Distribution-free testing algorithms for monomials with a sublinear number of queries Elya Dolev & Dana Ron Tel-Aviv University.

Distribution-free testing algorithms for monomials with a sublinear number of queries Elya Dolev & Dana Ron Tel-Aviv University

Property testing of (Boolean) functions (“standard/uniform” version) f : {0,1} n  {0,1} - the tested function F - family of functions (e.g. linear functions) Given a dist. par.  and query access to f f x f(x)  If f  F, then accept w.p.  2/3  If dist(f,F) >  then reject w.p  2/3 where dist(f,F) = min g  F {dist(f,g)} and dist(f,g) = Pr x  U [f(x)  g(x)]

Property testing of (Boolean) functions distribution-free version f : {0,1} n  {0,1} - the tested function F - family of functions (e.g. linear functions) D - (unknown) underlying distribution Given a dist. par. , access to examples distributed by D and query access to f fx f(x)  If f  F, then accept w.p.  2/3  If dist D (f,F) >  then reject w.p  2/3 where dist D (f,F) = min g  F {dist D (f,g)} and dist D (f,g) = Pr x  D [f(x)  g(x)] xD Inspired by dist-free PAC learning model [Valiant]

(Dist-free) Testing and Learning Dist-free testing was initially considered in [Goldreich,Goldwasser,R]. Observed that testing is no harder than (proper) learning (in particular, dist-free+queries). Q1: When is standard/dist-free testing easier than learning? Q2: What is relation btwn complexity of standard and dist-free testing?

Testing and Learning Quite a few classes for which standard testing is easier than learning (under the unif. dist. + queries): Linear functions [Blum,Luby,Rubinfeld] Low-degree polynomials [Rubinfeld&Sudan] Singletons, monomials, small monotone DNF [Parnas,R,Samorodintsky] Monotone functions [Ergun,Kannan,Kumar,Rubinfeld,Viswanathan] [Dodis,Goldreich,Lehman,Raskhodnikova,R,Samorodintsky] Small juntas [Fischer,Kindler,R,Safra,Samorodintsky] Small decision lists, decision trees, DNF (general) [Diakonikolas,Lee,Matulef,Onak,Rubinfeld,Servedio,Wan] Linear thresh. functions [Matulef,O’Donnell,Rubinfeld,Servedio]... Fewer positive results for dist-free testing [Halevy,Kushilevtz]x2. Tends to be more challenging.

Background on distribution-free testing One of the main positive (and general) results: if class has standard tester and can be self-corrected, then have dist-free tester [Halevy&Kushilevtz]. In particular gives dist-free testers for linear functions and low-degree polynomials. What about other classes of interest (e.g., from learning point of view) which don’t have self- correctors?

Background on distribution-free testing What about other classes of interest? [Glasner&Servedio] considered question for monomials (monotone/general), decision lists, linear thresh. func. Prove that every dist-free tester must perform  ((n/log(n)) 1/5 ) queries (for const.  ), in contrast to standard testing of classes where there is no dependence on n (and poly on 1/  ). Shows that strong dependence on n is unavoidable, but can we get some sublinear dependence on n? (Dist-free learning + queries requires linear dependence [Turan])

Our Results We give a positive answer to the question for monomials – both monotone and general. The complexity of our dist-free testing algorithms is O(n 1/2 log(n)/  ).

Dist-free testing of monotone monomials Let MM denote the class of monotone monomials (over n variables). Consider any f in MM. Observe: For each y s.t. f(y)=0, exists j s.t. y j =0 and x j  f For each y s.t. f(y)=1, for every j s.t. y j =0, x j  f Example: y 0 =010, f(y 0 )=0; y 1 =011, f(y 1 )=1 x 1 or x 3 must be in monomial x 1 cannot be in monomial

Dist-free testing of monotone monomials Def of the violation hypergraph H f of a function f: - Its vertex set is {0,1} n ; - Each (hyper)edge is a subset e={y 0,y 1,…,y t } where f(y 0 )=0 and f(y i )=1 for every i>0, s.t. Z(y 0 )   i>0 Z(y i ) (so that there is no g in MM consistent with f on e). Example: y 0 =010, y 1 =011, y 2 =110 (f(y 0 )=0, f(y 1 )=f(y 2 )=1) x 1 or x 3 must be in monomial x 1 cannot be in monomial x 3 cannot be in monomial Notation: Z(y)={j: y j =0}

Dist-free testing of monotone monomials Def of the violation hypergraph H f of a function f : - Its vertex set is {0,1} n ; - Each (hyper)edge is a subset e={y 0,y 1,…,y t } where f(y 0 )=0 and f(y i )=1 for every i>0, s.t. Z(y 0 )   i>0 Z(y i ) By def, if f is in MM then no edges in H f. Lemma: If dist D (f,MM) > , then D(C) >  for every vertex cover C of H f. Testing algorithm tries to find an edge in H f. Claim: Let R  {0,1} n. If no e  E(H f ) is subset of R, then exists g in MM that agrees with f on R.

Dist-free testing of monotone monomials Claim: Let R  {0,1} n. If no e  E(H f ) is subset of R then exists g in MM that agrees with f on R. f(y)=0 f(y)=1 Let S(R) = {i : y i =1  y  R  f -1 (1) } (if R  f -1 (1)=  then S=[n]) Define g(x) =  i  S(R) x i. Hence g(y)=f(y),  y  R  f -1 (1). Consider y  R  f -1 (0). Suppose g(y)=1, i.e., y i =1,  i  S. But then {y}  (R  f -1 (1)) is an edge in H f, contrary to premise of claim. {0,1} n R E(H f )

Dist-free testing of monotone monomials Testing algorithm tries to find an edge in H f. Notation: for Z  [n], y(Z) has all coordinates in Z equal 0, and others 1 (e.g., y({1,3}) = 0101, y({2}) = 1011) Basic building block: procedure that given y  f -1 (0) searches for index j s.t. y j =0 and f(y({j}))=0 (i.e. x j must be in monomial if f in MM). Procedure performs binary search. - Starts with Z = Z(y). - In each iteration partitions Z to two equal parts Z 1, Z 2, and queries y(Z 1 ) and y(Z 2 ). - Continues with Z i s.t.f(y(Z i ))=0 (if f(y(Z 1 ))=f(y(Z 2 ))=1 then {y(Z),y(Z 1 ),y(Z 2 )} is an edge so can reject) - Stops when |Z|=1. Z Z 1 Z 2 {j} (rep index of y(

Dist-free testing of monotone monomials Testing algorithm for MM - Obtain sample T of  (n 1/2 /  ) points dist.  D. - For each y in T s.t. f(y)=0 run search proc. on y. - If search failed for some y then reject (and halt). Otherwise, let J be union of all indices returned. - Obtain sample T’ of  (n 1/2 /  ) points dist.  D. - If exists y’ in T’ s.t. f(y’)=1 and Z(y’)  J   then reject, o.w. accept. Found edge {y(Z),y(Z 1 ),y(Z 2 )} Found edge {y({j}),y’}

Dist-free testing of monotone monomials Testing algorithm for MM - Obtain sample T of  (n 1/2 /  ) points dist.  D. - For each y in T s.t. f(y)=0 run search proc. on y. - If search failed for some y then reject (and halt). Otherwise, let J be union of all indices returned. - Obtain sample T’ of  (n 1/2 /  ) points dist.  D. - If exists y’ in T’ s.t. f(y’)=1 and Z(y’)  J   then reject, o.w. accept. Query complexity of alg: |T|log(n)+|T’| = O(n 1/2 log(n)/  ) If f in MM, alg always accepts. If dist D (f,MM) >  then prove that rejects w.p.  2/3. Lemma: If dist D (f,MM) > , then w.p.  5/6 over choice of T (of size  (n 1/2 /  ) ), either T 0 =T  f -1 (0) contains point that fails search or D(Y 1 (J))=  (  /n 1/2 ) where J=J(T 0 ) is union of indices returned by search, and Y 1 (J)={y  f -1 (1): Z(y)  J  }.

Dist-free testing of monotone monomials Prove contrapositive: If w.p.> 1/6 over choice of T (of size  (n 1/2 /  ) ): - T 0 =T  f -1 (0) does not contain any empty point and - D(Y 1 (J))=O(  /n 1/2 ) ( Y 1 (J)={y  f -1 (1): Z(y)  J  } ), then can construct vertex cover C of H f s.t. D(C) ≤ , so that dist D (f,MM) ≤ . First put in C all empty points. Total weight of these points is very small (O(  /n 1/2 )) Continue in O(n 1/2 ) iterations. In iteration r add to C subset Y r  f -1 (1) s.t. D(Y r )=O(  /n 1/2 ). Why cover? for y  f -1 (0) and j  Z(y), if j  J then Y 1 (J) covers all edges in H f that contain y (e.g., y=0101, j=3, J={3,4}, if y in edge {y=y 0,y 1,…,y t }, then have y i =??0?, so that y i  Y 1 (J).) Can show (by prob argument) that in each iteration (but last) exists T s.t. D(Y 1 (J(T 0 )))= O(  /n 1/2 ) and J(T 0 ) contains  (n 1/2 ) new indices. After last iteration add all y  f -1 (0) whose rep index did not appear in any iteration (can show that have small weight). point for which search fails

Dist-free testing of monotone monomials Suppose w.p. > 1/6 over choice of T (of size  (n 1/2 /  )), T 0 =T  f - 1 (0) does not contain empty point and D(Y 1 (J(T 0 )))=O(  /n 1/2 ). C  {empty points} T 1  T 1 0  J 1 =J(T 1 0 )  Y 1 = Y 1 (J 1 ), C  C  Y 1, J*  J*  J 1 J*   (J* is set of “covered indices”) |J 1 |=  (n 1/2 )D(Y 1 )=O(  /n 1/2 ) T 2  T 2 0  J 2 =J(T 1 0 )  Y 2 = Y 1 (J 2 ), C  C  Y 1, J*  J*  J 1 |J 2 \J*|=  (n 1/2 )D(Y 2 )=O(  /n 1/2 ) T s  T s 0  J s =J(T s 0 )  Y s = Y 1 (J s ), (s=O(n 1/2 )) C  C  Y s, J*  J*  J s D(Y s )=O(  /n 1/2 ) C  C  {all y  f -1 (0) s.t. Z(y)  J* =  } D(C)=O(  /n 1/2 )

Dist-free testing of general monomials Let GM denote the class of (general) monomials (over n variables). Consider any f in GM. Observe: For each y s.t. f(y)=0, either exists j s.t. y j =0 and x j  f or exists j s.t. y j =1 and  x j  f For each y s.t. f(y)=1, for every j s.t. y j =0, x j  f and for every j s.t. y j =1,  x j  f Example: y 0 =010, f(y 0 )=0; y 1 =011, f(y 1 )=1 x 1 or  x 2 or x 3 must be in monomial x 1 cannot be in monomial  x 2 cannot be in monomial  x 3 cannot be in monomial

Dist-free testing of general monomials First, modify notion of violation hypergraph H f : each edge {y 0,y 1,…,y t } still satisfies f(y 0 )=0, f(y i )=1, i>0, but now, Z(y 0 )   i>0 Z(y i ) and O(y 0 )   i>0 O(y i ) Next, binary search is performed on y in f -1 (0) but “w.r.t.” w in f -1 (1). Search finds index j s.t. f(w’)=0 for w’ that differs from w only on j’th coordinate. (in monotone case, implicitly w = 1 n ). After performing search on O(n 1/2 /  ) sample points in f -1 (0) (w.r.t. same w) and obtaining set J of “representative indices”, take additional sample and see if contains y in f -1 (1) s.t. y j  w j for some j in J.

Summary and Open problems Give sublinear (Õ(n 1/2 )) algorithms for dist-free testing of monotone/general monomials. (Alg for general monomials extends alg for monotone monomials.) Two natural questions: What is exact complexity of dist-free testing of monomials? (Lower bound of [GS] is  (n 1/5 )) What about other classes studied by [GS]? (Decision lists and linear threshold functions.)

Thanks

Standard vs. dist-free testing of monomials When the underlying distribution is uniform (standard testing), if f is a k-monomial, then Pr[f(x)=1] = 2 -k, and so can effectively consider only monomials where k = O(log(1/  ))). This is not generally true in dist-free case. Specifically, lower bound of [GS] constructs functions that depend on many variables and underlying dist. D helps to “hide non-monomiality”. Note: dist-free testing for (monotone) k-monomials when k is fixed, can be done using exp(k) samples+queries (combine [PRS] and [HK])

Dist-free testing of monotone monomials Claim: Let R  {0,1} n. If no e  E(H f ) is subset of R then exists g in MM that agrees with f on R. f(y)=0 f(y)=1 Let S(R) = {i : y i =1  y  R  f -1 (1) } (if R  f -1 (1)=  then S=[n]) Define g(x) =  i  S(R) x i. Hence g(y)=f(y),  y  R  f -1 (1). Consider y  R  f -1 (0). Suppose g(y)=1, i.e., y i =1,  i  S. But then {y}  (R  f -1 (1)) is an edge in H f, contrary to premise of claim.

Distribution-free testing algorithms for monomials with a sublinear number of queries Elya Dolev & Dana Ron Tel-Aviv University.

Similar presentations

Presentation on theme: "Distribution-free testing algorithms for monomials with a sublinear number of queries Elya Dolev & Dana Ron Tel-Aviv University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distribution-free testing algorithms for monomials with a sublinear number of queries Elya Dolev & Dana Ron Tel-Aviv University.

Similar presentations

Presentation on theme: "Distribution-free testing algorithms for monomials with a sublinear number of queries Elya Dolev & Dana Ron Tel-Aviv University."— Presentation transcript:

Similar presentations

About project

Feedback