Distribution-free testing algorithms for monomials with a sublinear number of queries Elya Dolev & Dana Ron Tel-Aviv University.

Slides:



Advertisements
Similar presentations
Quantum Lower Bounds You probably Havent Seen Before (which doesnt imply that you dont know OF them) Scott Aaronson, UC Berkeley 9/24/2002.
Advertisements

A threshold of ln(n) for approximating set cover By Uriel Feige Lecturer: Ariel Procaccia.
Gillat Kol joint work with Ran Raz Locally Testable Codes Analogues to the Unique Games Conjecture Do Not Exist.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Complexity ©D Moshkovitz 1 Approximation Algorithms Is Close Enough Good Enough?
1 Distribution-free testing algorithms for monomials with a sublinear number of queries Elya Dolev & Dana Ron Tel-Aviv University.
Property Testing on Product Distributions: Optimal Testers for Bounded Derivative Properties Deeparnab Chakrabarty Microsoft Research Bangalore Kashyap.
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Approximating Average Parameters of Graphs Oded Goldreich, Weizmann Institute Dana Ron, Tel Aviv University.
A Fairy Tale of Greedy Algorithms Yuli Ye Joint work with Allan Borodin, University of Toronto.
New Algorithms and Lower Bounds for Monotonicity Testing of Boolean Functions Rocco Servedio Joint work with Xi Chen and Li-Yang Tan Columbia University.
Property Testing: A Learning Theory Perspective Dana Ron Tel Aviv University.
1 Distributed (Local) Monotonicity Reconstruction Michael Saks Rutgers University C. Seshadhri Princeton University (Now IBM Almaden)
Proclaiming Dictators and Juntas or Testing Boolean Formulae Michal Parnas Dana Ron Alex Samorodnitsky.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
Putting a Junta to the Test Joint work with Eldar Fischer, Dana Ron, Shmuel Safra, and Alex Samorodnitsky Guy Kindler.
Putting a Junta to the Test Joint work with Eldar Fischer & Guy Kindler.
1 2 Introduction In last chapter we saw a few consistency tests. In this chapter we are going to prove the properties of Plane-vs.- Plane test: Thm[RaSa]:
Some Techniques in Property Testing Dana Ron Tel Aviv University.
Fourier Analysis, Projections, Influences, Juntas, Etc…
Testing of Clustering Noga Alon, Seannie Dar Michal Parnas, Dana Ron.
Sublinear Algorithms for Approximating Graph Parameters Dana Ron Tel-Aviv University.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,
Michael Bender - SUNY Stony Brook Dana Ron - Tel Aviv University Testing Acyclicity of Directed Graphs in Sublinear Time.
Testing Metric Properties Michal Parnas and Dana Ron.
On Proximity Oblivious Testing Oded Goldreich - Weizmann Institute of Science Dana Ron – Tel Aviv University.
1 On approximating the number of relevant variables in a function Dana Ron & Gilad Tsur Tel-Aviv University.
On Testing Convexity and Submodularity Michal Parnas Dana Ron Ronitt Rubinfeld.
1 On the Benefits of Adaptivity in Property Testing of Dense Graphs Joint work with Mira Gonen Dana Ron Tel-Aviv University.
1 Algorithmic Aspects in Property Testing of Dense Graphs Oded Goldreich – Weizmann Institute Dana Ron - Tel-Aviv University.
1 On the Benefits of Adaptivity in Property Testing of Dense Graphs Joint works with Mira Gonen and Oded Goldreich Dana Ron Tel-Aviv University.
Chapter 11: Limitations of Algorithmic Power
The Importance of Being Biased Irit Dinur S. Safra (some slides borrowed from Dana Moshkovitz) Irit Dinur S. Safra (some slides borrowed from Dana Moshkovitz)
Sampling and Approximate Counting for Weighted Matchings Roy Cagan.
Fourier Analysis of Boolean Functions Juntas, Projections, Influences Etc.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
On Testing Computability by small Width OBDDs Oded Goldreich Weizmann Institute of Science.
(work appeared in SODA 10’) Yuk Hei Chan (Tom)
A Tutorial on Property Testing Dana Ron Tel Aviv University.
1 Refined Search Tree Technique for Dominating Set on Planar Graphs Jochen Alber, Hongbing Fan, Michael R. Fellows, Henning Fernau, Rolf Niedermeier, Fran.
1 A New Interactive Hashing Theorem Iftach Haitner and Omer Reingold WEIZMANN INSTITUTE OF SCIENCE.
Dana Moshkovitz, MIT Joint work with Subhash Khot, NYU.
Correlation testing for affine invariant properties on Shachar Lovett Institute for Advanced Study Joint with Hamed Hatami (McGill)
Fixed Parameter Complexity Algorithms and Networks.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Lecture 22 More NPC problems
The Complexity of Optimization Problems. Summary -Complexity of algorithms and problems -Complexity classes: P and NP -Reducibility -Karp reducibility.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
CSC 211 Data Structures Lecture 13
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
Testing the independence number of hypergraphs
Locally Testable Codes and Caylay Graphs Parikshit Gopalan (MSR-SVC) Salil Vadhan (Harvard) Yuan Zhou (CMU)
Learnability of DNF with Representation-Specific Queries Liu Yang Joint work with Avrim Blum & Jaime Carbonell Carnegie Mellon University 1© Liu Yang 2012.
Approximation Algorithms based on linear programming.
On Sample Based Testers
Dana Ron Tel Aviv University
On Testing Dynamic Environments
Vitaly Feldman and Jan Vondrâk IBM Research - Almaden
On Approximating the Number of Relevant Variables in a Function
Lecture 18: Uniformity Testing Monotonicity Testing
Chapter 5. Optimal Matchings
On Learning and Testing Dynamic Environments
NP-Completeness Yin Tat Lee
Topic 3: Prob. Analysis Randomized Alg.
NP-Completeness Yin Tat Lee
Trevor Brown DC 2338, Office hour M3-4pm
Switching Lemmas and Proof Complexity
Complexity Theory: Foundations
Presentation transcript:

Distribution-free testing algorithms for monomials with a sublinear number of queries Elya Dolev & Dana Ron Tel-Aviv University

Property testing of (Boolean) functions (“standard/uniform” version) f : {0,1} n  {0,1} - the tested function F - family of functions (e.g. linear functions) Given a dist. par.  and query access to f f x f(x)  If f  F, then accept w.p.  2/3  If dist(f,F) >  then reject w.p  2/3 where dist(f,F) = min g  F {dist(f,g)} and dist(f,g) = Pr x  U [f(x)  g(x)]

Property testing of (Boolean) functions distribution-free version f : {0,1} n  {0,1} - the tested function F - family of functions (e.g. linear functions) D - (unknown) underlying distribution Given a dist. par. , access to examples distributed by D and query access to f fx f(x)  If f  F, then accept w.p.  2/3  If dist D (f,F) >  then reject w.p  2/3 where dist D (f,F) = min g  F {dist D (f,g)} and dist D (f,g) = Pr x  D [f(x)  g(x)] xD Inspired by dist-free PAC learning model [Valiant]

(Dist-free) Testing and Learning Dist-free testing was initially considered in [Goldreich,Goldwasser,R]. Observed that testing is no harder than (proper) learning (in particular, dist-free+queries). Q1: When is standard/dist-free testing easier than learning? Q2: What is relation btwn complexity of standard and dist-free testing?

Testing and Learning Quite a few classes for which standard testing is easier than learning (under the unif. dist. + queries): Linear functions [Blum,Luby,Rubinfeld] Low-degree polynomials [Rubinfeld&Sudan] Singletons, monomials, small monotone DNF [Parnas,R,Samorodintsky] Monotone functions [Ergun,Kannan,Kumar,Rubinfeld,Viswanathan] [Dodis,Goldreich,Lehman,Raskhodnikova,R,Samorodintsky] Small juntas [Fischer,Kindler,R,Safra,Samorodintsky] Small decision lists, decision trees, DNF (general) [Diakonikolas,Lee,Matulef,Onak,Rubinfeld,Servedio,Wan] Linear thresh. functions [Matulef,O’Donnell,Rubinfeld,Servedio]... Fewer positive results for dist-free testing [Halevy,Kushilevtz]x2. Tends to be more challenging.

Background on distribution-free testing One of the main positive (and general) results: if class has standard tester and can be self-corrected, then have dist-free tester [Halevy&Kushilevtz]. In particular gives dist-free testers for linear functions and low-degree polynomials. What about other classes of interest (e.g., from learning point of view) which don’t have self- correctors?

Background on distribution-free testing What about other classes of interest? [Glasner&Servedio] considered question for monomials (monotone/general), decision lists, linear thresh. func. Prove that every dist-free tester must perform  ((n/log(n)) 1/5 ) queries (for const.  ), in contrast to standard testing of classes where there is no dependence on n (and poly on 1/  ). Shows that strong dependence on n is unavoidable, but can we get some sublinear dependence on n? (Dist-free learning + queries requires linear dependence [Turan])

Our Results We give a positive answer to the question for monomials – both monotone and general. The complexity of our dist-free testing algorithms is O(n 1/2 log(n)/  ).

Dist-free testing of monotone monomials Let MM denote the class of monotone monomials (over n variables). Consider any f in MM. Observe: For each y s.t. f(y)=0, exists j s.t. y j =0 and x j  f For each y s.t. f(y)=1, for every j s.t. y j =0, x j  f Example: y 0 =010, f(y 0 )=0; y 1 =011, f(y 1 )=1 x 1 or x 3 must be in monomial x 1 cannot be in monomial

Dist-free testing of monotone monomials Def of the violation hypergraph H f of a function f: - Its vertex set is {0,1} n ; - Each (hyper)edge is a subset e={y 0,y 1,…,y t } where f(y 0 )=0 and f(y i )=1 for every i>0, s.t. Z(y 0 )   i>0 Z(y i ) (so that there is no g in MM consistent with f on e). Example: y 0 =010, y 1 =011, y 2 =110 (f(y 0 )=0, f(y 1 )=f(y 2 )=1) x 1 or x 3 must be in monomial x 1 cannot be in monomial x 3 cannot be in monomial Notation: Z(y)={j: y j =0}

Dist-free testing of monotone monomials Def of the violation hypergraph H f of a function f : - Its vertex set is {0,1} n ; - Each (hyper)edge is a subset e={y 0,y 1,…,y t } where f(y 0 )=0 and f(y i )=1 for every i>0, s.t. Z(y 0 )   i>0 Z(y i ) By def, if f is in MM then no edges in H f. Lemma: If dist D (f,MM) > , then D(C) >  for every vertex cover C of H f. Testing algorithm tries to find an edge in H f. Claim: Let R  {0,1} n. If no e  E(H f ) is subset of R, then exists g in MM that agrees with f on R.

Dist-free testing of monotone monomials Claim: Let R  {0,1} n. If no e  E(H f ) is subset of R then exists g in MM that agrees with f on R. f(y)=0 f(y)=1 Let S(R) = {i : y i =1  y  R  f -1 (1) } (if R  f -1 (1)=  then S=[n]) Define g(x) =  i  S(R) x i. Hence g(y)=f(y),  y  R  f -1 (1). Consider y  R  f -1 (0). Suppose g(y)=1, i.e., y i =1,  i  S. But then {y}  (R  f -1 (1)) is an edge in H f, contrary to premise of claim. {0,1} n R E(H f )

Dist-free testing of monotone monomials Testing algorithm tries to find an edge in H f. Notation: for Z  [n], y(Z) has all coordinates in Z equal 0, and others 1 (e.g., y({1,3}) = 0101, y({2}) = 1011) Basic building block: procedure that given y  f -1 (0) searches for index j s.t. y j =0 and f(y({j}))=0 (i.e. x j must be in monomial if f in MM). Procedure performs binary search. - Starts with Z = Z(y). - In each iteration partitions Z to two equal parts Z 1, Z 2, and queries y(Z 1 ) and y(Z 2 ). - Continues with Z i s.t.f(y(Z i ))=0 (if f(y(Z 1 ))=f(y(Z 2 ))=1 then {y(Z),y(Z 1 ),y(Z 2 )} is an edge so can reject) - Stops when |Z|=1. Z Z 1 Z 2 {j} (rep index of y(

Dist-free testing of monotone monomials Testing algorithm for MM - Obtain sample T of  (n 1/2 /  ) points dist.  D. - For each y in T s.t. f(y)=0 run search proc. on y. - If search failed for some y then reject (and halt). Otherwise, let J be union of all indices returned. - Obtain sample T’ of  (n 1/2 /  ) points dist.  D. - If exists y’ in T’ s.t. f(y’)=1 and Z(y’)  J   then reject, o.w. accept. Found edge {y(Z),y(Z 1 ),y(Z 2 )} Found edge {y({j}),y’}

Dist-free testing of monotone monomials Testing algorithm for MM - Obtain sample T of  (n 1/2 /  ) points dist.  D. - For each y in T s.t. f(y)=0 run search proc. on y. - If search failed for some y then reject (and halt). Otherwise, let J be union of all indices returned. - Obtain sample T’ of  (n 1/2 /  ) points dist.  D. - If exists y’ in T’ s.t. f(y’)=1 and Z(y’)  J   then reject, o.w. accept. Query complexity of alg: |T|log(n)+|T’| = O(n 1/2 log(n)/  ) If f in MM, alg always accepts. If dist D (f,MM) >  then prove that rejects w.p.  2/3. Lemma: If dist D (f,MM) > , then w.p.  5/6 over choice of T (of size  (n 1/2 /  ) ), either T 0 =T  f -1 (0) contains point that fails search or D(Y 1 (J))=  (  /n 1/2 ) where J=J(T 0 ) is union of indices returned by search, and Y 1 (J)={y  f -1 (1): Z(y)  J  }.

Dist-free testing of monotone monomials Prove contrapositive: If w.p.> 1/6 over choice of T (of size  (n 1/2 /  ) ): - T 0 =T  f -1 (0) does not contain any empty point and - D(Y 1 (J))=O(  /n 1/2 ) ( Y 1 (J)={y  f -1 (1): Z(y)  J  } ), then can construct vertex cover C of H f s.t. D(C) ≤ , so that dist D (f,MM) ≤ . First put in C all empty points. Total weight of these points is very small (O(  /n 1/2 )) Continue in O(n 1/2 ) iterations. In iteration r add to C subset Y r  f -1 (1) s.t. D(Y r )=O(  /n 1/2 ). Why cover? for y  f -1 (0) and j  Z(y), if j  J then Y 1 (J) covers all edges in H f that contain y (e.g., y=0101, j=3, J={3,4}, if y in edge {y=y 0,y 1,…,y t }, then have y i =??0?, so that y i  Y 1 (J).) Can show (by prob argument) that in each iteration (but last) exists T s.t. D(Y 1 (J(T 0 )))= O(  /n 1/2 ) and J(T 0 ) contains  (n 1/2 ) new indices. After last iteration add all y  f -1 (0) whose rep index did not appear in any iteration (can show that have small weight). point for which search fails

Dist-free testing of monotone monomials Suppose w.p. > 1/6 over choice of T (of size  (n 1/2 /  )), T 0 =T  f - 1 (0) does not contain empty point and D(Y 1 (J(T 0 )))=O(  /n 1/2 ). C  {empty points} T 1  T 1 0  J 1 =J(T 1 0 )  Y 1 = Y 1 (J 1 ), C  C  Y 1, J*  J*  J 1 J*   (J* is set of “covered indices”) |J 1 |=  (n 1/2 )D(Y 1 )=O(  /n 1/2 ) T 2  T 2 0  J 2 =J(T 1 0 )  Y 2 = Y 1 (J 2 ), C  C  Y 1, J*  J*  J 1 |J 2 \J*|=  (n 1/2 )D(Y 2 )=O(  /n 1/2 ) T s  T s 0  J s =J(T s 0 )  Y s = Y 1 (J s ), (s=O(n 1/2 )) C  C  Y s, J*  J*  J s D(Y s )=O(  /n 1/2 ) C  C  {all y  f -1 (0) s.t. Z(y)  J* =  } D(C)=O(  /n 1/2 )

Dist-free testing of general monomials Let GM denote the class of (general) monomials (over n variables). Consider any f in GM. Observe: For each y s.t. f(y)=0, either exists j s.t. y j =0 and x j  f or exists j s.t. y j =1 and  x j  f For each y s.t. f(y)=1, for every j s.t. y j =0, x j  f and for every j s.t. y j =1,  x j  f Example: y 0 =010, f(y 0 )=0; y 1 =011, f(y 1 )=1 x 1 or  x 2 or x 3 must be in monomial x 1 cannot be in monomial  x 2 cannot be in monomial  x 3 cannot be in monomial

Dist-free testing of general monomials First, modify notion of violation hypergraph H f : each edge {y 0,y 1,…,y t } still satisfies f(y 0 )=0, f(y i )=1, i>0, but now, Z(y 0 )   i>0 Z(y i ) and O(y 0 )   i>0 O(y i ) Next, binary search is performed on y in f -1 (0) but “w.r.t.” w in f -1 (1). Search finds index j s.t. f(w’)=0 for w’ that differs from w only on j’th coordinate. (in monotone case, implicitly w = 1 n ). After performing search on O(n 1/2 /  ) sample points in f -1 (0) (w.r.t. same w) and obtaining set J of “representative indices”, take additional sample and see if contains y in f -1 (1) s.t. y j  w j for some j in J.

Summary and Open problems Give sublinear (Õ(n 1/2 )) algorithms for dist-free testing of monotone/general monomials. (Alg for general monomials extends alg for monotone monomials.) Two natural questions: What is exact complexity of dist-free testing of monomials? (Lower bound of [GS] is  (n 1/5 )) What about other classes studied by [GS]? (Decision lists and linear threshold functions.)

Thanks

Standard vs. dist-free testing of monomials When the underlying distribution is uniform (standard testing), if f is a k-monomial, then Pr[f(x)=1] = 2 -k, and so can effectively consider only monomials where k = O(log(1/  ))). This is not generally true in dist-free case. Specifically, lower bound of [GS] constructs functions that depend on many variables and underlying dist. D helps to “hide non-monomiality”. Note: dist-free testing for (monotone) k-monomials when k is fixed, can be done using exp(k) samples+queries (combine [PRS] and [HK])

Dist-free testing of monotone monomials Claim: Let R  {0,1} n. If no e  E(H f ) is subset of R then exists g in MM that agrees with f on R. f(y)=0 f(y)=1 Let S(R) = {i : y i =1  y  R  f -1 (1) } (if R  f -1 (1)=  then S=[n]) Define g(x) =  i  S(R) x i. Hence g(y)=f(y),  y  R  f -1 (1). Consider y  R  f -1 (0). Suppose g(y)=1, i.e., y i =1,  i  S. But then {y}  (R  f -1 (1)) is an edge in H f, contrary to premise of claim.