Property Testing: A Learning Theory Perspective Dana Ron Tel Aviv University.

Slides:

Advertisements

Similar presentations

Property Testing of Data Dimensionality Robert Krauthgamer ICSI and UC Berkeley Joint work with Ori Sasson (Hebrew U.)

Advertisements

Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)

Deterministic vs. Non-Deterministic Graph Property Testing Asaf Shapira Tel-Aviv University Joint work with Lior Gishboliner.

An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution Jeffrey C. Jackson Presented By: Eitan Yaakobi Tamar.

Learning Juntas Elchanan Mossel UC Berkeley Ryan O’Donnell MIT Rocco Servedio Harvard.

1 Distribution-free testing algorithms for monomials with a sublinear number of queries Elya Dolev & Dana Ron Tel-Aviv University.

Distribution-free testing algorithms for monomials with a sublinear number of queries Elya Dolev & Dana Ron Tel-Aviv University.

Approximating Average Parameters of Graphs Oded Goldreich, Weizmann Institute Dana Ron, Tel Aviv University.

Christian Sohler | Every Property of Hyperfinite Graphs is Testable Ilan Newman and Christian Sohler.

Artur Czumaj Dept of Computer Science & DIMAP University of Warwick Testing Expansion in Bounded Degree Graphs Joint work with Christian Sohler.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006

1 Testing of clustering Article by : Noga Alon, Seannie Dar, Michal Parnas and Dana Ron Presented by: Nir Eitan.

Asaf Shapira (Georgia Tech) Joint work with: Arnab Bhattacharyya (MIT) Elena Grigorescu (Georgia Tech) Prasad Raghavendra (Georgia Tech) 1 Testing Odd-Cycle.

1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.

1 Distributed (Local) Monotonicity Reconstruction Michael Saks Rutgers University C. Seshadhri Princeton University (Now IBM Almaden)

Proclaiming Dictators and Juntas or Testing Boolean Formulae Michal Parnas Dana Ron Alex Samorodnitsky.

Putting a Junta to the Test Joint work with Eldar Fischer, Dana Ron, Shmuel Safra, and Alex Samorodnitsky Guy Kindler.

CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.

Some Techniques in Property Testing Dana Ron Tel Aviv University.

Probably Approximately Correct Model (PAC)

Testing of Clustering Noga Alon, Seannie Dar Michal Parnas, Dana Ron.

Sublinear Algorithms for Approximating Graph Parameters Dana Ron Tel-Aviv University.

Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,

Michael Bender - SUNY Stony Brook Dana Ron - Tel Aviv University Testing Acyclicity of Directed Graphs in Sublinear Time.

Testing Metric Properties Michal Parnas and Dana Ron.

On Proximity Oblivious Testing Oded Goldreich - Weizmann Institute of Science Dana Ron – Tel Aviv University.

1 On approximating the number of relevant variables in a function Dana Ron & Gilad Tsur Tel-Aviv University.

On Testing Convexity and Submodularity Michal Parnas Dana Ron Ronitt Rubinfeld.

1 On the Benefits of Adaptivity in Property Testing of Dense Graphs Joint work with Mira Gonen Dana Ron Tel-Aviv University.

1 Algorithmic Aspects in Property Testing of Dense Graphs Oded Goldreich – Weizmann Institute Dana Ron - Tel-Aviv University.

1 On the Benefits of Adaptivity in Property Testing of Dense Graphs Joint works with Mira Gonen and Oded Goldreich Dana Ron Tel-Aviv University.

Christian Sohler 1 University of Dortmund Testing Expansion in Bounded Degree Graphs Christian Sohler University of Dortmund (joint work with Artur Czumaj,

Chapter 11: Limitations of Algorithmic Power

On Testing Computability by small Width OBDDs Oded Goldreich Weizmann Institute of Science.

A Tutorial on Property Testing Dana Ron Tel Aviv University.

Some 3CNF Properties are Hard to Test Eli Ben-Sasson Harvard & MIT Prahladh Harsha MIT Sofya Raskhodnikova MIT.

Dana Moshkovitz, MIT Joint work with Subhash Khot, NYU.

Correlation testing for affine invariant properties on Shachar Lovett Institute for Advanced Study Joint with Hamed Hatami (McGill)

Chapter 11 Limitations of Algorithm Power. Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples:

Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.

1 Sublinear Algorithms Lecture 1 Sofya Raskhodnikova Penn State University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.

Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.

Property Testing: Sublinear-Time Approximate Decisions Oded Goldreich Weizmann Institute of Science Talk at CTW, July 2013.

Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.

Secret Sharing Non-Shannon Information Inequalities Presented in: Theory of Cryptography Conference (TCC) 2009 Published in: IEEE Transactions on Information.

CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.

Why almost all satisfiable k - CNF formulas are easy? Danny Vilenchik Joint work with A. Coja-Oghlan and M. Krivelevich.

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.

1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.

On Sample Based Testers

Dana Ron Tel-Aviv University

Property Testing (a.k.a. Sublinear Algorithms )

Dana Ron Tel Aviv University

On Testing Dynamic Environments

On Approximating the Number of Relevant Variables in a Function

From dense to sparse and back again: On testing graph properties (and some properties of Oded)

On Learning and Testing Dynamic Environments

NP-Completeness Yin Tat Lee

Analysis and design of algorithm

k-center Clustering under Perturbation Resilience

CIS 700: “algorithms for Big Data”

Pseudo-derandomizing learning and approximation

Chapter 34: NP-Completeness

Learning, testing, and approximating halfspaces

Chapter 11 Limitations of Algorithm Power

NP-Completeness Yin Tat Lee

Every set in P is strongly testable under a suitable encoding

Presentation transcript:

Property Testing: A Learning Theory Perspective Dana Ron Tel Aviv University

Property Testing (Informal Definition) For a fixed property P and any object O, determine whether O has property P, or whether O is far from having property P (i.e., far from any other object having P ). Task should be performed by inspecting the object (in as few places as possible). ?? ? ? ?

Examples The object can be a function and the property can be linearity. LThe object can be a string and the property can be membership in a fixed regular language L. The object can be a graph and the property can be 3-colorabilty.

Two Views of Property Testing A relaxation of exactly deciding whether the object has the property or does not have the property. A relaxation of learning the object (with membership queries and under the uniform distribution). Q: Which view is “more right”? In either case want testing algorithm to be significantly more efficient than decision/learning algorithm. A: Depends… Mainly on type of objects and properties studied: Combinatorial objects and properties vs. function classes that are of interest to learning community

A Little Background Initially defined by Rubinfeld and Sudan in the context of Program Testing (of algebraic functions). With Goldreich and Goldwasser initiated study of testing properties of combinatorial objects, and in particular graphs. Growing body of work deals with properties of functions, graphs, strings, sets of points... Many algorithms with complexity that is sub-linear in (or even independent of) size of object.

Formal Definition (“standard” model) A property testing algorithm for property P is given a distance parameter  and query access(*) to a function f.  If f has property P then the algorithm should accept w.h.p.  If f is  -far from any function having property P then the algorithm should reject w.h.p. Distance is measured with respect to the uniform dist.(**) (*) May consider testing with random examples only (**) May consider other distributions (unknown dist.)

Property Testing and Learning: Basic Comments/Observations  Comment (Motivation): Can use testing as preliminary step to learning: That is, for efficiently selecting good hypothesis class.  Observation: Testing is no harder than (proper) learning: If have learning algorithm for function class F that outputs hypothesis in F, then can use to test the property of belonging to F. Want testing algorithm to be more efficient than learning algorithm. That is, run learning alg with accuracy parameter set to  /2, and check that the hypothesis it outputs is at most 3  /4 far from f on independent sample.

Classes/Properties for which testing is more efficient then learning  Linear functions  Low-degree polynomials  Singletons, Monomials  (small) DNF and general Boolean formula and circuits  Monotone functions  Juntas  Halfspaces  Decision Lists, Decision Trees, Branching Programs  Clustering  Properties of Distributions Two recurring approaches: “Self-Correcting” approach, and “Enforce&Test” Approach.

Linearity Testing [Blum,Luby,Rubinfeld] Def1: Let F be a finite field. A function f : F n  F is called linear (multi-linear) if there exists constants a 1,…,a n  F s.t. for every x=x 1,…,x n  F n it holds that f(x) =  a i x i. Fact: A function f : F n  F is linear i.f.f for every x,y  F n it holds that f(x)+f(y)=f(x+y). Def2: A function f is said to be - far from linear if for every linear function g, dist(f,g)> , where dist(f,g)=Pr[f(x)  g(x)] (x selected uniformly in F n ).

Linearity Testing Cont’ Linearity Test 1) Uniformly and independently select  (1/  ) pairs of elements x,y  F n. 2) For every pair x,y selected, verify that f(x)+f(y) = f(x+y). 3) If for any of the pairs selected linearity is violated (i.e., f(x)+f(y)  f(x+y)), then REJECT, otherwise ACCEPT. Theorem: If f is linear then test accepts w.p. 1., and if f is - far from linear then with probability at least 2/3 the test rejects it. Query complexity:  (1/  ), i.e., independent of n. In contrast to learning where need (n) queries/examples. Lemma: If f is accepted with probability greater than 1/3, then f is -close to linear.

Linearity Testing Cont’ Suppose f is accepted w.p > 1/3 Define self-corrected version of f, denote g: For each x,y let V y (x) = f(x+y)-f(y) ( the vote of y on x ) g(x) = Plurality(V y (x)) small (<  /2) fraction of violating pairs (f(x)+f(y)f(x+y)) Can show that ( conditioned on < /2 fraction of violating pairs ) (1) g is linear. (2) dist(f,g)   Lemma: If f is accepted with probability greater than 1/3, then f is -close to linear.

Testing “Basic” Properties of Functions [Parnas, R, Samorodnitsky] Considers following function classes: Singletons: Monomials: DNF:

Testing “Basic” Properties of Functions Cont’ Can test whether f is a singleton using O(1/ ) queries. Can test whether f is a monomial using O(1/  ) queries. Can test whether f is a monotone DNF with at most t terms using Õ(t 2 /  ) queries. Common theme: no dependence in query complexity on size of input, n, and linear dependence on distance parameter,  (as opposed to learning these classes where have dependence on n (logarithmic)) Recent result of [Diakonikolas, Lee, Matulef, Onak, Rubinfeld, Servedio, Wan] greatly extends the above

Testing (Monotone) Singletons Singletons satisfy: (1) (2) Natural test: check, by sampling, that conditions hold (approximately). Can analyze natural test for case that distance between function and class of singletons is not too big (bounded away from 1/2).

Testing Singletons II - Parity Testing Observation: Singletons are a special case of parity functions (i.e., linear functions:.) Claim: Let. If then Modified algorithm: (1) Test whether f is a parity function (with dist. par.  ) using algorithm of [BLR]. (2) Uniformly select constant number of pairs x,y and check whether any is a violating pair (i.e.: ).

Testing Singletons III - Self Correcting Use Self-Corrector of [BLR] to “fix” f into parity function (g), and then test violations on self-corrected version. This “almost works”: If f is singleton - always accepted. If f is  -far from parity - rejected w.h.p. But if f is  -close to parity function g, then cannot simply apply claim to argue that many violating pairs w.r.t. f. If we could only test violations w.r.t. g instead of f...

Testing Singletons IV - The Algorithm Final Algorithm for Testing Singletons: (1) Test whether f is a parity function with dist. par.  using algorithm of [BLR]. (2) Uniformly select constant number of pairs x,y. Verify that Self-Cor(f,x)  Self-Cor(f,y) = Self-Cor(f,xy). (3) Verify that Self-Cor( ) = 1.

Testing Monomials and Monotone DNF Monomial testing algorithm has similar structure to Singleton testing algorithm. (Here too suffice to find test for monotone monomials.) - The first stage of linearity testing is replaced by Affinity Testing: if f is a monomial then F 1 ={x: f(x)=1} is an affine subspace. [ Fact: H is affine subspace i.f.f x,y,zH, xyz H ]. Affinity test is similar to parity test: select x,yF 1, z{0,1} n, verify that f(xyz)=f(x)f(y)f(z). - The second stage is also analogous singleton test (check for violating pairs). Here affinity adds structure that helps analyze second stage. Testing monotone DNF: use monomial test as sub-routine Result of [DLMORSW07] which extends to other families (e.g., non-monotone DNF) uses different techniques.

Testing of Clustering [Alon,Dar,Parnas,R] X - set of points |X| = n Notation: dist(x,y) - distance between points x and y. Assume that triangle inequality holds (dist(x,y ) ≤ dist(x,z)+dist(z,y)). For any subset S of X: Diameter of S d(S) = max x,yS dist(x,y)

Clustering Cont’ X is (k,b)-clusterable if exists a k-way partition (clustering) of X s.t. each cluster has diameter at most b. X is  -far from being (k,b’)-clusterable (b’  b) if there is no k-way partition of any Y  X, |Y| ≥ (1-  )n s.t. each cluster has diameter at most b’. In first case algorithm should accept and in second reject with probability ≥ 2/3  b b (In particular, will look at b’=(1+)b,  ≤ 1.)

Clustering Cont’ Testing Algorithm (input: k,b,   ) (1) Take sample of m=m(k,  ) points from X. (2) If sample is (k,b)-clusterable then accept, o.w. reject If X is (k,b)-clusterable then always accept. Suppose X is  -far from being (k,(1+)b)-clusterable. Show that reject w.p. at least 2/3. Will prove for general metric and =1 where m=O(k/  Note: for general metric cannot go below =1 unless allow m=(|X| 1/2 ), but can do so for Euclidean distance in d dimensions (in that case must have dependence on (1/ ) d/2 ). Other sublinear clustering work (e.g. for k-median) include: [Indyk], [Mishra,Oblinger,Pitt], [Ben-David], [Meyerson,O’Callaghan,Plotkin]

Clustering Cont’ Consider following mental experiment: -Let points in sample be x 1,x 2,…,x m >b Claim: If X is  -far from being (k, 2b)-clusterable then w.h.p. |REPS|>k, causing the algorithm to reject as required. Proof Idea: At each step consider clustering “enforced” by REPS: Each point in X\REPS is assigned to closest x i in REPS. Additional sample points “test” this clustering. - Construct (growing) set of “cluster representatives” REPS. - Initially, REPS = {x 1 }. -At each step, take next point that is at distance > b from every x in REPS, and add to REPS.

Tolerant Testing of Clustering [Parnas,R,Rubinfeld] Tolerant Testing: Reject when  -far but accept when  ’-close Tolerant Testing Algorithm (input: k, b,  ’,   ) (1) Take sample of m=m(k,  ’,  ) points from X. (2) If sample is (  ’ + (  -  ’)/2)-close to (k,b)-clusterable then accept, o.w. reject Sample has quadratic dependence on 1/(  -  ’), and same dependence on other parameters as (standard) testing algorithm. Can analyze using a generalization of a framework by Czumaj & Sohler for (standard) testing that captures aspects of “enforce&test” approach.

Distribution-Free Testing (with queries) First results [GGR]: trivial positive results that follow from learning  testing, and simple negative results for combinatorial properties (e.g., bipartiteness) First non-trivial positive results [Halevi, Kushilevitz]: (1) Linearity and Low-degree polynomials. In general: when have self-corrector. (2) Monotonicity in low dimensions (Also had positive results for graph properties) On the other hand, [Halevi, Kushilevitz] showed hardness of distribution free testing of monotonicity in high-dimensions (i.e., exponential in d over {0,1} d ). Recently, [Glasner,Servedio] showed that for several classes (monomials, decisions lists, linear threshold functions) need  ((n/log n) 1/5 ) queries for dist-free testing.

Conclusions and Open Problems  Property testing: A relaxation of learning where should determine (w.h.p) if exists good approximation of function f in class F rather than find such approximation.  Can serve as preliminary step to learning.  For quite a few function classes have testing algorithms that are more query-efficient than learning algorithms.  Some extend to dist-free testing, for some have strong lower bounds.  Still much is left to be understood about relation between testing and learning (has flavor of relation between decision and search).

Thanks

Property Testing and Learning: Basic Comments/Observations  Motivation I: Can use testing as preliminary step to learning: That is, for efficiently selecting good hypothesis class.  Motivation II: Relation between testing and learning has similarity to relation between decision and search.  Testing is no harder than (proper) learning: If have learning algorithm for function class F that outputs hypothesis in F, then can use to test the property of belonging to F. Want testing algorithm to be significantly more efficient than learning algorithm. That is, run learning alg with accuracy parameter set to  /2, and check that the hypothesis it outputs is at most 3  /4 far from f on independent sample.

Linearity Testing Cont’ Suppose f is accepted w.p > 1/3 Define self-corrected version of f, denote g: For each x,y let V y (x) = f(x+y)-f(y) ( the vote of y on x ) g(x) = Plurality(V y (x)) small (<  /2) fraction of violating pairs (f(x)+f(y)f(x+y)) Can show that ( conditioned on < /2 fraction of violating pairs ) (1) g is linear. (2) dist(f,g)   Main Technical Lemma (informal): if few violating pairs then  x we have that for almost all y, V y (x)=g(x) Lemma: If f is accepted with probability greater than 1/3, then f is -close to linear.

Learning Boolean Formulae Main difference w.r.t testing results: in testing there is no dependence on n and different algorithmic approach. Can learn singletons and monomials under uniform distribution using O(log n/  ) queries/examples (variation on Occam’s razor) Can properly learn monotone DNF with t terms and r literals using Õ(r log 2 n/  + t(r + 1/  )) queries [Angluin+Bshouty&Jackson&Tamon].