Property Testing: A Learning Theory Perspective Dana Ron Tel Aviv University.

Property Testing: A Learning Theory Perspective Dana Ron Tel Aviv University

Property Testing (Informal Definition) For a fixed property P and any object O, determine whether O has property P, or whether O is far from having property P (i.e., far from any other object having P ). Task should be performed by inspecting the object (in as few places as possible). ?? ? ? ?

Examples The object can be a function and the property can be linearity. LThe object can be a string and the property can be membership in a fixed regular language L. The object can be a graph and the property can be 3-colorabilty.

Two Views of Property Testing A relaxation of exactly deciding whether the object has the property or does not have the property. A relaxation of learning the object (with membership queries and under the uniform distribution). Q: Which view is “more right”? In either case want testing algorithm to be significantly more efficient than decision/learning algorithm. A: Depends… Mainly on type of objects and properties studied: Combinatorial objects and properties vs. function classes that are of interest to learning community

A Little Background Initially defined by Rubinfeld and Sudan in the context of Program Testing (of algebraic functions). With Goldreich and Goldwasser initiated study of testing properties of combinatorial objects, and in particular graphs. Growing body of work deals with properties of functions, graphs, strings, sets of points... Many algorithms with complexity that is sub-linear in (or even independent of) size of object.

Formal Definition (“standard” model) A property testing algorithm for property P is given a distance parameter  and query access(*) to a function f.  If f has property P then the algorithm should accept w.h.p.  If f is  -far from any function having property P then the algorithm should reject w.h.p. Distance is measured with respect to the uniform dist.(**) (*) May consider testing with random examples only (**) May consider other distributions (unknown dist.)

Property Testing and Learning: Basic Comments/Observations  Comment (Motivation): Can use testing as preliminary step to learning: That is, for efficiently selecting good hypothesis class.  Observation: Testing is no harder than (proper) learning: If have learning algorithm for function class F that outputs hypothesis in F, then can use to test the property of belonging to F. Want testing algorithm to be more efficient than learning algorithm. That is, run learning alg with accuracy parameter set to  /2, and check that the hypothesis it outputs is at most 3  /4 far from f on independent sample.

Classes/Properties for which testing is more efficient then learning  Linear functions  Low-degree polynomials  Singletons, Monomials  (small) DNF and general Boolean formula and circuits  Monotone functions  Juntas  Halfspaces  Decision Lists, Decision Trees, Branching Programs  Clustering  Properties of Distributions Two recurring approaches: “Self-Correcting” approach, and “Enforce&Test” Approach.

Linearity Testing [Blum,Luby,Rubinfeld] Def1: Let F be a finite field. A function f : F n  F is called linear (multi-linear) if there exists constants a 1,…,a n  F s.t. for every x=x 1,…,x n  F n it holds that f(x) =  a i x i. Fact: A function f : F n  F is linear i.f.f for every x,y  F n it holds that f(x)+f(y)=f(x+y). Def2: A function f is said to be - far from linear if for every linear function g, dist(f,g)> , where dist(f,g)=Pr[f(x)  g(x)] (x selected uniformly in F n ).

Linearity Testing Cont’ Linearity Test 1) Uniformly and independently select  (1/  ) pairs of elements x,y  F n. 2) For every pair x,y selected, verify that f(x)+f(y) = f(x+y). 3) If for any of the pairs selected linearity is violated (i.e., f(x)+f(y)  f(x+y)), then REJECT, otherwise ACCEPT. Theorem: If f is linear then test accepts w.p. 1., and if f is - far from linear then with probability at least 2/3 the test rejects it. Query complexity:  (1/  ), i.e., independent of n. In contrast to learning where need (n) queries/examples. Lemma: If f is accepted with probability greater than 1/3, then f is -close to linear.

Linearity Testing Cont’ Suppose f is accepted w.p > 1/3 Define self-corrected version of f, denote g: For each x,y let V y (x) = f(x+y)-f(y) ( the vote of y on x ) g(x) = Plurality(V y (x)) small (<  /2) fraction of violating pairs (f(x)+f(y)f(x+y)) Can show that ( conditioned on < /2 fraction of violating pairs ) (1) g is linear. (2) dist(f,g)   Lemma: If f is accepted with probability greater than 1/3, then f is -close to linear.

Testing “Basic” Properties of Functions [Parnas, R, Samorodnitsky] Considers following function classes: Singletons: Monomials: DNF:

Testing “Basic” Properties of Functions Cont’ Can test whether f is a singleton using O(1/ ) queries. Can test whether f is a monomial using O(1/  ) queries. Can test whether f is a monotone DNF with at most t terms using Õ(t 2 /  ) queries. Common theme: no dependence in query complexity on size of input, n, and linear dependence on distance parameter,  (as opposed to learning these classes where have dependence on n (logarithmic)) Recent result of [Diakonikolas, Lee, Matulef, Onak, Rubinfeld, Servedio, Wan] greatly extends the above

Testing (Monotone) Singletons Singletons satisfy: (1) (2) Natural test: check, by sampling, that conditions hold (approximately). Can analyze natural test for case that distance between function and class of singletons is not too big (bounded away from 1/2).

Testing Singletons II - Parity Testing Observation: Singletons are a special case of parity functions (i.e., linear functions:.) Claim: Let. If then Modified algorithm: (1) Test whether f is a parity function (with dist. par.  ) using algorithm of [BLR]. (2) Uniformly select constant number of pairs x,y and check whether any is a violating pair (i.e.: ).

Testing Singletons III - Self Correcting Use Self-Corrector of [BLR] to “fix” f into parity function (g), and then test violations on self-corrected version. This “almost works”: If f is singleton - always accepted. If f is  -far from parity - rejected w.h.p. But if f is  -close to parity function g, then cannot simply apply claim to argue that many violating pairs w.r.t. f. If we could only test violations w.r.t. g instead of f...

Testing Singletons IV - The Algorithm Final Algorithm for Testing Singletons: (1) Test whether f is a parity function with dist. par.  using algorithm of [BLR]. (2) Uniformly select constant number of pairs x,y. Verify that Self-Cor(f,x)  Self-Cor(f,y) = Self-Cor(f,xy). (3) Verify that Self-Cor( ) = 1.

Testing Monomials and Monotone DNF Monomial testing algorithm has similar structure to Singleton testing algorithm. (Here too suffice to find test for monotone monomials.) - The first stage of linearity testing is replaced by Affinity Testing: if f is a monomial then F 1 ={x: f(x)=1} is an affine subspace. [ Fact: H is affine subspace i.f.f x,y,zH, xyz H ]. Affinity test is similar to parity test: select x,yF 1, z{0,1} n, verify that f(xyz)=f(x)f(y)f(z). - The second stage is also analogous singleton test (check for violating pairs). Here affinity adds structure that helps analyze second stage. Testing monotone DNF: use monomial test as sub-routine Result of [DLMORSW07] which extends to other families (e.g., non-monotone DNF) uses different techniques.

Testing of Clustering [Alon,Dar,Parnas,R] X - set of points |X| = n Notation: dist(x,y) - distance between points x and y. Assume that triangle inequality holds (dist(x,y ) ≤ dist(x,z)+dist(z,y)). For any subset S of X: Diameter of S d(S) = max x,yS dist(x,y)

Clustering Cont’ X is (k,b)-clusterable if exists a k-way partition (clustering) of X s.t. each cluster has diameter at most b. X is  -far from being (k,b’)-clusterable (b’  b) if there is no k-way partition of any Y  X, |Y| ≥ (1-  )n s.t. each cluster has diameter at most b’. In first case algorithm should accept and in second reject with probability ≥ 2/3  b b (In particular, will look at b’=(1+)b,  ≤ 1.)

Clustering Cont’ Testing Algorithm (input: k,b,   ) (1) Take sample of m=m(k,  ) points from X. (2) If sample is (k,b)-clusterable then accept, o.w. reject If X is (k,b)-clusterable then always accept. Suppose X is  -far from being (k,(1+)b)-clusterable. Show that reject w.p. at least 2/3. Will prove for general metric and =1 where m=O(k/  Note: for general metric cannot go below =1 unless allow m=(|X| 1/2 ), but can do so for Euclidean distance in d dimensions (in that case must have dependence on (1/ ) d/2 ). Other sublinear clustering work (e.g. for k-median) include: [Indyk], [Mishra,Oblinger,Pitt], [Ben-David], [Meyerson,O’Callaghan,Plotkin]

Clustering Cont’ Consider following mental experiment: -Let points in sample be x 1,x 2,…,x m >b Claim: If X is  -far from being (k, 2b)-clusterable then w.h.p. |REPS|>k, causing the algorithm to reject as required. Proof Idea: At each step consider clustering “enforced” by REPS: Each point in X\REPS is assigned to closest x i in REPS. Additional sample points “test” this clustering. - Construct (growing) set of “cluster representatives” REPS. - Initially, REPS = {x 1 }. -At each step, take next point that is at distance > b from every x in REPS, and add to REPS.

Tolerant Testing of Clustering [Parnas,R,Rubinfeld] Tolerant Testing: Reject when  -far but accept when  ’-close Tolerant Testing Algorithm (input: k, b,  ’,   ) (1) Take sample of m=m(k,  ’,  ) points from X. (2) If sample is (  ’ + (  -  ’)/2)-close to (k,b)-clusterable then accept, o.w. reject Sample has quadratic dependence on 1/(  -  ’), and same dependence on other parameters as (standard) testing algorithm. Can analyze using a generalization of a framework by Czumaj & Sohler for (standard) testing that captures aspects of “enforce&test” approach.

Distribution-Free Testing (with queries) First results [GGR]: trivial positive results that follow from learning  testing, and simple negative results for combinatorial properties (e.g., bipartiteness) First non-trivial positive results [Halevi, Kushilevitz]: (1) Linearity and Low-degree polynomials. In general: when have self-corrector. (2) Monotonicity in low dimensions (Also had positive results for graph properties) On the other hand, [Halevi, Kushilevitz] showed hardness of distribution free testing of monotonicity in high-dimensions (i.e., exponential in d over {0,1} d ). Recently, [Glasner,Servedio] showed that for several classes (monomials, decisions lists, linear threshold functions) need  ((n/log n) 1/5 ) queries for dist-free testing.

Conclusions and Open Problems  Property testing: A relaxation of learning where should determine (w.h.p) if exists good approximation of function f in class F rather than find such approximation.  Can serve as preliminary step to learning.  For quite a few function classes have testing algorithms that are more query-efficient than learning algorithms.  Some extend to dist-free testing, for some have strong lower bounds.  Still much is left to be understood about relation between testing and learning (has flavor of relation between decision and search).

Thanks

Property Testing and Learning: Basic Comments/Observations  Motivation I: Can use testing as preliminary step to learning: That is, for efficiently selecting good hypothesis class.  Motivation II: Relation between testing and learning has similarity to relation between decision and search.  Testing is no harder than (proper) learning: If have learning algorithm for function class F that outputs hypothesis in F, then can use to test the property of belonging to F. Want testing algorithm to be significantly more efficient than learning algorithm. That is, run learning alg with accuracy parameter set to  /2, and check that the hypothesis it outputs is at most 3  /4 far from f on independent sample.

Linearity Testing Cont’ Suppose f is accepted w.p > 1/3 Define self-corrected version of f, denote g: For each x,y let V y (x) = f(x+y)-f(y) ( the vote of y on x ) g(x) = Plurality(V y (x)) small (<  /2) fraction of violating pairs (f(x)+f(y)f(x+y)) Can show that ( conditioned on < /2 fraction of violating pairs ) (1) g is linear. (2) dist(f,g)   Main Technical Lemma (informal): if few violating pairs then  x we have that for almost all y, V y (x)=g(x) Lemma: If f is accepted with probability greater than 1/3, then f is -close to linear.

Learning Boolean Formulae Main difference w.r.t testing results: in testing there is no dependence on n and different algorithmic approach. Can learn singletons and monomials under uniform distribution using O(log n/  ) queries/examples (variation on Occam’s razor) Can properly learn monotone DNF with t terms and r literals using Õ(r log 2 n/  + t(r + 1/  )) queries [Angluin+Bshouty&Jackson&Tamon].

Property Testing: A Learning Theory Perspective Dana Ron Tel Aviv University.

Similar presentations

Presentation on theme: "Property Testing: A Learning Theory Perspective Dana Ron Tel Aviv University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Property Testing: A Learning Theory Perspective Dana Ron Tel Aviv University.

Similar presentations

Presentation on theme: "Property Testing: A Learning Theory Perspective Dana Ron Tel Aviv University."— Presentation transcript:

Similar presentations

About project

Feedback