Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Tutorial on Property Testing Dana Ron Tel Aviv University.

Similar presentations


Presentation on theme: "A Tutorial on Property Testing Dana Ron Tel Aviv University."— Presentation transcript:

1 A Tutorial on Property Testing Dana Ron Tel Aviv University

2 Property Testing (Informal Definition) For a fixed property P and any object O, determine whether O has property P, or whether O is far from having property P (i.e., far from any other object having P ). Task should be performed by querying the object (in as few places as possible). ?? ? ? ?

3 Examples The object can be a graph (represented by its adjacency matrix), and the property can be 3-colorabilty. LThe object can be a string and the property can be membership in a given regular language L. The object can be a function and the property can be linearity.

4 Context A relaxation of exactly deciding whether the object has the property. A relaxation of learning the object. Property testing can be viewed as: In either case want testing algorithm to be significantly more efficient than decision/learning algorithm.

5 When can Property Testing be Useful? Object is to too large to even fully scan, so must make approximate decision. Object is not too large but (1) Exact decision is NP-hard (e.g. coloring) (2) Prefer sub-linear approximate algorithm to polynomial exact algorithm. Use Testing as preliminary step to exact decision or learning. In first case can quickly rule out object far from property. In second case can aid in efficiently selecting good hypothesis class.

6 Property Testing - Background Initially defined by Rubinfeld and Sudan in the context of Program Testing (of algebraic functions). Goldreich Goldwasser and Ron initiated study of testing properties of graphs. Growing body of work deals with properties of functions, graphs, strings, sets of points... Many algorithms with complexity that is sub-linear in (or even independent of) size of object.

7 Talk Organization Will discuss four topics: Testing Algebraic Properties of Functions: Linearity Testing [BLR] Testing “Basic” (non-algebraic) Properties of Functions: Singletons, Monomials, small DNF [PRS] Testing Graph Properties: Testing Bipartiteness [GGR] Testing Properties of strings: Testing Membership in Regular Languages [AKNS]

8 Testing Algebraic Properties of Functions: Linearity Testing [BLR]

9 Linearity Testing Def1: Let F be a finite field. A function f : F m  F is called linear (multi-linear) if there exists constants a 1,…,a m  F s.t. for every x=x 1,…,x m  F m it holds that f(x) =  a i x i. Fact: A function f : F m  F is linear i.f.f for every x,y  F m it holds that f(x)+f(y)=f(x+y). Def2: A function f is said to be - far from linear if for every linear function g, dist(f,g)> , where dist(f,g)=Pr[f(x)  g(x)] (x selected uniformly in F m ).

10 Linearity Testing Cont’ Linearity Test (Input: F, m, ) 1) Uniformly and independently select  (1/  ) pairs of elements x,y  F m. 2) For every pair x,y selected, verify that f(x)+f(y) = f(x+y). 3) If for any of the pairs selected linearity is violated (i.e., f(x)+f(y)  f(x+y)), then REJECT, otherwise ACCEPT. Observe: If f is linear then tests accepts w.p. 1. Theorem: If f is - far from linear then with probability at least 2/3 the test rejects it.

11 Linearity Testing Cont’ Proof (of special case): Let  (f) denote distance of f to closest linear function g. Assume 1/2 -  (f) is constant. Let G={x: f(x)=g(x)} (so that Pr[xG]=  (f)>  ). Say that x and y are a violating pair if f(x)+f(y)  f(x+y). Observation: for any x, y, if among the 3 elements, x, y, x+y we have 2 in G and 1 not in G, then x,y are a violating pair. Consider one of the 3 (disjoint) events. Can show: Pr[xG, yG, (x+y) G ]   (f)  (1 - 2  (f) ). Since events are disjoint, prob of violating pair is at least 3  (f)  (1 - 2  (f) ) = 6  (f)  (1/2-  (f) ) = ( ). Since test takes  (1/  ) pairs x,y, will reject w.h.p. 

12 Linearity Testing Cont’ How do we deal with the general case (where  (f) not necessarily bounded away from 1/2)? In order to prove that if  (f)>  then reject w.p.  2/3, prove contrapositive: if accept w.p > 1/3 (i.e., small fraction of violating pairs) then f is  -close to linear. That is, exists linear g s.t. dist(f,g)  . Specifically, define g as follows: g(x) = 1 if Pr y [f(x+y)-f(y)=1]  1/2 g(x) = 0 if Pr y [f(x+y)-f(y)=0] > 1/2 Can prove that if fraction of violating pairs (w.r.t. f) is sufficiently small the f is close to g and g is linear. Note: definition of g allows for Self-Correcting of f (for every x can determine g(x) w.h.p by few queries to f).

13 Testing “Basic” Properties of Functions: Singletons, Monomials, small DNF [PRS]

14 Testing “Basic” Properties of Functions: This work considers “The most basic” function classes: Singletons: Monomials: DNF:

15 Testing “Basic” Properties of Functions Cont’ Can test whether f is a singleton using queries. Can test whether f is a monomial using queries. Can test whether f is a monotone DNF with at most t terms using queries. Common theme: no dependence in query complexity on size of input, n, and polynomial dependence on distance parameter, .

16 Learning Boolean Formulae Basic observation: (proper) learning implies testing. Main difference w.r.t testing results: no dependence on n and different algorithmic approach. Can learn singletons and monomials under uniform distribution using queries [BEHW]. Can properly learn monotone DNF with t terms and r literals using queries [A+BJT]. F fh  F f h 

17 Testing (Monotone) Singletons Singletons satisfy: (1) (2) Natural test: check, by sampling, that conditions hold (approximately). Can analyze natural test for case that distance between function and class of singletons is not too big (bounded from 1/2).

18 Testing Singletons II - Parity Testing Observation: Singletons are a special case of parity functions (i.e., functions of the form.) Claim: Let. If then Modified algorithm: (1) Test whether f is a parity function (with dist. par.  ) using algorithm of [BLR]. (2) Uniformly select constant number of pairs x,y and check whether any is a violating pair (i.e.: ).

19 Testing Singletons III - Self Correcting Use Self-Corrector of [BLR] to “fix” f into parity function (g), and then test violations on self-corrected version. This “almost works”: If f is singleton - always accepted. If f is  -far from parity - rejected w.h.p. But if f is  -close to parity function g, then cannot simply apply claim to argue that many violating pairs w.r.t. f. If we could only test violations w.r.t. g instead of f...

20 Testing Singletons IIII - The Algorithm Final Algorithm for Testing Singletons: (1) Test whether f is a parity function with dist. par.  using algorithm of [BLR]. (2) Uniformly select constant number of pairs x,y. Verify that Self-Cor(f,x)  Self-Cor(f,y) = Self-Cor(f,xy). (3) Verify that Self-Cor( ) = 1.

21 Testing Monomials and Monotone DNF Monomial testing algorithm has similar structure to Singleton testing algorithm. (Here too suffice to find test for monotone monomials.) The first stage of linearity testing is replaced by Affinity Testing: if f is a monomial then F 1 ={x: f(x)=1} is an affine subspace. [Fact: H is affine subspace i.f.f x,y,zH, xyz H]. Affinity test is similar to parity test: select x,yF 1, z{0,1} n, verify that f(xyz)=f(x)f(y)f(z). The second stage is as in singleton test (check for violating pairs). Here affinity adds structure that helps analyze second stage. Testing monotone DNF: use monomial test as sub-routine (a monotone DNF function is a disjunction of monotone monomials).

22 Testing Graph Properties [GGR]

23 Testing Graph Properties Assume graphs are represented by their adjacency matrix. In this model, testing algorithm can perform queries: “is there an edge between u and v”. Distance between graphs: fraction of entries in adjacency matrix on which they differ. This model most appropriate for testing dense graphs. v u 1

24 Results for Testing Graph Properties Can test: Bipartiteness, k-colorability, -Clique, -Cut and a more general family of partition problems, with sample complexity poly(1/  and running time exp(poly(1/  both independent of size of graph [GGR]. Can test all properties that can be formulated by first order expression  about graphs with sample and time complexity independent of graph size (but at “steep” cost as function of 1/  [AFKS]. In directed graphs can test acyclicity with sample and time complexity poly(1/  [BR] (special case treated in [EKKRV]). In Adjacency-Matrix model In Incidence-Lists model Connectivity, k-edge-connectivity: complexity poly(1/  [GR1], Bipartiteness: poly(1/  |V| 1/2  [GR2], Diameter: poly(1/  [PR].

25 Testing Bipartiteness Def: Graph G=(V,E) is bipartite i.f.f. can partition vertices into two subsets V 1 and V 2 s.t. there are no edges between vertices that are both in V 1 or both in V 2. V1V1 V2V2 Recall that can decide whether graph is bipartite in time O(|V|+|E|) by Breadth First Search (BFS). However, we want very fast approximate decision. Furthermore, can extend algorithm and analysis to testing k-colorability (which is NP-Hard).

26 Testing Bipartiteness Cont’ Bipartite Testing Algorithm Uniformly and independently select m=  (log(1/  )/  2 ) vertices in graph. For every pair of vertices selected query whether there is an edge between the two, obtaining induced sub-graph. Perform a BFS to determine whether induced subgraph is bipartite. If it is output accept, o.w. output reject. Query complexity and running time of algorithm: O (log 2 (1/  )/  4 ). Slight variant of alg yields O (log 2 (1/  )/  3 ) and [AK] have reduced to O (log 2 (1/  )/  2 ). Correctness: If graph is bipartite then clearly always accepted. From this point on assume graph is  -far from bipartite. Will show that rejected w.p. at least 2/3. G

27 Analysis of Bipartiteness Testing Alg Def: Let X be a subset of points, and (X 1,X 2 ) a partition of X. Say that an edge (u,v) is violating w.r.t. (X 1,X 2 ) if either both u,v in X 1 or both in X 2. View sample as consisting of two parts: U and S. Show that w.h.p., for every partition (U 1,U 2 ) of U there is no partition (S 1,S 2 ) of S, s.t. (U 1 S 1,U 2 S 2 ) is bipartite. u v X1X1 X2X2 X1X1 X2X2 If there are no violating edges w.r.t. (X 1,X 2 ) then say it is a bipartite partition. In other words, the sub-graph induces by sample US is not bipartite. U1U1 U2U2 S U1U1 U2U2 S

28 Analysis of Bipartiteness Testing Alg Cont’ Def1: A vertex v is influential if has degree at least (  /4)|V|. Def2: A vertex v is covered by subset U if has neighbor in U. Lem: W.h.p. U covers all influential vertices but (  /4)|V|. U v V U InfluentialNon-influential Uncovered influential

29 Analysis of Bipartiteness Testing Alg Cont’ Let C be vertices covered by U and let R be remaining vertices. Observe: Since R contains at most all non-influential vertices, and at most (  /4)|V| influential ones, total num of edges incident to R is at most (  /2)|V| 2. Recall, graph G is  -far from bipartite: every partition (V 1,V 2 ) of V has >  |V| 2 violating edges. Together, above two imply that every partition of UC has > (  /2)|V| 2 violating edges. C U R Uncovered influential Non-influential

30 Analysis of Bipartiteness Testing Alg Cont’ Consider fixed partition (U 1,U 2 ) of U, and let (C 1,C 2 ) be partition of C where neighbors of vertices in U 1 are put in C 2 and neighbors of vertices in U 2 are put in C 1. U1U1 C2C2 C1C1 U2U2 Since (U 1 C 1,U 2 C 2 ) contains > (  /2)|V| 2 violating edges, this many pairs of vertices (v,w) in C 1 (C 2 ) have violating edge between them. If get such pair (v,w) in sample S, then for every partition (S 1,S 2 ), partition (U 1 S 1,U 2 S 2 ) contains some violating edge. v w Since many such pairs, the sample S contains such a pair w.h.p. By union bound on number of partitions (U 1,U 2 ) (at most 2 |U| = exp(log(1/  )/  )) S contains such a pair for every (U 1,U 2 ). 

31 Testing Other Graph (Partition) Properties Each property (k-colorability, -Clique, -Cut ) has its own “particularities” but in all cases: “Natural algorithm” (take small uniform sub-sample and check induced subgraph for property) works. Analysis works by breaking sample into two parts: the first part, U “forces” constraints on possible partitions of all vertices. Second part, S, “tests” whether constraints are satisfied. More general results of [AFKS] (combination of partition and forbidden subgraph properties (  properties)) also analyze natural algorithm. Analysis builds on Szemerdi’s regularity lemma.

32 Testing Properties of Strings: Membership in Regular Languages [AKNS]

33 Testing Membership in Regular Languages For fixed regular language L  {0,1} *, testing algorithm should accept w.h.p. every word wL, and should reject w.h.p. every word w that differs on more than  n  bits (n=|w|) from every w’L (|w’|=n). Algorithm can query any bit w i of w. Let M=(Q,F,q 0,  ) be the (minimum) DFA that accepts L. Let G(M) denote directed graph induced by M (that is, there is a directed edge for every transition). Def: Let u=w i …w j be sub-word of w that starts at position i. Say that u is feasible w.r.t. M starting from i if there exists a state q s.t. q can be reached in G(M) from q 0 in exactly i-1 steps, and there is a path of length (n-(|u|+i-1)) in G(M) from q’=  (q,u) to an accepting state q f. q0q0 qq’qfqf i-1 stepsu n-(|u|+i-1) steps

34 Testing Regular Language Cont’ - The GCD of cycle-lengths in G(C) is 1  There exists a constant r (=O(|Q| 2 ) s.t.  q,q’  C,  m r, exists path of length m from q to q’. Consider special case: Unique accepting state q f ; Q can be partitioned into two parts: C and D: - q 0,q f  C ; - subgraph G(C) strongly connected; - no edges from D to C. C D q0q0 qfqf q q’

35 Testing Regular Language Cont’ The Algorithm (simplified version): Uniformly and independently select  (r/  ) indices 1i n. For each i selected, check that the substring w i … w i+r/  is feasible. If any substring is infeasible then reject, otherwise accept. Number of queries: O(r 2 /   )=poly(|Q|)/   and running time poly(|Q|)/   (can improve to almost linear dependence on 1/  ). Correctness: If wL, then always accept. If w is  -far from L, would like to show that w contains many (short) infeasible substrings (causing rejection w.h.p).

36 Testing Regular Language Cont’ Prove contrapositive statement: If number of (short) infeasible substrings in w is small then w is close to w*L Proof idea: partition w (except first and last r symbols) into disjoint maximal feasible substrings u 1, …,u h : each u j is feasible, but addition of next symbol w k makes it infeasible. By slightly modifying each u j, can “glue” the modified substrings together into one string w* that “does not leave C”, and reaches q f. If h is small (as assumed), the w* close to w. C D qjqj qj’qj’ ujuj wkwk q j+1 q’ j+1 u j+1

37 Testing Regular Language Cont’ General case works by reducing to special case we discussed. In particular need to decompose G(M) into its strongly connected components, and consider how a word “moves between them”. This work has been extended by Newman to testing Branching Programs of bounded width, and by Kupferman and XX to testing Tree Automata.

38 Directions for Further Research  “Biggest” open problem: Can we characterize what properties are efficiently testable? (e.g., find a measure analogous to VC - dimension.)  Find Families of properties that are efficiently testable. Exist some such results for testing graph properties (e.g. partition problems) and we have the regular languages result.  Extend scope of property testing.

39

40 Testing Properties of Collections of Points: Testing of Clustering

41 Property Testing - Background Properties of functions: Initially defined by Rubinfeld and Sudan in the context of Program Testing. Tested algebraic properties of functions: low-degree polynomials. Other work on testing algebraic properties: [BLR,R,EKKRV...]. Non-algebraic properties: Monotonicity [GGLRS,DGLRSS,B,FN]. Properties of other objects: Main focus: Graph properties: [GGR,GR,AK,AFKS,BR,PR,CS...] Growing body of work deals with properties of strings [AKNS,N,PRR], sets of points [PR], geometric objects [CSZ], distributions [BFRW], and more. All algorithms have complexity that is sub-linear in (or even independent of) size of object.


Download ppt "A Tutorial on Property Testing Dana Ron Tel Aviv University."

Similar presentations


Ads by Google