Presentation is loading. Please wait.

Presentation is loading. Please wait.

Some Techniques in Property Testing Dana Ron Tel Aviv University.

Similar presentations


Presentation on theme: "Some Techniques in Property Testing Dana Ron Tel Aviv University."— Presentation transcript:

1 Some Techniques in Property Testing Dana Ron Tel Aviv University

2 Property Testing (Informal Definition) For a fixed property P and any object O, determine whether O has property P, or whether O is far from having property P (i.e., far from any other object having P ). Task should be performed by inspecting the object (in as few places as possible). ?? ? ? ?

3 Examples The object can be a function and the property can be linearity. LThe object can be a string and the property can be membership in a fixed regular language L. The object can be a graph and the property can be 3-colorabilty.

4 Context A relaxation of exactly deciding whether the object has the property or does not have the property. A relaxation of learning the object (with membership queries and under the uniform distribution). Property testing can be viewed as: In either case want testing algorithm to be significantly more efficient than decision/learning algorithm.

5 When can Property Testing be Useful? Object is to HUGE and even scanning it is infeasible so must make approximate decision. Object is just large but exact decision is NP-hard. Use Testing as preliminary step to exact decision. Namely, use testing to very quickly rule out objects that are far from having the property. Have poly-time exact algorithm, but approximate answer suffices so prefer sub-linear approximate algorithm.

6 Property Testing - Background Initially defined by Rubinfeld and Sudan in the context of Program Testing (of algebraic functions). With Goldreich and Goldwasser initiated study of testing properties of combinatorial objects, and in particular graphs. Growing body of work deals with properties of functions, graphs, strings, sets of points... Many algorithms with complexity that is sub-linear in (or even independent of) size of object.

7 Issues and Categories Types of objects and properties.  self-correcting  enforce & test  regularity lemma  testing by implicit learning  testing based on invariance Analysis techniques. Algorithmic techniques.  functions (algebraic and non-algebraic properties);  graphs;  strings;  matrices;  geometric objects;  sets of points; Mostly: global sampling + local exploration

8 The Self-Correcting Approach

9 Linearity Testing [Blum Luby Rubinfeld] Def1: A function f : F n  F is called linear (multi-linear) if there exist coefficients a 1,…,a n  F s.t. f(x 1,…,x n ) =  a i x i. Def2: A function f is said to be - far from linear if for every linear function g, dist(f,g)> , where dist(f,g)=Pr[f(x)  g(x)] (x selected uniformly in F n ). Def3: Linearity Testing Problem: Algorithm can query function on any x in F n to obtain f(x) - if f is linear then alg should accept; - if f is -far from linear then alg should reject w.h.p.; Fact: A function f : F n  F is linear i.f.f for every x,y  F n it holds that f(x)+f(y)=f(x+y).

10 Linearity Testing Cont’ Linearity Testing algorithm 1) Uniformly and independently select  (1/  ) pairs of elements x,y  F n. 2) For every pair x,y selected, verify that f(x)+f(y) = f(x+y). 3) If for any of the pairs selected linearity is violated (i.e., f(x)+f(y)  f(x+y)), then REJECT, otherwise ACCEPT. Observe: If f is linear then test accepts w.p. 1. Lemma: If f is - far from linear then with probability at least 2/3 the test rejects it. Lemma: If f is accepted with probability greater than 1/3, then f is  -close to linear.

11 Linearity Testing Cont’ Suppose f is accepted w.p > 1/3 Define self-corrected version of f, denote g: For each x,y let V y (x) = f(x+y)-f(y) ( the vote of y on x ) g(x) = Plurality(V y (x)) small (< /2) fraction of violating pairs (f(x)+f(y)f(x+y)) Can show that ( conditioned on < /2 fraction of violating pairs ) (1) g is linear. (2) dist(f,g)   Main Technical Lemma (informal): if few violating pairs then  x we have that for almost all y, V y (x)=g(x) Lemma: If f is accepted with probability greater than 1/3, then f is -close to linear.

12 Testing Polynomials (over finite fields) Def: A function f : F n  F is a (total) degree d polynomial if there exist coefficients {a v } where v=v 1 …v n, v i ≥ 0,  v i  d s.t. Different algorithms were designed to deal with different cases (e.g. d=1 [BLR], |F|>d [Rubinfeld, Sudan], F=GF(2), d>1 [Alon,Kaufman,Krivelevich,Litsyn,R] ), and are analyzed using Self-correction approach. Unifying algorithm [Kaufman,R] works by restricting function to low-dimensional affine subspaces, and checking that restriction is low-deg poly (for prime fields, dimension is (d+1)/(|F|-1)). Self correction (definition of “good” function g) works by correcting value on point based on “vote” of all subspaces it belongs to.

13 Notes on Self-Correcting Approach Note1: definition of self-correction function g allows to actually correct f : for every x can determine g(x) w.h.p by few queries to f. Note2: Found useful when testing properties that correspond to subclasses of above. For example, singleton functions (f(x) = x i ) are subclass of linear functions. Test for singletons [Parnas, R, Samorodnitsky] first runs linearity test. If passes, then runs additional check on self-corrected version of function. Note3: Found useful for distribution-free testing [Halevi, Kushilevitz]: General transformation for testers under uniform dist. to dist.-free when can self-correct.

14 The Enforce&Test Approach

15 Testing Bipartiteness Def1: Graph G=(V,E) is bipartite i.f.f. can partition vertices into two subsets V 1 and V 2 s.t. there are no edges between vertices that are both in V 1 or both in V 2. Recall that can decide whether graph is bipartite in time O(|V|+|E|) by Breadth First Search (BFS). However, we want very fast approximate decision. Def2: Graph G=(V,E) is - far from bipartite if every partition (V 1, V 2 ) has more than  |E| violating edges. V1V1 V2V2 Here consider dense case: |E| =  (|V| 2 ). Graph is represented by adjacency matrix, and alg can probe matrix.

16 Testing Bipartiteness in Dense Graphs [Goldreich Goldwasser R] Uniformly and independently select  (log(1/  )/  2 ) vertices in graph. If subgraph induced by selected vertices is bipartite, then accept, otherwise, reject. Query complexity and running time of algorithm: O(log 2 (1/  )/  4 ). Slight variant yields O(log 2 (1/  )/  3 ) and [Alon, Krivelevich] reduced to O(log 2 (1/  )/  2 ). Correctness: If graph is bipartite then always accepted. Need to prove that if  -far from bipartite then rejected w.h.p.

17 High-Level idea of Analysis (When Graph is  -far from bipartite) View sample as two parts: U and S. Idea: each partition (U 1,U 2 ) of U “enforces” a partition of all vertices. S Suppose every graph vertex has some neighbor in U. (In fact, w.h.p. over U holds for almost all sufficiently high degree vertices.) Since G is -far from bipartite, partition must have many violations. U1U1 U2U2 Will show w.h.p. in sample S (“test” sample) Since holds for every partition (U 1,U 2 ) of U, w.h.p. do not have any bipartite partition of U and S together (induced subgraph not bipartite).

18 Notes on Enforce&Test Note1: Bipartite Testing algorithm and enfroce&test analysis can be generalized to testing k-colorability [GGR]. Note2: Other properties whose analysis falls under enforce&test approach:  -Clique,  -Cut, and other graph partition properties [GGR]; Hypergraph coloring [Czumaj, Sohler]; Tree metric properties [Parnas R]; Clustering [Alon, Dar, Parnas, R] and more. Note3: For k-colorability, Clustering and other properties, can use output of tester to actually construct approximately good colorings/clusterings. E.g., for Bipartiteness, if graph is bipartite can determine partition that is approximately good, in constant time per vertex (has certain similarity to self correction).

19 Testing By Implicit Learning

20 Testing for Concise Representations [Diakonikolis, Lee, Matulef, Onak, Rubinfeld, Servedio,Wan] Results (partial) for n-variable Boolean functions: Decision lists Õ(1/ 2 ) s-term DNF Õ(s 4 / 2 ) size-s Decision Trees Õ(s 4 / 2 ) size-s Branching Programs Õ(s 4 / 2 ) size-s Boolean formula Õ(s 4 / 2 ) size-s Boolean circuits Õ(s 6 / 2 ) s-sparse polynomials over GF(2) Õ(s 4 / 2 ) Functions with Fourier deg ≤ d Õ(2 6d / 2 ) Class of functions Num of queries For all classes, poly(1/  ) and no dependence on n

21 Testing for Concise Representations (cont) Observation: many classes of functions that have concise representations (e.g., s-term DNF) can be approximated by small juntas in the class. Example: every s-term DNF function f is -close to an s-term DNF that depends on slog(s/) variables. Rough idea of algorithm(s): 1.Find collection of subsets of variables s.t. each contains a single variable on which function depends (non-negligibly) (variant of junta testing [Fischer,Kindler,R,Safra,Samrodnitsky]) – if num of subsets greater than some k, rejects. 2.Based on subsets create sample of labeled examples over {0,1} k (does not identify relevant variables). 3.Check whether exists function of appropriate form over k variables that is consistent with sample.

22 Testing for Concise Representations (cont) Rough idea of algorithm(s): 1.Finds collection of subsets of variables s.t. each contains a single variable on which function depends (non-negligibly) – if num of subsets greater than some k, rejects. 2.Based on subsets creates sample of labeled examples over {0,1} k (does not identify relevant variables). 3.Checks whether exists function of appropriate form over k variables that is consistent with sample. D - - D - D - - - D 1 - - 0 - 0 - - - 1 1 x 1  x 4 is consistent with labeled sample  accept. 0 - - 1 - 0 - - - 1 0 1 - - 1 - 1 - - - 0 0

23 Notes on Testing by Implicit Learning Note1: technique gives rise to many positive results (also extends to non-Boolean functions) Note2: well known that (proper) learning implies testing, but with roughly the same complexity. By using implicit learning save in complexity Note3: running time in general is exponential in query complexity. New result for sparse polynomials over GF(2) [Diakonikolis, Lee, Matulef, Servedio,Wan] gives time- efficient algorithm.

24 Extensions of PT: Tolerant Testing and Distance Approximation [Parnas, Rubinfeld,R] Tolerant Testing: Given parameters 0 ≤  1 <  2 distinguish between being  1 –close to property P and  2 –far from P (“standard” testing:  1 = 0) Example: Clustering. Standard testing requires to accept only perfect clusterings (k clusters, quality (e.g., diameter) q). Tolerant testing requires to accept good clusterings (with few outliers.) Results: clustering, monotonicity, local testing of codes, graph properties (dense and sparse models), and more. Distance approximation: estimate distance of object from having property P.  b b

25 What Hasn’t been Covered? Lot’s of things! Important Analysis Tool for Graph Properties: Szemerdi’s regularity lemma (variants of). Used for analyzing  graph properties (includes partition and forbidden subgraph properties)  Alon,Fischer,Krivelevich,Szegedy]. Many other results used it since. Recently used to characterize all properties testable with no dependence on size of graph [Alon, Fischer, Newman, Shapira] Important component for graph properties lower bounds (forbidden subgraphs) : Arithmetic Progressions [Alon], [Alon, Shapira] (x3) Tantalizing open problem: What is complexity of testing triangle-freeness (in dense-graphs model)? UB: tower of height poly(1/). LB: (roughly) exp(1/)

26 Thanks

27 Testing and the Regularity Lemma [Alon,Fischer,Krivelevich, Szegedy],[Alon,Shapira] *,…, [Alon, Fischer, Newman, Shapira] The Basis: For every , the vertices of every (sufficiently large) graph can be partitioned into t=t() subsets V 1,…,V t of equal size s.t. edge distribution between subsets V i, V j is roughly like in random graph with edge prob. p i,j = |E(V i,V j )|/|V i ||V j |. Results: of algorithm

28 111013121514171619182120232225242726292831303332 Last Example: Monotonicity Testing Def: A function f : [n]  R is monotone if for every i,j in [n], i< j we have f(i) ≤ f(j). It is - far from montone if must modify more than - fraction of values so that become monotone. Observation: “Natural algorithm” (take uniform sample and check whether f is monotone on sample) does not work unless sample size = (n 1/2 ),

29 2928 32333031 An alternative testing algorithm: Repeat the following O(1/  ) times: 1.Pick an entry uniformly at random. Let x be the value in that entry. 2.Perform a binary search for x 3.If x is found, output accept, otherwise, output reject. X = 28 2021181916171415121310 11 262724252223 Monotonicity Testing Cont’ Main Claim: entries for which search succeeds define a monotonically non-decreasing sequence. Hence, If  –far then must have more than  –fraction entries on which search fails, causing testing to reject w.h.p.

30 Tolerant Testing of Clustering [Parnas,R,Rubinfeld] Tolerant Testing: Reject when  -far but accept when  ’-close Tolerant Testing Algorithm (input: k,  ’,   ) (1) Take sample of m=m(k,  ’,  ) points from X. (2) If sample is (  ’ + (  -  ’)/2)-close to (k,b)-clusterable then accept, o.w. reject Sample has quadratic dependence on 1/(  -  ’), and same dependence on other parameters as (standard) testing algorithm. Can analyze using a generalization of a framework by Czumaj & Sohler for (standard) testing that captures aspects of “enforce&test” approach.

31 Directions for Further Research  “Biggest” open problem: Can we characterize what properties are efficiently testable? (e.g., find a measure analogous to VC - dimension.)  Find Families of properties that are efficiently testable. (Similarly to results for partition properties of graphs,  graph properties and regular languages result.)  Extend scope of property testing.


Download ppt "Some Techniques in Property Testing Dana Ron Tel Aviv University."

Similar presentations


Ads by Google