Some Techniques in Property Testing Dana Ron Tel Aviv University.

Slides:

Advertisements

Similar presentations

Finding Cycles and Trees in Sublinear Time Oded Goldreich Weizmann Institute of Science Joint work with Artur Czumaj, Dana Ron, C. Seshadhri, Asaf Shapira,

Advertisements

December 2, 2009 IPAM: Invariance in Property Testing 1 Invariance in Property Testing Madhu Sudan Microsoft/MIT TexPoint fonts used in EMF. Read the TexPoint.

Property Testing of Data Dimensionality Robert Krauthgamer ICSI and UC Berkeley Joint work with Ori Sasson (Hebrew U.)

Deterministic vs. Non-Deterministic Graph Property Testing Asaf Shapira Tel-Aviv University Joint work with Lior Gishboliner.

A UNIFIED FRAMEWORK FOR TESTING LINEAR-INVARIANT PROPERTIES ARNAB BHATTACHARYYA CSAIL, MIT (Joint work with ELENA GRIGORESCU and ASAF SHAPIRA)

Learning Juntas Elchanan Mossel UC Berkeley Ryan O’Donnell MIT Rocco Servedio Harvard.

1 Distribution-free testing algorithms for monomials with a sublinear number of queries Elya Dolev & Dana Ron Tel-Aviv University.

Distribution-free testing algorithms for monomials with a sublinear number of queries Elya Dolev & Dana Ron Tel-Aviv University.

1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.

Approximating Average Parameters of Graphs Oded Goldreich, Weizmann Institute Dana Ron, Tel Aviv University.

Christian Sohler | Every Property of Hyperfinite Graphs is Testable Ilan Newman and Christian Sohler.

Artur Czumaj Dept of Computer Science & DIMAP University of Warwick Testing Expansion in Bounded Degree Graphs Joint work with Christian Sohler.

Property Testing: A Learning Theory Perspective Dana Ron Tel Aviv University.

Why almost all k-colorable graphs are easy A. Coja-Oghlan, M. Krivelevich, D. Vilenchik.

Asaf Shapira (Georgia Tech) Joint work with: Arnab Bhattacharyya (MIT) Elena Grigorescu (Georgia Tech) Prasad Raghavendra (Georgia Tech) 1 Testing Odd-Cycle.

Testing of ‘massively parametrized problems’ - Ilan Newman Haifa University Based on joint work with: Sourav Chakraborty, Eldar Fischer, Shirley Halevi,

Proclaiming Dictators and Juntas or Testing Boolean Formulae Michal Parnas Dana Ron Alex Samorodnitsky.

Proximity Oblivious Testing Oded Goldreich Weizmann Institute of Science Joint work with Dana Ron.

Testing the Diameter of Graphs Michal Parnas Dana Ron.

Putting a Junta to the Test Joint work with Eldar Fischer, Dana Ron, Shmuel Safra, and Alex Samorodnitsky Guy Kindler.

Proximity Oblivious Testing Oded Goldreich Weizmann Institute of Science Joint work with Dana Ron.

NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.

Testing of Clustering Noga Alon, Seannie Dar Michal Parnas, Dana Ron.

Sublinear Algorithms for Approximating Graph Parameters Dana Ron Tel-Aviv University.

Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual.

Michael Bender - SUNY Stony Brook Dana Ron - Tel Aviv University Testing Acyclicity of Directed Graphs in Sublinear Time.

Testing Metric Properties Michal Parnas and Dana Ron.

On Proximity Oblivious Testing Oded Goldreich - Weizmann Institute of Science Dana Ron – Tel Aviv University.

1 On approximating the number of relevant variables in a function Dana Ron & Gilad Tsur Tel-Aviv University.

On Testing Convexity and Submodularity Michal Parnas Dana Ron Ronitt Rubinfeld.

1 On the Benefits of Adaptivity in Property Testing of Dense Graphs Joint work with Mira Gonen Dana Ron Tel-Aviv University.

1 Algorithmic Aspects in Property Testing of Dense Graphs Oded Goldreich – Weizmann Institute Dana Ron - Tel-Aviv University.

1 On the Benefits of Adaptivity in Property Testing of Dense Graphs Joint works with Mira Gonen and Oded Goldreich Dana Ron Tel-Aviv University.

Lower Bounds for Property Testing Luca Trevisan U C Berkeley.

Christian Sohler 1 University of Dortmund Testing Expansion in Bounded Degree Graphs Christian Sohler University of Dortmund (joint work with Artur Czumaj,

Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.

Approximating the Distance to Properties in Bounded-Degree and Sparse Graphs Sharon Marko, Weizmann Institute Dana Ron, Tel Aviv University.

On Testing Computability by small Width OBDDs Oded Goldreich Weizmann Institute of Science.

A Tutorial on Property Testing Dana Ron Tel Aviv University.

1 2 Introduction In this lecture we’ll cover: Definition of strings as functions and vice versa Error correcting codes Low degree polynomials Low degree.

Some 3CNF Properties are Hard to Test Eli Ben-Sasson Harvard & MIT Prahladh Harsha MIT Sofya Raskhodnikova MIT.

Dana Moshkovitz, MIT Joint work with Subhash Khot, NYU.

Correlation testing for affine invariant properties on Shachar Lovett Institute for Advanced Study Joint with Hamed Hatami (McGill)

Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)

Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

July The Mathematical Challenge of Large Networks László Lovász Eötvös Loránd University, Budapest

Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.

Lower Bounds for Property Testing Luca Trevisan U.C. Berkeley.

Testing the independence number of hypergraphs

Property Testing: Sublinear-Time Approximate Decisions Oded Goldreich Weizmann Institute of Science Talk at CTW, July 2013.

Amplification and Derandomization Without Slowdown Dana Moshkovitz MIT Joint work with Ofer Grossman (MIT)

Artur Czumaj DIMAP DIMAP (Centre for Discrete Maths and it Applications) Computer Science & Department of Computer Science University of Warwick Testing.

狄彥吾 (Yen-Wu Ti) 華夏技術學院資訊工程系 Property Testing on Combinatorial Objects.

Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.

Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.

On Sample Based Testers

Dana Ron Tel-Aviv University

Property Testing (a.k.a. Sublinear Algorithms )

Dana Ron Tel Aviv University

On Testing Dynamic Environments

Approximating the MST Weight in Sublinear Time

On Approximating the Number of Relevant Variables in a Function

From dense to sparse and back again: On testing graph properties (and some properties of Oded)

On Learning and Testing Dynamic Environments

NP-Completeness Yin Tat Lee

NP-Completeness Yin Tat Lee

The Subgraph Testing Model

Switching Lemmas and Proof Complexity

Presentation transcript:

Some Techniques in Property Testing Dana Ron Tel Aviv University

Property Testing (Informal Definition) For a fixed property P and any object O, determine whether O has property P, or whether O is far from having property P (i.e., far from any other object having P ). Task should be performed by inspecting the object (in as few places as possible). ?? ? ? ?

Examples The object can be a function and the property can be linearity. LThe object can be a string and the property can be membership in a fixed regular language L. The object can be a graph and the property can be 3-colorabilty.

Context A relaxation of exactly deciding whether the object has the property or does not have the property. A relaxation of learning the object (with membership queries and under the uniform distribution). Property testing can be viewed as: In either case want testing algorithm to be significantly more efficient than decision/learning algorithm.

When can Property Testing be Useful? Object is to HUGE and even scanning it is infeasible so must make approximate decision. Object is just large but exact decision is NP-hard. Use Testing as preliminary step to exact decision. Namely, use testing to very quickly rule out objects that are far from having the property. Have poly-time exact algorithm, but approximate answer suffices so prefer sub-linear approximate algorithm.

Property Testing - Background Initially defined by Rubinfeld and Sudan in the context of Program Testing (of algebraic functions). With Goldreich and Goldwasser initiated study of testing properties of combinatorial objects, and in particular graphs. Growing body of work deals with properties of functions, graphs, strings, sets of points... Many algorithms with complexity that is sub-linear in (or even independent of) size of object.

Issues and Categories Types of objects and properties.  self-correcting  enforce & test  regularity lemma  testing by implicit learning  testing based on invariance Analysis techniques. Algorithmic techniques.  functions (algebraic and non-algebraic properties);  graphs;  strings;  matrices;  geometric objects;  sets of points; Mostly: global sampling + local exploration

The Self-Correcting Approach

Linearity Testing [Blum Luby Rubinfeld] Def1: A function f : F n  F is called linear (multi-linear) if there exist coefficients a 1,…,a n  F s.t. f(x 1,…,x n ) =  a i x i. Def2: A function f is said to be - far from linear if for every linear function g, dist(f,g)> , where dist(f,g)=Pr[f(x)  g(x)] (x selected uniformly in F n ). Def3: Linearity Testing Problem: Algorithm can query function on any x in F n to obtain f(x) - if f is linear then alg should accept; - if f is -far from linear then alg should reject w.h.p.; Fact: A function f : F n  F is linear i.f.f for every x,y  F n it holds that f(x)+f(y)=f(x+y).

Linearity Testing Cont’ Linearity Testing algorithm 1) Uniformly and independently select  (1/  ) pairs of elements x,y  F n. 2) For every pair x,y selected, verify that f(x)+f(y) = f(x+y). 3) If for any of the pairs selected linearity is violated (i.e., f(x)+f(y)  f(x+y)), then REJECT, otherwise ACCEPT. Observe: If f is linear then test accepts w.p. 1. Lemma: If f is - far from linear then with probability at least 2/3 the test rejects it. Lemma: If f is accepted with probability greater than 1/3, then f is  -close to linear.

Linearity Testing Cont’ Suppose f is accepted w.p > 1/3 Define self-corrected version of f, denote g: For each x,y let V y (x) = f(x+y)-f(y) ( the vote of y on x ) g(x) = Plurality(V y (x)) small (< /2) fraction of violating pairs (f(x)+f(y)f(x+y)) Can show that ( conditioned on < /2 fraction of violating pairs ) (1) g is linear. (2) dist(f,g)   Main Technical Lemma (informal): if few violating pairs then  x we have that for almost all y, V y (x)=g(x) Lemma: If f is accepted with probability greater than 1/3, then f is -close to linear.

Testing Polynomials (over finite fields) Def: A function f : F n  F is a (total) degree d polynomial if there exist coefficients {a v } where v=v 1 …v n, v i ≥ 0,  v i  d s.t. Different algorithms were designed to deal with different cases (e.g. d=1 [BLR], |F|>d [Rubinfeld, Sudan], F=GF(2), d>1 [Alon,Kaufman,Krivelevich,Litsyn,R] ), and are analyzed using Self-correction approach. Unifying algorithm [Kaufman,R] works by restricting function to low-dimensional affine subspaces, and checking that restriction is low-deg poly (for prime fields, dimension is (d+1)/(|F|-1)). Self correction (definition of “good” function g) works by correcting value on point based on “vote” of all subspaces it belongs to.

Notes on Self-Correcting Approach Note1: definition of self-correction function g allows to actually correct f : for every x can determine g(x) w.h.p by few queries to f. Note2: Found useful when testing properties that correspond to subclasses of above. For example, singleton functions (f(x) = x i ) are subclass of linear functions. Test for singletons [Parnas, R, Samorodnitsky] first runs linearity test. If passes, then runs additional check on self-corrected version of function. Note3: Found useful for distribution-free testing [Halevi, Kushilevitz]: General transformation for testers under uniform dist. to dist.-free when can self-correct.

The Enforce&Test Approach

Testing Bipartiteness Def1: Graph G=(V,E) is bipartite i.f.f. can partition vertices into two subsets V 1 and V 2 s.t. there are no edges between vertices that are both in V 1 or both in V 2. Recall that can decide whether graph is bipartite in time O(|V|+|E|) by Breadth First Search (BFS). However, we want very fast approximate decision. Def2: Graph G=(V,E) is - far from bipartite if every partition (V 1, V 2 ) has more than  |E| violating edges. V1V1 V2V2 Here consider dense case: |E| =  (|V| 2 ). Graph is represented by adjacency matrix, and alg can probe matrix.

Testing Bipartiteness in Dense Graphs [Goldreich Goldwasser R] Uniformly and independently select  (log(1/  )/  2 ) vertices in graph. If subgraph induced by selected vertices is bipartite, then accept, otherwise, reject. Query complexity and running time of algorithm: O(log 2 (1/  )/  4 ). Slight variant yields O(log 2 (1/  )/  3 ) and [Alon, Krivelevich] reduced to O(log 2 (1/  )/  2 ). Correctness: If graph is bipartite then always accepted. Need to prove that if  -far from bipartite then rejected w.h.p.

High-Level idea of Analysis (When Graph is  -far from bipartite) View sample as two parts: U and S. Idea: each partition (U 1,U 2 ) of U “enforces” a partition of all vertices. S Suppose every graph vertex has some neighbor in U. (In fact, w.h.p. over U holds for almost all sufficiently high degree vertices.) Since G is -far from bipartite, partition must have many violations. U1U1 U2U2 Will show w.h.p. in sample S (“test” sample) Since holds for every partition (U 1,U 2 ) of U, w.h.p. do not have any bipartite partition of U and S together (induced subgraph not bipartite).

Notes on Enforce&Test Note1: Bipartite Testing algorithm and enfroce&test analysis can be generalized to testing k-colorability [GGR]. Note2: Other properties whose analysis falls under enforce&test approach:  -Clique,  -Cut, and other graph partition properties [GGR]; Hypergraph coloring [Czumaj, Sohler]; Tree metric properties [Parnas R]; Clustering [Alon, Dar, Parnas, R] and more. Note3: For k-colorability, Clustering and other properties, can use output of tester to actually construct approximately good colorings/clusterings. E.g., for Bipartiteness, if graph is bipartite can determine partition that is approximately good, in constant time per vertex (has certain similarity to self correction).

Testing By Implicit Learning

Testing for Concise Representations [Diakonikolis, Lee, Matulef, Onak, Rubinfeld, Servedio,Wan] Results (partial) for n-variable Boolean functions: Decision lists Õ(1/ 2 ) s-term DNF Õ(s 4 / 2 ) size-s Decision Trees Õ(s 4 / 2 ) size-s Branching Programs Õ(s 4 / 2 ) size-s Boolean formula Õ(s 4 / 2 ) size-s Boolean circuits Õ(s 6 / 2 ) s-sparse polynomials over GF(2) Õ(s 4 / 2 ) Functions with Fourier deg ≤ d Õ(2 6d / 2 ) Class of functions Num of queries For all classes, poly(1/  ) and no dependence on n

Testing for Concise Representations (cont) Observation: many classes of functions that have concise representations (e.g., s-term DNF) can be approximated by small juntas in the class. Example: every s-term DNF function f is -close to an s-term DNF that depends on slog(s/) variables. Rough idea of algorithm(s): 1.Find collection of subsets of variables s.t. each contains a single variable on which function depends (non-negligibly) (variant of junta testing [Fischer,Kindler,R,Safra,Samrodnitsky]) – if num of subsets greater than some k, rejects. 2.Based on subsets create sample of labeled examples over {0,1} k (does not identify relevant variables). 3.Check whether exists function of appropriate form over k variables that is consistent with sample.

Testing for Concise Representations (cont) Rough idea of algorithm(s): 1.Finds collection of subsets of variables s.t. each contains a single variable on which function depends (non-negligibly) – if num of subsets greater than some k, rejects. 2.Based on subsets creates sample of labeled examples over {0,1} k (does not identify relevant variables). 3.Checks whether exists function of appropriate form over k variables that is consistent with sample. D - - D - D D x 1  x 4 is consistent with labeled sample  accept

Notes on Testing by Implicit Learning Note1: technique gives rise to many positive results (also extends to non-Boolean functions) Note2: well known that (proper) learning implies testing, but with roughly the same complexity. By using implicit learning save in complexity Note3: running time in general is exponential in query complexity. New result for sparse polynomials over GF(2) [Diakonikolis, Lee, Matulef, Servedio,Wan] gives time- efficient algorithm.

Extensions of PT: Tolerant Testing and Distance Approximation [Parnas, Rubinfeld,R] Tolerant Testing: Given parameters 0 ≤  1 <  2 distinguish between being  1 –close to property P and  2 –far from P (“standard” testing:  1 = 0) Example: Clustering. Standard testing requires to accept only perfect clusterings (k clusters, quality (e.g., diameter) q). Tolerant testing requires to accept good clusterings (with few outliers.) Results: clustering, monotonicity, local testing of codes, graph properties (dense and sparse models), and more. Distance approximation: estimate distance of object from having property P.  b b

What Hasn’t been Covered? Lot’s of things! Important Analysis Tool for Graph Properties: Szemerdi’s regularity lemma (variants of). Used for analyzing  graph properties (includes partition and forbidden subgraph properties)  Alon,Fischer,Krivelevich,Szegedy]. Many other results used it since. Recently used to characterize all properties testable with no dependence on size of graph [Alon, Fischer, Newman, Shapira] Important component for graph properties lower bounds (forbidden subgraphs) : Arithmetic Progressions [Alon], [Alon, Shapira] (x3) Tantalizing open problem: What is complexity of testing triangle-freeness (in dense-graphs model)? UB: tower of height poly(1/). LB: (roughly) exp(1/)

Thanks

Testing and the Regularity Lemma [Alon,Fischer,Krivelevich, Szegedy],[Alon,Shapira] *,…, [Alon, Fischer, Newman, Shapira] The Basis: For every , the vertices of every (sufficiently large) graph can be partitioned into t=t() subsets V 1,…,V t of equal size s.t. edge distribution between subsets V i, V j is roughly like in random graph with edge prob. p i,j = |E(V i,V j )|/|V i ||V j |. Results: of algorithm

Last Example: Monotonicity Testing Def: A function f : [n]  R is monotone if for every i,j in [n], i< j we have f(i) ≤ f(j). It is - far from montone if must modify more than - fraction of values so that become monotone. Observation: “Natural algorithm” (take uniform sample and check whether f is monotone on sample) does not work unless sample size = (n 1/2 ),

An alternative testing algorithm: Repeat the following O(1/  ) times: 1.Pick an entry uniformly at random. Let x be the value in that entry. 2.Perform a binary search for x 3.If x is found, output accept, otherwise, output reject. X = Monotonicity Testing Cont’ Main Claim: entries for which search succeeds define a monotonically non-decreasing sequence. Hence, If  –far then must have more than  –fraction entries on which search fails, causing testing to reject w.h.p.

Tolerant Testing of Clustering [Parnas,R,Rubinfeld] Tolerant Testing: Reject when  -far but accept when  ’-close Tolerant Testing Algorithm (input: k,  ’,   ) (1) Take sample of m=m(k,  ’,  ) points from X. (2) If sample is (  ’ + (  -  ’)/2)-close to (k,b)-clusterable then accept, o.w. reject Sample has quadratic dependence on 1/(  -  ’), and same dependence on other parameters as (standard) testing algorithm. Can analyze using a generalization of a framework by Czumaj & Sohler for (standard) testing that captures aspects of “enforce&test” approach.

Directions for Further Research  “Biggest” open problem: Can we characterize what properties are efficiently testable? (e.g., find a measure analogous to VC - dimension.)  Find Families of properties that are efficiently testable. (Similarly to results for partition properties of graphs,  graph properties and regular languages result.)  Extend scope of property testing.