1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI.

Slides:



Advertisements
Similar presentations
Hardness of testing 3- colorability in bounded degree graphs Andrej Bogdanov Kenji Obata Luca Trevisan.
Advertisements

December 2, 2009 IPAM: Invariance in Property Testing 1 Invariance in Property Testing Madhu Sudan Microsoft/MIT TexPoint fonts used in EMF. Read the TexPoint.
Property Testing of Data Dimensionality Robert Krauthgamer ICSI and UC Berkeley Joint work with Ori Sasson (Hebrew U.)
Property Testing and Communication Complexity Grigory Yaroslavtsev
Lower Bounds for Testing Properties of Functions on Hypergrids Grigory Yaroslavtsev Joint with: Eric Blais (MIT) Sofya Raskhodnikova.
Grigory Yaroslavtsev Joint work with Piotr Berman and Sofya Raskhodnikova.
Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)
Of 22 August 29-30, 2011 Rabin ’80: APT 1 Invariance in Property Testing Madhu Sudan Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual.
Deterministic vs. Non-Deterministic Graph Property Testing Asaf Shapira Tel-Aviv University Joint work with Lior Gishboliner.
1 Graphs with Maximal Induced Matchings of the Same Size Ph. Baptiste 1, M. Kovalyov 2, Yu. Orlovich 3, F. Werner 4, I. Zverovich 3 1 Ecole Polytechnique,
Property testing of Tree Regular Languages Frédéric Magniez, LRI, CNRS Michel de Rougemont, LRI, University Paris II.
Distributional Property Estimation Past, Present, and Future Gregory Valiant (Joint work w. Paul Valiant)
1 Coinduction Principle for Games Michel de Rougemont Université Paris II & LRI.
A UNIFIED FRAMEWORK FOR TESTING LINEAR-INVARIANT PROPERTIES ARNAB BHATTACHARYYA CSAIL, MIT (Joint work with ELENA GRIGORESCU and ASAF SHAPIRA)
1 Markov Decision Processes: Approximate Equivalence Michel de Rougemont Université Paris II & LRI
Christian Sohler | Every Property of Hyperfinite Graphs is Testable Ilan Newman and Christian Sohler.
Artur Czumaj Dept of Computer Science & DIMAP University of Warwick Testing Expansion in Bounded Degree Graphs Joint work with Christian Sohler.
New Algorithms and Lower Bounds for Monotonicity Testing of Boolean Functions Rocco Servedio Joint work with Xi Chen and Li-Yang Tan Columbia University.
Proclaiming Dictators and Juntas or Testing Boolean Formulae Michal Parnas Dana Ron Alex Samorodnitsky.
Learning and Fourier Analysis Grigory Yaroslavtsev CIS 625: Computational Learning Theory.
Testing the Diameter of Graphs Michal Parnas Dana Ron.
Sparse Random Linear Codes are Locally Decodable and Testable Tali Kaufman (MIT) Joint work with Madhu Sudan (MIT)
Some Techniques in Property Testing Dana Ron Tel Aviv University.
Testing of Clustering Noga Alon, Seannie Dar Michal Parnas, Dana Ron.
Locally testable cyclic codes Lászl ó Babai, Amir Shpilka, Daniel Štefankovič There are no good families of locally-testable cyclic codes over. Theorem:
Michael Bender - SUNY Stony Brook Dana Ron - Tel Aviv University Testing Acyclicity of Directed Graphs in Sublinear Time.
Testing Metric Properties Michal Parnas and Dana Ron.
On Proximity Oblivious Testing Oded Goldreich - Weizmann Institute of Science Dana Ron – Tel Aviv University.
On Testing Convexity and Submodularity Michal Parnas Dana Ron Ronitt Rubinfeld.
Lower Bounds for Property Testing Luca Trevisan U C Berkeley.
On Testing Computability by small Width OBDDs Oded Goldreich Weizmann Institute of Science.
A Tutorial on Property Testing Dana Ron Tel Aviv University.
Some 3CNF Properties are Hard to Test Eli Ben-Sasson Harvard & MIT Prahladh Harsha MIT Sofya Raskhodnikova MIT.
January 8-10, 2010 ITCS: Invariance in Property Testing 1 Invariance in Property Testing Madhu Sudan Microsoft/MIT TexPoint fonts used in EMF. Read the.
Correlation testing for affine invariant properties on Shachar Lovett Institute for Advanced Study Joint with Hamed Hatami (McGill)
Japanjune The correction of XML data Université Paris II & LRI Michel de Rougemont 1.Approximation and Edit Distance.
1 Approximate Satisfiability and Equivalence Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI, LICS.
Transitive-Closure Spanner of Directed Graphs Kyomin Jung KAIST 2009 Combinatorics Workshop Joint work with Arnab Bhattacharyya MIT Elena Grigorescu MIT.
1 Sublinear Algorithms Lecture 1 Sofya Raskhodnikova Penn State University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.
Approximate schemas Michel de Rougemont, LRI, University Paris II.
Lower Bounds for Property Testing Luca Trevisan U.C. Berkeley.
1 Approximate Schemas and Data Exchange Michel de Rougemont University Paris II & LRI Joint work with Adrien Vielleribière, University Paris-South.
Interactive proof systems Section 10.4 Giorgi Japaridze Theory of Computability.
1/19 Minimizing weighted completion time with precedence constraints Nikhil Bansal (IBM) Subhash Khot (NYU)
Approximate schemas Michel de Rougemont, LRI, University Paris II Joint work with E. Fischer, Technion, F. Magniez, LRI.
1 Approximate Data Exchange Michel de Rougemont Adrien Vieilleribière University Paris II & LRI University Paris-Sud & LRI ICDT 2007.
狄彥吾 (Yen-Wu Ti) 華夏技術學院資訊工程系 Property Testing on Combinatorial Objects.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Testing Low-Degree Polynomials over GF(2) Noga AlonSimon LitsynMichael Krivelevich Tali KaufmanDana Ron Danny Vainstein.
Why almost all satisfiable k - CNF formulas are easy? Danny Vilenchik Joint work with A. Coja-Oghlan and M. Krivelevich.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
On Sample Based Testers
Algebraic Property Testing:
Property Testing (a.k.a. Sublinear Algorithms )
Lecture 2-2 NP Class.
Approximating the MST Weight in Sublinear Time
Polynomial Norms Amir Ali Ahmadi (Princeton University) Georgina Hall
Speaker: Chuang-Chieh Lin National Chung Cheng University
Randomized Algorithms
Warren Center for Network and Data Sciences
Subhash Khot Dept of Computer Science NYU-Courant & Georgia Tech
Local Error-Detection and Error-correction
Lecture 24 NP-Complete Problems
CIS 700: “algorithms for Big Data”
Locally Decodable Codes from Lifting
Approximate Validity of XML Streaming Data
Invariance in Property Testing
Algebraic Property Testing
The Subgraph Testing Model
Every set in P is strongly testable under a suitable encoding
Presentation transcript:

1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI

2 1.Testers and Correctors on a class K 2.Tester for regular words and regular trees with the Edit Distance with Moves 3.Detailed proof of a key result (u.stat captures the distance) 4. Application to learning regular properties Property testing

3 Let F be a property on a class K of structures U An ε -tester for F is a probabilistic algorithm A such that: If U |= F, A accepts If U is ε far from F, A rejects with high probability Time(A) independent of n. Robust characterizations of polynomials, R. Rubinfeld, M. Sudan, 1994 O. Goldreich, S. Goldwasser and D. Ron, Property Testing and its connection to Learning and Approximation, 1996.Property Testing and its connection to Learning and Approximation Tester usually implies a linear time corrector. (ε 1, ε 2 )- Tolerant Tester 1. Testers on a class K

4 1.Satisfiability : T |= F 2.Approximate Satisfiability T |= F 3.Approximate Equivalence Image on a class K of trees Approximate Satisfiability and Equivalence G

5 History of Testers Self-testers and correctors for Linear Algebra,Blum & Kanan 1989 Robust characterizations of polynomials, R. Rubinfeld, M. Sudan, 1994 Testers for graph properties : k-colorability, Goldreich and al Regular languages have testers, Alon et al. 2000s Testers for Regular tree languages, Mdr and Magniez, 2004 Charaterization of testable properties on graphs, Alon et al New areas: Sublinear algorithms, Approximation of decision problems

6 1.Distance dEdition: Insertions, Effacements, Modifications 2.Distance Edition avec déplacements: Distance Edition avec déplacements se généralise aux arbres ordonnés 2. Edit Distance with moves

7 Uniform Statistics W= longueur n, n-k+1 blocs de longueur k=1/ε Pour k=2, n-k+1=11 Distance de mots: NP-complet Testable, O(1): échantillonner N sous-mots de longueur k: Y(W) et Y(W) Si |Y(w)-Y(w)| <ε. accepter, sinon rejeter

8 Tester for a regular language W: Y: Z: T: ab H A T Y WZWZ Automate A définit L, et un polytope H dans lespace des u.stats Testeur x dans L: Testable, O(1): calculer Y(W), Si dist(Y(w),H) <ε. accepter, sinon rejeter Remarque: robustesse au bruit.

9 Pair (A,H) Blocs, k=2, m=4, | Σ |=4, | Σ| k +1=17: Boucles de taille 1 bloc: {(aa,ca:1),(bb,2),(cc,ac:3),(dd:4)} a b b c a c d d aa ca H A ac cc bb dd

10 Corrector of a regular language Y: est ε -proche de L(A) Correction déterministe: 1.Décomposition en sous-mots admissibles Décomposition en composantes connexes Recomposition (déplacements) distance 3 de Y ab A

11 Corrector of an ordered tree 2 moves, dist=2 Automate darbre ou DTD: t: l,r r: l,r

12 XML Corrector:

13 Applications Testers: Estimate the distance between two XML files, Décide if an XML F is ε-valid, Décide if two DTDs are close. Correctors: If an XML file F is ε-close from a DTD, Find a valid F ε-close to F; Rank XML files for a set of DTDs (supervised learning) Program Verification: Decide if two automata are ε-close in polynomial time. Approximate Model-Checking: Specification language Model Distance

14 3. Block and Uniform statistics W= length n, b.stat: consecutive subwords of length k, n/k blocks u.stat: any subwords of length k, n-k+1 blocks For k=2, n/k=6

15 Tester for equality of strings Edit distance with moves. NP-complete problem, but approximable in constant time with additive error. Uniform statistics ( ): W= Theorem 1. |u.stat(w)-ustat(w)| approximates dist(w,w). Sample N subwords of length k, compute Y(w) and Y(w): Lemma (Chernoff). Y(w) approximates u.stat(w). Corollary. |Y(w)-Y(w)| approximates dist(w,w). Tester: If |Y(w)-Y(w)| <ε. accept, else reject.

16 Let F be a property on strings. Soundness: ε-close strings have close statistics Robustness: ε-far strings have far statistics F is Equality on pairs of strings. For theorem 1, we prove: 1.b.stat is robust 2.u.stat is sound 3.u.stat is robust Soundness and Robustness

17 Robustness of b.stat Robustness of b-stat:

18 Soundness of u.stat Soundness of u-stat: Simple edit: Move w=A.B.C.D, w=A.C.B.D: Hence, for ε 2.n operations, Remark: b.stat is not sound. Problem: robustness of u.stat ? Harder! We need an auxiliary distribution and two key lemmas.

19 Statistics on words k k K t k-t Block statistics: b.stat Uniform statistics: u.stat Block Uniform statistics: bu.stat

20 Uniform Statistics A B Lemma 2:

21 Block Uniform Statistics Lemma 1:

22 Robustness of the uniform Statistics Robustness of u-stat: By Lemma 1: By Lemma 2: Tolerant tester: Theorem: for two words w and w large enough, the tester: 1.Accepts if w=w with probability 1 2.Accepts if w,w are ε 2 -close with probability 2/3 3.Rejects if w,w are ε-far with probability 2/3

23 Membership and Equivalence tester Membership Tester for w in L (regular): 1. Construction of the tester: Precompute H ε 2. Tester: Compute Y(w) (approx. b.stat(w)). Accept iff Y(w) is at distance less than ε to H ε Construction: Time is Tester: query complexity in time complexity in Remark 1: Time complexity of previous testers was exponential in m. Remark 2: The same method works for L context-free. Tester of 1.Compute H ε,A and H ε,B 2.Reject if H ε,A and H ε,B are different. Time polynomial in m=Max(|A |, |B |):

24 4. Application to learning Model: take random words according to a distribution D: U.stat representation: Negative examples could include the distance. Learning algorithm: convex hulls of positive examples.

25 PAC learning The regular language is a polytope for u.stat. Polytopes have a finite VC dimension. Hence they are PAC learnable. Problem: the learnt concept may be ε-far from the language L. For special distributions D, it may be ε-close. Example: D is uniform and the polytopes are « large ».

26 Conclusion 1.Tester for the Edit Distance with Moves 2.Tester for membership to a regular set 3.Equivalence tester for automata Polynomial time approximate algorithm (PSPACE-complete) Generalization to Buchi automata : approximate Model- Checking Context-Free Languages: exponential algorithm (undecidable problem) 4.PAC learning versus dist-Learning

27 Generalizations Buchi Automata. Distance on infinite words: Two words are ε-close if A word is ε-close to a language L if there exists w in L s. t. W and w are ε-close. Statistics: set of accumulation points of H: compatible loops of connected components of accepting states Tester for Buchi Automata: Compute H A and H B Reject if H A and H B are different. Equivalence of CF grammars is undecidable, Approximate equivalence in exponential.