Download presentation

Presentation is loading. Please wait.

Published byChristian Davidson Modified over 4 years ago

1
1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI

2
2 1.Testers and Correctors on a class K 2.Tester for regular words and regular trees with the Edit Distance with Moves 3.Detailed proof of a key result (u.stat captures the distance) 4. Application to learning regular properties Property testing

3
3 Let F be a property on a class K of structures U An ε -tester for F is a probabilistic algorithm A such that: If U |= F, A accepts If U is ε far from F, A rejects with high probability Time(A) independent of n. Robust characterizations of polynomials, R. Rubinfeld, M. Sudan, 1994 O. Goldreich, S. Goldwasser and D. Ron, Property Testing and its connection to Learning and Approximation, 1996.Property Testing and its connection to Learning and Approximation Tester usually implies a linear time corrector. (ε 1, ε 2 )- Tolerant Tester 1. Testers on a class K

4
4 1.Satisfiability : T |= F 2.Approximate Satisfiability T |= F 3.Approximate Equivalence Image on a class K of trees Approximate Satisfiability and Equivalence G

5
5 History of Testers Self-testers and correctors for Linear Algebra,Blum & Kanan 1989 Robust characterizations of polynomials, R. Rubinfeld, M. Sudan, 1994 Testers for graph properties : k-colorability, Goldreich and al. 1996 Regular languages have testers, Alon et al. 2000s Testers for Regular tree languages, Mdr and Magniez, 2004 Charaterization of testable properties on graphs, Alon et al. 2005 New areas: Sublinear algorithms, Approximation of decision problems

6
6 1.Distance dEdition: Insertions, Effacements, Modifications 2.Distance Edition avec déplacements: 0111000011110011001 0111011110000011001 3. Distance Edition avec déplacements se généralise aux arbres ordonnés 2. Edit Distance with moves

7
7 Uniform Statistics W=001010101110 longueur n, n-k+1 blocs de longueur k=1/ε Pour k=2, n-k+1=11 Distance de mots: NP-complet Testable, O(1): échantillonner N sous-mots de longueur k: Y(W) et Y(W) Si |Y(w)-Y(w)| <ε. accepter, sinon rejeter

8
8 Tester for a regular language W: 0000000000111111111111 Y: 000001000011111101111 Z: 1111111111110000000000 T: 01001010001011000111010101 ab 0 1 1 H A T Y WZWZ Automate A définit L, et un polytope H dans lespace des u.stats Testeur x dans L: Testable, O(1): calculer Y(W), Si dist(Y(w),H) <ε. accepter, sinon rejeter Remarque: robustesse au bruit.

9
9 Pair (A,H) Blocs, k=2, m=4, | Σ |=4, | Σ| k +1=17: Boucles de taille 1 bloc: {(aa,ca:1),(bb,2),(cc,ac:3),(dd:4)} 12 34 a b b c a c d d aa ca H A ac cc bb dd

10
10 Corrector of a regular language Y: 000001000011111101111 est ε -proche de L(A) Correction déterministe: 1.Décomposition en sous-mots admissibles 000001000011111101111 000001 000111111 1111 2.Décomposition en composantes connexes 000001 000 111111 1111 3. Recomposition (déplacements) 000 000001 111111 1111 distance 3 de Y ab 0 1 1 A

11
11 Corrector of an ordered tree 2 moves, dist=2 Automate darbre ou DTD: t: l,r r: l,r

12
12 XML Corrector: http://www.lri.fr/~mdr/xml/

13
13 Applications Testers: Estimate the distance between two XML files, Décide if an XML F is ε-valid, Décide if two DTDs are close. Correctors: If an XML file F is ε-close from a DTD, Find a valid F ε-close to F; Rank XML files for a set of DTDs (supervised learning) Program Verification: Decide if two automata are ε-close in polynomial time. Approximate Model-Checking: http://www.lri.fr/~mdr/vera/http://www.lri.fr/~mdr/vera/ Specification language Model Distance

14
14 3. Block and Uniform statistics W=001010101110 length n, b.stat: consecutive subwords of length k, n/k blocks u.stat: any subwords of length k, n-k+1 blocks For k=2, n/k=6

15
15 Tester for equality of strings Edit distance with moves. NP-complete problem, but approximable in constant time with additive error. Uniform statistics ( ): W=001010101110 Theorem 1. |u.stat(w)-ustat(w)| approximates dist(w,w). Sample N subwords of length k, compute Y(w) and Y(w): Lemma (Chernoff). Y(w) approximates u.stat(w). Corollary. |Y(w)-Y(w)| approximates dist(w,w). Tester: If |Y(w)-Y(w)| <ε. accept, else reject.

16
16 Let F be a property on strings. Soundness: ε-close strings have close statistics Robustness: ε-far strings have far statistics F is Equality on pairs of strings. For theorem 1, we prove: 1.b.stat is robust 2.u.stat is sound 3.u.stat is robust Soundness and Robustness

17
17 Robustness of b.stat Robustness of b-stat:

18
18 Soundness of u.stat Soundness of u-stat: Simple edit: Move w=A.B.C.D, w=A.C.B.D: Hence, for ε 2.n operations, Remark: b.stat is not sound. Problem: robustness of u.stat ? Harder! We need an auxiliary distribution and two key lemmas.

19
19 Statistics on words k k K t k-t Block statistics: b.stat Uniform statistics: u.stat Block Uniform statistics: bu.stat

20
20 Uniform Statistics A B Lemma 2:

21
21 Block Uniform Statistics Lemma 1:

22
22 Robustness of the uniform Statistics Robustness of u-stat: By Lemma 1: By Lemma 2: Tolerant tester: Theorem: for two words w and w large enough, the tester: 1.Accepts if w=w with probability 1 2.Accepts if w,w are ε 2 -close with probability 2/3 3.Rejects if w,w are ε-far with probability 2/3

23
23 Membership and Equivalence tester Membership Tester for w in L (regular): 1. Construction of the tester: Precompute H ε 2. Tester: Compute Y(w) (approx. b.stat(w)). Accept iff Y(w) is at distance less than ε to H ε Construction: Time is Tester: query complexity in time complexity in Remark 1: Time complexity of previous testers was exponential in m. Remark 2: The same method works for L context-free. Tester of 1.Compute H ε,A and H ε,B 2.Reject if H ε,A and H ε,B are different. Time polynomial in m=Max(|A |, |B |):

24
24 4. Application to learning Model: take random words according to a distribution D: U.stat representation: Negative examples could include the distance. Learning algorithm: convex hulls of positive examples.

25
25 PAC learning The regular language is a polytope for u.stat. Polytopes have a finite VC dimension. Hence they are PAC learnable. Problem: the learnt concept may be ε-far from the language L. For special distributions D, it may be ε-close. Example: D is uniform and the polytopes are « large ».

26
26 Conclusion 1.Tester for the Edit Distance with Moves 2.Tester for membership to a regular set 3.Equivalence tester for automata Polynomial time approximate algorithm (PSPACE-complete) Generalization to Buchi automata : approximate Model- Checking Context-Free Languages: exponential algorithm (undecidable problem) 4.PAC learning versus dist-Learning

27
27 Generalizations Buchi Automata. Distance on infinite words: Two words are ε-close if A word is ε-close to a language L if there exists w in L s. t. W and w are ε-close. Statistics: set of accumulation points of H: compatible loops of connected components of accepting states Tester for Buchi Automata: Compute H A and H B Reject if H A and H B are different. Equivalence of CF grammars is undecidable, Approximate equivalence in exponential.

Similar presentations

OK

Christian Sohler 1 University of Dortmund Testing Expansion in Bounded Degree Graphs Christian Sohler University of Dortmund (joint work with Artur Czumaj,

Christian Sohler 1 University of Dortmund Testing Expansion in Bounded Degree Graphs Christian Sohler University of Dortmund (joint work with Artur Czumaj,

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google