Download presentation

Presentation is loading. Please wait.

Published byChristian Davidson Modified over 3 years ago

1
1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI

2
2 1.Testers and Correctors on a class K 2.Tester for regular words and regular trees with the Edit Distance with Moves 3.Detailed proof of a key result (u.stat captures the distance) 4. Application to learning regular properties Property testing

3
3 Let F be a property on a class K of structures U An ε -tester for F is a probabilistic algorithm A such that: If U |= F, A accepts If U is ε far from F, A rejects with high probability Time(A) independent of n. Robust characterizations of polynomials, R. Rubinfeld, M. Sudan, 1994 O. Goldreich, S. Goldwasser and D. Ron, Property Testing and its connection to Learning and Approximation, 1996.Property Testing and its connection to Learning and Approximation Tester usually implies a linear time corrector. (ε 1, ε 2 )- Tolerant Tester 1. Testers on a class K

4
4 1.Satisfiability : T |= F 2.Approximate Satisfiability T |= F 3.Approximate Equivalence Image on a class K of trees Approximate Satisfiability and Equivalence G

5
5 History of Testers Self-testers and correctors for Linear Algebra,Blum & Kanan 1989 Robust characterizations of polynomials, R. Rubinfeld, M. Sudan, 1994 Testers for graph properties : k-colorability, Goldreich and al Regular languages have testers, Alon et al. 2000s Testers for Regular tree languages, Mdr and Magniez, 2004 Charaterization of testable properties on graphs, Alon et al New areas: Sublinear algorithms, Approximation of decision problems

6
6 1.Distance dEdition: Insertions, Effacements, Modifications 2.Distance Edition avec déplacements: Distance Edition avec déplacements se généralise aux arbres ordonnés 2. Edit Distance with moves

7
7 Uniform Statistics W= longueur n, n-k+1 blocs de longueur k=1/ε Pour k=2, n-k+1=11 Distance de mots: NP-complet Testable, O(1): échantillonner N sous-mots de longueur k: Y(W) et Y(W) Si |Y(w)-Y(w)| <ε. accepter, sinon rejeter

8
8 Tester for a regular language W: Y: Z: T: ab H A T Y WZWZ Automate A définit L, et un polytope H dans lespace des u.stats Testeur x dans L: Testable, O(1): calculer Y(W), Si dist(Y(w),H) <ε. accepter, sinon rejeter Remarque: robustesse au bruit.

9
9 Pair (A,H) Blocs, k=2, m=4, | Σ |=4, | Σ| k +1=17: Boucles de taille 1 bloc: {(aa,ca:1),(bb,2),(cc,ac:3),(dd:4)} a b b c a c d d aa ca H A ac cc bb dd

10
10 Corrector of a regular language Y: est ε -proche de L(A) Correction déterministe: 1.Décomposition en sous-mots admissibles Décomposition en composantes connexes Recomposition (déplacements) distance 3 de Y ab A

11
11 Corrector of an ordered tree 2 moves, dist=2 Automate darbre ou DTD: t: l,r r: l,r

12
12 XML Corrector:

13
13 Applications Testers: Estimate the distance between two XML files, Décide if an XML F is ε-valid, Décide if two DTDs are close. Correctors: If an XML file F is ε-close from a DTD, Find a valid F ε-close to F; Rank XML files for a set of DTDs (supervised learning) Program Verification: Decide if two automata are ε-close in polynomial time. Approximate Model-Checking: Specification language Model Distance

14
14 3. Block and Uniform statistics W= length n, b.stat: consecutive subwords of length k, n/k blocks u.stat: any subwords of length k, n-k+1 blocks For k=2, n/k=6

15
15 Tester for equality of strings Edit distance with moves. NP-complete problem, but approximable in constant time with additive error. Uniform statistics ( ): W= Theorem 1. |u.stat(w)-ustat(w)| approximates dist(w,w). Sample N subwords of length k, compute Y(w) and Y(w): Lemma (Chernoff). Y(w) approximates u.stat(w). Corollary. |Y(w)-Y(w)| approximates dist(w,w). Tester: If |Y(w)-Y(w)| <ε. accept, else reject.

16
16 Let F be a property on strings. Soundness: ε-close strings have close statistics Robustness: ε-far strings have far statistics F is Equality on pairs of strings. For theorem 1, we prove: 1.b.stat is robust 2.u.stat is sound 3.u.stat is robust Soundness and Robustness

17
17 Robustness of b.stat Robustness of b-stat:

18
18 Soundness of u.stat Soundness of u-stat: Simple edit: Move w=A.B.C.D, w=A.C.B.D: Hence, for ε 2.n operations, Remark: b.stat is not sound. Problem: robustness of u.stat ? Harder! We need an auxiliary distribution and two key lemmas.

19
19 Statistics on words k k K t k-t Block statistics: b.stat Uniform statistics: u.stat Block Uniform statistics: bu.stat

20
20 Uniform Statistics A B Lemma 2:

21
21 Block Uniform Statistics Lemma 1:

22
22 Robustness of the uniform Statistics Robustness of u-stat: By Lemma 1: By Lemma 2: Tolerant tester: Theorem: for two words w and w large enough, the tester: 1.Accepts if w=w with probability 1 2.Accepts if w,w are ε 2 -close with probability 2/3 3.Rejects if w,w are ε-far with probability 2/3

23
23 Membership and Equivalence tester Membership Tester for w in L (regular): 1. Construction of the tester: Precompute H ε 2. Tester: Compute Y(w) (approx. b.stat(w)). Accept iff Y(w) is at distance less than ε to H ε Construction: Time is Tester: query complexity in time complexity in Remark 1: Time complexity of previous testers was exponential in m. Remark 2: The same method works for L context-free. Tester of 1.Compute H ε,A and H ε,B 2.Reject if H ε,A and H ε,B are different. Time polynomial in m=Max(|A |, |B |):

24
24 4. Application to learning Model: take random words according to a distribution D: U.stat representation: Negative examples could include the distance. Learning algorithm: convex hulls of positive examples.

25
25 PAC learning The regular language is a polytope for u.stat. Polytopes have a finite VC dimension. Hence they are PAC learnable. Problem: the learnt concept may be ε-far from the language L. For special distributions D, it may be ε-close. Example: D is uniform and the polytopes are « large ».

26
26 Conclusion 1.Tester for the Edit Distance with Moves 2.Tester for membership to a regular set 3.Equivalence tester for automata Polynomial time approximate algorithm (PSPACE-complete) Generalization to Buchi automata : approximate Model- Checking Context-Free Languages: exponential algorithm (undecidable problem) 4.PAC learning versus dist-Learning

27
27 Generalizations Buchi Automata. Distance on infinite words: Two words are ε-close if A word is ε-close to a language L if there exists w in L s. t. W and w are ε-close. Statistics: set of accumulation points of H: compatible loops of connected components of accepting states Tester for Buchi Automata: Compute H A and H B Reject if H A and H B are different. Equivalence of CF grammars is undecidable, Approximate equivalence in exponential.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google