Presentation is loading. Please wait.

Presentation is loading. Please wait.

Approximate schemas Michel de Rougemont, LRI, University Paris II.

Similar presentations


Presentation on theme: "Approximate schemas Michel de Rougemont, LRI, University Paris II."— Presentation transcript:

1 Approximate schemas Michel de Rougemont, LRI, University Paris II

2 1.Distance between words (structures), O(1) Edit distance with moves 2.Distance between a word (structure) and a class of words (structures), O(1) 3.Distance between two languages (classes), Poly. 4. Applications: regular languages, DTDs Distances between languages

3 1.Satisfiability : Tree |= F 2.Approximate satisfiability Tree |= F 3.Approximate equivalence Image on a class K of trees 1. Approximate Satisfiability and Equivalence G

4 An ε -tester for a property F is a probabilistic algorithm A such that : If U |= F, A accepts If U is ε far from F, A rejects with high probability Time(A) independent of n. Tester usually implies a linear time corrector. Self-testers and correctors for Linear Algebra,Blum & Kanan 1989 Robust characterizations of polynomials, R. Rubinfeld, M. Sudan, 1994 Testers for graph properties : k-colorability, Goldreich and al. 1996 graph properties have testers, Alon and al. 1999 Regular languages have testers, Alon and al. 2000s Testers for Regular tree languages, Mdr and Magniez, ICALP 2004 Testers on a class K

5 1.Classical Edit Distance: Insertions, Deletions, Modifications 2.Edit Distance with moves 0111000011110011001 0111011110000011001 3. Edit Distance with Moves generalizes to Trees 2. Equality tester

6 Block and uniform statistics W=001010101110…… length n, subword of length k, n/k blocks For k=2, n/k=6

7 Goal: d 1 approximates the distance Let ε =1/k : For n>n 0 dist – ε.n < d 1 < dist + ε.n Practical application: ε=10 -2 hence k=100, stat dimension 2 100 Words of length n=10 9, d 1 is approximated by N samples and a good approximation after N=O(1/ε 3 ) trials. Remarks: 1.Distance with Moves. W =000….0001111…111 W’=1111…111000….000 2.Robustness to noise If W,W’ are noisy inputs (but ε-close), the method still works. 3.Random words are close with the moves, far without.

8 Tester for equality of strings Edit distance with moves. NP-complete problem, but O(1)- approximable. Uniform statistics ( ): W=001010101110 Theorem 1. |u.stat(w)-ustat(w’)| approximates dist(w,w’)/n. Sample N subwords of length k, compute Y(w) and Y(w’): Theorem 2. Y(w) approximates u.stat(w). Corollary. |Y(w)-Y(w’)| approximates dist(w,w’)/n. Tester: If |Y(w)-Y(w’)| <ε. accept, else reject.

9 3a. Tester for regular words Definition: L is a regular language and A an automaton for L, Test w in L. Admissible Z= A word W is Z-feasible if there are two states init accept

10 Tester for regular words For every admissible path Z: else REJECT. Theorem: Tester(W,A, ε ) is an ε -tester for L(A). Tester. Input : W,A, ε

11 Proof schema of the Tester Theorem: Regular words are testable. Robustness lemma: If W is ε-far from L, then for every admissible path Z, there exists such that the number of Z-infeasible subwords Splitting lemma: if W is far from L there are many disjoint infeasible subwords. Amplifying lemma: If there are many infeasible words, there are many short ones.

12 Merging words Merging lemma: Let Z be an admissible path, and let F be a Z- feasible cut of size h’. Then C CC C C C Take each word and split it along its connected components, removing single letters. Rearrange all the words of the same component in its Z-order. Add gluing words to obtain W’ in L:

13 Splitting Splitting lemma: If Z is an admissible path, W a word s.t. dist(W,L) > h, then W has Proof by contraposition:

14 3b. Correction in practice: right branch tree http://www.lri.fr/~mdr/xml/ 2 moves, dist=2

15 1.Inclusion 2.Equivalence Equivalence tester 4. Equivalent testing of Regular Languages

16 Automata for Regular languages Basic property: Proposition: Caratheodory’s theorem: in dimension d, convex hull of N points can be decomposed into in the union of convex hulls of d+1 points Large loops can be decomposed. Small loops (less than m=|A|) suffice.

17 Approximate Parikh mapping Lemma: For every X in H, w in L s. t. X. b-stat(w) w H is a fair representation of L

18 Construction of H in polynomial time Enumerate all loops: Number of b-stat is less : Some loops have same b-stat: ABBA and BBAA #partitions of a word of length m with « big blocks » Construct H by matrix iteration:

19 Example Automaton A: Blocks, k=2, m=4, | Σ |=4, | Σ| k +1=17: Loops: {(aa,ca:1),(bb,2),(cc,ac:3),(dd:4)} 12 34 a b b c a c d d aa ca H A ac cc bb dd

20 Equivalence tester Tester for w in L (regular): Compute b-stat(w) and H. Decide if dist(w,L)>ε.n Time is polynomial in m=|L|. Previous tester was exponential in m. Tester of 1.Compute H A and H B 2.Reject if H A and H B are different. Time polynomial in m=|A,B|

21 Application: Data Exchange SourceTarget W=010101011, source. Which structure for the target? Answer: if the two schemas are close, run a corrector and obtain W’=10101010, distance 3. If the two schemas are not close, no guarantee. General situation for data exchange and query answering.

22 Conclusion 1.Testers and Correctors 2.Constant algorithm for Edit Distance with moves 3a.Testers and Correctors for regular words 3b.Tester for regular trees and corrector for regular trees 4.Equivalence tester for automata Polynomial time algorithm Generalization to Buchi automata and Context-Free Tree regular languages

23 Generalizations Buchi Automata. Distance on infinite words: Two words are ε-close if A word is ε-close to a language L if there exists w’ in L s. t. W and w’ are ε-close. Statistics: set of accumulation points of H: compatible loops of connected components of accepting states Tester for Buchi Automata: Compute H A and H B Reject if H A and H B are different. Equivalence of CF grammars is undecidable, Approximate equivalence in exponential.

24 Let F be a property on a class K of structures U F is Equality Soundness: close structures have close statistics Robustness: far structures have far statistics Soundness and Robustness

25 Robustness of b.stat Robustness of b-stat:

26 Soundness of u.stat Soundness of u-stat: Simple edit: Move w=A.B.C.D, w’=A.C.B.D: Hence, for ε 2.n operations, Problem: robustness of u.stat ? Harder! You need an auxiliary distribution and two key lemmas.

27 Block Uniform Statistics Lemma 1:

28 Uniform Statistics A B Lemma 2:

29 Robustness of the uniform Statistics Robustness of u-stat: By Lemma 1: By Lemma 3:

30 Tester for the distance with moves NP-complete problem, but O(1)-approximable. Approximate u.stat: Sample N subwords of length k, compute Y: Y is a good approximation of u.stat (Chernoff), Uniform statistics is a good approximation of the distance by soundness and robustness. Tester: If Y<ε.n accept, else reject.


Download ppt "Approximate schemas Michel de Rougemont, LRI, University Paris II."

Similar presentations


Ads by Google