Approximate schemas Michel de Rougemont, LRI, University Paris II.

Slides:



Advertisements
Similar presentations
1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI.
Advertisements

Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)
Gillat Kol joint work with Ran Raz Locally Testable Codes Analogues to the Unique Games Conjecture Do Not Exist.
Property testing of Tree Regular Languages Frédéric Magniez, LRI, CNRS Michel de Rougemont, LRI, University Paris II.
Lecture 24 MAS 714 Hartmut Klauck
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
Automatic Verification Book: Chapter 6. What is verification? Traditionally, verification means proof of correctness automatic: model checking deductive:
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Determinization of Büchi Automata
1 Markov Decision Processes: Approximate Equivalence Michel de Rougemont Université Paris II & LRI
Interchanging distance and capacity in probabilistic mappings Uriel Feige Weizmann Institute.
Christian Sohler | Every Property of Hyperfinite Graphs is Testable Ilan Newman and Christian Sohler.
Artur Czumaj Dept of Computer Science & DIMAP University of Warwick Testing Expansion in Bounded Degree Graphs Joint work with Christian Sohler.
CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep
Computability and Complexity 5-1 Classifying Problems Computability and Complexity Andrei Bulatov.
1 Testing of clustering Article by : Noga Alon, Seannie Dar, Michal Parnas and Dana Ron Presented by: Nir Eitan.
Complexity 18-1 Complexity Andrei Bulatov Probabilistic Algorithms.
Computability and Complexity 13-1 Computability and Complexity Andrei Bulatov The Class NP.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
Testing the Diameter of Graphs Michal Parnas Dana Ron.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
Testing of Clustering Noga Alon, Seannie Dar Michal Parnas, Dana Ron.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
The max flow problem
Michael Bender - SUNY Stony Brook Dana Ron - Tel Aviv University Testing Acyclicity of Directed Graphs in Sublinear Time.
Testing Metric Properties Michal Parnas and Dana Ron.
EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.
Message Passing for the Coloring Problem: Gallager Meets Alon and Kahale Sonny Ben-Shimon and Dan Vilenchik Tel Aviv University AofA June, 2007 TexPoint.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
Lower Bounds for Property Testing Luca Trevisan U C Berkeley.
Theory of Computing Lecture 22 MAS 714 Hartmut Klauck.
Model Checking Lecture 5. Outline 1 Specifications: logic vs. automata, linear vs. branching, safety vs. liveness 2 Graph algorithms for model checking.
Some 3CNF Properties are Hard to Test Eli Ben-Sasson Harvard & MIT Prahladh Harsha MIT Sofya Raskhodnikova MIT.
Final Exam Review Cummulative Chapters 0, 1, 2, 3, 4, 5 and 7.
Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)
Japanjune The correction of XML data Université Paris II & LRI Michel de Rougemont 1.Approximation and Edit Distance.
1 Approximate Satisfiability and Equivalence Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI, LICS.
1 Sublinear Algorithms Lecture 1 Sofya Raskhodnikova Penn State University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.
1 New Coins from old: Computing with unknown bias Elchanan Mossel, U.C. Berkeley
1 Approximate Schemas and Data Exchange Michel de Rougemont University Paris II & LRI Joint work with Adrien Vielleribière, University Paris-South.
Interactive proof systems Section 10.4 Giorgi Japaridze Theory of Computability.
Context-Free and Noncontext-Free Languages Chapter 13 1.
Approximate schemas Michel de Rougemont, LRI, University Paris II Joint work with E. Fischer, Technion, F. Magniez, LRI.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
1 Approximate Data Exchange Michel de Rougemont Adrien Vieilleribière University Paris II & LRI University Paris-Sud & LRI ICDT 2007.
PushDown Automata. What is a stack? A stack is a Last In First Out data structure where I only have access to the last element inserted in the stack.
Pumping Lemma for CFLs. Theorem 7.17: Let G be a CFG in CNF and w a string in L(G). Suppose we have a parse tree for w. If the length of the longest path.
An Introduction to Rabin Automata Presented By: Tamar Aizikowitz Spring 2007 Automata Seminar.
狄彥吾 (Yen-Wu Ti) 華夏技術學院資訊工程系 Property Testing on Combinatorial Objects.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Finding Regular Simple Paths in Graph Databases Basic definitions Regular paths Regular simple.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Probabilistic Algorithms
New Characterizations in Turnstile Streams with Applications
CSE 105 theory of computation
Lecture 18: Uniformity Testing Monotonicity Testing
Lecture 10: Sketching S3: Nearest Neighbor Search
CIS 700: “algorithms for Big Data”
Locally Decodable Codes from Lifting
Approximate Validity of XML Streaming Data
CSCI B609: “Foundations of Data Science”
CS21 Decidability and Tractability
Instructor: Aaron Roth
Every set in P is strongly testable under a suitable encoding
CSE 105 theory of computation
Presentation transcript:

Approximate schemas Michel de Rougemont, LRI, University Paris II

1.Distance between words (structures), O(1) Edit distance with moves 2.Distance between a word (structure) and a class of words (structures), O(1) 3.Distance between two languages (classes), Poly. 4. Applications: regular languages, DTDs Distances between languages

1.Satisfiability : Tree |= F 2.Approximate satisfiability Tree |= F 3.Approximate equivalence Image on a class K of trees 1. Approximate Satisfiability and Equivalence G

An ε -tester for a property F is a probabilistic algorithm A such that : If U |= F, A accepts If U is ε far from F, A rejects with high probability Time(A) independent of n. Tester usually implies a linear time corrector. Self-testers and correctors for Linear Algebra,Blum & Kanan 1989 Robust characterizations of polynomials, R. Rubinfeld, M. Sudan, 1994 Testers for graph properties : k-colorability, Goldreich and al graph properties have testers, Alon and al Regular languages have testers, Alon and al. 2000s Testers for Regular tree languages, Mdr and Magniez, ICALP 2004 Testers on a class K

1.Classical Edit Distance: Insertions, Deletions, Modifications 2.Edit Distance with moves Edit Distance with Moves generalizes to Trees 2. Equality tester

Block and uniform statistics W= …… length n, subword of length k, n/k blocks For k=2, n/k=6

Goal: d 1 approximates the distance Let ε =1/k : For n>n 0 dist – ε.n < d 1 < dist + ε.n Practical application: ε=10 -2 hence k=100, stat dimension Words of length n=10 9, d 1 is approximated by N samples and a good approximation after N=O(1/ε 3 ) trials. Remarks: 1.Distance with Moves. W =000… …111 W’=1111…111000… Robustness to noise If W,W’ are noisy inputs (but ε-close), the method still works. 3.Random words are close with the moves, far without.

Tester for equality of strings Edit distance with moves. NP-complete problem, but O(1)- approximable. Uniform statistics ( ): W= Theorem 1. |u.stat(w)-ustat(w’)| approximates dist(w,w’)/n. Sample N subwords of length k, compute Y(w) and Y(w’): Theorem 2. Y(w) approximates u.stat(w). Corollary. |Y(w)-Y(w’)| approximates dist(w,w’)/n. Tester: If |Y(w)-Y(w’)| <ε. accept, else reject.

3a. Tester for regular words Definition: L is a regular language and A an automaton for L, Test w in L. Admissible Z= A word W is Z-feasible if there are two states init accept

Tester for regular words For every admissible path Z: else REJECT. Theorem: Tester(W,A, ε ) is an ε -tester for L(A). Tester. Input : W,A, ε

Proof schema of the Tester Theorem: Regular words are testable. Robustness lemma: If W is ε-far from L, then for every admissible path Z, there exists such that the number of Z-infeasible subwords Splitting lemma: if W is far from L there are many disjoint infeasible subwords. Amplifying lemma: If there are many infeasible words, there are many short ones.

Merging words Merging lemma: Let Z be an admissible path, and let F be a Z- feasible cut of size h’. Then C CC C C C Take each word and split it along its connected components, removing single letters. Rearrange all the words of the same component in its Z-order. Add gluing words to obtain W’ in L:

Splitting Splitting lemma: If Z is an admissible path, W a word s.t. dist(W,L) > h, then W has Proof by contraposition:

3b. Correction in practice: right branch tree 2 moves, dist=2

1.Inclusion 2.Equivalence Equivalence tester 4. Equivalent testing of Regular Languages

Automata for Regular languages Basic property: Proposition: Caratheodory’s theorem: in dimension d, convex hull of N points can be decomposed into in the union of convex hulls of d+1 points Large loops can be decomposed. Small loops (less than m=|A|) suffice.

Approximate Parikh mapping Lemma: For every X in H, w in L s. t. X. b-stat(w) w H is a fair representation of L

Construction of H in polynomial time Enumerate all loops: Number of b-stat is less : Some loops have same b-stat: ABBA and BBAA #partitions of a word of length m with « big blocks » Construct H by matrix iteration:

Example Automaton A: Blocks, k=2, m=4, | Σ |=4, | Σ| k +1=17: Loops: {(aa,ca:1),(bb,2),(cc,ac:3),(dd:4)} a b b c a c d d aa ca H A ac cc bb dd

Equivalence tester Tester for w in L (regular): Compute b-stat(w) and H. Decide if dist(w,L)>ε.n Time is polynomial in m=|L|. Previous tester was exponential in m. Tester of 1.Compute H A and H B 2.Reject if H A and H B are different. Time polynomial in m=|A,B|

Application: Data Exchange SourceTarget W= , source. Which structure for the target? Answer: if the two schemas are close, run a corrector and obtain W’= , distance 3. If the two schemas are not close, no guarantee. General situation for data exchange and query answering.

Conclusion 1.Testers and Correctors 2.Constant algorithm for Edit Distance with moves 3a.Testers and Correctors for regular words 3b.Tester for regular trees and corrector for regular trees 4.Equivalence tester for automata Polynomial time algorithm Generalization to Buchi automata and Context-Free Tree regular languages

Generalizations Buchi Automata. Distance on infinite words: Two words are ε-close if A word is ε-close to a language L if there exists w’ in L s. t. W and w’ are ε-close. Statistics: set of accumulation points of H: compatible loops of connected components of accepting states Tester for Buchi Automata: Compute H A and H B Reject if H A and H B are different. Equivalence of CF grammars is undecidable, Approximate equivalence in exponential.

Let F be a property on a class K of structures U F is Equality Soundness: close structures have close statistics Robustness: far structures have far statistics Soundness and Robustness

Robustness of b.stat Robustness of b-stat:

Soundness of u.stat Soundness of u-stat: Simple edit: Move w=A.B.C.D, w’=A.C.B.D: Hence, for ε 2.n operations, Problem: robustness of u.stat ? Harder! You need an auxiliary distribution and two key lemmas.

Block Uniform Statistics Lemma 1:

Uniform Statistics A B Lemma 2:

Robustness of the uniform Statistics Robustness of u-stat: By Lemma 1: By Lemma 3:

Tester for the distance with moves NP-complete problem, but O(1)-approximable. Approximate u.stat: Sample N subwords of length k, compute Y: Y is a good approximation of u.stat (Chernoff), Uniform statistics is a good approximation of the distance by soundness and robustness. Tester: If Y<ε.n accept, else reject.