[Paper Survey] Profinite Methods in Automata Theory Presentation: IPL Rinko, Dec. 7, 2010 by Kazuhiro Inaba by Jean-Éric Pin (Invited Lecture at STACS.

Slides:

Advertisements

Similar presentations

Properties of Regular Languages

Advertisements

IRP ∩ LSI is not RE presentation by Kazuhiro Inaba (NII, IPL-Group Seminar May 25, 2010 ( Introduction of the manuscript:

Deterministic Finite Automata (DFA)

YES-NO machines Finite State Automata as language recognizers.

Sets Lecture 11: Oct 24 AB C. This Lecture We will first introduce some basic set theory before we do counting. Basic Definitions Operations on Sets Set.

Basic Properties of Relations

CSCI 2670 Introduction to Theory of Computing September 13, 2005.

Lecture 6 Hyperreal Numbers (Nonstandard Analysis)

CS21 Decidability and Tractability

Automata and Formal Languages Tim Sheard 1 Lecture 8 Pumping Lemma & Distinguishability Jim Hook Tim Sheard Portland State University.

1 Lecture 22 Myhill-Nerode Theorem –distinguishability –equivalence classes of strings –designing FSA’s –proving a language L is not regular.

Transparency No. 2-1 Formal Language and Automata Theory Chapter 2 Deterministic Finite Automata (DFA) (include Lecture 3 and 4)

Languages. A Language is set of finite length strings on the symbol set i.e. a subset of (a b c a c d f g g g) At this point, we don’t care how the language.

Transparency No Formal Language and Automata Theory Chapter 10 The Myhill-Nerode Theorem (lecture 15,16 and B)

CS5371 Theory of Computation Lecture 5: Automata Theory III (Non-regular Language, Pumping Lemma, Regular Expression)

EE1J2 – Discrete Maths Lecture 7

Regular Expression (EXTRA)

LING 438/538 Computational Linguistics Sandiway Fong Lecture 12: 10/5.

CS 454 Theory of Computation Sonoma State University, Fall 2011 Instructor: B. (Ravi) Ravikumar Office: 116 I Darwin Hall Original slides by Vahid and.

Normal forms for Context-Free Grammars

EE1J2 - Slide 1 EE1J2 – Discrete Maths Lecture 12 Number theory Mathematical induction Proof by induction Examples.

CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)

FSA Lecture 1 Finite State Machines. Creating a Automaton  Given a language L over an alphabet , design a deterministic finite automaton (DFA) M such.

Theory of Computing Lecture 22 MAS 714 Hartmut Klauck.

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

1 Set Theory. Notation S={a, b, c} refers to the set whose elements are a, b and c. a  S means “a is an element of set S”. d  S means “d is not an element.

1 Regular Expressions/Languages Regular languages –Inductive definitions –Regular expressions syntax semantics Not covered in lecture.

1 Set Theory. Notation S={a, b, c} refers to the set whose elements are a, b and c. a  S means “a is an element of set S”. d  S means “d is not an element.

Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) (1,

Great Theoretical Ideas in Computer Science.

Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.

Set theory Sets: Powerful tool in computer science to solve real world problems. A set is a collection of distinct objects called elements. Traditionally,

Formal Language Finite set of alphabets Σ: e.g., {0, 1}, {a, b, c}, { ‘{‘, ‘}’ } Language L is a subset of strings on Σ, e.g., {00, 110, 01} a finite language,

Math 3121 Abstract Algebra I Lecture 3 Sections 2-4: Binary Operations, Definition of Group.

CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our.

DECIDABILITY OF PRESBURGER ARITHMETIC USING FINITE AUTOMATA Presented by : Shubha Jain Reference : Paper by Alexandre Boudet and Hubert Comon.

CSC312 Automata Theory Lecture # 2 Languages.

CSC312 Automata Theory Lecture # 2 Languages.

Lecture Two: Formal Languages Formal Languages, Lecture 2, slide 1 Amjad Ali.

Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.

Groups Definition A group  G,  is a set G, closed under a binary operation , such that the following axioms are satisfied: 1)Associativity of  :

1 Chapter 1 Introduction to the Theory of Computation.

Set, Combinatorics, Probability & Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS Slides Set,

1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2006.

Computably Enumerable Semigroups, Algebras, and Groups Bakhadyr Khoussainov The University of Auckland New Zealand Research is partially supported by Marsden.

Sets Define sets in 2 ways  Enumeration  Set comprehension (predicate on membership), e.g., {n | n  N   k  k  N  n = 10  k  0  n  50} the set.

Strings and Languages CS 130: Theory of Computation HMU textbook, Chapter 1 (Sec 1.5)

Chapter 4 Pumping Lemma Properties of Regular Languages Decidable questions on Regular Languages.

Inferring Finite Automata from queries and counter-examples Eggert Jón Magnússon.

Foundations of Discrete Mathematics Chapter 4 By Dr. Dalia M. Gil, Ph.D.

Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.

UNIT - 2.  A binary operation on a set combines two elements of the set to produce another element of the set. a*b  G,  a, b  G e.g. +, -, ,  are.

CS 203: Introduction to Formal Languages and Automata

Pumping Lemma for CFLs. Theorem 7.17: Let G be a CFG in CNF and w a string in L(G). Suppose we have a parse tree for w. If the length of the longest path.

Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.

Copyright © Cengage Learning. All rights reserved. CHAPTER 8 RELATIONS.

Strings and Languages Denning, Section 2.7. Alphabet An alphabet V is a finite nonempty set of symbols. Each symbol is a non- divisible or atomic object.

Department of Statistics University of Rajshahi, Bangladesh

Finite Automata A simple model of computation. 2 Finite Automata2 Outline Deterministic finite automata (DFA) –How a DFA works.

Chapter 1. Linear equations Review of matrix theory Fields System of linear equations Row-reduced echelon form Invertible matrices.

Chapter 2 Regular Languages & Finite Automata. Regular Expressions A finitary denotation of a regular language over . ØL Ø = Ø aL a = {a} where a ∈ 

Relations Chapter 9 Copyright © McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill.

Unit-III Algebraic Structures

Set, Combinatorics, Probability & Number Theory

Chapter 3 The Real Numbers.

CSE 3813 Introduction to Formal Languages and Automata

Chapter 3 The Real Numbers.

4. Properties of Regular Languages

Closure Properties of Regular Languages

Presentation transcript:

[Paper Survey] Profinite Methods in Automata Theory Presentation: IPL Rinko, Dec. 7, 2010 by Kazuhiro Inaba by Jean-Éric Pin (Invited Lecture at STACS 2009)

The Topic of the Paper Investigation on (subclasses of) regular languages by using ◦ Topological method ◦ Especially, “profinite metric”

Why I Read This Paper I want to have a different point of view on the “Inverse Regularity Preservation” property of str/tree/graph functions ◦ A function  f :: string  string ◦ is IRP iff  For any regular language L, the inverse image f -1 (L) = {s | f(s) ∈ L} is regular

(Why I Read This Paper) Application of IRP Typechecking f :: L IN → L OUT ？ ◦ Verify that a transformation always generates valid outputs from valid inputs. fL IN L OUT XSLT Template for formating bookmarks XBEL SchemaXHTML Schema PHP ScriptArbitrary String String not containing “ ”

(Why I Read This Paper) Application of IRP Typechecking f :: L IN → L OUT ？ ◦ If f is IRP, we can check this by … f is type-correct ⇔ f(L IN ) ⊆ L OUT ⇔ L IN ⊆ f -1 (L OUT ) ⇔ L IN ∩ f -1 (L OUT ) ＝ Φ with counter-example in the unsafe case (for experts: f is assumed to be deterministic)

(Why I Read This Paper) Characterization of IRP Which function is IRP? We know that MTT* is a strict subclass of IRP (as I have presented half a year ago). But how can we characterize the subclass? Is there any systematic method to define subclasses of IRP functions? The paper [Pin 09] looks to provide an algebraic/topological viewpoint on regular languages, which I didn’t know.

Agenda Metrics Profinite Metric Completion Characterization of ◦ Regular Languages ◦ Inverse Regularity Preservation ◦ Subclasses of Regular Languages by “Profinite Equations” Summary

Notation I use the following notation ◦ Σ = finite set of ‘character’s ◦ Σ* = the set of finite words (strings) e.g., ◦ Σ = {0,1}  Σ* = { ε, 0, 1, 00, 01, 10, … } ◦ Σ = {a,b,c,…,z,A,B,C,…,Z}  Σ* = { ε, a, b, …, HelloWorld, … }

Metrics d :: S × S  R+ ◦ is a metric on a set S, if it satisfies:  d(x, x) = 0  d(x, y) = d(y, x)  d(x, y) ≦ d(x, z) + d(z, y) (triangle inequality)

Example d R :: R × R  R+ d R (a, b) = |a-b| d 2 :: R 2 × R 2  R+ d 2 ( (a x,a y ), (b x,b y ) ) = √ (a x -b x ) 2 + (a y -b y ) 2 d 1 :: R 2 × R 2  R+ d 1 ( (a x,a y ), (b x,b y ) ) = |a x -b x | + |a y -b y | d ∞ :: R 2 × R 2  R+ d ∞ ( (a x,a y ), (b x,b y ) ) = max(|a x -b x |, |a y -b y |)

Metrics on Strings : Example d cp (x, y) = 2 -cp(x,y) where  cp(x,y) = ∞ if x=y  cp(x,y) = the length of the common prefix of x and y d cp ( “abcabc”, “abcdef” ) = 2 -3 = d cp ( “zzz”, “zzz” ) = 2 - ∞ = 0

Proof : d cp (x,y)=2 -cp(x,y) is a metric d cp (x,x) = 0 d cp (x,y) = d cp (y,x) ◦ By definition. d cp (x,y) ≦ d cp (x,z) + d cp (z,y) ◦ Notice that we have either  cp(x,y) ≧ cp(x,z) or cp(x,y) ≧ cp(z,y). ◦ Thus  d cp (x,y) ≦ d cp (x,z) or d cp (x,y) ≦ d cp (z,y).

Profinite Metric on Strings d mA (x, y) = 2 -mA(x,y) where  mA(x,y) = ∞ if x=y  mA(x,y) = the size of the minimal DFA (deterministic finite automaton) that distinguishes x and y

Example d mA (“aa”, “aaa”) = 2 -2 = 0.25 d mA (a 119,a 120 ) = 2 -2 = 0.25 d mA (a 60, a 120 ) = 2 -7 = F F a a F F aa a aaa a

Example d mA (“ab”, “abab”) = 2 -2 = 0.25 d mA (“abab”, “abababab”) = 2 -3 = F F b b a a F F b b a b a a

Proof : d mA (x,y)=2 -mA(x,y) is a metric d mA (x,x) = 0 d mA (x,y) = d mA (y,x) ◦ By definition. d mA (x,y) ≦ d mA (x,z) + d mA (z,y) ◦ Notice that we have either  mA(x,y) ≧ mA(x,z) or mA(x,y) ≧ mA(z,y). ◦ Thus  d mA (x,y) ≦ d mA (x,z) or d mA (x,y) ≦ d mA (z,y).

(Note) In the paper another profinite metric is defined, based on the known fact: ◦ A set of string L is recognizable by DFA if and only if ◦ If it is an inverse image of a subset of a finite monoid by a homomorphism L = ψ -1 (F) where ψ :: Σ*  M is a homomorphism, M is a finite monoid, F ⊆ M

Completion of Metric Space A sequence of elements x 1, x 2, x 3, … ◦ is Cauchy if ∀ ε>0, ∃ N, ∀ i,k>N, d(x i, x k )<ε ◦ is convergent ∃ a ∞, ∀ ε>0, ∃ N, ∀ i>N, d(x i,x ∞ )<ε Completion of a metric space is the minimum extension of S, whose all Cauchy sequences are convergent.

Example of Completion Completion of rational numbers with “normal” distance  Reals ◦ QR ◦ d Q (x,y) = |x-y|d R (x,y) = |x-y| 1, 1.4, 1.41, , …√2 3, 3.1, 3.14, , …π 5, 5, 5, 5, …5

Example of Completion Completion of finite strings with d cp ◦ Σ* ◦ d cp (Common Prefix) a, aa, aaa, aaaaaaaa, … ab, abab, ababab, … zz, zz, zz, zz, …

Example of Completion Completion of finite strings with d cp  the set of finite and infinite strings ◦ Σ*Σ ω ◦ d cp (Common Prefix)d cp a, aa, aaa, aaaaaaaa, …a ω ab, abab, ababab, …(ab) ω zz, zz, zz, zz, …zz

Completion of Strings with Profinite Metric d mA (x, y) = 2 -mA(x,y) Example of a Cauchy sequence: x i = w i! (for some string w ) w, ww, wwwwww, w 24, w 120, w 720, … (NOTE: w i is not a Cauchy sequence)

Completion of Strings with Profinite Metric Completion of ◦ Σ* with d mA (x, y) = 2 -mA(x,y) yields the set of profinite words Σ* In the paper, the limit w i! is called x i = w i! w ω with a note: Note that x ω is simply a notation and one should resist the temptation to interpret it as an inﬁnite word.

Difference from Infinite Words In the set of infinite words ◦ a ω + b = a ω (since the length of the common prefix is ω, their distance is 0, hence equal) In the set of profinite words ◦ a ω + b ≠ a ω (their distance is 0.25, because of: F F b a,b a

p-adic Metric on Q Similar concept in the Number Theory For each n ≧ 2, define d’ n as ◦ d’ n (x,y) = n -a if x-y = b/c n a  where a,b,c ∈ Z and b,c is not divisible by n When p is a prime, d’ p is called the p-adic metric

Example (p-adic Metric) For each n ≧ 2, define d’ n as ◦ d’ n (x,y) = n -a if x-y = b/c n a  where a,b,c ∈ Z and b,c is not divisible by n d’ 10 ( 12345, ) = d’ 10 ( 0.33, 0.43 ) = 10 +1

Q R QpQp Completion by |x-y| by d’ p Infinite Strings by d cp by d mA Profinite Strings Finite Strings 1, 1.4, 1.41, …  … 1, 21, 121, 2121, …  …

Theorem [Hunter 1988] L ⊆ Σ* is regular if and only if cl(L) is clopen in Σ* clopen := closed & open closed := complement is open S is open := ∀ x ∈ S, ∃ ε>0, {y|d(x,y)<ε} ⊆ S cl(S) := unique minimum closed set ⊇ L

Intuition L is regular ◦ ⇔ cl(L) is open ◦ ⇔ ∀ x ∈ cl(L), ∃ ε, ∀ y, d mA (x,y)<ε  y ∈ cl(L) ◦ ⇔ If cl(L) contains x, it contains all ‘hard- to-distinguish-from x’ profinite strings L L x y ε

(Non-)example L = { a n b n | n ∈ nat } is not regular Because ◦ a ω b ω is contained in cl(L) ◦ cl(L) do not contain a ω b ω+k! for each k ◦ but d mA (a ω b ω, a ω b ω+k! ) ≦ 2 -k L L aωbωaωbω

Proof Sketch : clopen ⇔ regular L is Regular ⇒ cl(L) is Clopen (This direction is less surprising.) ◦ It is trivially closed ◦ Suppose L is regular but cl(L) is not ppen. ◦ Then, ∃ x ∈ cl(L), ∀ ε, ∃ y ∉ cl(L), d mA (x,y)<ε ◦ Then, ∀ n, ∃ x ∈ L, ∃ y ∉ L, d mA (x,y)<2 -n ◦ Then, ∀ size-n DFA, ∃ x,y that can’t be separated ◦ Thus, L is not be a regular language.

(not in the paper: just my thought) Generalize: Regular ⇒ Clopen (This direction is less surprising. Why?) Because it doesn’t use any particular property of “regular” Let ◦ F be a set of predicates string  bool ◦ siz be any function F  nat ◦ d mF (x,y) = 2 -min{siz(f) | f(x)≠f(y)} L is F-recognizable ⇒ cl(L) is clopen with d mF

(not in the paper: just my thought) Generalize: Regular ⇒ Clopen (This direction is less surprising. Why?) Because it doesn’t use any particular property of “regular” E.g., ◦ d mPA (x,y) = 2 -min{#states of PD-NFA separating x&y} L is context-free ⇒ cl(L) is clopen with d mPA (But this is not at all interesting, because any set is clopen in this metric!!)

Proof Sketch: Clopen ⇒ Regular Used lemmas: ◦ Σ* is compact  i.e., if it is covered by an infin union of open sets, then it is covered by their finite subfamily, too.  i.e., every infinite seq has convergent subseq  The proof relies on the fact: siz -1 (n) is finite ◦ Concatenation is continuous in this metric  i.e., ∀ x ∀ ε ∃ δ, ∀ x’, d(x,x’)<δ  d(f(x),f(x’))<ε  Due to d mA (wx,wy) ≦ d mA (x,y) By these lemmas, clopen sets are shown to be covered by finite congruence, and hence regular.

Corollary f :: Σ*  Σ* is IRP if and only if f :: Σ*  Σ* is continuous continuous := ∀ x ∀ ε ∃ δ, ∀ x’, d(x,x’)<δ  d(f(x),f(x’))<ε Known to be equivalent to f -1 ( (cl)open ) = (cl)open

“Equational Characterization” Main interest of the paper Many subclasses of regular languages are characterized by Equations on Profinite Strings

Example A regular language L is star-free (i.e., in { ∪, ∩, ￢, ・ }-closure of fin. langs) (or equivalently, FO-definable) if and only if x ω ≡ L x ω+1 ◦ i.e., ∀ u v x, ux ω v ∈ cl(L) ⇔ ux ω+1 v ∈ cl(L) Corollary: FO-definability is decidable Corollary: FO-definability is decidable

Example A regular language L is commutative if and only if xy ≡ L yx ◦ i.e., ∀ u v x, u xy v ∈ cl(L) ⇔ u yx v ∈ cl(L) Corollary: Commutativity is decidable Corollary: Commutativity is decidable

Example A regular language L is dense ( ∀ w, Σ* w Σ* ∩ L ≠ Φ) if and only if {xρ ≡ L ρx ≡ L ρ, x ≦ L ρ} ◦ where ρ = lim n  ∞ v n, v n+1 =(v n u n+1 v n ) (n+1)! u = {ε, a, b, aa, ab, ba, bb, aaa, …} ◦ i.e., ∀ u v x, … & uxv ∈ cl(L) ⇒ uρv ∈ cl(L)

Theorem [Reiterman 1982] If a family (set of languages) F of regular languages is closed under ◦ intersection, union, complement, ◦ quotient (q a (L) = {x | ax ∈ L}), and ◦ inverse of homomorphism if and only if It is defined by a set of profinite equations of the form: u ≡ v

Other Types of Equations [Pin & Gehrke & Grigorieff 2008]

Summary Completion by the Profinite metric d mA (x, y) = 2 -min_automaton(x,y) is used as a tool to characterize (subclasses of) regular languages