# Finite State Machines for Strings over Infinite Alphabets F. Neven, T. Schwentick and V. Vianu Automata Seminar - Spring 2007 Tamar Aizikowitz ACM Transactions.

## Presentation on theme: "Finite State Machines for Strings over Infinite Alphabets F. Neven, T. Schwentick and V. Vianu Automata Seminar - Spring 2007 Tamar Aizikowitz ACM Transactions."— Presentation transcript:

Finite State Machines for Strings over Infinite Alphabets F. Neven, T. Schwentick and V. Vianu Automata Seminar - Spring 2007 Tamar Aizikowitz ACM Transactions on Computational Logic, Vol. V No. N, 01/03

2 of 47 Finite Machine for Infinite Alphabet? Finite automaton: Transitions based on current state and input value  δ defined for Q  Σ Infinite alphabet  infinite transition function? Solution: Store a finite number of values Transitions based on stored values New values can be stored during computation

3 of 47 Register Automata Suggested by Kaminski and Francez, 1994 Finite automata + finite number of registers Registers store values from alphabet Register operations: Compare register value with current value Store current value in register Transitions specify change of state, whether value is stored and movement of head.

4 of 47 Infinite Alphabets - Definitions D : an infinite set (e.g. set of data values) D-string : w=d 1  d n s.t. d i  D dom(w) = {1,…,|w|} val w (i) = d i for i  dom(w) ⊳,⊲  D delimit input string 2-way automata work on w = ⊳ v ⊲ dom + (w) = {0,…,|w|+1} where: val w (0) = ⊳ val w (|w|+1) = ⊲

5 of 47 Nondeterministic 2-Way k-Register Automata ( 2N-RA ) A =  D, Q, q 0, τ 0, , F  D – infinite alphabet Q, q 0, F – as usual τ 0 :{1,…,k} → D  { ⊳, ⊲ } – initial register assignment  – transition function Two types of transitions: (i,q) → ( p,d ) – current value = register i value q → ( p,i,d ) – store current value in register i d  {stay,right,left} – movement direction of head

6 of 47 Configurations Configuration: γ = [ j, q, τ ] Initial configuration: γ 0 = [1,q 0,τ 0 ] Accepting configuration: γ f = [ j,q f,τ], q f  F Head Position Current State Register Assignment

7 of 47 Computations [ j, q, τ]  [ j’, q’, τ’] iff: (1) (i,q) → (q’,d)  δ, j’ = j+d, val w ( j) = τ(i) and τ = τ’ or (2) q → (q’,i,d)  δ, j’ = j+d and τ’= τ | τ(i) ← val w ( j) Note: Type 2 transition relevant only if no type 1 transition applies (why?) w accepted by A iff there exists γ f s.t. γ 0  * γ f

8 of 47 Variants Deterministic: at most one transition applies to each configuration. One way: no left moves in transition function. xC-RA : denotation for various models Where x  {1,2} and C  {D,N}

9 of 47 Example 1: 1N-RA L 1 ={d 1  d n |  i, j : i  j  d i =d j }  contains all words where some value appears more than once Construction idea: Read input string from left to right “Guess” i and store value in register Look for stored value in remaining input

10 of 47 Example 1: Continued… A =  D, {q 0, q 1, q f }, q 0,, , {q f }  q f : Accepting configuration reached! q 1 – look for j : Go right: q 1 → (1,q 1 ) If found value, move to q f : (2,q 1 ) → q f q 0 - look for i : Go right: q 0 → (1,q 0 ) Guess i, store value, move to q 1 : q 0 → (2,q 1 ) “Trash” register Register for storing repeating value

11 of 47 Example 1: Concluded Example of run on w = 13234 … 13234 ## q 0 →(1,q 0 )q0q0 13 #1 q 0 →(2,q 1 ) 2 13 q1q1 3 q 1 →(1,q 1 ) 2 3 23 (2,q 1 )→q f 4 3 q fq f W ACCEPTED!

12 of 47 Example 2: 2N-RA L 2 ={d 1  d n |  i, j : i  j → d i  d j }  contains all words with distinct values Construction idea: Scan symbols from left to right. For each symbol: Store value in register Look for stored value in remaining input If found  reject Else proceed to check next symbol (how?)

13 of 47 Example 2: Continued… A =  D,{q 0, q 1, q 2, q 3, q rej, q acc }, q 0,, ,{q acc }  (1,q 0 ) → (q 1,right) q 1 → (2,q 2,right) q 2 → (1,q 2,right) (2,q 2 ) → (q rej,stay) (3,q 2 ) → (q 3,left) q 3 → (1,q 3,left) (2,q 3 ) → (q 1,right) (3,q 1 ) → (q acc,stay) ⊲  didi  djdj  ⊳⊲ didi djdj ⊳ ⊲ # ⊳ ?didi didi didi didi didi ⊳

14 of 47 Logic Variants of first order and monadic second order logic over D -strings. w represented by logical structure: Domain dom(w) with natural ordering < Value function val:dom(w)→D instantiated by val w Atomic Formulae: x = y, x < y val(x) = val(y) val(x) = d for d  D  { ⊳,⊲ }

15 of 47 FO * and MSO * The logic FO * Atomic formulae Boolean connectives First order quantification over dom + (w) The logic MSO * FO * Quantification over unary predicates on dom + (w)

16 of 47 FO * and MSO * Definability L(φ):= {w  D * | w  φ} For example… What φ defines L 1 ?  x  y( x  y  val(x) = val(y)) What φ defines L 2 ?  x  y( x  y → val(x)  val(y))

17 of 47 RA s vs. MSO * Theorem 3.1: 2D-RA  MSO * Proof: Consider the language L of strings u#v where the number of unique symbols appearing in u equals the number of unique symbols appearing in v. Part 1: There exists a 2D-RA which accepts L. Part 2: L is not MSO * definable.  2D-RA  MSO *

18 of 47 Proof: Preliminaries N u / N v = the set of unique symbols in u / v  L={u#v | |N u |= |N v |} lmo w (d ) = leftmost occurrence of d in w N u ={a 1,…,a n } and N v ={b 1,…,b m } s.t. for every i < j, lmo u (a i ) < lmo u (a j ) and lmo v (b i ) < lmo v (b j ). Note: u#v  L iff n = m

19 of 47 Proof: Part 1 ( L is 2N-RA ) Question: How can we build a 2D-RA for L ? Basic concept: Visit lmo u (a 1 ), lmo v (b 1 ), lmo u (a 2 ), … in order If lmo u (a n ) and lmo v (b m ) are reached simultaneously  accept Else  reject How can we visit the lmo -s in order? Finding lmo u (a 1 ), lmo v (b 1 ) is easy… (how?)

20 of 47 Proof: Part 1 Concluded Assume a i is stored in a register. Compute lmo u (a i +1) as follows: Move head to lmo u (a i ) Go left until ⊳ Go right until a i (leftmost occurrence) For positions lmo u (a i )+j (start j=1 ) test if lmo u (a i +1) Store value and proceed to move left If value is encountered then check next position ( j++ ) Else, if ⊳ is reached then lmo u (a i +1)= lmo u (a i )+j Similar for b i -s…  Language accepted

21 of 47 Proof: Part 2 ( L not MSO * ) Assume by contradiction that φ * is an MSO * sentence s.t. u#v  φ * iff |N u |=|N v |. Let C be the set of D -symbols appearing in φ *. w is admissible iff: w is of the form u#v w contains no symbols from C N u  N v =  Each D -symbol occurs at most once in u or v

22 of 47 Proof: Part 2 Continued… Let φ be φ * by replacing: val(x) = val(y) by x = y val(x) = d by false if d  # For every admissible string w=d 1  d n #e 1  e m : a n #a m  φ  d 1  d n #e 1  e m  φ letters don’t matter in φ  d 1  d n #e 1  e m  φ * w has no letters from C  n = m because all letters are different φ is MSO

23 of 47 Proof: Part 2 Concluded For every n  , there exists an admissible string d 1  d n #e 1  e n (why?)  For every n  , a n #a n  φ Note: φ is in MSO (no value comparisons) Define a formula for the form a n #a m : ψ:=  x(val(x)=#   y(val(y)=a  (val(y)=#  y=x)))  L’={a n #a n | n   } is MSO definable by φ  ψ  L’ is regular  Contradiction!

24 of 47 2N-RA vs. FO * Theorem 3.7: (weak version) FO *  2N-RA Proof: Define a language L  D * s.t: Part 1: No 2N-RA can accept L. Part 2: L is FO * definable.  FO *  2N-RA

25 of 47 Proof: Part 1 ( L not 2N-RA ) Based on communication complexity methodology: Input string divided between two parties I and II Parties can send messages according to a pre-defined protocol String is accepted if both parties accept Each party has unlimited computational power Restriction only on form of messages

26 of 47 Proof: Part 1 Continued… We consider strings of the form u#v u,v encode sets of subsets of D L={u#v| u,v represent the same set of sets} Claim: L cannot be accepted by 2N-RA s Assume by contradiction that there exists a 2N-RA A s.t. L(A) = L We simulate A by defining an appropriate protocol…

27 of 47 Proof: Part 1 Continued… Define communication protocol as follows: I is given u while II is given v I simulates A until A tries to cross # to the right Sends configuration information to II II simulates A until A tries to cross # to the left Sends configuration information to I So on until one of the parties reaches an accepting configuration or gets stuck. If A exists such a protocol will accept L

28 of 47 Proof: Part 1 Continued… It remains to define an appropriate protocol… Restrict u#v to at most N data values Assume A has |Q| states and k registers  M:=|Q|N k different messages needed Each message needs to be sent no more than once in each direction (why?) At most M 2M different possible series of messages (dialogs) need to be considered

29 of 47 Proof: Part 1 Concluded M 2M is exponential in N Number of sets of sets of N values is 2 2 N  For large N, there exist u,v s.t: u#u and v#v are accepted by the same dialogue u,v represent different sets of sets  u#v is also accepted  No such protocol can accept L  No 2N-RA can accept L

30 of 47 Proof: Part 2 ( L is FO * ) We show that L is FO * definable… First we define an encoding for u,v : Assume \$ not in D u,v of the form \$d 11  d n1 \$d 12  d n2 \$  \$d 1m  d nm \$ Each d 1j  d nj represent a subset of D -values Goal: Define a formula verifying that every subset in u appears in v and vice versa.

31 of 47 Proof: Part 2 Continued… We start with some smaller formulae… w is of the form u#v form:=  x(val(x) = #   y(val(y) = # → y=x)) x is in the interval [y,z] x  [y,z]:= y < x  x < z The interval [y,z] represents a subset subs(y,z):= val(y)=\$  val(z)=\$  y < z   x(x  [y,z] → val(x)  #  val(x)  \$)

32 of 47 Proof: Part 2 Continued… Some more… The subset [y,z] is a subset of [y’,z’] [y,z]  [y’,z’]:=  x(x  [y,z] →  x’(x’  [y’z’]  val(x)=val(x’))) The subset [y,z] equals the subset [y’,z’] [y,z]=[y’,z’]:= [y,z]  [y’,z’]  [y’,z’]  [y,z] The subset [y,z] is in u [y,z]  u:= sub(y,z)   x(val(x)=# → z < x) The subset [y,z] is in v [y,z]  v:= sub(y,z)   x(val(x)=# → x < y)

33 of 47 Proof: Part 2 Concluded Two last formulae… Every subset in u appears in v usubv:=  y  z([y,z]  u →  y’  z’(([y’,z’]  v  [y,z]=[y’,z’])) vsubu defined similarly And now to put it all together… φ:= form  usubv  vsubu It follows that w  φ iff w  L  L is FO * definable.

34 of 47 Decision Problems Kaminski and Francez showed that emptiness for 1N-RA s is decidable And what of universality? We will show that universality for 1N-RA is undecidable by reduction from a known undecidable problem, PCP.

35 of 47 Post Correspondence Problem Introduced by Emil Post in 1946 Input: A sequence of pairs (x 1,y 1 ),…,(x n,y n ) s.t. x i,y i  {a,b} * for i=1,…,n Solution: A set of indices α 1,…, α m  {1,…,n} s. t. x α 1  x α m = y α 1  y α m Output: Does the given input instance have a solution.

36 of 47 PCP Example Input: Solution: Index1234 x values abaaabb y values aaabbb 11324 aaaababb aa babb

37 of 47 PCP Undecidability PCP is known to be undecidable. Proof sketch: Reduction from L u : Given a Turing Machine M and a word w Define PCP instance P based on M and w s.t. P has a solution iff M accepts w A solution for P encodes a run of M on w x -series is always ‘one step ahead’ of y -series y series can ‘catch up’ only if computation in x series reaches an accepting state

38 of 47 PCP Undecidability Continued Start computation: Encode transitions: Add instance pairs of the following forms: Copy symbols: q acc ‘eats’ symbols: # #q 0 w# qiaqia bq j aq i b q j ac a a q acc # # aq acc q acc q acc a q acc

39 of 47 Undecidability of Universality Theorem 5.1: It is undecidable whether a given 1N-RA is universal. Proof: For a given PCP instance P, construct a 1N-RA A s.t. A accepts an input string iff it does not represent a solution for P.  P has no solution iff A is universal  Decidability of universality leads to decidability of PCP  Universality of 1N-RA is undecidable

40 of 47 PCP Encoding Assume w.l.g. that Sym={1,…,n,a,b,#,\$}  D Candidate: a string u#v s.t: u encodes x α 1, …, x α m v encodes y β 1, …, y β l Candidate is a solution if: l = m α i = β i x α 1  x α m = y α 1  y α m Matching pairs

41 of 47 PCP Encoding Continued x α j encoding: \$ γ α j δ 1 a 1  δ k a k \$ acts as separator γ represents j by a unique value α j  1,…,m δ i encode positions in the word γ and δ values appear only once in u / v x α j = a 1  a k y β j encoded similarly

42 of 47 PCP Encoding Example Index1234 x values abaaabb y values aaabbb 1121334254 aaaababb aa babb \$111a\$212a\$333a4a \$425b6a\$547b8b # \$111a2a\$213a4a\$335b \$426a7b\$548b

43 of 47 PCP Encoding Continued u#v is syntactically correct if: γ -projection of u = γ -projection of v δ -projection of u = δ -projection of v u#v represents a solution if: u#v is syntactically correct For each γ, the number to the right of γ is the same in u and in v For each δ, the symbol to the right of δ is the same in u and in v

44 of 47 Construction of A Assume the values of Sym are stored in the initial register assignment A works as follows: “Guesses” why w is not a valid solution Checks whether w meets the chosen criteria If yes, accepts Else rejects w has an accepting computation  w meets some criteria for being “wrong”  w is not a solution for the PCP instance

45 of 47 When is w “wrong” w is of the wrong form: w  u#v u or v  (\$γαδ…) * x i  a 1  a k or y j  a 1  a k in u or v γ -projections are wrong: First / last γ in u  first / last γ in v Two γ ’s are the same in u / v γ 1 and γ 2 are successors in u but not in v

46 of 47 When is w “wrong” Concluded δ projections are wrong: Similar to γ -projections w does not represent a solution: The α -value for some γ in u is different than the corresponding β -value in v The a - / b -value for some δ in u is different than the corresponding a - / b -value in v

47 of 47 Equivalence and Inclusion Corollary 5.2: Equivalence of 1N-RA s is undecidable. Proof: Assume equivalence was decidable Build an Automaton A D * that accepts every possible input word  Universality is decidable by checking equivalence to A D *  Contradiction! Corollary: Inclusion is also undecidable.

Download ppt "Finite State Machines for Strings over Infinite Alphabets F. Neven, T. Schwentick and V. Vianu Automata Seminar - Spring 2007 Tamar Aizikowitz ACM Transactions."

Similar presentations