# FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY 15-453.

## Presentation on theme: "FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY 15-453."— Presentation transcript:

FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY 15-453

1 0 0 q1q1 q2q2 q3q3 q4q4 ε

Q is the set of states Σ is the alphabet  : Q  Σ ε → 2 Q is the transition function Q 0  Q is the set of start states F  Q is the set of accept states A non-deterministic finite automaton (NFA) is a 5-tuple N = (Q, Σ, , Q 0, F) 2 Q is the set of subsets of Q and Σ ε = Σ  {ε}

1 0 0 q1q1 q2q2 q3q3 q4q4 N = (Q, Σ, , Q 0, F) Q = {q 1, q 2, q 3, q 4 } Σ = {0,1} Q 0 = {q 1, q 2 } F = {q 4 }  Q ε 0 1ε q1q1 q2q2 q3q3 q4q4

1 0 0 q1q1 q2q2 q3q3 q4q4 N = (Q, Σ, , Q 0, F) Q = {q 1, q 2, q 3, q 4 } Σ = {0,1} Q 0 = {q 1, q 2 } F = {q 4 }  Q 0 1ε q1q1 {q 3 }  q2q2  {q 4 }  q3q3  {q2}{q2} q4q4  ε

1 0 0 q1q1 q2q2 q3q3 q4q4 N = (Q, Σ, , Q 0, F) Q = {q 1, q 2, q 3, q 4 } Σ = {0,1} Q 0 = {q 1, q 2 } F = {q 4 }  Q ε, 0 0 1ε q1q1 {q 3 }  q2q2  {q 4 }  q3q3 {q 2,q 4 }  {q2}{q2} q4q4  00  L(N)? 01  L(N)?

Let w  Σ* and suppose w can be written as w 1... w n where w i  Σ ε (ε is viewed as representing the empty string ) Then N accepts w if there are r 0, r 1,..., r n  Q such that 1.r 0  Q 0 2.r i+1   (r i, w i+1 ) for i = 0,..., n-1, and 3.r n  F A language L is recognized by an NFA N if L = L (N). L(N) = the language of machine N = set of all strings machine N accepts

1 0 0 q1q1 q2q2 q3q3 q4q4 N = (Q, Σ, , Q 0, F) Q = {q 1, q 2, q 3, q 4 } Σ = {0,1} Q 0 = {q 1, q 2 } F = {q 4 }  Q ε, 0 0 1ε q1q1 {q 3 }  q2q2  {q 4 }  q3q3 {q 2,q 4 }  {q2}{q2} q4q4  00  L(N)? 01  L(N)?

Q = 2 Q  : Q  Σ → Q  (R,  ) =  ε(  (r,  ) ) rRrR q 0 = ε(Q 0 ) F = { R  Q | f  R for some f  F } FROM NFA TO DFA Input: N = (Q, Σ, , Q 0, F) Output: M = (Q, Σ, , q 0, F) * For R  Q, the ε-closure of R, ε(R) = {q that can be reached from some r  R by traveling along zero or more ε arrows} *

a a, b a 2 3 1 b ε Given: NFA N = ( {1,2,3}, {a.b}, , {1}, {1} ) Construct: equivalent DFA M ε({1}) = {1,3} N

a a, b a 2 3 1 b ε Given: NFA N = ( {1,2,3}, {a,b}, , {1}, {1} ) Construct: equivalent DFA M ε({1}) = {1,3} N N = ( Q, Σ, , Q 0, F ) = (Q, Σ, , q 0, F)  ab  {1}  {2} {2,3} {3} {1,3}  {1,2} {2,3} {1,3} {2} {2,3} {1,2,3} {3} {1,2,3} {2,3} q 0 =

a a, b a 2 3 1 b ε Given: NFA N = ( {1,2,3}, {a,b}, , {1}, {1} ) Construct: equivalent DFA M ε({1}) = {1,3} N N = ( Q, Σ, , Q 0, F ) = (Q, Σ, , q 0, F)  ab  {1}  {2} {2,3} {3} {1,3}  {1,2} {2,3} {1,3} {2} {2,3} {1,2,3} {3} {1,2,3} {2,3} q 0 =

REGULAR LANGUAGES CLOSED UNDER STAR Let L be a regular language and M be a DFA for L We construct an NFA N that recognizes L* 0 0,1 0 0 1 1 1 ε ε ε

Formally: Input: M = (Q, Σ, , q 1, F) Output: N = (Q, Σ, , {q 0 }, F) Q = Q  {q 0 } F = F  {q 0 }  (q,a) = {  (q,a)} {q 1 }  if q  Q and a ≠ ε if q  F and a = ε if q = q 0 and a = ε if q = q 0 and a ≠ ε  else DFA NFA

Show: L (N) = L* 1.L (N)  L* 2. L (N)  L*

1. L (N)  L* Assume w = w 1 …w k is in L*, where w 1,…,w k  L We show N accepts w by induction on k Base Cases: k = 0 k = 1 Inductive Step: Assume N accepts all strings v = v 1 …v k  L*, v i  L and let u = u 1 …u k u k+1  L*, v j  L Since N accepts u 1 …u k (by induction) and M accepts u k+1, N must accept u (w = ε) (w  L)

Assume w is accepted by N, we show w  L* If w = ε, then w  L* If w ≠ ε accept ε ε  L* 2. L (N)  L* By induction

REGULAR LANGUAGES ARE CLOSED UNDER REGULAR OPERATIONS Union: A  B = { w | w  A or w  B } Intersection: A  B = { w | w  A and w  B } Negation:  A = { w  Σ* | w  A } Reverse: A R = { w 1 …w k | w k …w 1  A } Concatenation: A  B = { vw | v  A and w  B } Star: A* = { w 1 …w k | k ≥ 0 and each w i  A }

REGULAR EXPRESSIONS THURSDAY Jan 21

SOME LANGUAGES ARE NOT REGULAR B = {0 n 1 n | n ≥ 0} is NOT regular!

WHICH OF THESE ARE REGULAR C = { w | w has equal number of 1s and 0s} D = { w | w has equal number of occurrences of 01 and 10} NOT REGULAR REGULAR!!!

THE PUMPING LEMMA Let L be a regular language with |L| =  Then there exists a positive integer P such that 1. |y| > 0 2. |xy| ≤ P 3. xy i z  L for any i ≥ 0 if w  L and |w| ≥ P then w = xyz, where:

Let P be the number of states in M Assume w  L is such that |w| ≥ P q0q0 qiqi qjqj q |w| … There must be j > i such that q i = q j Let M be a DFA that recognizes L 1. |y| > 0 2. |xy| ≤ P 3. xy i z  L for any i ≥ 0 We show w = xyz x

USING THE PUMPING LEMMA Use the pumping lemma to prove that B = {0 n 1 n | n ≥ 0} is not regular Hint: Assume B is regular Let B = L(M), for DFA M, and let P be larger than the number of states in M Try pumping s = 0 P 1 P

USING THE PUMPING LEMMA Use the pumping lemma to prove that B = {0 n 1 n | n ≥ 0} is not regular Hint: Assume B is regular, and try pumping s = 0 P 1 P If B is regular, s can be split into s = xyz, where for any i ≥ 0, xy i z is also in B If y is all 0s: xyyz has more 0s than 1s If y is all 1s: xyyz has more 1s than 0s If y has both 1s and 0s: xyyz will have some 1s before some 0s

Use the pumping lemma to prove that C = { w | w has an equal number of 0s and 1s} is not regular Hint: Try pumping s = 0 P 1 P If C is regular, s can be split into s = xyz, where for any i ≥ 0, xy i z is also in C and |xy| ≤ P

WHAT DOES D LOOK LIKE? 1  0  ε  0(0  1)*0  1(0  1)*1 D = { w | w has equal number of occurrences of 01 and 10} = { w | w = 1, w = 0, w = ε or w starts with a 0 and ends with a 0 or w starts with a 1 and ends with a 1 }

REGULAR EXPRESSIONS  is a regexp representing {  } ε is a regexp representing {ε}  is a regexp representing  If R 1 and R 2 are regular expressions representing L 1 and L 2 then: (R 1 R 2 ) represents L 1  L 2 (R 1  R 2 ) represents L 1  L 2 (R 1 )* represents L 1 *

PRECEDENCE Tightest Star (“*”) Concatenation (“.”, “”) LoosestUnion (“  ”, “+”, “|”)

R2R2 R1*R1*( EXAMPLE R 1 *R 2  R 3 = ())  R3R3

{ w | w has exactly a single 1 } 0*10*

What language does  * represent? {ε}{ε}

{ w | w has length ≥ 3 and its 3rd symbol is 0 } (0  1)(0  1)0(0  1)* 000(0  1)*  010(0  1)*  100(0  1)*  110(0  1)*

{ w | every odd position of w is a 1 } (1(0  1))*(1  ε) 1((0  1)1)*(0  1  ε)  ε Also

EQUIVALENCE L can be represented by a regexp  L is regular

L can be represented by a regexp  L is regular EQUIVALENCE Given regular expression R, we show there exists NFA N such that R represents L(N) Induction on the length of R:

Given regular expression R, we show there exists NFA N such that R represents L(N) Induction on the length of R: L can be represented by a regexp  L is regular Base Cases (R has length 1): R =   R = ε R = 

Inductive Step: Assume R has length k > 1, and that every regular expression of length < k represents a regular language Three possibilities for R: R = R 1  R 2 R = R 1 R 2 R = (R 1 )* (Union Theorem!) (Concatenation) (Star) Therefore: L can be represented by a regexp  L is regular

L can be represented by a regexp  L is a regular language Have Shown

Give an NFA that accepts the language represented by (1(0  1))* 1ε1,0 ε

L can be represented by a regexp  L is a regular language  Proof idea: Transform an NFA for L into a regular expression by removing states and re-labeling arrows with regular expressions

NFA ε ε ε ε ε Add unique and distinct start and accept states While machine has more than 2 states: Pick an internal state, rip it out and re-label the arrows with regexps, to account for the missing state 0 1 0 01*0

NFA ε ε ε ε ε While machine has more than 2 states: In general: R(q 1,q 2 ) R(q 2,q 2 ) R(q 2,q 3 ) R(q 1,q 2 )R(q 2,q 2 )*R(q 2,q 3 )  R(q 1,q 3 ) q1q1 q2q2 q3q3 G R(q 1,q 3 )

q1q1 b a ε q2q2 a,b ε a*b (a*b)(a  b)* q0q0 q3q3 R(q 0,q 3 ) = (a*b)(a  b)* represents L(N)

Formally: Run CONVERT(G): (Outputs a regexp) If #states = 2 return the expression on the arrow going from q start to q accept Add q start and q accept to create G If #states > 2

Formally:Add q start and q accept to create G If #states > 2 select q rip  Q different from q start and q accept define Q = Q – {q rip } define R as: R(q i,q j ) = R(q i,q rip )R(q rip,q rip )*R(q rip,q j )  R(q i,q j ) return CONVERT(G) Run CONVERT(G): (Outputs a regexp) } Defines: G (GNFA) (R = the regexps for edges in G)

Claim: CONVERT(G) is equivalent to G Proof by induction on k (number of states in G) Base Case: k = 2 Inductive Step: Assume claim is true for k-1 state GNFAs We first note that G and G are equivalent But, by the induction hypothesis, G is equivalent to CONVERT(G) Thus: CONVERT(G) equivalent to CONVERT(G) QED

q3q3 q2q2 b a b q1q1 b a a ε ε ε bb

q2q2 b a b q1q1 a a ε ε ε a  ba b

bb  (a  ba)b*a a  ba q2q2 b q1q1 a ε ε bb b b  (a  ba)b* (bb  (a  ba)b*a)* (b  (a  ba)b*)

Convert the NFA to a regular expression q3q3 q2q2 b b q1q1 a a, b ε ε ε b (a  b)b*b bb*b (a  b)b*b(bb*b)* (a  b)b*b(bb*b)*a

((a  b)b*b(bb*b)*a)*  ((a  b)b*b(bb*b)*a)*(a  b)b*b(bb*b)*

ε

DFA NFA Regular Language Regular Expression DEFINITION

FLAC Read Chapter 2.1 of the book for next time www.cs.cmu.edu/~15453