1 Regular Expressions Highlights: –A regular expression is used to specify a language, and it does so precisely. –Regular expressions are very intuitive.

1 Regular Expressions Highlights: –A regular expression is used to specify a language, and it does so precisely. –Regular expressions are very intuitive. –Regular expressions are very useful in a variety of contexts. –Given a regular expression, an NFA-ε can be constructed from it automatically. –Thus, so can an NFA be constructed, and a DFA, and a corresponding program, all automatically!

2 Two Operations Concatenation: –x = 010 –y = 1101 –xy = 010 1101 Language Concatenation:L 1 L 2 = {xy | x is in L 1 and y is in L 2 } –L 1 = {01, 00} –L 2 = {11, 010} –L 1 L 2 = {01 11, 01 010, 00 11, 00 010} Language Union: –L 1 = {01, 00} –L 2 = {01, 11, 010} –L 1 L 2 = {01, 00, 11, 010}

3 Operations on Languages Let L, L 1, L 2 be subsets of Σ * Concatenation:L 1 L 2 = {xy | x is in L 1 and y is in L 2 } Concatenating a language with itself:L 0 = {ε} L i = LL i-1, for all i >= 1

4 Kleene closure Say, L, or L 1 ={a, abc, ba}, on Σ ={a,b,c} Then, L 2 = {aa, aabc, aba, abca, abcabc, abcba, baa, baabc, baba} L 3 = {a, abc, ba}. L 2 ….. But, L 0 = {ε} Kleene closure of L, L* = { ε, L 1, L 2, L 3,...}

5 Operations on Languages Let L, L 1, L 2 be subsets of Σ * Concatenation:L 1 L 2 = {xy | x is in L 1 and y is in L 2 } Union is set union of L1 and L2 Kleene Closure:L * = L i = L 0 U L 1 U L 2 U… Positive Closure:L + = L i = L 1 U L 2 U… Question: Does L + contain ε?

6 Definition of a Regular Expression Let Σ be an alphabet. The regular expressions over Σ are: –ØRepresents the empty set { } –ε Represents the set {ε} –aRepresents the set {a}, for any symbol a in Σ Let r and s be regular expressions that represent the sets R and S, respectively. –r+sRepresents the set R U S(precedence 3) –rsRepresents the set RS(precedence 2) –r * Represents the set R* (highest precedence) –(r)Represents the set R(not an operator, rather provides precedence) If r is a regular expression, then L(r) is used to denote the corresponding language.

7 Examples: Let Σ = {0, 1} (0 + 1)*All strings of 0’s and 1’s 01*0 followed by any number 1’s 0(0 + 1)*All strings of 0’s and 1’s, beginning with a 0 (0 + 1)*1All strings of 0’s and 1’s, ending with a 1 (0 + 1)*0(0 + 1)*All strings of 0’s and 1’s containing at least one 0 (0 + 1)*0(0 + 1)*0(0 + 1)*All strings of 0’s and 1’s containing at least two 0’s (0 + 1)*01*01*All strings of 0’s and 1’s containing at least two 0’s (1 + 01*0)*All strings of 0’s and 1’s containing an even number of 0’s 1*(01*01*)*All strings of 0’s and 1’s containing an even number of 0’s (1*01*0)*1*All strings of 0’s and 1’s containing an even number of 0’s (0+1)* = (0*1*)*Any string, or (sigma)*, sigma={0, 1} in all cases here Question: Is there a unique minimum regular expression for a given language?

8 Identities: 1.Øu = uØ = ØLike multiplying by 0 2.ε u = u ε = uLike multiplying by 1 3.Ø* = ε L * = L i = L 0 U L 1 U L 2 U… 4.ε * = ε = { ε } 5.u+v = v+u 6.u + Ø = u 7.u + u = u 8.u* = (u*)* 9.u(v+w) = uv+uw [which operation is hidden before parenthesis?] 10.(u+v)w = uw+vw 11.(uv)*u = u(vu)* [note: you have to have a single u, at start or end] 12.(u+v)* = (u*+v)* = u*(u+v)* = (u+vu*)* = (u*v*)* = u*(vu*)* = (u*v)*u*

9 Equivalence of Regular Expressions and NFA-εs Note: Throughout the following, keep in mind that a string is accepted by an NFA-ε if there exists ANY path from the start state to any final state. Lemma 1: Let r be a regular expression. Then there exists an NFA-ε M such that L(M) = L(r). Furthermore, M has exactly one final state with no transitions out of it. Proof: (by induction on the number of operators, denoted by OP(r), in r).

10 Basis: OP(r) = 0 Then r is either Ø, ε, or a, for some symbol a in Σ For Ø: For ε: For a: qfqf q0q0 qfqf qfqf q0q0 a

11 Inductive Hypothesis: Suppose there exists a k  0 such that for any regular expression r where 0  OP(r)  k, there exists an NFA-ε such that L(M) = L(r). Furthermore, suppose that M has exactly one final state. Inductive Step: Let r be a regular expression with k + 1 operators (OP(r) = k + 1), where k + 1 >= 1. Case 1)r = r 1 + r 2 Since OP(r) = k +1, it follows that 0<= OP(r 1 ), OP(r 2 ) <= k. By the inductive hypothesis there exist NFA-ε machines M 1 and M 2 such that L(M 1 ) = L(r 1 ) and L(M 2 ) = L(r 2 ). Furthermore, both M 1 and M 2 have exactly one final state. Construct M as: q1q1 M1M1 q2q2 M2M2 qfqf q0q0 ε ε ε ε f1f1 f2f2

12 Case 2)r = r 1 r 2 Since OP(r) = k+1, it follows that 0<= OP(r 1 ), OP(r 2 ) <= k. By the inductive hypothesis there exist NFA- ε machines M 1 and M 2 such that L(M 1 ) = L(r 1 ) and L(M 2 ) = L(r 2 ). Furthermore, both M 1 and M 2 have exactly one final state. Construct M as: Case 3)r = r 1 * Since OP(r) = k+1, it follows that 0<= OP(r 1 ) <= k. By the inductive hypothesis there exists an NFA- ε machine M 1 such that L(M 1 ) = L(r 1 ). Furthermore, M 1 has exactly one final state. Construct M as: f1f1 q1q1 M1M1 f2f2 q2q2 M2M2 ε f1f1 q1q1 qfqf q0q0 εε ε M1M1 ε

13 Example: r = 0(0+1)* r = r 1 r 2 r 1 = 0 r 2 = (0+1)* r 2 = r 3 * r 3 = 0+1 r 3 = r 4 + r 5 r 4 = 0 r 5 = 1 q0q0 1 q1q1

14 Example: r = 0(0+1)* r = r 1 r 2 r 1 = 0 r 2 = (0+1)* r 2 = r 3 * r 3 = 0+1 r 3 = r 4 + r 5 r 4 = 0 r 5 = 1 q0q0 1 q2q2 0 q1q1 q3q3

15 Example: r = 0(0+1)* r = r 1 r 2 r 1 = 0 r 2 = (0+1)* r 2 = r 3 * r 3 = 0+1 r 3 = r 4 + r 5 r 4 = 0 r 5 = 1 q4q4 q0q0 q1q1 1 q2q2 q3q3 0 ε εε ε q5q5

16 Example: r = 0(0+1)* r = r 1 r 2 r 1 = 0 r 2 = (0+1)* r 2 = r 3 * r 3 = 0+1 r 3 = r 4 + r 5 r 4 = 0 r 5 = 1 q6q6 q5q5 q4q4 q0q0 q1q1 1 q2q2 q3q3 0ε εε ε ε qfqf ε εε

17 Example: r = 0(0+1)* r = r 1 r 2 r 1 = 0 r 2 = (0+1)* r 2 = r 3 * r 3 = 0+1 r 3 = r 4 + r 5 r 4 = 0 r 5 = 1 q8q8 q9q9 q6q6 q5q5 q4q4 q0q0 q1q1 1 q2q2 q3q3 0ε εε ε ε qfqf ε εε 0

18 Example: r = 0(0+1)* r = r 1 r 2 r 1 = 0 r 2 = (0+1)* r 2 = r 3 * r 3 = 0+1 r 3 = r 4 + r 5 r 4 = 0 r 5 = 1 q8q8 q9q9 q6q6 q5q5 q4q4 q0q0 q1q1 1 q2q2 q3q3 0ε εε ε ε qfqf ε εε 0 ε

19 Definitions Required to Convert a DFA to a Regular Expression Let M = (Q, Σ, δ, q 1, F) be a DFA with state set Q = {q 1, q 2, …, q n }, and define: R i,j = { x | x is in Σ* and δ(q i,x) = q j } R i,j is the set of all strings that define a path in M from q i to q j. Note that states have been numbered starting at 1, not 0!

20 Example: R 2,3 = {0, 001, 00101, 011, …} R 1,4 = {01, 00101, …} R 3,3 = {11, 100, …} 0 q3q3 q1q1 0 q2q2 1 q5q5 q4q4 0 0 0 1 1 1 1 0

21 In words: R k i,j is the set of all the strings that define a path in M from q i to q j but that passes through no state numbered greater than k. Definition: R k i,j = { x | x is in Σ* and δ(q i,x) = q j, and for no u where 1  |u| < |x| and x = uv there is no case such that δ(q i,u) = q p where p>k} Note that it may be true that i>k or j>k, only the intermediate states on the path from i to j may not be >k.

22 Example: R 4 2,3 = {0, 1000, 011, …}R 1 2,3 = {0} 111 is not in R 4 2,3 because it goes via q5111 is not in R 1 2,3 101 is not in R 1 2,3 R 5 2,3 = R 2,3 any state may be on the path now q3q3 q1q1 0 q2q2 1 q5q5 q4q4 0 0 0 1 1 1 1 0

23 Obeservations: 1) R n i,j = R i,j 2) R k-1 i,j is a subset of R k i,j 3) L(M) = R n 1,q = R 1,q 4) R 0 i,j = Easily computed from the DFA! 5) R k i,j = R k-1 i,k (R k-1 k,k )* R k-1 k,j U R k-1 i,j Now, you see the purpose of introducing k: So that we can write it as a RE

24 Notes on 5: 5) R k i,j = R k-1 i,k (R k-1 k,k ) * R k-1 k,j U R k-1 i,j Consider paths represented by the strings in R k i,j : : IF x is a string in R k i,j then no state numbered > k may passed through when processing x and either: –q k is not passed through, i.e., x is in R k-1 i,j –q k is passed through one or more times, i.e., x is in R k-1 i,k (R k-1 k,k ) * R k-1 k,j qiqi qjqj

25 Lemma 2: Let M = (Q, Σ, δ, q 1, F) be a DFA. Then there exists a regular expression r such that L(M) = L(r). Proof: First we will show (by induction on k) that for all i,j, and k, where 1  i,j  n and 0  k  n, that there exists a regular expression r such that L(r) = R k i,j. Basis: k=0 R 0 i,j contains single symbols, one for each transition from q i to q j, and possibly ε if i=j. case 1) No transitions from q i to q j and i != j r 0 i,j = Ø case 2) At least one (m  1) transition from q i to q j and i != j r 0 i,j = a 1 + a 2 + a 3 + … + a m where δ(q i, a p ) = q j, for all 1  p  m

26 case 3) No transitions from q i to q j and i = j r 0 i,j = ε case 4) At least one (m  1) transition from q i to q j and i = j r 0 i,j = a 1 + a 2 + a 3 + … + a m + ε where δ(q i, a p ) = q j for all 1  p  m Inductive Hypothesis: Suppose that R k-1 i,j can be represented by the regular expression r k-1 i,j for all 1  i,j  n, and some k  1. Inductive Step: Consider R k i,j = R k-1 i,k (R k-1 k,k ) * R k-1 k,j U R k-1 i,j. By the inductive hypothesis there exist regular expressions r k-1 i,k, r k-1 k,k, r k-1 k,j, and r k-1 i,j generating R k-1 i,k, R k-1 k,k, R k-1 k,j, and R k-1 i,j, respectively. Thus, if we let r k i,j = r k-1 i,k (r k-1 k,k ) * r k-1 k,j + r k-1 i,j then r k i,j is a regular expression generating R k i,j,i.e., L(r k i,j ) = R k i,j.

27 Finally, if F = {q j1, q j2, …, q jr }, then r n 1,j1 + r n 1,j2 + … + r n 1,jr is a regular expression generating L(M). Note: not only does this prove that the regular expressions generate the regular languages, but it also provides an algorithm for computing it!

28 Example: First table column is computed from the DFA. k = 0k = 1k = 2 r k 1,1 ε r k 1,2 0 r k 1,3 1 r k 2,1 0 r k 2,2 ε r k 2,3 1 r k 3,1 Ø r k 3,2 0 + 1 r k 3,3 ε q1q1 0 q2q2 q3q3 1 1 00/1

29 All remaining columns are computed from the previous column using the formula. r 1 2,3 = r 0 2,1 (r 0 1,1 )* r 0 1,3 + r 0 2,3 = 0 ( ε )* 1 + 1 = 01 + 1 k = 0k = 1k = 2 r k 1,1 εε r k 1,2 00 r k 1,3 11 r k 2,1 00 r k 2,2 εε + 00 r k 2,3 11 + 01 r k 3,1 ØØ r k 3,2 0 + 10 + 1 r k 3,3 εε q1q1 0 q2q2 q3q3 1 1 00/1

30 r 2 1,3 = r 1 1,2 (r 1 2,2 )* r 1 2,3 + r 1 1,3 = 0 (ε + 00)* (1 + 01) + 1 = 0*1 k = 0k = 1k = 2 r k 1,1 εε(00)* r k 1,2 000(00)* r k 1,3 110*1 r k 2,1 000(00)* r k 2,2 εε + 00(00)* r k 2,3 11 + 010*1 r k 3,1 ØØ(0 + 1)(00)*0 r k 3,2 0 + 10 + 1(0 + 1)(00)* r k 3,3 εεε + (0 + 1)0*1 q1q1 0 q2q2 q3q3 1 1 00/1

31 To complete the regular expression, we compute: r 3 1,2 + r 3 1,3 k = 0k = 1k = 2 r k 1,1 εε(00)* r k 1,2 000(00)* r k 1,3 110*1 r k 2,1 000(00)* r k 2,2 εε + 00(00)* r k 2,3 11 + 010*1 r k 3,1 ØØ(0 + 1)(00)*0 r k 3,2 0 + 10 + 1(0 + 1)(00)* r k 3,3 εεε + (0 + 1)0*1

32 Theorem: Let L be a language. Then there exists an a regular expression r such that L = L(r) if and only if there exits a DFA M such that L = L(M). Proof: (if) Suppose there exists a DFA M such that L = L(M). Then by Lemma 2 there exists a regular expression r such that L = L(r). (only if) Suppose there exists a regular expression r such that L = L(r). Then by Lemma 1 there exists a DFA M such that L = L(M). Corollary: The regular expressions define the regular languages. Note: The conversion from a regular expression to a DFA and a program accepting L(r) is now complete, and fully automated!

1 Regular Expressions Highlights: –A regular expression is used to specify a language, and it does so precisely. –Regular expressions are very intuitive.

Similar presentations

Presentation on theme: "1 Regular Expressions Highlights: –A regular expression is used to specify a language, and it does so precisely. –Regular expressions are very intuitive."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Regular Expressions Highlights: –A regular expression is used to specify a language, and it does so precisely. –Regular expressions are very intuitive.

Similar presentations

Presentation on theme: "1 Regular Expressions Highlights: –A regular expression is used to specify a language, and it does so precisely. –Regular expressions are very intuitive."— Presentation transcript:

Similar presentations

About project

Feedback