Chapter 2 Scanning From Regular Expression to DFA Gang S.Liu College of Computer Science & Technology Harbin Engineering University
From Regular Expression to DFA Regular expression NFADFAProgram
From a Regular Expression to NFA The construction we will describe is know as Thompson ’ s construction. It uses ε-transitions to “ glue together ” the machines of each piece of a regular expression.
Basic Regular Expression a ε a ε Φ
Concatenation Clearly, this machine accepts L(rs) = L(r)L(s) and corresponds to the regular expression rs r…r… r…r… s…s… ε NFA for a regular expression r s…s… NFA for a regular expression s NFA for a regular expression rs
Choice among Alternatives We added a new start state and a new accepting state using ε-transitions. This machine accepts L( r | s ) = L(r) ∪ L(s). r…r… s…s… ε ε ε ε
Repetition This machine corresponds to r*. r…r… ε ε ε ε
Example 2.12 Translate the regular expression ab|a into a NFA. ab ab ε aεε εε ab ε
Example 2.13 letter(letter|digit)* letterdigit letter digit letter ε ε ε ε ε ε ε ε ε
From Regular Expression to DFA Regular expression NFADFAProgram
From NFA to DFA We need some method for eliminating ε-transitions and multiple transitions from a state on a single input character. Eliminating ε-transitions involves the construction of ε-closures. Eliminating multiple transitions involves keeping track of the set of the states instead of single states.
ε - Closure ε-closure of a single state s is the set of states reachable by zero or more ε-transitions. We denote this set by s. ε-closure of a state always contains the state itself.
Example 2.14 a* 1 = 2 = 3 = 4 = ε ε a ε ε {1, 2, 4} {2} {2, 3, 4} {4}
ε-Closure of Set of States ε-closure of a set of states is defined as the union of ε-closures of each individual state. If S = {s 1, s 2, … s n } is a set of states, then S = s 1 ∪ s 2 ∪ … ∪ s n In the previous example we had 1 = {1, 2, 4} and 3 = {2, 3, 4} Let S = {1, 3} S = {1, 3} = 1 ∪ 3 = {1, 2, 4} ∪ {2, 3, 4} = {1, 2, 3, 4}
The Subset Construction Given NFA M. Need to construct a corresponding DFA M ’. 1.Compute ε-closure of the start state of M; this becomes the start state of M ’. 2.For this set and for each subsequent set, we compute transitions on character a as follows 1.Given a set of states S and a character a, compute the set of states S ’ a = {t | for some s in S there is a transition from s to t on a} 2.Compute S ’ a, the ε-closure of S ’ a. This becomes a new state. There is a transition from S to S ’ a on the character a. 3.Continue with this process until no new states or transitions are created. 4.Mark as accepting those constructed states that contain an accepting state of M.
Example 2.15 11 ε ε a ε ε ,2,4 2,3,4 a a {1, 2, 4} a {2, 3, 4} a = {1, 2, 4} – start state = {3}= {2, 3, 4} – new state and transition = {3}= {2, 3, 4} - no new state, new transition
Example 2.16 ab ε a 8 εε ε ε ,4,7,8 1,2,6 5,8 a b
Example 2.17 letter digit letter ε ε ε ε ε ε ε ε ε ,3,4,5,7,10 4,5,6,7,9,10 4,5,7,8,9,10 letter digit
From Regular Expression to DFA Regular expression NFADFAProgram
Minimizing Number of States The resulting DFA may be more complex than necessary. In Example 2.15 we got DFA for a*, but there is a more simple DFA. Important result from automata theory: For any given DFA there is an equivalent DFA containing a minimum number of states, and it is unique. 1,2,42,3,4 a a a
Minimizing Number of States Algorithm 1.Create two sets of states: all accepting states and all non-accepting states. 2.For each set of states, consider the transitions on each character a of the alphabet. If all states in the set have transitions on a to the same set of states, then it defines a- transition from the set of states to itself. If there are two states in the set s and t that have transitions on a that land in different sets, we must split the set of states into two sets according to where a-transitions land. 3.Repeat step 2 until either all states contain one element or no further splitting occurs.
Example ,3,4,5,7,10 4,5,6,7,9,10 4,5,7,8,9,10 letter digit letter digit
Example 2.19 (a|ε)b* All states are accepting states. Each accepting state has a b-transition to another accepting state. a distinguishes state 1 from states 2 and 3. There is a-transition to error state from 2 and a b b b 12,3 a b b b
DFA for Special Symbols All special symbols except assignment are distinct single characters. If we use a variable to indicate the token type, all accepting states can be collapsed into one state DONE. + - ; return PLUS return MINUS … return SEMI
Adding Numbers and Identifiers START INID INNUM DONE digit letter [other] + - * / = < ( ) ;
Adding White Space, Comments, and Assignment START INID INNUM DONE digit letter [other] + - * / = < ( ) ; other INASSIGN INCOMMENT { } other : = [other] white space
Homework 2.12 a. Use Tompson ’ s construction to convert the regular expression ( a | b )* a ( a | b | ε ) into an NFA. b. Convert the NFA of part(a) into a DFA using the subset construction.