 # 1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.

## Presentation on theme: "1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's."— Presentation transcript:

1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's construction) –Convert that to a deterministic one (Subset construction) –Minimize the DFA (Hopcroft's algorithm) –Implement it Existing scanner generator: flex

2 The scanning process: step 1 Let's build a mini-scanner that recognizes exactly those strings of as and bs that end in ab Step 1: Come up with a Regular Expression (a|b)*ab

3 The scanning process: step 2 Step 2: Use Thompson's construction to create an NFA for that expression We want to be able to automate the process Thompson's construction gives a systematic way to create an NFA from a RE. It builds the NFA in a bottom-up manner. At any time during construction –there is only one final state –no transitions leave the final state –components are linked together using  -productions.

4 The scanning process: step 2 Step 2: Use Thompson's construction to create an NFA for that expression a b a b     a|b a b     (a|b)*   

5 The scanning process: step 2 Step 2: Use Thompson's construction to create an NFA for that expression a b     (a|b)*ab    ab 

6 The scanning process: step 3 Step 3: Use subset construction to convert the NFA to a DFA Observation: –Two states q i, q k, linked together with an  - productions in the NFA should be the same state in the DFA because the machine goes from q i to q k without consuming input. The  -closure() function takes a state q and returns all the states that can be reached from q on  - productions only.

7 The scanning process: step 3 Step 3: Use subset construction to convert the NFA to a DFA Observation: –If, on some input a, the NFA can go to any one of k states, then those k state should be represented by a single state in the DFA. The  () function takes as input a state q and a character x and returns all states that we can go to from q when reading a single x.

8 The scanning process: step 3 Step 3: Use subset construction to convert the NFA to a DFA –The start state Q o of the DFA is the  -closure of the start state q 0 of the NFA –Compute  -closure(  (Q 0, x)) for each valid input character x. This will generate new states. –Systematically compute  -closure(  (Q i, x)) until no new states can be created. –The final states of the DFA are those that contain final states of the NFA.

9 The scanning process: step 3 Step 3: Use subset construction to convert the NFA to a DFA 35 a 46 b 81     27    910 a 1112 b   -closure(1) = {1, 2, 3, 4, 8, 9}

10 The scanning process: step 3 35 a 46 b 81     27    910 a 1112 b  Q 0 = {1,2,3,4,8,9}  (Q 0, a) = {5,7,8,9,2,3,4,10,11} = Q 1  (Q 0, b) = {6,7,8,9,2,3,4} = Q 2  (Q 1, a) = Q 1  (Q 1, b) = {6,7,8,9,2,3,4,12} = Q 3  (Q 2, a) = Q 1  (Q 2, b) = Q 2  (Q 3, a) = Q 1  (Q 3, b) = Q 2

11 The scanning process: step 3 35 a 46 b 81     27    910 a 1112 b  0 1 2 a b a 3 b a b b a

The scanning process: step 4 Step 4: Use Hopcroft's algorithm to minimize the DFA 0 1 2 a b a 3 b a b b a  (Q 0, a) = Q 1  (Q 0, b) = Q 2  (Q 2, a) = Q 1  (Q 2, b) = Q 2 States Q0 and Q2 behave the same way, so they can be merged. Note that even though Q3 also behaves the same way, it cannot be merged with Q0 or Q2 because Q3 is a final state while Q0 and Q2 are not. 0 1 a a 3 b b b a

13 In practice flex is a scanner generator that takes a RE specification and follows the described process to generate a DFA. The user additionally specifies –actions to be performed whenever a valid string has been recognized e.g. insert identifier in symbol table –error messages to be generated when the input string is invalid.

14 In practice Errors that are typically detected during scanning include –Unterminated strings –Unterminated comments –Invalid characters

Download ppt "1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's."

Similar presentations