# 1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015.

## Presentation on theme: "1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015."— Presentation transcript:

1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015

2 2 Outlines  3.1 Overview  3.2 Regular Expressions  3.4 Finite Automata and Scanners  3.5 Using a Scanner Generator  LEX --- Introduce in TA course: LEX introduction  3.7 Practical Considerations  3.8 Translating Regular Expressions into Finite Automata  3.9 Summary  Modify form http://www.cs.ualberta.ca/~amaral/courses/680/ 10/8/2015

Overview(1)  Formal notations  For specifying the precise structure of tokens are necessary  Quoted string in Pascal  Can a string split across a line?  Is a null string allowed?  Is.1 or 10. ok?  The 1..10 problem  Scanner generators  Tables, Programs  What formal notations to use? 10/8/20153

Overview(2)  Lexical analyzer (scanner) role  Produce a sequence of (tokens) for parser  Stripe out comments and whitespaces  Associate a line number with each error message  Expand macros 10/8/20154 Lexical Analyzer Parser Symbol Table source program to semantic analysis token getNextToken

Regular Expressions (1)  Tokens  built from symbols of a finite vocabulary.  Structures of tokens  use regular expressions to define  Set Definition  The sets of strings defined by regular expressions are termed   is a regular expression denoting the empty set  is a regular expression denoting the set that contains only the empty string  A string s is a regular expression denoting a set containing only s 10/8/20155

Regular Expression (2)  if A and B are regular expressions, so are  A | B (alternation)  A regular expression formed by A or B  (a)|(b) = {a, b}  AB or AB (concatenation)  A regular expression formed by A followed by B  (a)(b) = {ab}  A* (Kleene closure)  A regular expression formed by zero or more repetitions of A  a* = {, a, aa, aaa, …} 10/8/20156 More Complex Example (a|b|c)* = {, a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc …}

Regular Expression (3)  Some notational convenience  P +  PP* (at least one)  Not(A)  V - A  Not(S)  V* - S  A K  AA …A (k copies)  A ?  Optional, zero or one occurrence of A 10/8/20157 More Complex Example  Let D = (0 | 1 | 2 | 3 | 4 |... | 9 )  Let L = (A | B |... | Z | a | b |... | z)  comment = -- not(EOL)* EOL  ex: --hello12_34 \n  decimal = D + · D +  ex: 123.456  ident = L (L | D | _)*  ex: A1a5_6  comments = ((#|  ) not(#))*  ex:#A435#3

Regular Expressions (4)  Is regular expression as power as CFG?  { [ i ] i | i  1}  Regular grammar 10/8/20158 AaAa AaBAaB AaAa A  Ba or

Finite Automata and Scanners (1)  Finite automaton (FA)  can be used to recognize the tokens specified by a regular expression  A FA consists of  A finite set of states S  A set of input symbols  (the input symbol alphabet)  A set of transitions (or moves) from one state to another, labeled with characters in V  A special start state s 0 (only one)  A set of final, or accepting, states F 10/8/20159 FA = {S, , s 0, F, move }

Finite Automata and Scanners (2) 10/8/201510 is a transition is a state is a final state is the start state Example at next page….

 Example  A transition diagram  This machine accepts (abc + ) + Finite Automata and Scanners (3) 10/8/201511 a abc c (abc + ) +

Finite Automata and Scanners (4)  Other Example  (0|1)*0(0|1)(0|1) 10/8/201512 1423 00,1 5 0 (0|1)*0

Finite Automata and Scanners (5)  Other Example  ID = L(L|D)*(_(L|D) + )*  A data structure can be translated for many REs or FAs 10/8/201513 L - L | DL | D L | DL | D (_(L|D) + )*L(L|D)* Final for two * symbol What difference? Answer : “_” by times item 2 = item 3

Finite Automata and Scanners (6)  Other Example  RealLit = (D + ( |.))|(D*.D + ) 10/8/201514

 Two kinds of FA:  Deterministic: next transition is unique  Non-deterministic: otherwise Finite Automata and Scanners (7) 10/8/201515... a a Which path we should select?...

 A transition diagram  A transition table 4 3 2 1 Finite Automata and Scanners (8) 10/8/201516 1 / / Not(Eol) 342 Eol StateCharacter -Eolab… 3 3 2 4333

Finite Automata and Scanners (9)  Any regular expression  can be translated into a DFA that accepts the set of strings denoted by the regular expression  The transition can be done  Automatically by a scanner generator : LEX (TA course)  Manually by a programmer :  Coding the DFA in two form  1. Table-driven, commonly produced by a scanner generator  2. Explicit control, produced automatically or by hand 10/8/201517

Finite Automata and Scanners (10)  Scanner Driver Interpreting a Transition Table /* Note: CurrentChar is already set to the current input character. */ State = StartState; while (TRUE) { NextSate = T[State, CurrentChar]; if (NextSate == ERROR) break; State = NextState; CurrentChar = getchar(); } If(is_final_state(State)) /* Return or process valid token. */ else lexical_error(CurrentChar); 10/8/201518 Table-driven

Finite Automata and Scanners (11)  Scanner with Fixed Token Definition if (CurrentChar == ‘/') { CurrentChar = getchar(); if (CurrentChar == ‘/') { do { CurrentChar = getchar(); } while (CurrentChar != '\n'); } else { ungetc(CurrentChar, stdin); lexical_error(CurrentChar); } else lexical_error(CurrentChar); /* Return or process valid token. */ 10/8/201519 Explicit control

Finite Automata and Scanners (12)  Transducer  We may perform some actions during state transition.  A scanner can be turned into a transducer by the appropriate insertion of actions based on state transitions 10/8/201520

21 Using a Scanner Generator  By TA…. 10/8/2015

Practical Considerations (1)  Reserved Words  Usually, all keywords are reserved in order to simplify parsing.  In Pascal, we could even write  begin  begin; end; end; begin;  end  if else then if = else;  The problem  with reserved words is that they are too numerous.  COBOL has several hundreds of reserved words!  ZEROS  ZERO  ZEROES 10/8/201522

Practical Considerations (2)  Compiler Directives and Listing Source Lines  Compiler options e.g. optimization, profiling, etc.  handled by scanner or semantic routines  Complex pragmas are treated like other statements.  Source inclusion  e.g. #include in C  handled by preprocessor or scanner  Conditional compilation  e.g. #if, #endif in C  useful for creating program versions 10/8/201523

Practical Considerations (3)  Entry of Identifiers into the Symbol Table  Who is responsible for entering symbols into symbol table?  Scanner?  Consider this example:  { int abc;  …  { int abc; }  } 10/8/201524

Practical Considerations (4)  How to handle end-of-file?  Create a special EOF token.  EOF token is useful in a CFG  Multicharacter Lookahead  Blanks are not significant in Fortran  DO 10 I= 1,100  Beginning of a loop  DO 10 I = 1.100  An assignment statement DO 10 I= 1.100  A Fortran Scanner  can determine whether the O is the last character of a DO token only after reading as far as the comma 10/8/201525

Practical Considerations (5)  Multicharacter Lookahead (Cont’d)  In Ada and Pascal  To scan 10..100  There are three token  10 ..  100  Two-character (..) lookahead after the 10  It is easy to build a scanner that can perform general backup.  If we reach a situation  in which we are not in final state and cannot scan any more characters, we extract characters from the right end of the buffer and queue them fir rescanning  Until we reach a prefix of the scanned characters flagged as a valid token 10/8/201526 Example at next page

Practical Considerations (6)  An FA That Scans Integer and Real Literals and the Subrange Operator 10/8/201527 D ● D D D ●● Buffered TokenToken Flag 1Integer Literal 12Integer Literal 12.Invalid 12.3Real Literal 12.3eInvalid 12.3e+Invalid Detail Operation of each case at next page

Practical Considerations (7)  An FA That Scans Integer and Real Literals and the Subrange Operator 10/8/201528 D 1 Buffered Token Token Flag Integer Literal1 Input Token Input string: 12.3e+q

Practical Considerations (8)  An FA That Scans Integer and Real Literals and the Subrange Operator 10/8/201529 D 1 Buffered Token Token Flag Integer Literal1 Input Token D 22 Input string: 12.3e+q

Practical Considerations (9)  An FA That Scans Integer and Real Literals and the Subrange Operator 10/8/201530 D 1 Buffered Token Token Flag Invalid1 Input Token D 22.. ● Input string: 12.3e+q

Practical Considerations (10)  An FA That Scans Integer and Real Literals and the Subrange Operator 10/8/201531 D 1 Buffered Token Token Flag Real Literal1 Input Token D 22.. ● 33 D Input string: 12.3e+q

Practical Considerations (11)  An FA That Scans Integer and Real Literals and the Subrange Operator 10/8/201532 D 1 Buffered Token Token Flag Invalid1 Input Token D 22.. ● 33 D ee ? Input string: 12.3e+q Backup is invoked!

Practical Considerations (11)  An FA That Scans Integer and Real Literals and the Subrange Operator 10/8/201533 D 1 Buffered Token Token Flag Invalid1 Input Token D 22.. ● 33 D ee ? Input string: 12.3e+q Backup is invoked!

Practical Considerations (12)  An FA That Scans Integer and Real Literals and the Subrange Operator 10/8/201534 D 1 Buffered Token Token Flag Invalid1 Input Token D 22.. ● 33 D ee ? ++ ? Input string: 12.3e+q

Practical Considerations (13) cannot scan any more characters, and not in accept state  Backup is invoked ! 10/8/201535 D 1 Buffered Token Token Flag Invalid1 Input Token D 22.. ● 33 D ee ? ++ ? Input string: 12.3e+q

36 Outlines  3.1 Over View  3.2 Regular Expression  3.3 Finite Automata and Scanners  3.4 Using a Scanner Generator  3.5 Practical Considerations  3.6 Translating Regular Expressions into Finite Automata  Creating Deterministic Automata  Optimizing Finite Automata  3.7 Tracing Example 10/8/2015

Translating Regular Expressions into Finite Automata(1)  Regular expressions are equivalent to FAs  The main job of a scanner generator  To transform a regular expression definition into an equivalent FA 10/8/201537 A regular expressionNondeterministic FADeterministic FA Optimized Deterministic FA minimize # of states Importance in NFA->DFA

Translating Regular Expressions into Finite Automata(2)  We can transform any regular expression into an NFA with the following properties:  There is an unique final state  The final state has no successors  Every other state has at least one successors  Example : A Nondeterministic Finite Automaton (NFA)  Input : babb  Regular Expressions : (a|b)*abb 10/8/201538 Unique final stateFinal S has no successor 0 a a 2 bb b 31 either one or two successors

Translating Regular Expressions into Finite Automata(3)  We need to review the definition of regular expression  Item 1:  It is null string  Item 2: a  It is a char of the vocabulary  Item 3 : |  It is “or” operation.  Example : A|B  Item 4 : ●  It is the operation of catenation  Example : AB  Item 4 : *  It is the operation of repetition  Example : A* 10/8/201539 More Example at Next Page

Translating Regular Expressions into Finite Automata(4)  NFA :  (null string)  NFA : a (1string)  A char of the vocabulary 10/8/201540 a Processing Token a

 NFA :  NFA For A Translating Regular Expressions into Finite Automata(5) 10/8/201541 NFA For B  Processing Token 

 NFA :  ●  Translating Regular Expressions into Finite Automata(6) 10/8/201542 NFA For A NFA For B  Processing Token ● 

 NFA :   Translating Regular Expressions into Finite Automata(7) 10/8/201543 NFA For A  Processing Token  = 0 times > 1 times

 Construct an NFA for Regular Expression    01 * | 1  (0(1 * )) |1 Translating Regular Expressions into Finite Automata(8) 10/8/201544 1 * Processing Token  Start

 Construct an NFA for Regular Expression    01 * |1  (0(1 * )) |1 Translating Regular Expressions into Finite Automata(9) 10/8/201545 Processing Token 1 *  Start 0  For Connection

 Construct an NFA for Regular Expression     01 * +1  (0(1 * ))+1 Translating Regular Expressions into Finite Automata(10) 10/8/201546 Processing Token 1 * 0 |1    Start

 What’s problem about NFA?  Ans: It may be ambiguous that difficult to program!!!  A Nondeterministic Finite Automaton (NFA): (a|b)*abb Translating Regular Expressions into Finite Automata(11) 10/8/201547 2 b 3 Start 0 a 1 b a b Input : babb Processing Token ba Ambiguous!!! Which one should we select?

 What’s problem about NFA?  Ans: It may be ambiguous that difficult to program!!!  A deterministic Finite Automaton (NFA): b*abb Translating Regular Expressions into Finite Automata(12) 10/8/201548 2 b 3 Start 0 a 1 b b Input : babb Processing Token ba No Ambiguous!!! It have unique path! bb

Creating Deterministic Automata(1)  The transformation  from an NFA N to an equivalent DFA M works by what is sometimes called the subset construction  An Example for each step…  Initial NFA : 01 * |1  (0(1 * )) |1 10/8/201549 Start 4  65 2  3 1 10 7 89  More Detail operation at next page…

Creating Deterministic Automata(2)  Step 1: 10/8/201550 Start 4 1 65 2 0 3 110 7 89 1 1. -closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7, 10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10} More Detail operation at next page…

Creating Deterministic Automata(2)  Step 1: 10/8/201551 Start 4 1 65 2 0 3 110 7 89 1 1. -closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/201552 2 0 1 8 1 1. -closure(1) ={1, 2, 8}

Creating Deterministic Automata(2)  Step 1: 10/8/201553 Start 4 1 65 2 0 3 110 7 89 1 1. -closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/201554 2 0 2. -closure(2) ={2}

Creating Deterministic Automata(2)  Step 1: 10/8/201555 Start 4 1 65 2 0 3 110 7 89 1 1. -closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/201556 4 1 53 10 7 3. -closure(3) ={3,4,5,7,10}

Creating Deterministic Automata(2)  Step 1: 10/8/201557 Start 4 1 65 2 0 3 110 7 89 1 1. -closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/201558 4 1 5 10 7 4. -closure(4) ={4,5,7,10}

Creating Deterministic Automata(2)  Step 1: 10/8/201559 Start 4 1 65 2 0 3 110 7 89 1 1. -closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/201560 1 5 5. -closure(5) ={5}

Creating Deterministic Automata(2)  Step 1: 10/8/201561 Start 4 1 65 2 0 3 110 7 89 1 1. -closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/201562 1 65 10 7 6. -closure(6) ={5,6,7,10} This point line not be computed!!

Creating Deterministic Automata(2)  Step 1: 10/8/201563 Start 4 1 65 2 0 3 110 7 89 1 1. -closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/201564 10 7 7. -closure(7) ={5, 7,10} 1 5

Creating Deterministic Automata(2)  Step 1: 10/8/201565 Start 4 1 65 2 0 3 110 7 89 1 1. -closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/201566 8 1 8. -closure(8) ={8}

Creating Deterministic Automata(2)  Step 1: 10/8/201567 Start 4 1 65 2 0 3 110 7 89 1 1. -closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/201568 109 9. -closure(9) ={9,10}

Creating Deterministic Automata(2)  Step 1: 10/8/201569 Start 4 1 65 2 0 3 110 7 89 1 1. -closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/201570 10 10. -closure(10) ={10}

Creating Deterministic Automata(2)  Step 1: 10/8/201571 Start 4 1 65 2 0 3 110 7 89 1 1. -closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10} Total closures, but…..

Creating Deterministic Automata(2)  Step 1: 10/8/201572 Start 4 1 65 2 0 3 110 7 89 1 1. -closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={5, 7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10} Delete Sub Set...

Creating Deterministic Automata(2)  Step 1: 10/8/201573 Start 4 1 65 2 0 3 110 7 89 1 1. -closure(1) ={1, 2, 8} 2. -closure(2) ={2} 3. -closure(3) ={3,4,5,7,10} 4. -closure(4) ={4,5,7,10} 5. -closure(5) ={5} 6. -closure(6) ={5,6,7,10} 7. -closure(7) ={7,10} 8. -closure(8) ={8} 9. -closure(9) ={9,10} 10. -closure(10) ={10} Now Closures, No Sub Set... State 3   state 3   state 4   state 5,7   state 10   empty

Creating Deterministic Automata(3)  Step 1:  The initial state of M is the set of states reachable from the initial state of N by -transitions  Usually called l-closure or ε-closure 10/8/201574 Algorithm for example at upside

Creating Deterministic Automata(4)  Step 2:  To create the successor states  Take any state S of M and any character c, and compute S’s successor under c  S is identified with some set of N’s states, {n 1, n 2,…}  Find all possible successor states to {n 1, n 2,…} under c  Obtain a set {m 1, m 2,…}  T=close({m 1, m 2,…}) 10/8/201575 ST {n 1, n 2,…}close({m 1, m 2,…})

Creating Deterministic Automata(7)  Step 2: void make_deterministic( nondeterministic_fa N, deterministic *M) { set_of_fa_states T; M->initial_state = SET_OF(N.initial_state) ; close (& M->initial_state ); Add M-> initial_state to M->states; while( states or transitions can be added) { choose S in M->states and c in Alphabet; T=SET_OF (y in N. states SUCH THAT x->y under c for some x in S); close(& T); if(T not in M->states) add T to M->states; Add the transition to M->transitions: S->T under c; } M->final_states = SET_OF(S in M->states SUCH_THAT N.final_state in S); } 10/8/201576 Example at next page…

Creating Deterministic Automata(5)  Step 2:  First Re-Number for simplifying the work flow 1. -closure(1) ={1, 2, 8}  A = {1, 2, 8} 3. -closure(3) ={3,4,5,7,10}  B = {3,4,5,7,10} 6. -closure(6) ={5,6,7,10}  C = {5,6,7,10} 9. -closure(9) ={9,10}  D = {9, 10} 10/8/201577 Start 4 1 65 2 0 3 1 10 7 89 1 More Operation at next page ……

Creating Deterministic Automata(6) 10/8/201578 {1,2,8} {3,4,5,7,10} {9, 10} {5,6,7,10} Start 4 1 65 2 0 3 1 10 7 89 1 A : {1, 2, 8} B : {3,4,5,7,10} C : {5,6,7,10} D : {9, 10} A B C D Start 0 1 1 1 No Out-Degree Final

Creating Deterministic Automata(7)  Step 2: void make_deterministic( nondeterministic_fa N, deterministic *M) { set_of_fa_states T; M->initial_state = SET_OF(N.initial_state) ; close (& M->initial_state ); Add M-> initial_state to M->states; while( states or transitions can be added) { choose S in M->states and c in Alphabet; T=SET_OF (y in N. states SUCH THAT x->y under c for some x in S); close(& T); if(T not in M->states) add T to M->states; Add the transition to M->transitions: S->T under c; } M->final_states = SET_OF(S in M->states SUCH_THAT N.final_state in S); } 10/8/201579 Example at next page…

Optimizing Finite Automata(1)  Minimize number of states  Every DFA has a unique smallest equivalent DFA  Given a DFA M we use Transition Table to construct the equivalent minimal DFA.  Initially, we draw a transition table from DFA diagram. 10/8/201580 Start 1 A   DFA D 1 BC Table State Character 01 ABD BC CC D A: Start State B,C,D: Final State

Optimizing Finite Automata(2)  Minimize number of states 10/8/201581 State Character 01 ABD BC CC D Start 1 A   DFA D 1 BC Optimize B is equal C State Character 01 A{B, C}D D New DFA Start A   D 1 B,C A: Start State B,C,D: Final State Special : B can merge into C, Because the B and C are final state.

Additional  Simplifying rules (removing parentheses)  “ * ” has highest precedence and is left associative  Concatenation has 2nd highest precedence and is left associative  “| “has lowest precedence and is left associative  E.g., (a)|((b)*(c)) == a|b*c 10/8/201582

83 Outlines  3.1 Over View  3.2 Regular Expression  3.3 Finite Automata and Scanners  3.4 Using a Scanner Generator  3.5 Practical Considerations  3.6 Translating Regular Expressions into Finite Automata  3.7 Tracing Example  Modify form http://www.cs.ualberta.ca/~amaral/courses/680/ 10/8/2015

Tracing Example(1)  Review Steps of Scanner Generator 10/8/201584 A regular expressionNondeterministic FADeterministic FA Optimized Deterministic FA minimize # of states Importance in NFA->DFA

Tracing Example(2)  Regular Expression  IF and IFA 10/8/201585 if {return IF;} [a - z] [a – z|0 - 9 ] * {return ID;} [0 - 9] + {return NUM;}. {error ();}

Tracing Example(3)  Translate from RE to NFA 10/8/201586 A regular expression Nondeterministic FA Deterministic FA Optimized Deterministic FA minimize # of states

Tracing Example(4) 10/8/201587 The NFA for a symbol i is: i 12 start The NFA for the regular expression if is: f 3 1 start 2 i The NFA for a symbol f is: f 2 start 1 IF if {return IF;}

Tracing Example(5) 10/8/201588 a-z 1 start [a-z] [a-z|0-9 ] * {return ID;} 423 a-z 0-9 ID

Tracing Example(6) 10/8/201589 5 43 2 0-9 1 start NUM [0 – 9] + {return NUM;} 0-9

Tracing Example(9) 10/8/201590 NUM 21 any but \n error ID IF 1 2 i f 3 a-z 1 423 0-9 5 43 2 0-9 1

Tracing Example(10)  Translate from NFA to DFA 10/8/201591 A regular expression Nondeterministic FA Deterministic FA Optimized Deterministic FA minimize # of states

Tracing Example(11) 10/8/201592 238 4 56713 9 101112 14 151 a-z 0-9 a-z 0-9 i f IF error NUM ID any character Full NFA Diagram Special case :Handle in Final

Tracing Example(12) 10/8/201593 23 8 4 5 67 9 14 1 a-z 0-9 a-z 0-9 i f IF error NUM ID any character 1. -closure(1) ={ 1, 4, 9, 14} 2. -closure(2) ={ 2} 3. -closure(3) ={ 3} 4. -closure(5) ={ 5, 6, 8} 5. -closure(7) ={ 7, 8} 6. -closure(8) ={ 6, 8} 7. -closure(10) ={ 10, 11, 13} 9. -closure(13) ={11, 13} 8. -closure(12) ={12, 13} 10 1112 13 10. -closure(15) ={15} 15

Tracing Example(13) 10/8/201594 DFA States = {1-4-9-14} 1-4-9-14 Now we need to compute: move(1-4-9-14,a-h) = {5,15} -closure ({5,15}) = {5,6,8,15} a-h 5-6-8-15 23 8 4 5 67 9 14 1 a-z 0-9 a-z 0-9 i f IF error NUM ID any character 10 1112 13 15

Tracing Example(16) 10/8/201595 DFA States = {1-4-9-14} move(1-4-9-14, i) = 1-4-9-14 a-h {2,5,15} -closure ({2,5,15}) = {2,5,6,8,15} 2-5-6-8-15 i 23 8 4 5 67 9 14 1 a-z 0-9 a-z 0-9 i f IF error any character 10 1112 13 15 5-6-8-15

Tracing Example(21) 10/8/201596 DFA States = {1-4-9-14} move(1-4-9-14, j-z) = -closure ({5,15}) = 1-4-9-14 a-h 5-6-8-15 2-5-6-8-15 i j-z {5,15} {5,6,8,15} 23 8 4 5 67 9 14 1 a-z 0-9 a-z 0-9 i f IF error NUM ID any character 10 1112 13 15

Tracing Example(22) 10/8/201597 DFA States = {1-4-9-14} move(1-4-9-14, 0-9) = 1-4-9-14 a-h 5-6-8-15 2-5-6-8-15 i j-z 10-11-13-15 0-9 {10,15} -closure ({10,15}) = {10,11,13,15} 23 8 4 5 67 9 14 1 a-z 0-9 a-z 0-9 i f IF error NUM ID any character 10 1112 13 15

Tracing Example(23) 10/8/201598 DFA States = {1-4-9-14} move(1-4-9-14, other ) = 1-4-9-14 a-h 5-6-8-15 2-5-6-8-15 i j-z 10-11-13-15 0-9 15 other {15} -closure ({15}) = {15} 23 8 4 5 67 9 14 1 a-z 0-9 a-z 0-9 i f IF error any character 10 1112 13 15 NUM ID

Tracing Example(24) 10/8/201599 DFA states = {1-4-9-14} The analysis for 1-4-9-14 is complete. We mark it and pick another state in the DFA to analysis. (Practice) 23 8 4 5 67 13 9 101112 14 15 1 a-z 0-9 a-z 0-9 i f IF error NUM ID any character 1-4-9-14 a-h 5-6-8-15 2-5-6-8-15 i j-z 10-11-13-15 0-9 15 other

Tracing Example(25) 10/8/2015100 5-6-8-15 2-5-6-8-15 10-11-13-153-6-7-8 11-12-13 6-7-8 15 1-4-9-14 a-e, g-z, 0-9 a-z,0-9 0-9 f i a-h j-z 0-9 other ID NUM IF error ID a-z,0-9 See pp. 118 of Aho-Sethi-Ullman and pp. 29 of Appel.

Tracing Example(26) 10/8/2015101 A regular expression Nondeterministic FA Deterministic FA Optimized Deterministic FA minimize # of states Minimize DFA

Tracing Example(27) 10/8/2015102 Stat e character 0-9a-efg-hij-z oth er ADCCCBCE BGGFGGG- CGGGGGG- DH------ E------- FGGGGGG- GGGGGGG- HH------ A B C D E F G H Transition Table DFA

Tracing Example(28) 10/8/2015103 Stat e character 0-9a-efg-hij-z oth er ADCCCBCE BGGFGGG- CGGGGGG- DH------ E------- FGGGGGG- GGGGGGG- HH------ A B C D E F G H Transition Table DFA Sta te character 0-9a-efg-hij-z oth er ADCCCBCE BCCCCCC- CCCCCCC- DD------ E------- New Transition Table-1

Tracing Example(29) 10/8/2015104 A B C D E F G H DFA Sta te character 0-9a-efg-hij-z oth er ADCCCBCE BCCCCCC- CCCCCCC- DD------ E------- New Transition Table-1 Sta te character 0-9a-efg-hij-z oth er ADBBBBBE BBBBBBB- DD------ E------- New Transition Table-2

Tracing Example(30) 10/8/2015105 A B C D E F G H DFA B DE A 0-9 a-z 0-9 other IF ID NUM error a-z,0-9 B=C=F=G D=H Sta te character 0-9a-efg-hij-z oth er ADBBBBBE BBBBBBB- DD------ E------- New Transition Table-2 i f New DFA IF can be handled by look-ahead programming

Chapter 3 End Any Question? 10/8/2015106 隨堂考試(1 + ) What is the optimized DFA for 1 + ?

1. -closure(1) ={1, 2} 2. -closure(2) ={2} 3. -closure(3) ={3,4,2} 4. -closure(4) ={4,2} 1423  1 * = 1. 1 + (Can use this method) 1. -closure(1) ={1, 2}  A 3. -closure(3) ={3,4,2}  B State Character 01 AB BB A  B  {1,2} A Start {3,4,2} 1 B 1 Can Not Optimized, (Merge) For A is Start State, B is Final State!

Download ppt "1 Chapter 3 Scanning - Theory and Practice Prof Chung. 10/8/2015."

Similar presentations