Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.

Similar presentations


Presentation on theme: "COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1."— Presentation transcript:

1 COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1

2 Overview of the Subject (COMP 3438) Overview of Unix Sys. Prog. ProcessFile System Overview of Device Driver Development Character Device Driver Development Introduction to Block Device Driver Overview of Complier Design Lexical Analysis (HW #3) Syntax Analysis (HW #4) Part I: Unix System Programming (Device Driver Development) Part II: Compiler Design Course Organization (This lecture is in red) 2

3 Review for the Previous Lecture Lexical Analyzer Input: Output: Source Prog. Tokens Regular Expression patterns Regular Set (Language) Alphabet String Language Operations: Union, Concatenation, Kleen-closure Definition: 1.  is a RE  {  }. 2. a in  is a RF  {a} 3. r, s are RE then r|s is RE rs is RE (r)* is RE Two methods to implement (1) Lex tool (2) Finite Automaton 3

4 The Outline NFA Regular expressions DFA (Deterministic Finite Automata) Lexical Specification Table-driven Implementation of DFA (Lexical Analyzer) (Nondeterministic Finite Automaton) 4

5 Part I: NFA, DFA and the Conversion 5

6 Recognizing Tokens Regular Expression  Specify tokens Finite Automaton  Recognize tokens A language recognizer: A recognizer for a language L is a program that takes a string x and answers “ yes ” if x is a sentence of L and “ no ” otherwise. Recognizer for L Input: string x Outputs: YES (x  L) or NO (x  L) 6

7 Nondeterministic Finite Automata A finite automaton consists of 5 components ( , S, s 0, F, move ) 1) An input alphabet  2) A set of states S 3) A start state n 4) A set of accepting states F  S 5) A transition function move that maps state-symbol pairs to sets of states. 7

8 Transition Graphs A state The start state An accepting state A transition a  A simple example: a finite automaton that accepts only “1” 1 start 8

9 Examples A finite automaton accepting any number of 1’s followed by a single 0.  = {0,1} Alphabet :  = {0,1} What language does the following language recognize? 0 1 0 1 NFA accepting 1*0 0 1 1 1 2 0 0 NFA accepting (1|0)*10 start 9

10 10 The transition function of an NFA can also be implemented by a transition table, where the entries of rows are states and columns, respectively. State Transition Table 0 1 a a 2 b b STATE Input Symbols a b 0 1 {0, 1}{0} {2}

11  Transition Another kind of transition: Machine can move from state s1 to state s2 without reading any input Example:  = { 0, 1} 1 2  1 2 3 4 5   0 0 1 1 NFA accepting 0 0*|1 1* 11

12 NFA are hard to implement Nondeterministic Finite Automata (NFA) Can have multiple transitions from one input in a given state Can have  -transitions Hard to implement Another kind of finite automata: Deterministic Finite Automata 12

13 Deterministic Finite Automata A deterministic Finite Automaton (DFA) is a special case of NFA One transition per input per state No  -transitions 13

14 Examples for NFA and DFA  = {a, b} An NFA recognizing the language (a|b) * abb start b a 0312 abb a b a b 0312 abb A DFA accepting (a|b) * abb NFA: 1. may get into multiple states with an symbol. 2.  transition DFA: 1. Only get in one state state with an input; 2. No  transition 14

15 Table Implementation of DFA S T a a b b a b ab STU TTU UTU StateInput character Input Strings: (1)aaaa (2)bbb (3)abababb (4)aaaaaab (5)babaaaa (6)bbbbbba U 15

16 Algorithm 3.1 Simulating a DFA. Input: input string x terminated by an end-of-file character eof; DFA D with start state s 0 and set of accepting states F. Output: The answer “ yes ” if D accepts x, “ no ” otherwise. Method: Apply the following algorithm to the input string x. move(s, c) gives the state to which there is a transition from state s on input c. nextchar returns next character of the input string x. s := s 0 ; c := nextchar; while c ≠ eof do s := move(s, c); c := nextchar end; if s is in F then return “yes” else return “no”; 16

17 NFA to DFA A DFA is a special case of NFA No  -transitions One transition per input per state The conversion of an NFA to a DFA needs to solve:  -closure(s): The set of all states reachable from s on  - transition (Solve no  -transitions) Consider all input symbols from one state (Solve one transition per input per state) 17

18 An Example 1 2 3 4 5   0 0 1 1 An NFA T1=  -closure(s1)={s1,s2,s3} T1 0  -closure( move(T1,0))={s4} T2= {s4}  -closure( move(T1,1))={s5} T3= {s5} T2 1  -closure( move(T2,0))={s4}=T2  -closure( move(T2,1))= DUMP 0 T3 1 1 DUMP 0  -closure( move(T3,1))={s5}=T3  -closure( move(T3,0))= DUMP  -closure( move(DUMP,0))= DUMP  -closure( move(DUMP,1))= DUMP 0,1 An accepting state is a state containing at least one accepting state of the NFA 18

19 Conversion Algorithm (NFA  DFA) Input: An NFA N ( s0 is the start state ) Output: A DFA D (Dstates  all states, Dtran  transition fuction) Algorithm: T0=  -closure(s0); T0 is unmarked and Dstates = {T0}; for each an unmarked state T in Dstates mark T; for each input symbol a U =  -closure( move(T,a) ); if U is not in Dstates then add U as an unmarked state to Dstates; Dtran(T,a) = U; end for 19

20 Another Example   b a     start 1 6 2 4 3 0 10 5   7 89 abb T1=  -closure(0)={0,1,2,4,7} move(T1,a)={3,8}  -closure(move(T1,a))=  -closure(3,8 )={1,2,3,4,6,7,8} = T2 move(T1,b)={5}  -closure(move(T1,b))=  -closure(5) ={1,2,4,5,6,7} = T3 move(T2,a)={3,8}  -closure(move(T2,a))=  -closure(3,8 )= T2 move(T2,b)={5,9}  -closure(move(T2,b))=  -closure(5,9) = {1,2,4, 5,6,7,9}=T4 move(T3,a)={3,8}  -closure(move(T3,a))=  -closure(3,8) =T2 move(T3,b)={5}  -closure(move(T3,b))=  -closure(5) =T3 move(T4,a)={3,8}  -closure(move(T4,a))=  -closure(3,8) = T2 move(T4,b)={5,10}  -closure(move(T4,b))=  -closure(5,10) ={1,2,4,5,6,7,10}= T5 move(T5,a)={3,8}  -closure(move(T5,a))=  -closure(3,8) = T2 move(T5,b)={5}  -closure(move(T5,b))=  -closure(5) = T3 20

21 Another Example (cont.) T1=  -closure(0)={0,1,2,4,7} move(T1,a)={3,8}  -closure(move(T1,a))=  -closure(3,8 )={1,2,3,4,6,7,8} = T2 move(T1,b)={5}  -closure(move(T1,b))=  -closure(5) ={1,2,4,5,6,7} = T3 move(T2,a)={3,8}  -closure(move(T2,a))=  -closure(3,8 )= T2 move(T2,b)={5,9}  -closure(move(T2,b))=  -closure(5,9) = {1,2,4, 5,6,7,9}=T4 move(T3,a)={3,8}  -closure(move(T3,a))=  -closure(3,8) =T2 move(T3,b)={5}  -closure(move(T3,b))=  -closure(5) =T3 move(T4,a)={3,8}  -closure(move(T4,a))=  -closure(3,8) = T2 move(T4,b)={5,10}  -closure(move(T4,b))=  -closure(5,10) ={1,2,4,5,6,7,10}= T5 move(T5,a)={3,8}  -closure(move(T5,a))=  -closure(3,8) = T2 move(T5,b)={5}  -closure(move(T5,b))=  -closure(5) = T3 a b T1 T2 T3 T2 T2 T4 T3 T2 T3 T4 T2 T5 T5 T2 T3 Transition Table 21

22 States of DFA obtained from NFA An NFA may be in many states at any time If there are N states, the NFA must be in some subset of those N states How many subsets are there? Given a set with N elements, it has 2 N subsets. At most 2 N states where N is the num. of states of NFA 22

23 Part II. Regular Expression to NFA 23

24 Regular Expression to NFA NFA Regular expressions DFA (Deterministic Finite Automata) Lexical Specification Table-driven Implementation of DFA (Lexical Analyzer) (Nondeterministic Finite Automaton) 24

25 Algorithm 3.3 (Thompson ’ s construction) An NFA from a regular expression. Input: A regular expression r over an alphabet . Output: An NFA N accepting L(r). Method: defined by rules for the following cases: 1. For , construct the NFA 2.For a in , construct the NFA 3.Suppose N(s) and N(t) are NFA ’ s for regular expressions s and t. We have the following four sub-cases; the NFA ’ s to be constructed are shown in the next page. a) the regular expression s | t b) the regular expression st c) the regular expression s * d) the regular expression (s) if start  if a 25

26    N(t)N(s) N(t) N(s)     start i f i f N(s) start i f  NFA for s|t NFA for st NFA for s* NFA for (d) N(s) itself The accepting state of s and the start state of t are merged 26

27 Example: 3.16 Using Algorithm 3.3, construct N(r) for the regular expression r = (a | b) * abb 23 start a 45 b b a     1 6 2 4 3 ??? 5   b a     start 1 6 2 4 3 0 7 5   27

28 Part III. Homework 28

29 Homework - Tokens Sample Prog: What kind of tokens we need to recognize? Keywords: var, begin, end ID: a,b,var1 Number: 2.0 Operators: /, = Delimiter: ;. ( ) Whitespace: ‘\n’, ‘\t’, ‘ ’ var a, b, c; begin a = 5; b = 6; c = (a + b) / 2.0 end. 29

30 Homework - Regular Expression Sample Prog: Regular Expression: LETTER  [a-zA-Z] DIGIT  [0-9] KEYWORD  var | begin | end ID  LETTER (LETTER|DIGIT)* … var a, b, c; begin a = 5; b = 6; c = (a + b) / 2.0 end. 30

31 Implementation - Buffer & Pointers The input is read into a buffer Use two pointers: lex_begin: The beginning pointer forward: The look ahead pointer To deal with failure, easily move pointer “forward” back. v a r a, E a ; \n b e g i n \n a = 2 0 E a.. lex_begin forward Look Section 3.2 for Buffer and Pointers 31

32 Homework – Keywords & ID ID letter 01 letter or digit other 2 * A keyword is identified by this NFA as an ID. So after an ID is obtained, check a keyword table to see if it is an ID or keyword. Significantly reduce the number of states 1. Return token ID; 2. lex_begin & forward to get lexeme; 3. Adjust lex_begin & forward; 32

33 Homework - Implement Transition Table ID letter 01 letter or digit other 2 * Operator > 34 = 5 other 6 * < token nexttoken() { … switch( state ){ case S0: c=nextchar() if( isletter(c) ) state = 1; else state= fail( ); break; case …. … } int fail() { forward=lex_begin; switch (start){ case 0: start=3; break; case 3: start=…. … } return start; } Look Section 3.4 (pp. 98) for details. 33

34 Any Questions?


Download ppt "COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1."

Similar presentations


Ads by Google