COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.

Slides:



Advertisements
Similar presentations
4b Lexical analysis Finite Automata
Advertisements

COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Chapter 2 Lexical Analysis Nai-Wei Lin. Lexical Analysis Lexical analysis recognizes the vocabulary of the programming language and transforms a string.
LEXICAL ANALYSIS Phung Hua Nguyen University of Technology 2006.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
2. Lexical Analysis Prof. O. Nierstrasz
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
CS 426 Compiler Construction
Chapter 3 Lexical Analysis
Topic #3: Lexical Analysis
Finite-State Machines with No Output Longin Jan Latecki Temple University Based on Slides by Elsa L Gunter, NJIT, and by Costas Busch Costas Busch.
Finite-State Machines with No Output
1 Outline Informal sketch of lexical analysis –Identifies tokens in input string Issues in lexical analysis –Lookahead –Ambiguities Specifying lexers –Regular.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Lexical Analyzer (Checker)
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Compiler Construction 2 주 강의 Lexical Analysis. “get next token” is a command sent from the parser to the lexical analyzer. On receipt of the command,
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
CS 321 Programming Languages and Compilers Lectures 16 & 17 Introduction to Formal Languages Regular Languages Lexical Analysis.
By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
Lexical Analysis.
1st Phase Lexical Analysis
Prof. Necula CS 164 Lecture 31 Lexical Analysis Lecture 3-4.
1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Scribing K SAMPATH KUMAR 11CS10022 scribing. Definition of a Regular Expression R is a regular expression if it is: 1.a for some a in the alphabet ,
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
Compilers Lexical Analysis 1. while (y < z) { int x = a + b; y += x; } 2.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Finite automate.
Chapter 3 Lexical Analysis.
Lexical analysis Finite Automata
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
CSc 453 Lexical Analysis (Scanning)
Two issues in lexical analysis
Recognizer for a Language
Review: NFA Definition NFA is non-deterministic in what sense?
Lexical Analysis Why separate lexical and syntax analyses?
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lexical Analysis and Lexical Analyzer Generators
Lecture 5: Lexical Analysis III: The final bits
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lexical Analysis Lecture 3-4 Prof. Necula CS 164 Lecture 3.
Recognition of Tokens.
Lecture 4: Lexical Analysis II: From REs to DFAs
Finite Automata.
4b Lexical analysis Finite Automata
Finite Automata & Language Theory
Chapter 3. Lexical Analysis (2)
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
4b Lexical analysis Finite Automata
Lecture 5 Scanning.
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1

Overview of the Subject (COMP 3438) Overview of Unix Sys. Prog. ProcessFile System Overview of Device Driver Development Character Device Driver Development Introduction to Block Device Driver Overview of Complier Design Lexical Analysis (HW #3) Syntax Analysis (HW #4) Part I: Unix System Programming (Device Driver Development) Part II: Compiler Design Course Organization (This lecture is in red) 2

Review for the Previous Lecture Lexical Analyzer Input: Output: Source Prog. Tokens Regular Expression patterns Regular Set (Language) Alphabet String Language Operations: Union, Concatenation, Kleen-closure Definition: 1.  is a RE  {  }. 2. a in  is a RF  {a} 3. r, s are RE then r|s is RE rs is RE (r)* is RE Two methods to implement (1) Lex tool (2) Finite Automaton 3

The Outline NFA Regular expressions DFA (Deterministic Finite Automata) Lexical Specification Table-driven Implementation of DFA (Lexical Analyzer) (Nondeterministic Finite Automaton) 4

Part I: NFA, DFA and the Conversion 5

Recognizing Tokens Regular Expression  Specify tokens Finite Automaton  Recognize tokens A language recognizer: A recognizer for a language L is a program that takes a string x and answers “ yes ” if x is a sentence of L and “ no ” otherwise. Recognizer for L Input: string x Outputs: YES (x  L) or NO (x  L) 6

Nondeterministic Finite Automata A finite automaton consists of 5 components ( , S, s 0, F, move ) 1) An input alphabet  2) A set of states S 3) A start state n 4) A set of accepting states F  S 5) A transition function move that maps state-symbol pairs to sets of states. 7

Transition Graphs A state The start state An accepting state A transition a  A simple example: a finite automaton that accepts only “1” 1 start 8

Examples A finite automaton accepting any number of 1’s followed by a single 0.  = {0,1} Alphabet :  = {0,1} What language does the following language recognize? NFA accepting 1* NFA accepting (1|0)*10 start 9

10 The transition function of an NFA can also be implemented by a transition table, where the entries of rows are states and columns, respectively. State Transition Table 0 1 a a 2 b b STATE Input Symbols a b 0 1 {0, 1}{0} {2}

 Transition Another kind of transition: Machine can move from state s1 to state s2 without reading any input Example:  = { 0, 1} 1 2    NFA accepting 0 0*|1 1* 11

NFA are hard to implement Nondeterministic Finite Automata (NFA) Can have multiple transitions from one input in a given state Can have  -transitions Hard to implement Another kind of finite automata: Deterministic Finite Automata 12

Deterministic Finite Automata A deterministic Finite Automaton (DFA) is a special case of NFA One transition per input per state No  -transitions 13

Examples for NFA and DFA  = {a, b} An NFA recognizing the language (a|b) * abb start b a 0312 abb a b a b 0312 abb A DFA accepting (a|b) * abb NFA: 1. may get into multiple states with an symbol. 2.  transition DFA: 1. Only get in one state state with an input; 2. No  transition 14

Table Implementation of DFA S T a a b b a b ab STU TTU UTU StateInput character Input Strings: (1)aaaa (2)bbb (3)abababb (4)aaaaaab (5)babaaaa (6)bbbbbba U 15

Algorithm 3.1 Simulating a DFA. Input: input string x terminated by an end-of-file character eof; DFA D with start state s 0 and set of accepting states F. Output: The answer “ yes ” if D accepts x, “ no ” otherwise. Method: Apply the following algorithm to the input string x. move(s, c) gives the state to which there is a transition from state s on input c. nextchar returns next character of the input string x. s := s 0 ; c := nextchar; while c ≠ eof do s := move(s, c); c := nextchar end; if s is in F then return “yes” else return “no”; 16

NFA to DFA A DFA is a special case of NFA No  -transitions One transition per input per state The conversion of an NFA to a DFA needs to solve:  -closure(s): The set of all states reachable from s on  - transition (Solve no  -transitions) Consider all input symbols from one state (Solve one transition per input per state) 17

An Example   An NFA T1=  -closure(s1)={s1,s2,s3} T1 0  -closure( move(T1,0))={s4} T2= {s4}  -closure( move(T1,1))={s5} T3= {s5} T2 1  -closure( move(T2,0))={s4}=T2  -closure( move(T2,1))= DUMP 0 T3 1 1 DUMP 0  -closure( move(T3,1))={s5}=T3  -closure( move(T3,0))= DUMP  -closure( move(DUMP,0))= DUMP  -closure( move(DUMP,1))= DUMP 0,1 An accepting state is a state containing at least one accepting state of the NFA 18

Conversion Algorithm (NFA  DFA) Input: An NFA N ( s0 is the start state ) Output: A DFA D (Dstates  all states, Dtran  transition fuction) Algorithm: T0=  -closure(s0); T0 is unmarked and Dstates = {T0}; for each an unmarked state T in Dstates mark T; for each input symbol a U =  -closure( move(T,a) ); if U is not in Dstates then add U as an unmarked state to Dstates; Dtran(T,a) = U; end for 19

Another Example   b a     start   7 89 abb T1=  -closure(0)={0,1,2,4,7} move(T1,a)={3,8}  -closure(move(T1,a))=  -closure(3,8 )={1,2,3,4,6,7,8} = T2 move(T1,b)={5}  -closure(move(T1,b))=  -closure(5) ={1,2,4,5,6,7} = T3 move(T2,a)={3,8}  -closure(move(T2,a))=  -closure(3,8 )= T2 move(T2,b)={5,9}  -closure(move(T2,b))=  -closure(5,9) = {1,2,4, 5,6,7,9}=T4 move(T3,a)={3,8}  -closure(move(T3,a))=  -closure(3,8) =T2 move(T3,b)={5}  -closure(move(T3,b))=  -closure(5) =T3 move(T4,a)={3,8}  -closure(move(T4,a))=  -closure(3,8) = T2 move(T4,b)={5,10}  -closure(move(T4,b))=  -closure(5,10) ={1,2,4,5,6,7,10}= T5 move(T5,a)={3,8}  -closure(move(T5,a))=  -closure(3,8) = T2 move(T5,b)={5}  -closure(move(T5,b))=  -closure(5) = T3 20

Another Example (cont.) T1=  -closure(0)={0,1,2,4,7} move(T1,a)={3,8}  -closure(move(T1,a))=  -closure(3,8 )={1,2,3,4,6,7,8} = T2 move(T1,b)={5}  -closure(move(T1,b))=  -closure(5) ={1,2,4,5,6,7} = T3 move(T2,a)={3,8}  -closure(move(T2,a))=  -closure(3,8 )= T2 move(T2,b)={5,9}  -closure(move(T2,b))=  -closure(5,9) = {1,2,4, 5,6,7,9}=T4 move(T3,a)={3,8}  -closure(move(T3,a))=  -closure(3,8) =T2 move(T3,b)={5}  -closure(move(T3,b))=  -closure(5) =T3 move(T4,a)={3,8}  -closure(move(T4,a))=  -closure(3,8) = T2 move(T4,b)={5,10}  -closure(move(T4,b))=  -closure(5,10) ={1,2,4,5,6,7,10}= T5 move(T5,a)={3,8}  -closure(move(T5,a))=  -closure(3,8) = T2 move(T5,b)={5}  -closure(move(T5,b))=  -closure(5) = T3 a b T1 T2 T3 T2 T2 T4 T3 T2 T3 T4 T2 T5 T5 T2 T3 Transition Table 21

States of DFA obtained from NFA An NFA may be in many states at any time If there are N states, the NFA must be in some subset of those N states How many subsets are there? Given a set with N elements, it has 2 N subsets. At most 2 N states where N is the num. of states of NFA 22

Part II. Regular Expression to NFA 23

Regular Expression to NFA NFA Regular expressions DFA (Deterministic Finite Automata) Lexical Specification Table-driven Implementation of DFA (Lexical Analyzer) (Nondeterministic Finite Automaton) 24

Algorithm 3.3 (Thompson ’ s construction) An NFA from a regular expression. Input: A regular expression r over an alphabet . Output: An NFA N accepting L(r). Method: defined by rules for the following cases: 1. For , construct the NFA 2.For a in , construct the NFA 3.Suppose N(s) and N(t) are NFA ’ s for regular expressions s and t. We have the following four sub-cases; the NFA ’ s to be constructed are shown in the next page. a) the regular expression s | t b) the regular expression st c) the regular expression s * d) the regular expression (s) if start  if a 25

   N(t)N(s) N(t) N(s)     start i f i f N(s) start i f  NFA for s|t NFA for st NFA for s* NFA for (d) N(s) itself The accepting state of s and the start state of t are merged 26

Example: 3.16 Using Algorithm 3.3, construct N(r) for the regular expression r = (a | b) * abb 23 start a 45 b b a     ??? 5   b a     start   27

Part III. Homework 28

Homework - Tokens Sample Prog: What kind of tokens we need to recognize? Keywords: var, begin, end ID: a,b,var1 Number: 2.0 Operators: /, = Delimiter: ;. ( ) Whitespace: ‘\n’, ‘\t’, ‘ ’ var a, b, c; begin a = 5; b = 6; c = (a + b) / 2.0 end. 29

Homework - Regular Expression Sample Prog: Regular Expression: LETTER  [a-zA-Z] DIGIT  [0-9] KEYWORD  var | begin | end ID  LETTER (LETTER|DIGIT)* … var a, b, c; begin a = 5; b = 6; c = (a + b) / 2.0 end. 30

Implementation - Buffer & Pointers The input is read into a buffer Use two pointers: lex_begin: The beginning pointer forward: The look ahead pointer To deal with failure, easily move pointer “forward” back. v a r a, E a ; \n b e g i n \n a = 2 0 E a.. lex_begin forward Look Section 3.2 for Buffer and Pointers 31

Homework – Keywords & ID ID letter 01 letter or digit other 2 * A keyword is identified by this NFA as an ID. So after an ID is obtained, check a keyword table to see if it is an ID or keyword. Significantly reduce the number of states 1. Return token ID; 2. lex_begin & forward to get lexeme; 3. Adjust lex_begin & forward; 32

Homework - Implement Transition Table ID letter 01 letter or digit other 2 * Operator > 34 = 5 other 6 * < token nexttoken() { … switch( state ){ case S0: c=nextchar() if( isletter(c) ) state = 1; else state= fail( ); break; case …. … } int fail() { forward=lex_begin; switch (start){ case 0: start=3; break; case 3: start=…. … } return start; } Look Section 3.4 (pp. 98) for details. 33

Any Questions?