Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.

Slides:



Advertisements
Similar presentations
CSE 311 Foundations of Computing I
Advertisements

4b Lexical analysis Finite Automata
CSC 361NFA vs. DFA1. CSC 361NFA vs. DFA2 NFAs vs. DFAs NFAs can be constructed from DFAs using transitions: Called NFA- Suppose M 1 accepts L 1, M 2 accepts.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Compiler Construction
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Fall 2006Costas Busch - RPI1 Regular Expressions.
1 Single Final State for NFAs and DFAs. 2 Observation Any Finite Automaton (NFA or DFA) can be converted to an equivalent NFA with a single final state.
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
Topic #3: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
CS308 Compiler Principles Lexical Analyzer Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University Fall 2012.
Overview of Previous Lesson(s) Over View  Strategies that have been used to implement and optimize pattern matchers constructed from regular expressions.
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
Lexical Analysis Constructing a Scanner from Regular Expressions.
Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Lexical Analyzer (Checker)
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Compiler Construction 2 주 강의 Lexical Analysis. “get next token” is a command sent from the parser to the lexical analyzer. On receipt of the command,
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Pembangunan Kompilator.  A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that language, and.
Fall 2003CS416 Compiler Design1 Lexical Analyzer Lexical Analyzer reads the source program character by character to produce tokens. Normally a lexical.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
Overview of Previous Lesson(s) Over View  Algorithm for converting RE to an NFA.  The algorithm is syntax- directed, it works recursively up the parse.
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 1 Regular Languages Some slides are in courtesy.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
Finite Automata & Regular Languages Sipser, Chapter 1.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2007.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
CSE 311 Foundations of Computing I Lecture 24 FSM Limits, Pattern Matching Autumn 2011 CSE 3111.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Scribing K SAMPATH KUMAR 11CS10022 scribing. Definition of a Regular Expression R is a regular expression if it is: 1.a for some a in the alphabet ,
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2006.
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
Converting Regular Expressions to NFAs Empty string   is a regular expression denoting  {  } a is a regular expression denoting {a} for any a in 
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Compilers Lexical Analysis 1. while (y < z) { int x = a + b; y += x; } 2.
COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Chapter 3 Lexical Analysis.
Lexical analysis Finite Automata
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
Two issues in lexical analysis
Recognizer for a Language
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Recognition of Tokens.
Finite Automata.
4b Lexical analysis Finite Automata
Finite Automata & Language Theory
Chapter 3. Lexical Analysis (2)
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
4b Lexical analysis Finite Automata
Presentation transcript:

Overview of Previous Lesson(s)

Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.  These symbols may specify several paths, some of which lead to accepting states and some that don't.  In such a case the NFA does accept the string, one successful path is enough.  If an edge is labeled ε, then it can be taken for free. 3

Over View..  A deterministic finite automaton (DFA) is a special case of an NFA where:  There are no moves on input ε, secondly,  For each state S and input symbol a, there is exactly one edge out of s labeled a. 4

Over View...  Algorithm for converting any RE to an NFA.  The algorithm is syntax- directed, it works recursively up the parse tree for the regular expression.  For each sub-expression the algorithm constructs an NFA with a single accepting state. 5

Over View... Method:  Begin by parsing r into its constituent subexpressions.  The rules for constructing an NFA consist of basis rules for handling subexpressions with no operators.  Inductive rules for constructing larger NFA's from the NFA's for the immediate sub expressions of a given expression. 6

Over View... Basis Step:  For expression ε construct the NFA  For any sub-expression a in Σ construct the NFA 7

Over View... Induction Step:  Suppose N(s) and N(t) are NFA's for regular expressions s and t, respectively.  If r = s|t. Then N(r), the NFA for r, should be constructed as  N(r) accepts L(s) U L(t), which is the same as L(r). 8

Over View...  Now Suppose r = st, Then N(r), the NFA for r, should be constructed as  N(r) accepts L(s)L(t), which is the same as L(r). 9

Over View...  Now Suppose r = s*, Then N(r), the NFA for r, should be constructed as  N(r) accept all the strings in L(s) 1, L(s) 2, and so on, so the entire set of strings accepted by N(r) is L(s*).  Finally suppose r = (s), Then L(r) = L(s) and we can use the NFA N(s) as N(r). 10

11

Contents  Design of a Lexical-Analyzer Generator  The Structure of the Generated Analyzer  Pattern Matching Based on NFA 's  DFA's for Lexical Analyzers  Optimization of DFA-Based Pattern Matchers  Important States of an NFA 12

Lexical-Analyzer Design  Here we will see the designing technique in generating a lexical- analyzer.  We will discuss two approaches, based on NFA's and DFA's.  The program that serves as the lexical analyzer includes a fixed program that simulates an automaton.  The rest of the lexical analyzer consists of components that are created from the Lex program. 13

Structure of the Generated Analyzer  Its components are:  A transition table for the automaton.  Functions that are passed directly through Lex to the output.  The actions from the input program, which appear as fragments of code to be invoked by the automaton simulator. 14

Structure of the Generated Analyzer  Architecture of a lexical analyzer generated by Lex. 15

Structure of the Generated Analyzer  To construct the automaton, we begin by taking each regular- expression pattern in the Lex program and converting it to an NFA.  We need a single automaton that will recognize lexemes matching any of the patterns in the program.  So we combine all the NFA's into one by introducing a new start state with ɛ-transitions to each of the start states of the NFA's Ni for pattern Pi 16

Structure of the Generated Analyzer  An NFA constructed from a Lex program 17 a { action A 1 for pattern P 1 } abb { action A 2 for pattern P 2 } a*b + { action A n for pattern P n }

Pattern Matching Based on NFA 's  For pattern based matching the simulator starts reading characters and calculates the set of states.  At some point the input character does not lead to any state or we have reached the eof.  Since we wish to find the longest lexeme matching the pattern we proceed backwards from the current point (where there was no state) until we reach an accepting state (i.e., the set of NFA states, N-states, contains an accepting N-state).  Each accepting N-state corresponds to a matched pattern.  The lex rule is that if a lexeme matches multiple patterns we choose the pattern listed first in the lex-program. 18

Pattern Matching Based on NFA's..  Ex. Consider three patterns and their associated actions and consider processing the input aaba. 19 aAction A 1 abb Action A 2 a*b + Action A 3 Pattern Actions to perform

Pattern Matching Based on NFA's…  We begin by constructing the three NFAs. 20

Pattern Matching Based on NFA's…  We introduce a new start state and ε-transitions as discussed in the previous section. 21

Pattern Matching Based on NFA's…  We start at the ε-closure of the start state, which is {0,1,3,7}.  The first a (remember the input is aaba) takes us to {2,4,7}.  This includes an accepting state and indeed we have matched the first patten. However, we do not stop since we may find a longer match.  The next a takes us to {7} and next b takes us to {8}.  The next a fails since there are no a-transitions out of state 8. 22

Pattern Matching Based on NFA's…  We are back in {8} and ask if one of these N-states is an accepting state.  Indeed state 8 is accepting for third pattern.  Action3 would now be performed. 23

DFA for Lexical Analyzer  In this section we see an architecture to convert the NFA for all the patterns into an equivalent DFA, using the subset construction mechanism of DFA from NFA.  Within each DFA state, if there are one or more accepting NFA states, determine the first pattern whose accepting state is represented, and make that pattern the output of the DFA state. 24

DFA for Lexical Analyzer..  A transition graph for the DFA handling the patterns a, abb and a*b + that is constructed by the subset construction from the NFA. 25

DFA for Lexical Analyzer…  The accepting states are labeled by the pattern that is matched by that state.  For instance, the state {6, 8 } has two accepting states, corresponding to patterns abb and a*b +.  Since the former is listed first, that is the pattern associated with state {6,8}. 26

DFA for Lexical Analyzer…  In the diagram, when there is no NFA state possible, we do not show the edge.  Technically we should show these edges, all of which lead to the same D-state, called the dead state, and corresponds to the empty subset of N-states. 27

Optimization of DFA-based Pattern Matchers  Now we will talk about some algorithms that have been used to implement and optimize pattern matchers constructed from regular expressions.  The first algorithm is useful in a Lex compiler, because it constructs a DFA directly from a regular expression, without constructing an intermediate NFA. The resulting DFA also may have fewer states than the DFA constructed via an NFA. 28

Optimization of DFA-based Pattern Matchers..  The second algorithm minimizes the number of states of any DFA, by combining states that have the same future behavior.  The algorithm itself is quite efficient, running in time O(n log n), where n is the number of states of the DFA.  The third algorithm produces more compact representations of transition tables than the standard, two-dimensional table. 29

Important States of an NFA  Prior to begin our discussion of how to go directly from a regular expression to a DFA, we must first dissect the NFA construction and consider the roles played by various states.  We call a state of an NFA important if it has a non-ɛ out-transition.  The subset construction uses only the important states in a set T when it computes ɛ- closure (move(T, a)), the set of states reachable from T on input a. 30

Important States of an NFA..  During the subset construction, two sets of NFA states can be identified if they:  Have the same important states, and  Either both have accepting states or neither does.  The important states are those introduced as initial states in the basis part for a particular symbol position in the regular expression. 31

Important States of an NFA...  The constructed NFA has only one accepting state, but this state, having no out-transitions, is not an important state.  By concatenating a unique right endmarker # to a regular expression r, we give the accepting state for r a transition on #, making it an important state of the NFA for (r) #.  The important states of the NFA correspond directly to the positions in the regular expression that hold symbols of the alphabet. 32

Important States of an NFA...  It is useful to present the regular expression by its syntax tree, where the leaves correspond to operands and the interior nodes correspond to operators.  An interior node is called a cat-node, or-node, or star-node if it is labeled by the concatenation operator (dot), union operator I, or star operator *, respectively. 33

Important States of an NFA...  Ex. Syntax tree for (a|b)*abb# 34

Thank You