Chapter 2 Scanning From Regular Expression to DFA Gang S.Liu College of Computer Science & Technology Harbin Engineering University.

Slides:



Advertisements
Similar presentations
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Advertisements

Finite Automata CPSC 388 Ellen Walker Hiram College.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
1 CIS 461 Compiler Design and Construction Fall 2012 slides derived from Tevfik Bultan et al. Lecture-Module 5 More Lexical Analysis.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
1.Defs. a)Finite Automaton: A Finite Automaton ( FA ) has finite set of ‘states’ ( Q={q 0, q 1, q 2, ….. ) and its ‘control’ moves from state to state.
Topic #3: Lexical Analysis
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary –Quoted string in.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
Regular Expressions and Finite State Automata  Themes  Finite State Automata (FSA)  Describing patterns with graphs  Programs that keep track of state.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Regular Expressions and Finite State Automata Themes –Finite State Automata (FSA) Describing patterns with graphs Programs that keep track of state –Regular.
Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
2. Scanning College of Information and Communications Prof. Heejin Park.
Lexical Analyzer (Checker)
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.
Converting NFAs to DFAs How a Syntax Analyser is constructed.
Transition Diagrams Lecture 3 Wed, Jan 21, Building Transition Diagrams from Regular Expressions A regular expression consists of symbols a, b,
Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
Brian Mitchell - Drexel University MCS680-FCS 1 Patterns, Automata & Regular Expressions int MSTWeight(int graph[][], int size)
Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Lecture 8 NFA Subset Construction & Epsilon Transitions
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Scribing K SAMPATH KUMAR 11CS10022 scribing. Definition of a Regular Expression R is a regular expression if it is: 1.a for some a in the alphabet ,
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
June 13, 2016 Prof. Abdelaziz Khamis 1 Chapter 2 Scanning – Part 2.
Compiler Construction Lecture Three: Lexical Analysis - Part Two CSC 2103: Compiler Construction Lecture Three: Lexical Analysis - Part Two Joyce Nakatumba-Nabende.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Compilers Lexical Analysis 1. while (y < z) { int x = a + b; y += x; } 2.
Theory of Computation Automata Theory Dr. Ayman Srour.
Department of Software & Media Technology
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Lecture 2 Lexical Analysis
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
CSE 105 theory of computation
Two issues in lexical analysis
Recognizer for a Language
Non-Deterministic Finite Automata
Transition Diagrams Lecture 3 Fri, Jan 21, 2005.
Converting NFAs to DFAs How Lex is constructed
CSCI 2670 Introduction to Theory of Computing
Compiler Construction
CSE 105 theory of computation
Chapter 1 Regular Language
Compiler Construction
Lecture 5 Scanning.
CHAPTER 1 Regular Languages
Part Two : Nondeterministic Finite Automata
CSCI 2670 Introduction to Theory of Computing
CSE 105 theory of computation
Presentation transcript:

Chapter 2 Scanning From Regular Expression to DFA Gang S.Liu College of Computer Science & Technology Harbin Engineering University

From Regular Expression to DFA Regular expression NFADFAProgram

From a Regular Expression to NFA  The construction we will describe is know as Thompson ’ s construction.  It uses ε-transitions to “ glue together ” the machines of each piece of a regular expression.

Basic Regular Expression a ε a ε Φ

Concatenation  Clearly, this machine accepts L(rs) = L(r)L(s) and corresponds to the regular expression rs r…r… r…r… s…s… ε NFA for a regular expression r s…s… NFA for a regular expression s NFA for a regular expression rs

Choice among Alternatives  We added a new start state and a new accepting state using ε-transitions.  This machine accepts L( r | s ) = L(r) ∪ L(s). r…r… s…s… ε ε ε ε

Repetition  This machine corresponds to r*. r…r… ε ε ε ε

Example 2.12  Translate the regular expression ab|a into a NFA. ab ab ε aεε εε ab ε

Example 2.13  letter(letter|digit)* letterdigit letter digit letter ε ε ε ε ε ε ε ε ε

From Regular Expression to DFA Regular expression NFADFAProgram

From NFA to DFA  We need some method for eliminating ε-transitions and multiple transitions from a state on a single input character.  Eliminating ε-transitions involves the construction of ε-closures.  Eliminating multiple transitions involves keeping track of the set of the states instead of single states.

ε - Closure  ε-closure of a single state s is the set of states reachable by zero or more ε-transitions. We denote this set by s.  ε-closure of a state always contains the state itself.

Example 2.14  a*  1 =  2 =  3 =  4 = ε ε a ε ε {1, 2, 4} {2} {2, 3, 4} {4}

ε-Closure of Set of States  ε-closure of a set of states is defined as the union of ε-closures of each individual state.  If S = {s 1, s 2, … s n } is a set of states, then S = s 1 ∪ s 2 ∪ … ∪ s n  In the previous example we had 1 = {1, 2, 4} and 3 = {2, 3, 4} Let S = {1, 3} S = {1, 3} = 1 ∪ 3 = {1, 2, 4} ∪ {2, 3, 4} = {1, 2, 3, 4}

The Subset Construction  Given NFA M.  Need to construct a corresponding DFA M ’. 1.Compute ε-closure of the start state of M; this becomes the start state of M ’. 2.For this set and for each subsequent set, we compute transitions on character a as follows 1.Given a set of states S and a character a, compute the set of states S ’ a = {t | for some s in S there is a transition from s to t on a} 2.Compute S ’ a, the ε-closure of S ’ a. This becomes a new state. There is a transition from S to S ’ a on the character a. 3.Continue with this process until no new states or transitions are created. 4.Mark as accepting those constructed states that contain an accepting state of M.

Example 2.15 11 ε ε a ε ε ,2,4 2,3,4 a a  {1, 2, 4} a  {2, 3, 4} a = {1, 2, 4} – start state = {3}= {2, 3, 4} – new state and transition = {3}= {2, 3, 4} - no new state, new transition

Example 2.16 ab ε a 8 εε ε ε ,4,7,8 1,2,6 5,8 a b

Example 2.17 letter digit letter ε ε ε ε ε ε ε ε ε ,3,4,5,7,10 4,5,6,7,9,10 4,5,7,8,9,10 letter digit

From Regular Expression to DFA Regular expression NFADFAProgram

Minimizing Number of States  The resulting DFA may be more complex than necessary. In Example 2.15 we got DFA for a*, but there is a more simple DFA.  Important result from automata theory: For any given DFA there is an equivalent DFA containing a minimum number of states, and it is unique. 1,2,42,3,4 a a a

Minimizing Number of States Algorithm 1.Create two sets of states: all accepting states and all non-accepting states. 2.For each set of states, consider the transitions on each character a of the alphabet. If all states in the set have transitions on a to the same set of states, then it defines a- transition from the set of states to itself. If there are two states in the set s and t that have transitions on a that land in different sets, we must split the set of states into two sets according to where a-transitions land. 3.Repeat step 2 until either all states contain one element or no further splitting occurs.

Example ,3,4,5,7,10 4,5,6,7,9,10 4,5,7,8,9,10 letter digit letter digit

Example 2.19  (a|ε)b*  All states are accepting states. Each accepting state has a b-transition to another accepting state.  a distinguishes state 1 from states 2 and 3. There is a-transition to error state from 2 and a b b b 12,3 a b b b

DFA for Special Symbols  All special symbols except assignment are distinct single characters.  If we use a variable to indicate the token type, all accepting states can be collapsed into one state DONE. + - ; return PLUS return MINUS … return SEMI

Adding Numbers and Identifiers START INID INNUM DONE digit letter [other] + - * / = < ( ) ;

Adding White Space, Comments, and Assignment START INID INNUM DONE digit letter [other] + - * / = < ( ) ; other INASSIGN INCOMMENT { } other : = [other] white space

Homework  2.12  a. Use Tompson ’ s construction to convert the regular expression ( a | b )* a ( a | b | ε ) into an NFA.  b. Convert the NFA of part(a) into a DFA using the subset construction.