Lexical Analysis IV : NFA to DFA DFA Minimization

Slides:



Advertisements
Similar presentations
CSE 311 Foundations of Computing I
Advertisements

Lecture 24 MAS 714 Hartmut Klauck
Complexity and Computability Theory I Lecture #4 Rina Zviel-Girshin Leah Epstein Winter
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Regular Expressions and DFAs COP 3402 (Summer 2014)
DFA Minimization Jeremy Mange CS 6800 Summer 2009.
Finite Automata CPSC 388 Ellen Walker Hiram College.
CSE 311: Foundations of Computing Fall 2013 Lecture 23: Finite state machines and minimization.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
1 CIS 461 Compiler Design and Construction Fall 2012 slides derived from Tevfik Bultan et al. Lecture-Module 5 More Lexical Analysis.
Lexical Analysis: DFA Minimization & Wrap Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp.
Compiler Construction
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
Courtesy Costas Busch - RPI1 Non Deterministic Automata.
From Cooper & Torczon1 Automating Scanner Construction RE  NFA ( Thompson’s construction )  Build an NFA for each term Combine them with  -moves NFA.
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Regular.
1 Single Final State for NFAs and DFAs. 2 Observation Any Finite Automaton (NFA or DFA) can be converted to an equivalent NFA with a single final state.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
Fall 2006Costas Busch - RPI1 Non-Deterministic Finite Automata.
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
Lexical Analysis — Part II From Regular Expression to Scanner Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
CS Chapter 2. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
Fall 2004COMP 3351 Another NFA Example. Fall 2004COMP 3352 Language accepted (redundant state)
Costas Busch - LSU1 Non-Deterministic Finite Automata.
CSE 311: Foundations of Computing Fall 2014 Lecture 23: State Minimization, NFAs.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
1 Compiler Construction Finite-state automata. 2 Today’s Goals More on lexical analysis: cycle of construction RE → NFA NFA → DFA DFA → Minimal DFA DFA.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
REGULAR LANGUAGES.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
Lexical Analysis Constructing a Scanner from Regular Expressions.
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lexical Analysis III : NFA to DFA DFA Minimization Lecture 5 CS 4318/5331 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper.
Lexical Analysis: DFA Minimization Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
CHAPTER 1 Regular Languages
Lecture # 12. Nondeterministic Finite Automaton (NFA) Definition: An NFA is a TG with a unique start state and a property of having single letter as label.
Lexical Analysis: DFA Minimization & Wrap Up. Automating Scanner Construction PREVIOUSLY RE  NFA ( Thompson’s construction ) Build an NFA for each term.
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CS 154 Formal Languages and Computability February 9 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CSE 311 Foundations of Computing I Lecture 24 FSM Limits, Pattern Matching Autumn 2011 CSE 3111.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Regular Languages Chapter 1 Giorgi Japaridze Theory of Computability.
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
Converting Regular Expressions to NFAs Empty string   is a regular expression denoting  {  } a is a regular expression denoting {a} for any a in 
Non-deterministic Finite Automata (NFA)
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
CSE 311 Foundations of Computing I
Principles of Computing – UFCFA3-30-1
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Non-Deterministic Finite Automata
Lecture 4: Lexical Analysis II: From REs to DFAs
DFA Equivalence & Minimization
NFA TO DFA.
Automating Scanner Construction
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lexical Analysis: DFA Minimization & Wrap Up
Lecture 5 Scanning.
Presentation transcript:

Lexical Analysis IV : NFA to DFA DFA Minimization Lecture 5 CS 4318/5331 Apan Qasem Texas State University Spring 2015 *some slides adopted from Cooper and Torczon

Announcements REU programs in summer

Review DFA NFA NFA and DFA recognize the same set of languages For every RE there exists a DFA Cannot convert REs directly to DFAs NFA DFAs that allow non-determinism empty transitions multiple transitions on same symbol NFA and DFA recognize the same set of languages 3

Review RE to NFA a* a a a a b ab a b a|b s0 s1 s0 s1 s1 s3 s0 s2 s1 s3 4

Thompson’s Construction NFA properties Each NFA has a single start state and a single final state The only transition that enters the initial state is the initial transition No transitions leave the final state An empty transition always connects two states that were start or final states of a component NFA A state has at most two entering and two exiting empty transitions try to convince yourself that these properties hold

Cycle of Construction RE NFA DFA Minimized DFA Code Thompson’s Subset Construction Code Hopcroft’s Algorithm

Construct NFA for (a|b)*aa Step 1: Construct trivial NFAs s0 s1 a s0 s1 b

Example : NFA for (a|b)*aa Step 2: Work inside parentheses a | b s0 s1 a s1 s0 s0 s1 b

Example : NFA for (a|b)*aa Step 2: Work inside parentheses a | b (rename states) s1 s3 a s2 s4 b s0 s5

Example : NFA for (a|b)*aa Step 3: * (closure) a s1 s3 s5 s5 s0 s0 b s2 s4

Example : NFA for (a|b)*aa Step 3: * (closure) - renaming states a s2 s4 s6 s7 s0 s1 b s3 s5

Example : NFA for (a|b)*aa Step 4: concatenation a s2 s4 a s3 s5 b s6 s7 s1 s0 s8 s9 a

Example : NFA for (a|b)*aa Step 5: concatenation a s2 s4 a s3 s5 b s6 s7 s1 s0 s10 s11 s8 s9

Example : NFA for (a|b)*aa Eliminating empty transitions for concatenation s2 s4 a s3 s5 b s6 s7 s1 s0 s9 s8

NFA to DFA To convert NFAs to DFAs we need to get rid of non-determinism from NFAs Three cases of non-determinism in NFAs Transition to a state without consuming any input Multiple transitions on the same input symbol No transition on an input symbol

Examples of Non-determinism in NFAs

Examples of Non-determinism in NFAs All we need to do is eliminate all transitions

Subset Construction : Example In state s2 on input a can go to either s3 or s4 s3 s2 a s4 Create a state for the DFA that represents the combined state

Subset Construction : Example In state s2 on input a, can go to either s3 or s4. From s3, can go to s5 and s6. From s4 can go to S6. From S5 … and so on … c b s5 s6 s3 s4 s2 a Follow the path for each state in the combined state to create new states s5 s6 c b s6

NFA→DFA with Subset Construction Main Idea: For every state in the NFA, determine all reachable states for every input symbol The set of reachable states constitute a single state in the converted DFA Each state in the DFA corresponds to a subset of states in the NFA (hence the name) Find reachable states for each new DFA state, until no more new states can be found

Finding Reachable States Two key functions Move(si, a) is the set of states reachable from si by a single hop only ε-closure(si) is the set of states reachable from si by ε can follow multiple εhops (hence “closure”) Move(s1, a) ? s3 Move(s2, a) ? empty ε-closure(s0)? s0, s1, s2, s3, s5 ε-closure(s2)? s2 a s1 s3 s0 s5 b s2 s4

Subset Construction : Algorithm // Start state, s0 is derived from start state of NFA Take ε-closure of NFA start state, s0 = ε-closure({n0}) s0 represents all the possible states we can be in, at the very beginning For each state in s0, Compute Move(si, α) for each α ∈ Σ, and take its ε-closure // This step gives us the reachable states Iterate until no more states are added

Subset Construction : Algorithm The algorithm halts: S contains no duplicates (test before adding) 2{NFA states} is finite while loop adds to S, but does not remove from S (monotone) ⇒ the loop halts S contains all the reachable NFA states Algorithm tries each character on each si . It builds every possible NFA configuration ⇒ S and T form the DFA s0 ← ε-closure({n0}) S ← {s0} W ← {s0} while ( W ≠ Ø ) select and remove si from W for each α ∈ Σ t ← ε-closure(Move(s,α)) T[s,α] ← t if ( t ∉ S ) then add t to S add t to W

Subset Construction : A fixed-point computation Example of a fixed-point computation Monotone construction of some finite set Halts when it stops adding to the set Proofs of halting and correctness are similar These computations arise in many contexts Other fixed-point computations Canonical construction of sets of LR(1) items Quite similar to the subset construction Classic data-flow analysis Differential Equation solvers Square root computation We will see more fixed-point computations later in this course

Subset Construction : Final States Any DFA state containing an NFA final state becomes a final state of the DFA

a b States ε-closure(move(∑,*)) DFA NFA a b q0 s2 s4 s3 s5 s6 s7 s1 s0

a b States ε-closure(move(∑,*)) DFA NFA a b q0 s0, s1, s2, s3, s7 s2

a b States ε-closure(move(∑,*)) DFA NFA a b q0 s0, s1, s2, s3, s7 s4, s8, s6, s7, s1, s2, s3

a b States ε-closure(move(∑,*)) DFA NFA a b q0 s0, s1, s2, s3, s7 s4, s8, s6, s7, s1, s2, s3 s5, s6, s7, s1, s2, s3

a b States ε-closure(move(∑,*)) DFA NFA a b q0 s0, s1, s2, s3, s7 s4, s8, s6, s7, s1, s2, s3 s5, s6, s7, s1, s2, s3 q1 q2

a b States ε-closure(move(∑,*)) DFA NFA a b q0 s0, s1, s2, s3, s7 s4, s8, s6, s7, s1, s2, s3 s5, s6, s7, s1, s2, s3 q1 s4, s8, s9, s6, s7, s1, s2, s3 q2

a b States ε-closure(move(∑,*)) DFA NFA a b q0 s0, s1, s2, s3, s7 s4, s8, s6, s7, s1, s2, s3 s5, s6, s7, s1, s2, s3 q1 s4, s8, s9, s6, s7, s1, s2, s3 q2

a b States ε-closure(move(∑,*)) DFA NFA a b q0 s0, s1, s2, s3, s7 s4, s8, s6, s7, s1, s2, s3 s5, s6, s7, s1, s2, s3 q1 s4, s8, s9, s6, s7, s1, s2, s3 q2 q3

a b States ε-closure(move(∑,*)) DFA NFA a b q0 s0, s1, s2, s3, s7 s4, s8, s6, s7, s1, s2, s3 s5, s6, s7, s1, s2, s3 q1 s4, s8, s9, s6, s7, s1, s2, s3 q2 q3

a b States ε-closure(move(∑,*)) DFA NFA a b q0 s0, s1, s2, s3, s7 s4, s8, s6, s7, s1, s2, s3 s5, s6, s7, s1, s2, s3 q1 s4, s8, s9, s6, s7, s1, s2, s3 q2 q3

a b States ε-closure(move(∑,*)) DFA NFA a b q0 s0, s1, s2, s3, s7 s4, s8, s6, s7, s1, s2, s3 s5, s6, s7, s1, s2, s3 q1 s4, s8, s9, s6, s7, s1, s2, s3 q2 q3

a b States ε-closure(move(∑,*)) DFA NFA a b q0 s0, s1, s2, s3, s7 s4, s8, s6, s7, s1, s2, s3 s5, s6, s7, s1, s2, s3 q1 s4, s8, s9, s6, s7, s1, s2, s3 q2 s5, s6, s7, s1, s2, s3, q3

a a Equivalent States a b a b b b q1 q0 q3 q2 DFA Transition Table ε-closure(move(s,*)) DFA NFA a b q0 s0, s1, s2, s3, s7 q1 q2 s4, s8, s6, s7, s1, s2, s3 q3 s5, s6, s7, s1, s2, s3, s8, s4, s9, s6, s1, s2, s3, s7 DFA Transition Table

think about an algorithm for primality test DFA Minimization Goal Discover sets of equivalent states Represent each such set with just one state Definition of equivalence Two states are equivalent if and only if ∀ α ∈ Σ, transitions on α lead to identical (or equivalent) states i.e., both states do the same thing if we land on them Trick Easier to determine if two states are not equivalent α-transitions to distinct sets ⇒ states must be in distinct sets think about an algorithm for primality test

Partition of a Set The DFA minimization algorithm is based on the notion of set partitions A partition P of S is a collection of sets P such that each s ∈ S is in exactly one pi ∈ P Not a partition Partition

Hopcroft’s Algorithm Proposed by John Hopcroft in 1971 Later improved efficiency to O(nlogn) Developed in the context of finite automaton but have found application in other areas alias analysis are the two variables referencing the same memory location? redundancy elimination are the values in two variables identical? Hopcroft also known for many other contributions to Computer Science The Cinderella book Hopcroft-Karp algorithm

Hopcroft’s Algorithm Main idea Find equivalent cars Initially put all elements (states/variables/pointers) in a single partition At each step divide the current partition based on some distinguishing property or behavior of the elements Elements that remain grouped together are equivalent Find equivalent cars Initial partition? Subdivide by Make? Color?

Algorithm for DFA Minimization Hopcroft’s algorithm applied to DFA Minimization Pick initial partition P0 Two sets: final states and non-final states {F} and {S-F}, where D =(S,Σ,δ,s0,F) Iteratively split the sets based on the behavior of the the states state transitions States that remain grouped together are equivalent What should our initial partition be? How do we capture the behavior of the state?

Splitting a Set  pj   pi pk

Splitting a Set  pj  pn pm  pk

Splitting a Set Splitting or partitioning a set by a Assume sa and sb ∈ pi, where pi is a subset of the original set of states (i) δ(sa,a) = sx and δ(sb,a) = sy (ii) sx ∈ pj, sy ∈ pk, j ≠ k

Algorithm for DFA Minimization T ← {F, {S-F}} P ← { } while ( P ≠ T) P ← T T ← { } for each set pi ∈ P T ← T ∪ Split(pi ) Split(S) for each c ∈ Σ if c splits S into s1 & s2 then return {s1 , s2} return S Partition P ∈ 2S Start off with 2 subsets of S: {F} and {S-F} The while loop takes Pi → Pi+1 by splitting 1 or more sets Pi+1 is at least one step closer to the partition with | S | sets Maximum of | S | splits Note that Partitions are never combined Initial partition ensures that final states remain final states

DFA Minimization Refining the algorithm As written, it examines every pi ∈ P on each iteration This strategy entails a lot of unnecessary work Only need to examine pi if some T, reachable from pi, has split Reformulate the algorithm using a worklist Start worklist with initial partition, F and {S-F} When it splits Pi into P1 and P2 , place P2 on worklist This version looks at each pi ∈ P many fewer times Hopcroft’s contribution

DFA Minimization : Example DFA for (a | b)*abb Transition Table b s0 s1 s2 s3 s4 a State a b S0 S1 S2 S3 S4

DFA Minimization : Example b s0 s1 s2 s3 s4 a Current Partition pi Split on a Split on b P0

DFA Minimization : Example b s0 s1 s2 s3 s4 a Current Partition pi Split on a Split on b P0 {s4} {s0,s1,s2,s3}

DFA Minimization : Example b s0 s1 s2 s3 s4 a Current Partition pi Split on a Split on b P0 {s4} {s0,s1,s2,s3} {s4}

DFA Minimization : Example b s0 s1 s2 s3 s4 a Current Partition pi Split on a Split on b P0 {s4} {s0,s1,s2,s3} {s4} none None

DFA Minimization : Example b s0 s1 s2 s3 s4 a Current Partition pi Split on a Split on b P0 {s4} {s0,s1,s2,s3} {s4} none {s0,s1,s2,s3}

DFA Minimization : Example b s0 s1 s2 s3 s4 a Current Partition pi Split on a Split on b P0 {s4} {s0,s1,s2,s3} {s4} none None {s0,s1,s2,s3} {s0,s1,s2} {s3}

DFA Minimization : Example b s0 s1 s2 s3 s4 a Current Partition pi Split on a Split on b P0 {s4} {s0,s1,s2,s3} {s4} none {s0,s1,s2,s3} {s0,s1,s2} {s3} P1 {s4} {s0,s1,s2} {s3} {s0,s1,s2}

DFA Minimization : Example b s0 s1 s2 s3 s4 a Current Partition pi Split on a Split on b P0 {s4} {s0,s1,s2,s3} {s4} none None {s0,s1,s2,s3} {s0,s1,s2} {s3} P1 {s4} {s0,s1,s2} {s3} {s0,s1,s2} {s0,s2} {s1}

DFA Minimization : Example b s0 s1 s2 s3 s4 a Current Partition pi Split on a Split on b P0 {s4} {s0,s1,s2,s3} {s4} none {s0,s1,s2,s3} {s0,s1,s2} {s3} P1 {s4} {s0,s1,s2} {s3} {s0,s1,s2} {s0,s2} {s1} P2 {s4} {s0,s2} {s1} {s3} {s0,s2}

DFA Minimization : Example b s0 s1 s2 s3 s4 a b S0, S2 s1 s3 s4 a Current Partition pi Split on a Split on b P0 {s4} {s0,s1,s2,s3} {s4} none {s0,s1,s2,s3} {s0,s1,s2} {s3} P1 {s4} {s0,s1,s2} {s3} {s0,s1,s2} {s0,s2} {s1} P2 {s4} {s0,s2} {s1} {s3} {s0,s2}

Example : Putting it together … Construct regular expression for language that contains all strings that start with an a, followed by any number of b’s and c’s a(b|c)*

Example : RE to NFA a(b|c)* Step 1: Compute trivial NFAs s0 s1 a s0 s1 b s0 s1 c

Example : RE to NFA a(b|c)* Step 2: Work inside parentheses b | c s0 s1 b s5 s0 s0 s1 c

Example : RE to NFA a(b|c)* Step 2: Work inside parentheses b | c b s1 s3 s0 s5 c s2 s4

Example : RE to NFA a(b|c)* Step 3: * (closure) b s1 s3 s5 s5 s0 s0 c s2 s4

Example : RE to NFA a(b|c)* Step 3: * (closure) b s2 s4 s6 s7 s0 s1 c s3 s5

Example : RE to NFA a(b|c)* Step 4: concatenation b s4 s5 a s8 s9 s1 s2 s3 s0 c s6 s7

NFA to DFA with Subset Construction q4 q5 b q6 q7 c q8 q1 q9 q3 q2 q0 a States ε-closure(move(s,*)) DFA NFA a b c s0 q0

NFA to DFA with Subset Construction q4 q5 b q6 q7 c q8 q1 q9 q3 q2 q0 a States ε-closure(move(s,*)) DFA NFA a b c s0 q0 q1, q2, q3 q4, q6, q9 none s1

NFA to DFA with Subset Construction q4 q5 b q6 q7 c q8 q1 q9 q3 q2 q0 a NFA to DFA with Subset Construction States ε-closure(move(s,*)) DFA NFA a b c s0 q0 q1, q2, q3 q4, q6, q9 none s1 q5, q8, q9 q3, q4, q6 q7, q8, q9

NFA to DFA with Subset Construction q4 q5 b q6 q7 c q8 q1 q9 q3 q2 q0 a NFA to DFA with Subset Construction States ε-closure(move(s,*)) DFA NFA a b c s0 q0 q1, q2, q3 q4, q6, q9 none s1 q5, q8, q9 q3, q4, q6 q7, q8, q9

NFA to DFA with Subset Construction q4 q5 b q6 q7 c q8 q1 q9 q3 q2 q0 a NFA to DFA with Subset Construction States ε-closure(move(s,*)) DFA NFA a b c s0 q0 q1, q2, q3 q4, q6, q9 none s1 q5, q8, q9 q3, q4, q6 q7, q8, q9 s2 s3

NFA to DFA with Subset Construction q4 q5 b q6 q7 c q8 q1 q9 q3 q2 q0 a States ε-closure(move(s,*)) DFA NFA a b c s0 q0 q1, q2, q3 q4, q6, q9 none s1 q5, q8, q9 q3, q4, q6 q7, q8, q9 s2 s3

NFA to DFA with Subset Construction q4 q5 b q6 q7 c q8 q1 q9 q3 q2 q0 a States ε-closure(move(s,*)) DFA NFA a b c s0 q0 q1, q2, q3 q4, q6, q9 none s1 q5, q8, q9 q3, q4, q6 q7, q8, q9 s2 s3

NFA to DFA with Subset Construction States ε-closure(move(s,*)) DFA NFA a b c s0 q0 s1 none q1, q2, q3 q4, q6, q9 s2 s3 q5, q8, q9 q3, q4, q6 q7, q8, q9

DFA Minimization b c s0 s1 s2 s3 a Already minimized!

Homework 1 Homework 1 is out, due by March 9