Lexical Analysis — Part II From Regular Expression to Scanner Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.

Slides:



Advertisements
Similar presentations
Lexical Analysis IV : NFA to DFA DFA Minimization
Advertisements

4b Lexical analysis Finite Automata
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
From Cooper & Torczon1 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language?
Lexical Analysis — Introduction Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
1 CIS 461 Compiler Design and Construction Fall 2012 slides derived from Tevfik Bultan et al. Lecture-Module 5 More Lexical Analysis.
Lexical Analysis: DFA Minimization & Wrap Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp.
Compiler Construction
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
From Cooper & Torczon1 Automating Scanner Construction RE  NFA ( Thompson’s construction )  Build an NFA for each term Combine them with  -moves NFA.
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #4 Lexical.
2. Lexical Analysis Prof. O. Nierstrasz
Parsing VI The LR(1) Table Construction. LR(k) items The LR(1) table construction algorithm uses LR(1) items to represent valid configurations of an LR(1)
Parsing V Introduction to LR(1) Parsers. from Cooper & Torczon2 LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited.
Compiler Construction
Scanner Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the.
LR(1) Parsers Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.
1 Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
1 Compiler Construction Finite-state automata. 2 Today’s Goals More on lexical analysis: cycle of construction RE → NFA NFA → DFA DFA → Minimal DFA DFA.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Lecture 05: Theory of Automata:08 Kleene’s Theorem and NFA.
Lexical Analysis Constructing a Scanner from Regular Expressions.
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lexical Analysis III : NFA to DFA DFA Minimization Lecture 5 CS 4318/5331 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper.
Lexical Analysis: DFA Minimization Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Lexical Analysis: DFA Minimization & Wrap Up. Automating Scanner Construction PREVIOUSLY RE  NFA ( Thompson’s construction ) Build an NFA for each term.
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
using Deterministic Finite Automata & Nondeterministic Finite Automata
CSE 311 Foundations of Computing I Lecture 24 FSM Limits, Pattern Matching Autumn 2011 CSE 3111.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
1 Compiler Construction Vana Doufexi office CS dept.
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Lexical analysis Finite Automata
Two issues in lexical analysis
Recognizer for a Language
Lexical Analysis - An Introduction
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lecture 5: Lexical Analysis III: The final bits
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lecture 4: Lexical Analysis II: From REs to DFAs
4b Lexical analysis Finite Automata
Automating Scanner Construction
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lexical Analysis: DFA Minimization & Wrap Up
4b Lexical analysis Finite Automata
Lexical Analysis - An Introduction
Compiler Construction
Lecture 5 Scanning.
Presentation transcript:

Lexical Analysis — Part II From Regular Expression to Scanner Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. COMP 412 FALL 2010

Comp 412, Fall Quick Review Last class: —The scanner is the first stage in the front end —Specifications can be expressed using regular expressions —Build tables and code from a DFA Scanner Generator specifications Scanner source codeparts of speech & words Specifications written as “regular expressions” Represent words as indices into a global table tables or code design time compile time

Comp 412, Fall Quick Review of Regular Expressions All strings of 1s and 0s ending in a 1 All strings over lowercase letters where the vowels (a,e,i,o, & u) occur exactly once, in ascending order All strings of 1s and 0s that do not contain three 0s in a row: ( 1 * (  |01 | 001 ) 1 * ) * (  | 0 | 00 ) ( 0 | 1 ) * 1 Let Cons be (b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|y|z) Cons * a Cons * e Cons * i Cons * o Cons * u Cons *

Comp 412, Fall Consider the problem of recognizing ILOC register names Register  r (0|1|2| … | 9) (0|1|2| … | 9) * Allows registers of arbitrary number Requires at least one digit RE corresponds to a recognizer (or DFA ) Transitions on other inputs go to an error state, s e Example ( from Lab 1 ) S0S0 S2S2 S1S1 r (0|1|2| … 9) accepting state (0|1|2| … 9) Recognizer for Register

Comp 412, Fall DFA operation Start in state S 0 & make transitions on each input character DFA accepts a word x iff x leaves it in a final state (S 2 ) So, r17 takes it through s 0, s 1, s 2 and accepts r takes it through s 0, s 1 and fails a takes it straight to s e Example ( continued ) S0S0 S2S2 S1S1 r (0|1|2| … 9) accepting state (0|1|2| … 9) Recognizer for Register

Comp 412, Fall Example ( continued ) To be useful, the recognizer must be converted into code  r 0,1,2,3,4, 5,6,7,8,9 All others s0s0 s1s1 sese sese s1s1 sese s2s2 sese s2s2 sese s2s2 sese sese sese sese sese Char  next character State  s 0 while (Char  EOF ) State   (State,Char) Char  next character if (State is a final state ) then report success else report failure Skeleton recognizer Table encoding the RE O(1) cost per character (or per transition) S0S0 S2S2 S1S1 r (0|1|2| … 9)

Comp 412, Fall Example ( continued ) We can add “actions” to each transition  r 0,1,2,3,4, 5,6,7,8,9 All others s0s0 s 1 start s e error s e error s1s1 s e error s 2 add s e error s2s2 s e error s 2 add s e error sese s e error s e error s e error Char  next character State  s 0 while (Char  EOF ) Next   (State,Char) Act   (State,Char) perform action Act State  Next Char  next character if (State is a final state ) then report success else report failure Skeleton recognizerTable encoding RE Typical action is to capture the lexeme

Comp 412, Fall r Digit Digit * allows arbitrary numbers Accepts r00000 Accepts r99999 What if we want to limit it to r0 through r31 ? Write a tighter regular expression —Register  r ( (0|1|2) (Digit |  ) | (4|5|6|7|8|9) | (3|30|31) ) —Register  r0|r1|r2| … |r31|r00|r01|r02| … |r09 Produces a more complex DFA DFA has more states DFA has same cost per transition (or per character) DFA has same basic implementation What if we need a tighter specification?

Comp 412, Fall Tighter register specification (continued) The DFA for Register  r ( (0|1|2) (Digit |  ) | (4|5|6|7|8|9) | (3|30|31) ) Accepts a more constrained set of register names Same set of actions, more states S0S0 S5S5 S1S1 r S4S4 S3S3 S6S6 S2S2 0,1,20,1,2 3 0,10,1 4,5,6,7,8,94,5,6,7,8,9 (0|1|2| … 9)

Comp 412, Fall Tighter register specification (continued)  r0, All others s0s0 s1s1 sese sese sese sese sese s1s1 sese s2s2 s2s2 s5s5 s4s4 sese s2s2 sese s3s3 s3s3 s3s3 s3s3 sese s3s3 sese sese sese sese sese sese s4s4 sese sese sese sese sese sese s5s5 sese s6s6 sese sese sese sese s6s6 sese sese sese sese sese sese sese sese sese sese sese sese sese Table encoding RE for the tighter register specification This table runs in the same skeleton recognizer This table uses the same O(1) time per character The extra precision costs us table space, not time S0S0 S5S5 S1S1 r S4S4 S3S3 S6S6 S2S2 0,1,20,1,2 3 0,10,1 4,5,6,7,8,94,5,6,7,8,9 (0|1|2| … 9)

Comp 412, Fall Tighter register specification (continued) State Action r0,123 4,5,6 7,8,9 other 0 1 start eeeee 1e 2 add 2 add 5 add 4 add e 2e 3 add 3 add 3 add 3 add e exit 3,4eeeee e exit 5e 6 add eee e exit 6eeeee x exit eeeeeee S0S0 S5S5 S1S1 r S4S4 S3S3 S6S6 S2S2 0,1,20,1,2 3 0,10,1 4,5,6,7,8,94,5,6,7,8,9 (0|1|2| … 9) We care about path lengths (time) and finite size of set of states (representability), but we don’t worry (much) about number of states.

Comp 412, Fall Where are we going? We will show how to construct a finite state automaton to recognize any RE Overview: —Direct construction of a nondeterministic finite automaton (NFA) to recognize a given RE  Easy to build in an algorithmic way  Requires  -transitions to combine regular subexpressions —Construct a deterministic finite automaton (DFA) to simulate the NFA  Use a set-of-states construction —Minimize the number of states in the DFA  Hopcroft state minimization algorithm —Generate the scanner code  Additional specifications needed for the actions Introduce NFA s Optional, but worthwhile

Comp 412, Fall Non-deterministic Finite Automata What about an RE such as ( a | b ) * abb ? Each RE corresponds to a deterministic finite automaton ( DFA ) We know a DFA exists for each RE The DFA may be hard to build directly Automatic techniques will build it for us … S0S0 S1S1 S3S3 S2S2 b b b b a a a a Nothing here that would change the O(1) cost per transition

Comp 412, Fall Non-deterministic Finite Automata Here is a simpler RE for ( a | b ) * abb This recognizer is more intuitive Structure seems to follow the RE ’s structure This recognizer is not a DFA S 0 has a transition on  S 1 has two transitions on a This is a non-deterministic finite automaton ( NFA ) a | b S0S0 S1S1 S4S4 S2S2 S3S3  abb This NFA needs one more transition, at O(1) cost per transition

Comp 412, Fall Non-deterministic Finite Automata An NFA accepts a string x iff  a path though the transition graph from s 0 to a final state such that the edge labels spell x, ignoring  ’s Transitions on  consume no input To “run” the NFA, start in s 0 and guess the right transition at each step —Always guess correctly —If some sequence of correct guesses accepts x then accept Why study NFA s? They are the key to automating the RE  DFA construction We can paste together NFA s with  -transitions NFA becomes an NFA 

Comp 412, Fall Relationship between NFA s and DFA s DFA is a special case of an NFA DFA has no  transitions DFA ’s transition function is single-valued Same rules will work DFA can be simulated with an NFA —Obviously NFA can be simulated with a DFA ( less obvious ) Simulate sets of possible states Possible exponential blowup in the state space Still, one state per character in the input stream Rabin & Scott, 1959

Comp 412, Fall Automating Scanner Construction To convert a specification into code: 1Write down the RE for the input language 2Build a big NFA 3Build the DFA that simulates the NFA 4Systematically shrink the DFA 5Turn it into code Scanner generators Lex and Flex work along these lines Algorithms are well-known and well-understood Key issue is interface to parser ( define all parts of speech ) You could build one in a weekend!

Comp 412, Fall Where are we? Why are we doing this? RE  NFA ( Thompson’s construction ) Build an NFA for each term Combine them with  -moves NFA  DFA ( Subset construction ) Build the simulation DFA  Minimal DFA  Hopcroft’s algorithm DFA  RE All pairs, all paths problem Union together paths from s 0 to a final state minimal DFA RENFADFA The Cycle of Constructions

Comp 412, Fall RE  NFA using Thompson’s Construction Key idea NFA pattern for each symbol & each operator Join them with  moves in precedence order S0S0 S1S1 a NFA for a S0S0 S1S1 a S3S3 S4S4 b NFA for ab  NFA for a | b S0S0 S1S1 S2S2 a S3S3 S4S4 b S5S5    S0S0 S1S1  S3S3 S4S4  NFA for a * a   Ken Thompson, CACM, 1968

Comp 412, Fall Example of Thompson’s Construction Let’s try a ( b | c ) * 1. a, b, & c 2. b | c 3. ( b | c ) * S0S0 S1S1 a S0S0 S1S1 b S0S0 S1S1 c S2S2 S3S3 b S4S4 S5S5 c S1S1 S6S6 S0S0 S7S7      S1S1 S2S2 b S3S3 S4S4 c S0S0 S5S5    

Comp 412, Fall Example of Thompson’s Construction (con’t) 4. a ( b | c ) * Of course, a human would design something simpler... S0S0 S1S1 a b | c But, we can automate production of the more complex NFA version... S0S0 S1S1 a  S4S4 S5S5 b S6S6 S7S7 c S3S3 S8S8 S2S2 S9S9     

Comp 412, Fall Where are we? Why are we doing this? RE  NFA ( Thompson’s construction ) Build an NFA for each term Combine them with  -moves NFA  DFA ( subset construction )  Build the simulation DFA  Minimal DFA  Hopcroft’s algorithm DFA  RE All pairs, all paths problem Union together paths from s 0 to a final state minimal DFA RENFADFA The Cycle of Constructions

Comp 412, Fall NFA  DFA with Subset Construction Need to build a simulation of the NFA Two key functions Move(s i, a) is the set of states reachable from s i by a  -closure(s i ) is the set of states reachable from s i by  The algorithm: Start state derived from s 0 of the NFA Take its  -closure S 0 =  -closure({s 0 }) Take the image of S 0, Move(S 0,  ) for each   , and take its  -closure Iterate until no more states are added Sounds more complex than it is… Rabin & Scott, 1959

Comp 412, Fall NFA  DFA with Subset Construction The algorithm: s 0  -closure ( {n 0 } ) S  { s 0 } W  { s 0 } while ( W ≠ Ø ) select and remove s from W for each    t   - closure ( Move ( s,  )) T[s,  ]  t if ( t  S ) then add t to S add t to W Let’s think about why this works The algorithm halts: 1. S contains no duplicates (test before adding) 2. 2 {NFA states} is finite 3. while loop adds to S, but does not remove from S (monotone)  the loop halts S contains all the reachable NFA states It tries each character in each s i. It builds every possible NFA configuration.  S and T form the DFA This test is a little tricky s 0 is a set of states S & W are sets of sets of states

Comp 412, Fall NFA  DFA with Subset Construction The algorithm: s 0  -closure ( {n 0 } ) S  { s 0 } W  { s 0 } while ( W ≠ Ø ) select and remove s from W for each    t   - closure ( Move ( s,  )) T[s,  ]  t if ( t  S ) then add t to S add t to W Let’s think about why this works The algorithm halts: 1. S contains no duplicates (test before adding) 2. 2 {NFA states} is finite 3. while loop adds to S, but does not remove from S (monotone)  the loop halts S contains all the reachable NFA states It tries each character in each s i. It builds every possible NFA configuration.  S and T form the DFA Any DFA state containing a final state of the NFA final state becomes a final state of the DFA.

Comp 412, Fall NFA  DFA with Subset Construction Example of a fixed-point computation Monotone construction of some finite set Halts when it stops adding to the set Proofs of halting & correctness are similar These computations arise in many contexts Other fixed-point computations Canonical construction of sets of LR(1) items —Quite similar to the subset construction Classic data-flow analysis (& Gaussian Elimination) —Solving sets of simultaneous set equations We will see many more fixed-point computations

Comp 412, Fall NFA  DFA with Subset Construction a ( b | c )* : q0q0 q1q1 a  q4q4 q5q5 b q6q6 q7q7 c q3q3 q8q8 q2q2 q9q9      States  -closure(Move(s,*)) DFANFA a bc s0s0 q0q0 q 1, q 2, q 3, q 4, q 6, q 9 none s1s1 q 1, q 2, q 3, q 4, q 6, q 9 none q 5, q 8, q 9, q 3, q 4, q 6 q 7, q 8, q 9, q 3, q 4, q 6 s2s2 q 5, q 8, q 9, q 3, q 4, q 6 nones2s2 s3s3 s3s3 q 7, q 8, q 9, q 3, q 4, q 6 none s2s2 s3s3

Comp 412, Fall NFA  DFA with Subset Construction a ( b | c )* : q0q0 q1q1 a  q4q4 q5q5 b q6q6 q7q7 c q3q3 q8q8 q2q2 q9q9      States  -closure(Move(s,*)) DFANFA a bc s0s0 q0q0 q 1, q 2, q 3, q 4, q 6, q 9 s1s1 q 1, q 2, q 3, q 4, q 6, q 9 none q 5, q 8, q 9, q 3, q 4, q 6 q 7, q 8, q 9, q 3, q 4, q 6 s2s2 q 5, q 8, q 9, q 3, q 4, q 6 nones2s2 s3s3 s3s3 q 7, q 8, q 9, q 3, q 4, q 6 none s2s2 s3s3

Comp 412, Fall NFA  DFA with Subset Construction a ( b | c )* : q0q0 q1q1 a  q4q4 q5q5 b q6q6 q7q7 c q3q3 q8q8 q2q2 q9q9      States  -closure(Move(s,*)) DFANFA a bc s0s0 q0q0 q 1, q 2, q 3, q 4, q 6, q 9 none s1s1 q 1, q 2, q 3, q 4, q 6, q 9 none q 5, q 8, q 9, q 3, q 4, q 6 q 7, q 8, q 9, q 3, q 4, q 6 s2s2 q 5, q 8, q 9, q 3, q 4, q 6 nones2s2 s3s3 s3s3 q 7, q 8, q 9, q 3, q 4, q 6 none s2s2 s3s3

Comp 412, Fall NFA  DFA with Subset Construction a ( b | c )* : q0q0 q1q1 a  q4q4 q5q5 b q6q6 q7q7 c q3q3 q8q8 q2q2 q9q9      States  -closure(Move(s,*)) DFANFA a bc s0s0 q0q0 q 1, q 2, q 3, q 4, q 6, q 9 none s1s1 q 1, q 2, q 3, q 4, q 6, q 9 none q 5, q 8, q 9, q 3, q 4, q 6 q 7, q 8, q 9, q 3, q 4, q 6 s2s2 q 5, q 8, q 9, q 3, q 4, q 6 nones2s2 s3s3 s3s3 q 7, q 8, q 9, q 3, q 4, q 6 none s2s2 s3s3

Comp 412, Fall NFA  DFA with Subset Construction a ( b | c )* : q0q0 q1q1 a  q4q4 q5q5 b q6q6 q7q7 c q3q3 q8q8 q2q2 q9q9      States  -closure(Move(s,*)) DFANFA a bc s0s0 q0q0 q 1, q 2, q 3, q 4, q 6, q 9 none s1s1 q 1, q 2, q 3, q 4, q 6, q 9 none q 5, q 8, q 9, q 3, q 4, q 6 q 7, q 8, q 9, q 3, q 4, q 6 s2s2 q 5, q 8, q 9, q 3, q 4, q 6 nones2s2 s3s3 s3s3 q 7, q 8, q 9, q 3, q 4, q 6 none s2s2 s3s3

Comp 412, Fall NFA  DFA with Subset Construction a ( b | c )* : q0q0 q1q1 a  q4q4 q5q5 b q6q6 q7q7 c q3q3 q8q8 q2q2 q9q9      States  -closure(Move(s,*)) DFANFA a bc s0s0 q0q0 q 1, q 2, q 3, q 4, q 6, q 9 none s1s1 q 1, q 2, q 3, q 4, q 6, q 9 none q 5, q 8, q 9, q 3, q 4, q 6 s2s2 q 5, q 8, q 9, q 3, q 4, q 6 nones2s2 s3s3 s3s3 q 7, q 8, q 9, q 3, q 4, q 6 none s2s2 s3s3

Comp 412, Fall NFA  DFA with Subset Construction a ( b | c )* : q0q0 q1q1 a  q4q4 q5q5 b q6q6 q7q7 c q3q3 q8q8 q2q2 q9q9      States  -closure(Move(s,*)) DFANFA a bc s0s0 q0q0 q 1, q 2, q 3, q 4, q 6, q 9 none s1s1 q 1, q 2, q 3, q 4, q 6, q 9 none q 5, q 8, q 9, q 3, q 4, q 6 q 7, q 8, q 9, q 3, q 4, q 6 s2s2 q 5, q 8, q 9, q 3, q 4, q 6 nones2s2 s3s3 s3s3 q 7, q 8, q 9, q 3, q 4, q 6 none s2s2 s3s3

Comp 412, Fall NFA  DFA with Subset Construction a ( b | c )* : q0q0 q1q1 a  q4q4 q5q5 b q6q6 q7q7 c q3q3 q8q8 q2q2 q9q9      States  -closure(Move(s,*)) DFANFA a bc s0s0 q0q0 q 1, q 2, q 3, q 4, q 6, q 9 none s1s1 q 1, q 2, q 3, q 4, q 6, q 9 none q 5, q 8, q 9, q 3, q 4, q 6 q 7, q 8, q 9, q 3, q 4, q 6 s2s2 q 5, q 8, q 9, q 3, q 4, q 6 nones2s2 s3s3 s3s3 q 7, q 8, q 9, q 3, q 4, q 6 none s2s2 s3s3

Comp 412, Fall NFA  DFA with Subset Construction a ( b | c )* : q0q0 q1q1 a  q4q4 q5q5 b q6q6 q7q7 c q3q3 q8q8 q2q2 q9q9      States  -closure(Move(s,*)) DFANFA a bc s0s0 q0q0 q 1, q 2, q 3, q 4, q 6, q 9 none s1s1 q 1, q 2, q 3, q 4, q 6, q 9 none q 5, q 8, q 9, q 3, q 4, q 6 q 7, q 8, q 9, q 3, q 4, q 6 s2s2 q 5, q 8, q 9, q 3, q 4, q 6 nones2s2 s3s3 s3s3 q 7, q 8, q 9, q 3, q 4, q 6 none s2s2 s3s3

Comp 412, Fall NFA  DFA with Subset Construction a ( b | c )* : q0q0 q1q1 a  q4q4 q5q5 b q6q6 q7q7 c q3q3 q8q8 q2q2 q9q9      States  -closure(Move(s,*)) DFANFA a bc s0s0 q0q0 q 1, q 2, q 3, q 4, q 6, q 9 none s1s1 q 1, q 2, q 3, q 4, q 6, q 9 none q 5, q 8, q 9, q 3, q 4, q 6 q 7, q 8, q 9, q 3, q 4, q 6 s2s2 q 5, q 8, q 9, q 3, q 4, q 6 none s3s3 q 7, q 8, q 9, q 3, q 4, q 6 none

Comp 412, Fall NFA  DFA with Subset Construction a ( b | c )* : q0q0 q1q1 a  q4q4 q5q5 b q6q6 q7q7 c q3q3 q8q8 q2q2 q9q9      States  -closure(Move(s,*)) DFANFA a bc s0s0 q0q0 q 1, q 2, q 3, q 4, q 6, q 9 none s1s1 q 1, q 2, q 3, q 4, q 6, q 9 none q 5, q 8, q 9, q 3, q 4, q 6 q 7, q 8, q 9, q 3, q 4, q 6 s2s2 q 5, q 8, q 9, q 3, q 4, q 6 nones2s2 s3s3 s3s3 q 7, q 8, q 9, q 3, q 4, q 6 none s2s2 s3s3 q 7 is the core state of s 3

Comp 412, Fall NFA  DFA with Subset Construction a ( b | c )* : q0q0 q1q1 a  q4q4 q5q5 b q6q6 q7q7 c q3q3 q8q8 q2q2 q9q9      States  -closure(Move(s,*)) DFANFA a bc s0s0 q0q0 q 1, q 2, q 3, q 4, q 6, q 9 none s1s1 q 1, q 2, q 3, q 4, q 6, q 9 none q 5, q 8, q 9, q 3, q 4, q 6 q 7, q 8, q 9, q 3, q 4, q 6 s2s2 q 5, q 8, q 9, q 3, q 4, q 6 nones2s2 s3s3 s3s3 q 7, q 8, q 9, q 3, q 4, q 6 none s2s2 s3s3 q 5 is the core state of s 2

Comp 412, Fall NFA  DFA with Subset Construction a ( b | c )* : q0q0 q1q1 a  q4q4 q5q5 b q6q6 q7q7 c q3q3 q8q8 q2q2 q9q9      States  -closure(Move(s,*)) DFANFA a bc s0s0 q0q0 q 1, q 2, q 3, q 4, q 6, q 9 none s1s1 q 1, q 2, q 3, q 4, q 6, q 9 none q 5, q 8, q 9, q 3, q 4, q 6 q 7, q 8, q 9, q 3, q 4, q 6 s2s2 q 5, q 8, q 9, q 3, q 4, q 6 nones2s2 s3s3 s3s3 q 7, q 8, q 9, q 3, q 4, q 6 none s2s2 s3s3 Final states because of q 9

Comp 412, Fall NFA  DFA with Subset Construction a ( b | c )* : q0q0 q1q1 a  q4q4 q5q5 b q6q6 q7q7 c q3q3 q8q8 q2q2 q9q9      States  -closure(Move(s,*)) DFANFA a bc s0s0 q0q0 s1s1 none s1s1 q 1, q 2, q 3, q 4, q 6, q 9 none s2s2 s3s3 s2s2 q 5, q 8, q 9, q 3, q 4, q 6 nones2s2 s3s3 s3s3 q 7, q 8, q 9, q 3, q 4, q 6 none s2s2 s3s3 Transition table for the DFA

Comp 412, Fall NFA  DFA with Subset Construction The DFA for a ( b | c ) * Much smaller than the NFA ( no  -transitions ) All transitions are deterministic Use same code skeleton as before s3s3 s2s2 s0s0 s1s1 c b a b c c b S0S0 S1S1 a b | c But, remember our goal: a bc s0s0 s1s1 none s1s1 s2s2 s3s3 s2s2 s2s2 s3s3 s3s3 s2s2 s3s3

Comp 412, Fall Where are we? Why are we doing this? RE  NFA ( Thompson’s construction ) Build an NFA for each term Combine them with  -moves NFA  DFA ( subset construction ) Build the simulation DFA  Minimal DFA  Hopcroft’s algorithm DFA  RE All pairs, all paths problem Union together paths from s 0 to a final state Not enough time to teach Hopcroft’s algorithm today minimal DFA RENFADFA The Cycle of Constructions

Comp 412, Fall Alternative Approach to DFA Minimization The Intuition The subset construction merges prefixes in the NFA s0s0 s 10 s9s9 s8s8 s5s5 s7s7 s6s6 s3s3 s2s2 s1s1 s4s4    a b a b c d c abc | bc | ad Thompson’s construction would leave  -transitions between each single- character automaton s0s0 s6s6 s4s4 s5s5 s2s2 s1s1 s3s3 a b b c d c Subset construction eliminates  - transitions and merges the paths for a. It leaves duplicate tails, such as bc.

Comp 412, Fall Alternative Approach to DFA Minimization Idea: use the subset construction twice For an NFA N —Let reverse(N) be the NFA constructed by making initial states final (& vice-versa) and reversing the edges —Let subset(N) be the DFA that results from applying the subset construction to N —Let reachable(N) be N after removing all states that are not reachable from the initial state Then, reachable(subset(reverse[reachable(subset(reverse(N))])) is the minimal DFA that implements N [ Brzozowski, 1962 ] This result is not intuitive, but it is true. Neither algorithm dominates the other.

Comp 412, Fall Alternative Approach to DFA Minimization Step 1 The subset construction on reverse(NFA) merges suffixes in original NFA s 11    Reversed NFA s0s0 s 10 s9s9 s8s8 s5s5 s7s7 s6s6 s3s3 s2s2 s1s1 s4s4    a b a b c d c s 11 s9s9 s8s8 s3s3 s2s2 s1s1 a a b d c subset(reverse(NFA))

Comp 412, Fall Alternative Approach to DFA Minimization Step 2 Reverse it again & use subset to merge prefixes … Reverse it, again s 11 s9s9 s8s8 s3s3 s2s2 s1s1 a a b d c s0s0    And subset it, again s 11 s3s3 s2s2 a b d c s0s0 b Minimal DFA minimal DFA RENFADFA The Cycle of Constructions Brzozowski