From Cooper & Torczon1 Automating Scanner Construction RE  NFA ( Thompson’s construction )  Build an NFA for each term Combine them with  -moves NFA.

Slides:



Advertisements
Similar presentations
Lexical Analysis IV : NFA to DFA DFA Minimization
Advertisements

4b Lexical analysis Finite Automata
Lecture 24 MAS 714 Hartmut Klauck
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
From Cooper & Torczon1 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language?
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
1 CIS 461 Compiler Design and Construction Fall 2012 slides derived from Tevfik Bultan et al. Lecture-Module 5 More Lexical Analysis.
Lexical Analysis: DFA Minimization & Wrap Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp.
Scanner wrap-up and Introduction to Parser. Automating Scanner Construction RE  NFA (Thompson’s construction) Build an NFA for each term Combine them.
Compiler Construction
©2004 Brooks/Cole FIGURES FOR CHAPTER 2 SCANNING Click the mouse to move to the next page. Use the ESC key to exit this chapter. This chapter in the book.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Lecture 3UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 3.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #4 Lexical.
2. Lexical Analysis Prof. O. Nierstrasz
1 Single Final State for NFAs and DFAs. 2 Observation Any Finite Automaton (NFA or DFA) can be converted to an equivalent NFA with a single final state.
Parsing V Introduction to LR(1) Parsers. from Cooper & Torczon2 LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary  Quoted string in.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #5 Introduction.
Lexical Analysis — Part II From Regular Expression to Scanner Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
CS 426 Compiler Construction
1 Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
1 Compiler Construction Finite-state automata. 2 Today’s Goals More on lexical analysis: cycle of construction RE → NFA NFA → DFA DFA → Minimal DFA DFA.
2. Lexical Analysis Prof. O. Nierstrasz Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary –Quoted string in.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Introduction to Parsing Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Lexical Analysis Constructing a Scanner from Regular Expressions.
2. Scanning College of Information and Communications Prof. Heejin Park.
Lexical Analyzer (Checker)
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lexical Analysis III : NFA to DFA DFA Minimization Lecture 5 CS 4318/5331 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper.
Lexical Analysis: DFA Minimization Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
Lexical Analysis: DFA Minimization & Wrap Up. Automating Scanner Construction PREVIOUSLY RE  NFA ( Thompson’s construction ) Build an NFA for each term.
Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=
Regular Expressions Fundamental Data Structures and Algorithms Peter Lee March 13, 2003.
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
1 Chapter Constructing Efficient Finite Automata.
Algorithms for hard problems Automata and tree automata Juris Viksna, 2015.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Complexity and Computability Theory I Lecture #5 Rina Zviel-Girshin Leah Epstein Winter
1 Section 11.3 Constructing Efficient Finite Automata First we’ll see how to transform an NFA into a DFA. Then we’ll see how to transform a DFA into a.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Lexical Analysis - An Introduction
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lecture 5: Lexical Analysis III: The final bits
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lecture 4: Lexical Analysis II: From REs to DFAs
DFA Equivalence & Minimization
Automating Scanner Construction
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lexical Analysis: DFA Minimization & Wrap Up
Lexical Analysis - An Introduction
Lecture 5 Scanning.
Lexical Analysis Uses formalism of Regular Languages
Presentation transcript:

from Cooper & Torczon1 Automating Scanner Construction RE  NFA ( Thompson’s construction )  Build an NFA for each term Combine them with  -moves NFA  DFA ( subset construction )  Build the simulation DFA  Minimal DFA (today) Hopcroft’s algorithm DFA  RE All pairs, all paths problem Union together paths from s 0 to a final state minimal DFA RENFADFA The Cycle of Constructions

from Cooper & Torczon2 DFA Minimization The Big Picture Discover sets of equivalent states Represent each such set with just one state Two states are equivalent if and only if: The set of paths leading to them are equivalent    , transitions on  lead to equivalent states (DFA) transitions to distinct sets  states must be in distinct sets A partition P of S Each s  S is in exactly one set p i  P The algorithm iteratively partitions the DFA ’s states

from Cooper & Torczon3 DFA Minimization Details of the algorithm Group states into maximal size sets, optimistically Iteratively subdivide those sets, as needed States that remain grouped together are equivalent Initial partition, P 0, has two sets {F} & {Q-F} (D =(Q, , ,q 0,F)) Splitting a set Assume q a, q b, & q c  s, and  (q a,a) = q x,  (q b,a) = q y, &  (q a,a) = q z If q x, q y, & q z are not in the same set, then s must be split One state in the final DFA cannot have two transitions on a

from Cooper & Torczon4 DFA Minimization The algorithm P  { F, {Q-F}} while ( P is still changing) T  { } for each set s  P for each    partition s by  into s 1, s 2, …, s k T  T  s 1, s 2, …, s k if T  P then P  T Why does this work? Partition P  2 Q Start off with 2 subsets of Q {F} and {Q-F} While loop takes P i  P i+1 by splitting 1 or more sets P i+1 is at least one step closer to the partition with |Q | sets Maximum of |Q | splits Note that Partitions are never combined Initial partition ensures that final states are intact This is a fixed-point algorithm!

from Cooper & Torczon5 DFA Minimization Enough theory, does this stuff work?  Recall our example: ( a | b) * abb s0s0 a s1s1 b s3s3 b s4s4 s2s2 a b b a a a b s 0, s 2 a s1s1 b s3s3 b s4s4 b a a a b final state

from Cooper & Torczon6 DFA Minimization What about a ( b | c ) * ? First, the subset construction: q0q0 q1q1 a  q4q4 q5q5 b q6q6 q7q7 c q3q3 q8q8 q2q2 q9q9      s3s3 s2s2 s0s0 s1s1 c b a b b c c Final states

from Cooper & Torczon7 DFA Minimization Then, apply the minimization algorithm To produce the minimal DFA s3s3 s2s2 s0s0 s1s1 c b a b b c c s0s0 s1s1 a b | c In lecture 6, I said that a human would design a simpler automaton than Thompson’s construction did. The algorithms produce that same DFA ! final states

from Cooper & Torczon8 Limits of Regular Languages Advantages of Regular Expressions Simple & powerful notation for specifying patterns Automatic construction of fast recognizers Many kinds of syntax can be specified with RE s Example — an expression grammar Term  [a-zA-Z] ([a-zA-z] | [0-9]) * Op  + | - |  | / Expr  ( Term Op ) * Term Of course, this would generate a DFA … If RE s are so useful … Why not use them for everything?

from Cooper & Torczon9 Limits of Regular Languages Not all languages are regular RL’s  CFL’s  CSL’s You cannot construct DFA’s to recognize these languages L = { p k q k } (parenthesis languages) L = { wcw r | w   * } Neither of these is a regular language (nor an RE ) But, this is a little subtle. You can construct DFA ’s for Alternating 0’s and 1’s (  | 1)( 0 | 1) ( 0 |  ) Sets of pairs of 0’s and 1’s ( 01 | 10 ) ( 01 | 10 ) * RE ’s can count bounded sets and bounded differences

from Cooper & Torczon10 What can be so hard? Poor language design can complicate scanning Reserved words are important if then then then = else; else else = then (PL/I) Significant blanks (Fortran & Algol68) do 10 i = 1,25 do 10 i = 1.25 String constants with special characters (C, others) newline, tab, quote, comment delimiters, … Finite closures  Limited identifier length  Adds states to count length

from Cooper & Torczon11 What can be so hard? (Fortran 66/77) How does a compiler do this? First pass finds & inserts blanks Can add extra words or tags to create a scanable language Second pass is normal scanner Example due to Dr. F.K. Zadeck