CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.

Slides:



Advertisements
Similar presentations
CSE 311 Foundations of Computing I
Advertisements

4b Lexical analysis Finite Automata
CSC 361NFA vs. DFA1. CSC 361NFA vs. DFA2 NFAs vs. DFAs NFAs can be constructed from DFAs using transitions: Called NFA- Suppose M 1 accepts L 1, M 2 accepts.
Complexity and Computability Theory I Lecture #4 Rina Zviel-Girshin Leah Epstein Winter
Finite Automata CPSC 388 Ellen Walker Hiram College.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
Compiler Construction
Lexical Analysis - Scanner Computer Science Rensselaer Polytechnic Compiler Design Lecture 2.
Lecture 3UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 3.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Regular.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
Fall 2006Costas Busch - RPI1 Non-Deterministic Finite Automata.
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
1.Defs. a)Finite Automaton: A Finite Automaton ( FA ) has finite set of ‘states’ ( Q={q 0, q 1, q 2, ….. ) and its ‘control’ moves from state to state.
Costas Busch - LSU1 Non-Deterministic Finite Automata.
CSE 311: Foundations of Computing Fall 2014 Lecture 23: State Minimization, NFAs.
Regular Expressions (RE) Empty set Φ A RE denotes the empty set Empty string λ A RE denotes the set {λ} Symbol a A RE denotes the set {a} Alternation M.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
1 Outline Informal sketch of lexical analysis –Identifies tokens in input string Issues in lexical analysis –Lookahead –Ambiguities Specifying lexers –Regular.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Lexical Analysis Constructing a Scanner from Regular Expressions.
Lexical Analyzer (Checker)
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
CS375 Compilers Lexical Analysis 4 th February, 2010.
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
Prof. Necula CS 164 Lecture 31 Lexical Analysis Lecture 3-4.
using Deterministic Finite Automata & Nondeterministic Finite Automata
CS 154 Formal Languages and Computability February 9 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
1 Lexical Analysis Uses formalism of Regular Languages Uses formalism of Regular Languages Regular Expressions Regular Expressions Deterministic Finite.
Deterministic Finite Automata Nondeterministic Finite Automata.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Compiler Construction Lecture Three: Lexical Analysis - Part Two CSC 2103: Compiler Construction Lecture Three: Lexical Analysis - Part Two Joyce Nakatumba-Nabende.
Converting Regular Expressions to NFAs Empty string   is a regular expression denoting  {  } a is a regular expression denoting {a} for any a in 
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
CS314 – Section 5 Recitation 2
Two issues in lexical analysis
Recognizer for a Language
Chapter 2 FINITE AUTOMATA.
COSC 3340: Introduction to Theory of Computation
Non-Deterministic Finite Automata
CSE 311: Foundations of Computing
4b Lexical analysis Finite Automata
Compiler Construction
4b Lexical analysis Finite Automata
CSE 311 Foundations of Computing I
Chapter 1 Regular Language
Lecture 5 Scanning.
Presentation transcript:

CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02

CS 412/413 Spring 2002 Introduction to Compilers2 Outline Regexp review DFAs, NFAs DFA simulation RE-NFA conversion NFA-DFA conversion

CS 412/413 Spring 2002 Introduction to Compilers3 Regular Expressions If R and S are regular expressions, so are: ε empty string a for any character a R S (concatenation: “R followed by S”) R | S(alternation: “R or S”) R *(Kleene star: “zero or more R’s”)

CS 412/413 Spring 2002 Introduction to Compilers4 Regular Expression Extensions If R is a regular expressions, so are: R ?= ε | R (zero or one R) R += RR* (one or more R’s) (R)= R (no effect: grouping) [abc]= a|b|c (any of the listed) [a-e]= a|b|…| e (character ranges) [^ab]= c|d|… (anything but the listed chars)

CS 412/413 Spring 2002 Introduction to Compilers5 Concepts Tokens = strings of characters representing the lexical units of the programs, such as identifiers, numbers, keywords, operators –May represent a unique character string (keywords, operators) –May represent multiple strings (identifiers, numers) Regular expressions = concise description of tokens –A regular expressions describes a set of strings Language denoted by a regular expression = the set of strings that it represents –L(R) is the language denoted by regular expression R

CS 412/413 Spring 2002 Introduction to Compilers6 We need a mechanism to determine if an input string w belongs to the language denoted by a regular expression R Such a mechanism is called an acceptor How To Use Regular Expressions Input string w in the program Regex R which describes a token ? Yes, if w = token No, if w  token

CS 412/413 Spring 2002 Introduction to Compilers7 Acceptors Acceptor = determines if an input string belongs to a language L Finite Automata = acceptor for languages described by regular expressions Input String Language Acceptor w L Yes, if w  L No, if w  L

CS 412/413 Spring 2002 Introduction to Compilers8 Finite Automata Informally, finite automata consist of: –A finite set of states –Transitions between states –An initial state (start state) –A set of final states (accepting state) Two kinds of finite automata: –Deterministic finite automata (DFA): the transition from each state is uniquely determined by the current input character –Non-deterministic finite automata (NFA): there may be multiple possible choices or some transitions do not depend on the input character

CS 412/413 Spring 2002 Introduction to Compilers9 DFA Example Finite automaton that accepts the strings in the language denoted by the regular expression ab*a –A graph –A transition table 01 2 a b a ab 01 Error ErrorError

CS 412/413 Spring 2002 Introduction to Compilers10 Simulating the DFA trans_table[NSTATES][NCHARS] accept_states[NSTATES] state = INITIAL while (state != ERROR) { c = input.read(); if (c == EOF) break; state = trans_table[state][c]; } return accept_states[state]; 01 2 a b a Determine if the DFA accepts an input string

CS 412/413 Spring 2002 Introduction to Compilers11 RE  Finite automaton? Can we build a finite automaton for every regular expression? Strategy: build the finite automaton inductively, based on the definition of regular expressions a a ε

CS 412/413 Spring 2002 Introduction to Compilers12 Alternation R | S Concatenation: R S RE  Finite automaton? R automaton S automaton ? ? ? R automaton S automaton

CS 412/413 Spring 2002 Introduction to Compilers13 NFA Definition A non-deterministic finite automaton (NFA) is an automaton where the state transitions are such that: –There may be ε-transitions (transitions which do not consume input characters) –There may be multiple transitions from the same state on the same input character   a b b a a Example: regexp?

CS 412/413 Spring 2002 Introduction to Compilers14 RE  NFA intuition -?[0-9]+ (-|  ) [0-9][0-9]*  

CS 412/413 Spring 2002 Introduction to Compilers15 NFA construction NFA only needs one stop state (why?) Canonical NFA: Use this canonical form to inductively construct NFAs for regular expressions

CS 412/413 Spring 2002 Introduction to Compilers16 Inductive NFA Construction R SR S R S R | SR | S R S ε εε ε R* R ε ε ε ε ε

CS 412/413 Spring 2002 Introduction to Compilers17 DFA vs NFA DFA: action of automaton on each input symbol is fully determined –obvious table-driven implementation NFA: –automaton may have choice on each step –automaton accepts a string if there is any way to make choices to arrive at accepting state / every path from start state to an accept state is a string accepted by automaton –not obvious how to implement!

CS 412/413 Spring 2002 Introduction to Compilers18 Simulating an NFA Problem: how to execute NFA? “strings accepted are those for which there is some corresponding path from start state to an accept state” Conclusion: search all paths in graph consistent with the string Idea: search paths in parallel –Keep track of subset of NFA states that search could be in after seeing string prefix –“Multiple fingers” pointing to graph

CS 412/413 Spring 2002 Introduction to Compilers19 Example Input string: -23 NFA states: {0,1} {1} {2, 3} {2, 3} 0 1  

CS 412/413 Spring 2002 Introduction to Compilers20 NFA-DFA conversion Can convert NFA directly to DFA by same approach Create one DFA for each distinct subset of NFA states that could arise States: {0,1}, {1}, {2, 3} 0 1   {0,1}{1} {2,3}

CS 412/413 Spring 2002 Introduction to Compilers21 Algorithm For a set S of states in the NFA, compute ε-closure(S) = the set of states reachable from states in S by ε-transitions T = S Repeat T = T U {s | s’  T, (s,s) is ε-transition} Until T remains unchanged ε-closure(S) = T For a set S of states in the NFA, compute DFAedge(S,c) = the set of states reachable from states in S by transitions on character c and ε-transitions DFAedge(S,c) = ε-closure( { s | s’  S, (s’,s) is c-transition} )

CS 412/413 Spring 2002 Introduction to Compilers22 Algorithm Top-level algorithm: DFA-initial-state = ε-closure(NFA-initial-state) For each DFA-state S For each character c S’ = DFAedge(S,c) Add an edge (S, S’) labeled with character c in the DFA For each DFA-state S If S contains an NFA-final state Mark S as DFA-final-state

CS 412/413 Spring 2002 Introduction to Compilers23 Putting the Pieces Together RE  NFA Conversion NFA  DFA Conversion DFA Simulation Yes, if w  L(R) No, if w  L(R) Input String Regular Expression R w