CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.

Slides:



Advertisements
Similar presentations
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Advertisements

4b Lexical analysis Finite Automata
Compiler Baojian Hua Lexical Analysis (II) Compiler Baojian Hua
Compiler Construction
Lexical Analysis - Scanner Computer Science Rensselaer Polytechnic Compiler Design Lecture 2.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #4 Lexical.
CS 310 – Fall 2006 Pacific University CS310 Decidability Section 4.1/4.2 November 10, 2006.
Prof. Hilfinger CS 164 Lecture 21 Lexical Analysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger.
Compiler Construction
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
1 Non-Deterministic Automata Regular Expressions.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
CS 426 Compiler Construction
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
1 Outline Informal sketch of lexical analysis –Identifies tokens in input string Issues in lexical analysis –Lookahead –Ambiguities Specifying lexers –Regular.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
Lexical Analysis Constructing a Scanner from Regular Expressions.
Lexical Analyzer (Checker)
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
CS 536 Fall Scanner Construction  Given a single string, automata and regular expressions retuned a Boolean answer: a given string is/is not in.
Compiler Construction 2 주 강의 Lexical Analysis. “get next token” is a command sent from the parser to the lexical analyzer. On receipt of the command,
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
CS375 Compilers Lexical Analysis 4 th February, 2010.
Joey Paquet, 2000, Lecture 2 Lexical Analysis.
Compiler Construction Sohail Aslam Lecture 9. 2 DFA Minimization  The generated DFA may have a large number of states.  Hopcroft’s algorithm: minimizes.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
CSE 425: Syntax I Syntax and Semantics Syntax gives the structure of statements in a language –Allowed ordering, nesting, repetition, omission of symbols.
C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,
Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
Lexical Analysis.
1st Phase Lexical Analysis
Exercise Solution for Exercise (a) {1,2} {3,4} a b {6} a {5,6,1} {6,2} {4} {3} {5,6} { } b a b a a b b a a b a,b b b a.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
Prof. Necula CS 164 Lecture 31 Lexical Analysis Lecture 3-4.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
1 Topic 2: Lexing and Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer.
CS 154 Formal Languages and Computability February 9 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
1 Compiler Construction Vana Doufexi office CS dept.
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Compiler Construction Lecture Three: Lexical Analysis - Part Two CSC 2103: Compiler Construction Lecture Three: Lexical Analysis - Part Two Joyce Nakatumba-Nabende.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Department of Software & Media Technology
Lecture 2 Lexical Analysis
RegExps & DFAs CS 536.
Recognizer for a Language
Lexical Analysis Lecture 3-4 Prof. Necula CS 164 Lecture 3.
Compiler Construction
Presentation transcript:

CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02

CS 412/413 Spring 2002 Introduction to Compilers2 Outline DFA state minimization Lexical analyzers Automating lexical analysis Jlex lexical analyzer generator

CS 412/413 Spring 2002 Introduction to Compilers3 Finite Automata Finite automata: –States, transitions between states –Initial state, set of final states DFA = deterministic –Each transition consumes an input character –Each transition is uniquely determined by the input character NFA = non-deterministic –There may be ε-transitions, which do not consume input characters –There may be multiple transitions from the same state on the same input character

CS 412/413 Spring 2002 Introduction to Compilers4 From Regexp to DFA Two steps: –Convert the regular expression to an NFA –Convert the resulting NFA to a DFA The generated DFAs may have a large number of states State Minimization = optimization which converts a DFA to another DFA which recognizes the same language and has a minimum number of states

CS 412/413 Spring 2002 Introduction to Compilers5 Example: –DFA1: –DFA2: –Both DFAs accept: b*ab*a State Minimization a 4 a b 3 a b b b a a a b b

CS 412/413 Spring 2002 Introduction to Compilers6 Idea: find groups of equivalent states –all transitions from states in one group G 1 go to states in the same group G 2 –construct the minimized DFA such that there is one state for each group of states from the initial DFA State Minimization a 4 a b 3 a b b b a

CS 412/413 Spring 2002 Introduction to Compilers7 DFA Minimization Algorithm Step 1: Construct a partition P of the set of states having two groups: F = the set of final (accepting) states S-F = set of non-final states Step 2: Repeat Let P = G 1 U … U G n the current partition Partition each goup G i into subgroups: Two states s and t are in the same subgroup if, for each symbol a there are transitions s  s’ and t  t’ and s’, t’ belong to the same group G j Combine all the computed subgroups into a new partition P’ Until P = P’ Step3: Construct a DFA with one state for each group of states in the final partition P

CS 412/413 Spring 2002 Introduction to Compilers8 Optimized Acceptor RE  NFA NFA  DFA DFA Simulation Yes, if w  L(R) No, if w  L(R) Input String Regular Expression R w Minimize DFA

CS 412/413 Spring 2002 Introduction to Compilers9 Lexical Analyzers vs Acceptors Lexical analyzers use the same mechanism, but they: –Have multiple RE descriptions for multiple tokens –Have a character stream at the input –Return a sequence of matching tokens at the output (or an error) –Always return the longest matching token –For multiple longest matching tokens use rule priorities

CS 412/413 Spring 2002 Introduction to Compilers10 Lexical Analyzers RE  NFA NFA  DFA Minimize DFA DFA Simulation Character Stream REs for Tokens R1 … RnR1 … Rn program Token stream (and errors)

CS 412/413 Spring 2002 Introduction to Compilers11 Handling Multiple REs whitespace identifier number keywords     NFAs Minimized DFA Combine the NFAs of all the regular expressions into a single finite automata

CS 412/413 Spring 2002 Introduction to Compilers12 Lexical Analyzers Token stream at the output –Associate tokens with final states –Output the corresponding token when reaching a final state Longest match –When in a final state, look if there is a further transition; otherwise return the token for the current final state Rule priority –Same longest matching token when there is a final state corresponding to multiple tokens –Associate that final state to the token with the highest priority

CS 412/413 Spring 2002 Introduction to Compilers13 Automating Lexical Analysis All of the lexical analysis process can be automated ! –RE  DFA  Minimized DFA –Minimized DFA  Lexical Analyzer (DFA Simulation Program) We only need to specify: –Regular expressions for the tokens –Rule priorities for multiple longest match cases

CS 412/413 Spring 2002 Introduction to Compilers14 Lexical Analyzer Generators Jlex Compiler Character Stream REs for Tokens Token stream (and errors) lex.llex.javalex.class javac Compiler program

CS 412/413 Spring 2002 Introduction to Compilers15 Jlex Specification File Jlex = Lexical analyzer generator –written in Java –generates a Java lexical analyzer Has three parts: –Preamble, which contains package/import declarations –Definitions, which contains regular expression abbreviations –Regular expressions and actions, which contains: the list of regular expressions for all the tokens Corresponding actions for each token (Java code to be executed when the token is returned)

CS 412/413 Spring 2002 Introduction to Compilers16 Example Specification File Package Parse; Import ErrorMsg.ErrorMsg; % digits = 0|[1-9][0-9]* letter = [A-Za-z] identifier = {letter}({letter}|[0-9_])* whitespace = [\ \t\n\r]+ % {whitespace} {/* discard */} {digits}{ return new IntegerConstant(Integer.parseInt(yytext()); } “if” { return new IfToken(); } “while” { return new WhileToken(); } {identifier}{ return new IdentifierToken(yytext()); }.{ ErrorMsg.error(“illegal character”); }

CS 412/413 Spring 2002 Introduction to Compilers17 Start States Mechanism which specifies in which state to start the execution of the DFA Define states in the second section –%state STATE Use states as prefixes of regular expressions in the third section: – regex {action} Set current state in the actions –yybegin(STATE) There is a pre-defined initial state: YYINITIAL

CS 412/413 Spring 2002 Introduction to Compilers18 Example COMMENT INITIAL. *) (* if %state COMMENT % “if”{ return new IfToken(); } “(*”{ yybegin(COMMENT); } “*)”{ yybegin(YYINITIAL); }.{}

CS 412/413 Spring 2002 Introduction to Compilers19 Start States and REs The use of states allow the lexer to recognize more than regular expressions (or DFAs) –Reason: the lexer can jump across different states in the semantic actions using yybegin(STATE) Example: nested comments –Increment a global variable on open parentheses and decrement it on close parentheses –When the variable gets to zero, jump to YYINITIAL –The global variable essentially models an infinite number of states!

CS 412/413 Spring 2002 Introduction to Compilers20 Conclusion Regular expressions: concise way of specifying tokens Can convert RE to NFA, then to DFA, then to minimized DFA Use the minimized DFA to recognize tokens in the input stream Automate the process using lexical analyzer generators –Write regular expression descriptions of tokens –Automatically get a lexical analyzer program which identifies tokens from an input stream of characters Read Chapter 2, Appel.