Lecture 5: Lexical Analysis III: The final bits

Slides:



Advertisements
Similar presentations
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Advertisements

Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.
1 Week 2 Questions / Concerns Schedule this week: Homework1 & Lab1a due at midnight on Friday. Sherry will be in Klamath Falls on Friday Lexical Analyzer.
From Cooper & Torczon1 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language?
Lexical Analysis — Introduction Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
Lexical Analysis: DFA Minimization & Wrap Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
From Cooper & Torczon1 Automating Scanner Construction RE  NFA ( Thompson’s construction )  Build an NFA for each term Combine them with  -moves NFA.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #4 Lexical.
Compiler Construction
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
1 Outline Informal sketch of lexical analysis –Identifies tokens in input string Issues in lexical analysis –Lookahead –Ambiguities Specifying lexers –Regular.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis Mooly Sagiv Schrierber Wed 10:00-12:00 html:// Textbook:Modern.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Lexical Analysis: DFA Minimization & Wrap Up. Automating Scanner Construction PREVIOUSLY RE  NFA ( Thompson’s construction ) Build an NFA for each term.
C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
1 Compiler Construction Vana Doufexi office CS dept.
Department of Software & Media Technology
COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Lexical and Syntax Analysis
Lecture 2 Lexical Analysis
Chapter 2 :: Programming Language Syntax
Lexical Analysis (Sections )
Finite-State Machines (FSMs)
Using SLK and Flex++ Followed by a Demo
CSc 453 Lexical Analysis (Scanning)
RegExps & DFAs CS 536.
Finite-State Machines (FSMs)
Compiler Lecture 1 CS510.
Two issues in lexical analysis
Recognizer for a Language
Review: NFA Definition NFA is non-deterministic in what sense?
Lexical Analysis Why separate lexical and syntax analyses?
Lecture 2: General Structure of a Compiler
Lecture 3: Introduction to Lexical Analysis
Lexical Analysis - An Introduction
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lexical Analysis Lecture 3-4 Prof. Necula CS 164 Lecture 3.
Lecture 7: Introduction to Parsing (Syntax Analysis)
Lecture 4: Lexical Analysis II: From REs to DFAs
CS 3304 Comparative Languages
CS 3304 Comparative Languages
Automating Scanner Construction
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lexical Analysis: DFA Minimization & Wrap Up
Other Issues - § 3.9 – Not Discussed
Chapter 2 :: Programming Language Syntax
4b Lexical analysis Finite Automata
Lexical Analysis - An Introduction
Chapter 2 :: Programming Language Syntax
Compiler Construction
Lecture 5 Scanning.
CSc 453 Lexical Analysis (Scanning)
Compiler Design 3. Lexical Analyzer, Flex
Presentation transcript:

Lecture 5: Lexical Analysis III: The final bits Front-End Back-End Source code IR Object code Lexical Analysis (from the last 2 lectures) Lexical Analysis: Regular Expressions (REs) are formulae to describe a (regular) language. Every RE can be converted to a Deterministic Finite Automaton (DFA). DFAs can automate the construction of lexical analysers. Algorithms to construct a DFA from a RE (through a NFA) were given. Today’s lecture: How to minimise the (resulting) DFA Practical Considerations; lex; wrap-up 30-Nov-18 COMP36512 Lecture 5

DFA Minimisation: the problem C b a Problem: can we minimise the number of states? Answer: yes, if we can find groups of states where, for each input symbol, every state of such a group will have transitions to the same group. 30-Nov-18 COMP36512 Lecture 5

DFA minimisation: the algorithm (Hopcroft’s algorithm: simple version) Divide the states of the DFA into two groups: those containing final states and those containing non-final states. while there are group changes for each group for each input symbol if for any two states of the group and a given input symbol, their transitions do not lead to the same group, these states must belong to different groups. For the curious, there is an alternative approach: create a graph in which there is an edge between each pair of states which cannot coexist in a group because of the conflict above. Then use a graph colouring algorithm to find the minimum number of colours needed so that any two nodes connected by an edge do not have the same colour (we’ll examine graph colouring algorithms later on, in register allocation) 30-Nov-18 COMP36512 Lecture 5

How does it work? Recall (a | b)* abb In each iteration, we consider any non-single-member groups and we consider the partitioning criterion for all pairs of states in the group. b A B D E C b a A, C b a D E b b B a a a 30-Nov-18 COMP36512 Lecture 5

From NFA to minimized DFA: recall a (b | c)*  S4 S5 b a      S8 S9 S0 S1 S2 S3 c   S6 S7  A D C B a c b 30-Nov-18 COMP36512 Lecture 5

DFA minimisation: recall a (b | c)* Apply the minimisation algorithm to produce the minimal DFA: A D C B a c b b | c A a B,C,D Remember, last time, we said that a human could construct a simpler automaton than Thompson’s construction? Well, algorithms can produce the same DFA! 30-Nov-18 COMP36512 Lecture 5

Building fast scanners... From a RE to a DFA; why have we been doing all this? If we know the transition table and the final state(s) we can build directly a recogniser that detects acceptance (recall from Lecture 3): char=input_char(); state=0; // starting state while (char != EOF) { state  table(state,char); if (state == ‘-’) return failure; word=word+char; } if (state == FINAL) return acceptance; else return failure; But, table-driven recognisers can waste a lot of effort. Can we do better? Encode states and actions in the code. How to deal with reserved keywords? Some compilers recognise keywords as identifiers and check them in a table. Simpler to include them as REs in the lexical analyser’s specification. 30-Nov-18 COMP36512 Lecture 5

Direct coding from the transition table Recall Register  r digit digit* (Lecture 3, slides 10-11) word=‘’; char=‘’; goto s0; s0: word=word+char; char=input_char(); if (char == ‘r’) then goto s1 else goto error; s1: word=word+char; if (‘0’  char  ‘9’) then goto s2 else goto error; s2: word=word+char; if (‘0’  char  ‘9’) then goto s2 else if (char== EOF) return acceptance else goto error error: print “error”; return failure; fewer operations avoids memory operations (esp. important for large tables) added complexity may make the code ugly and difficult to understand 30-Nov-18 COMP36512 Lecture 5

Practical considerations Poor language design may complicate lexical analysis: if then then = else; else else = then (PL/I) DO5I=1,25 vs DO5I=1.25 (Fortran: urban legend has it that an error like this caused a crash of an early NASA mission) the development of a sound theoretical basis has influenced language design positively. Template syntax in C++: aaaa<mytype> aaaa<mytype<int>> (>> is an operator for writing to the output stream) The lexical analyser treats the >> operator as two consecutive > symbols. The confusion will be resolved by the parser (by matching the <, >) 30-Nov-18 COMP36512 Lecture 5

Lex/Flex: Generating Lexical Analysers Flex is a tool for generating scanners: programs which recognised lexical patterns in text (from the online manual pages: % man flex) Lex input consists of 3 sections: regular expressions; pairs of regular expressions and C code; auxiliary C code. When the lex input is compiled, it generates as output a C source file lex.yy.c that contains a routine yylex(). After compiling the C file, the executable will start isolating tokens from the input according to the regular expressions, and, for each token, will execute the code associated with it. The array char yytext[] contains the representation of a token. 30-Nov-18 COMP36512 Lecture 5

flex Example Output: Input file (“myfile”) 1 123 43 + 1 435 1 34 -1 = 2 aaaa 1 329 42 * 1 45 47 / 2 a 45 - 40 ( 1 23 41 ) 1 3 2 bye lines 4 %{ #define ERROR -1 int line_number=1; %} whitespace [ \t] letter [a-zA-Z] digit [0-9] integer ({digit}+) l_or_d ({letter}|{digit}) identifier ({letter}{l_or_d}*) operator [-+*/] separator [;,(){}] %% {integer} {return 1;} {identifier} {return 2;} {operator}|{separator} {return (int)yytext[0];} {whitespace} {} \n {line_number++;} . {return ERROR;} int yywrap(void) {return 1;} int main() { int token; yyin=fopen("myfile","r"); while ((token=yylex())!=0) printf("%d %s \n", token, yytext); printf("lines %d \n",line_number); } Input file (“myfile”) 123+435+34=aaaa 329*45/a-34*(45+23)**3 bye-bye 30-Nov-18 COMP36512 Lecture 5

Conclusion Lexical Analysis turns a stream of characters in to a stream of tokens: a largely automatic process. REs are powerful enough to specify scanners. DFAs have good properties for an implementation. Next time: Introduction to Parsing Reading: Aho2 3.9.6, 3.5 (lex); Aho1, pp. 141-144, 105-111 (lex); Grune pp.86-96; Hunter pp. 53-62; Cooper1 pp.55-72. Exercises: Aho2, p.166, p.186; Aho1 pp.146-150; Hunter pp. 71; Grune pp.185-187. 30-Nov-18 COMP36512 Lecture 5