1 February 23, 2016 1 February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.

Slides:



Advertisements
Similar presentations
4b Lexical analysis Finite Automata
Advertisements

Compiler Baojian Hua Lexical Analysis (II) Compiler Baojian Hua
Chapter 2 Lexical Analysis Nai-Wei Lin. Lexical Analysis Lexical analysis recognizes the vocabulary of the programming language and transforms a string.
Lecture # 5. Topics Minimization of DFA Examples What are the Important states of NFA? How to convert a Regular Expression directly into a DFA ?
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
1 Pertemuan Lexical Analysis (Scanning) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
1 October 2, October 2, 2015October 2, 2015October 2, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use , Semester 1, Lex.
Chapter 3 Chang Chi-Chung The Role of the Lexical Analyzer Lexical Analyzer Parser Source Program Token Symbol Table getNextToken error.
CS308 Compiler Principles Lexical Analyzer Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University Fall 2012.
Overview of Previous Lesson(s) Over View  Strategies that have been used to implement and optimize pattern matchers constructed from regular expressions.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
1 October 14, October 14, 2015October 14, 2015October 14, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Lexical Analyzer (Checker)
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
May 31, May 31, 2016May 31, 2016May 31, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University,
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Compiler Construction 2 주 강의 Lexical Analysis. “get next token” is a command sent from the parser to the lexical analyzer. On receipt of the command,
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Chapter 3 Chang Chi-Chung The Role of the Lexical Analyzer Lexical Analyzer Parser Source Program Token Symbol Table getNextToken error.
1 Lexical Analysis and Lexical Analyzer Generators Chapter 3 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
Pembangunan Kompilator.  A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that language, and.
By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.
Lecture # 4 Chapter 1 (Left over Topics) Chapter 3 (continue)
1 November 19, November 19, 2015November 19, 2015November 19, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
Introduction to Lex Ying-Hung Jiang
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
Compiler Construction Sohail Aslam Lecture 9. 2 DFA Minimization  The generated DFA may have a large number of states.  Hopcroft’s algorithm: minimizes.
Fall 2003CS416 Compiler Design1 Lexical Analyzer Lexical Analyzer reads the source program character by character to produce tokens. Normally a lexical.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Overview of Previous Lesson(s) Over View  Algorithm for converting RE to an NFA.  The algorithm is syntax- directed, it works recursively up the parse.
Lexical Analyzer CS308 Compiler Theory1. 2 Lexical Analyzer Lexical Analyzer reads the source program character by character to produce tokens. Normally.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
1 January 18, January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Lexical Analysis.
1st Phase Lexical Analysis
1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Deterministic Finite Automata Nondeterministic Finite Automata.
1 Syntax Analysis Part III Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Syntax Analysis Part III
Introduction to Lexical Analysis
Recognizer for a Language
Lexical Analysis Why separate lexical and syntax analyses?
Syntax Analysis Part III
Lexical Analysis and Lexical Analyzer Generators
Syntax Analysis Part III
Lecture 4: Lexical Analysis II: From REs to DFAs
Syntax Analysis Part III
Other Issues - § 3.9 – Not Discussed
Compiler Structures 3. Lex Objectives , Semester 2,
Systems Programming & Operating Systems Unit – III
Presentation transcript:

1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

2  Formalization February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction Lexical Analysis & Lexical Analyzer Generators  Regular Expressions  Finite Automata  RE  Conversion  FA  Lexer Design

3 February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction RegEx to FA –C–Conversion –F–Finite state processing –T–Table-driven processing Thompson Construction –C–Compound of RE by NFA –R–Rule out ambiguity –F–Finite state / table driven processing RE -> DFA (Directly) –C–Conversion without ambiguity –C–Closure turns to finite –T–Table-driven processing Keep in mind following questions

4 Design of a Lexical Analyzer Generator: RE to NFA to DFA p 1 { action 1 } p 2 { action 2 } … p n { action n } Lex specification with regular expressions NFA DFA Subset construction February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

5 From Regular Expression to NFA (Thompson’s Construction) February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

6 From Regular Expression to DFA Directly The “important states” of an NFA are those without an  -transition, that is if move({s},a)   for some a then s is an important state The subset construction algorithm uses only the important states when it determines  -closure(move(T,a)) February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

7 From Regular Expression to DFA Directly (Algorithm) Augment the regular expression r with a special end symbol # to make accepting states important: the new expression is r# Construct a syntax tree for r# Traverse the tree to construct functions nullable, firstpos, lastpos, and followpos February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

8 From Regular Expression to DFA Directly: Syntax Tree of ( a | b )* abb# February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

9 From Regular Expression to DFA Directly: Annotating the Tree nullable(n): the subtree at node n generates languages including the empty string firstpos(n): set of positions that can match the first symbol of a string generated by the subtree at node n lastpos(n): the set of positions that can match the last symbol of a string generated be the subtree at node n followpos(i): the set of positions that can follow position i in the tree February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

10 From Regular Expression to DFA Directly: Annotating the Tree February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

11 From Regular Expression to DFA Directly: Syntax Tree of ( a | b )* abb# February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

12 From Regular Expression to DFA Directly: followpos for each node n in the tree do if n is a cat-node with left child c 1 and right child c 2 then for each i in lastpos(c 1 ) do followpos(i) := followpos(i)  firstpos(c 2 ) end do else if n is a star-node for each i in lastpos(n) do followpos(i) := followpos(i)  firstpos(n) end do end if end do February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

13 From Regular Expression to DFA Directly: Algorithm s 0 := firstpos(root) where root is the root of the syntax tree Dstates := {s 0 } and is unmarked while there is an unmarked state T in Dstates do mark T for each input symbol a   do let U be the set of positions that are in followpos(p) for some position p in T, such that the symbol at position p is a if U is not empty and not in Dstates then add U as an unmarked state to Dstates end if Dtran[T,a] := U end do end do February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

14 From Regular Expression to DFA Directly: Example February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

15 Time-Space Tradeoffs Automaton Space (worst case) Time (worst case) NFA O(r)O(r)O(  r  x  ) DFAO(2 |r| ) O(x)O(x) February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

16 February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction Got it with following questions RegEx to FA –C–Conversion –F–Finite state processing –T–Table-driven processing Thompson Construction –C–Compound of RE by NFA –R–Rule out ambiguity –F–Finite state / table driven processing RE -> DFA (Directly) –C–Conversion without ambiguity –C–Closure turns to finite –T–Table-driven processing

17 Thank you very much! Questions? February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

18 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

19  Formalization February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction Lexical Analysis & Lexical Analyzer Generators  Regular Expressions  Finite Automata  RE  Conversion  FA  Lexer Design

20 February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction Design: from RE to FA –F–From RegEx to NFA –F–From NFA to DFA –L–Lex Specification Lex / Yacc Compiler –L–Lex Specification –Y–Yet Another Compiler Compiler –F–From stream to interesting bits Design Organization –I–Initial stage –L–L/S/S analysis stage –I–IR as outputs for further generation Keep in mind following questions

21 Design of a Lexical Analyzer Generator Translate regular expressions to NFA Translate NFA to an efficient DFA regular expressions NFADFA Simulate NFA to recognize tokens Simulate DFA to recognize tokens Optional February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

22 Design of a Lexical Analyzer Generator: RE to NFA to DFA p 1 { action 1 } p 2 { action 2 } … p n { action n } Lex specification with regular expressions NFA DFA Subset construction February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

23 The Lex and Flex Scanner Generators Lex and its newer cousin flex are scanner generators Systematically translate regular definitions into C source code for efficient scanning Generated code is easy to integrate in C applications February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

24 Creating a Lexical Analyzer with Lex and Flex lex or flex compiler lex source program lex.l lex.yy.c input stream C compiler a.out sequence of tokens lex.yy.c a.out February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

25 Lex/Flex Compiler February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction What is Lex Lex is officially known as a "Lexical Analyser". It's main job is to break up an input stream into more usable elements. What is Yacc Yacc is officially known as a "parser". It's job is to analyse the structure of the input stream, and operate of the "big picture". Yacc stands for “Yet Another Compiler Compiler What are Flex and Bison Lex and Yacc are part of BSD Unix. GNU has it's own, enhanced, versions called Flex and Bison. I'll keep referring to "Lex" and "Yacc", but you can use Flex and Bison as "drop-in" replacements in most cases. In fact, the additional features of Flex and Bison make them an irresistable choice.

26 Lex/Flex Compiler February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

27 Lex Specification A lex specification consists of three parts: regular definitions, C declarations in %{ %} % translation rules % user-defined auxiliary procedures The translation rules are of the form: p 1 { action 1 } p 2 { action 2 } … p n { action n } February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

28 Regular Expressions in Lex x match the character x \. match the character. “ string ” match contents of string of characters. match any character except newline ^ match beginning of a line $ match the end of a line [xyz] match one character x, y, or z (use \ to escape - ) [^xyz] match any character except x, y, and z [a-z] match one of a to z r * closure (match zero or more occurrences) r + positive closure (match one or more occurrences) r ? optional (match zero or one occurrence) r 1 r 2 match r 1 then r 2 (concatenation) r 1 | r 2 match r 1 or r 2 (union) ( r ) grouping r 1 \ r 2 match r 1 when followed by r 2 { d } match the regular expression defined by d February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

29 Example Lex Specification 1 %{ #include %} % [0-9]+ { printf(“%s\n”, yytext); }.|\n { } % main() { yylex(); } Contains the matching lexeme Invokes the lexical analyzer lex spec.l gcc lex.yy.c -ll./a.out < spec.l Translation rules February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

30 Example Lex Specification 2 %{ #include int ch = 0, wd = 0, nl = 0; %} delim [ \t]+ % \n { ch++; wd++; nl++; } ^{delim} { ch+=yyleng; } {delim} { ch+=yyleng; wd++; }. { ch++; } % main() { yylex(); printf("%8d%8d%8d\n", nl, wd, ch); } Regular definition Translation rules February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

31 Example Lex Specification 3 %{ #include %} digit [0-9] letter [A-Za-z] id {letter}({letter}|{digit})* % {digit}+ { printf(“number: %s\n”, yytext); } {id} { printf(“ident: %s\n”, yytext); }. { printf(“other: %s\n”, yytext); } % main() { yylex(); } Regular definitions Translation rules February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

32 Organization February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction

33 February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction Got it with following questions Design: from RE to FA –F–From RegEx to NFA –F–From NFA to DFA –L–Lex Specification Lex / Yacc Compiler –L–Lex Specification –Y–Yet Another Compiler Compiler –F–From stream to interesting bits Design Organization –I–Initial stage –L–L/S/S analysis stage –I–IR as outputs for further generation

34 Thank you very much! Questions? February 23, Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department of Computer Science, CS400 Compiler Construction