Compiler Construction Lexical Analysis Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.

Slides:



Advertisements
Similar presentations
JavaCUP JavaCUP (Construct Useful Parser) is a parser generator
Advertisements

1 Assignment 3 Jianguo Lu. 2 Task: check whether the a program is syntactically correct /** this is a comment line in the sample program **/ INT f2(INT.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
1 JavaCUP JavaCUP (Construct Useful Parser) is a parser generator Produce a parser written in java, itself is also written in Java; There are many parser.
Chapter 7 User-Defined Methods. Chapter Objectives  Understand how methods are used in Java programming  Learn about standard (predefined) methods and.
Compilation Encapsulation Or: Why Every Component Should Just Do Its Damn Job.
Compiler Construction Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #4 Lexical.
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1.
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
Lexical Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 2.
1 Problem 2 A Scanner / Parser for Simple C. 2 Outline l Language syntax for SC l Requirements for the scanner l Requirement for the parser l companion.
Chapter 2 Chang Chi-Chung Lexical Analyzer The tasks of the lexical analyzer:  Remove white space and comments  Encode constants as tokens.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Compiler Construction Parsing II Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1.
Compiler Construction Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.
Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.
Scanning with Jflex.
Lecture 2: Lexical Analysis CS 540 George Mason University.
1 Material taught in lecture Scanner specification language: regular expressions Scanner generation using automata theory + extra book-keeping.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
CS 536 Spring Learning the Tools: JLex Lecture 6.
Compiler Construction Lexical Analysis Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
Compilation Lecture 2: Lexical Analysis Syntax Analysis (1): CFLs, CFGs, PDAs Noam Rinetzky 1.
Automated Parser Generation (via CUP)CUP 1. High-level structure JFlexjavac Lexer spec Lexical analyzer text tokens.java CUPjavac Parser spec.javaParser.
Introduction to Programming David Goldschmidt, Ph.D. Computer Science The College of Saint Rose Java Fundamentals (Comments, Variables, etc.)
Lexical Analysis Mooly Sagiv Schrierber Wed 10:00-12:00 html:// Textbook:Modern.
Winter Compiler Construction T2 – Lexical Analysis (Scanning) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
Hello.java Program Output 1 public class Hello { 2 public static void main( String [] args ) 3 { 4 System.out.println( “Hello!" ); 5 } // end method main.
CPSC 388 – Compiler Design and Construction Scanners – JLex Scanner Generator.
Compilation (Semester A, 2013/14) Lecture 2: Lexical Analysis Modern Compiler Design: Chapter 2.1 Noam Rinetzky 1.
Chapter 2: Java Fundamentals
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
CS453 LectureIntroduction1 CS453 Compiler Construction Instructor:Wim Bohm Computer Science Building 470 TA: tba
Compiler Construction Dr. Noam Rinetzky and Orr Tamir School of Computer Science Tel Aviv University
Jianguo Lu 1 Explanation on Assignment 2, part 1 DFASimulator.java –The algorithm is on page 116, dragon book. DFAInput.txt The Language is: 1* 0 1* 0.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
JLex Lecture 4 Mon, Jan 24, JLex JLex is a lexical analyzer generator in Java. It is based on the well-known lex, which is a lexical analyzer generator.
Introduction to Lex Ying-Hung Jiang
Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group
Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v
Practical 1-LEX Implementation
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
Compiler Principles Fall Compiler Principles Lecture 6: Parsing part 5 Roman Manevich Ben-Gurion University.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
1 JavaCUP JavaCup (Construct Useful Parser) is a parser generator; Produce a parser written in java, itself is also written in Java; There are many parser.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Winter Compiler Construction Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University.
©SoftMoore ConsultingSlide 1 Lexical Analysis (a.k.a. Scanning)
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
LEX & Yacc Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
JavaCUP JavaCUP (Construct Useful Parser) is a parser generator
Winter Compiler Construction
Tutorial On Lex & Yacc.
Lecture 2: Lexical Analysis Noam Rinetzky
JLex Lecture 4 Mon, Jan 26, 2004.
Starting JavaProgramming
Compiler Structures 3. Lex Objectives , Semester 2,
Compiler Structures 2. Lexical Analysis Objectives
Presentation transcript:

Compiler Construction Lexical Analysis Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

2 Administration Project Teams Project Teams Send me your group Send me your group Send me an if you can’t find a team Send me an if you can’t find a team First PA – for two weeks First PA – for two weeks There are no recitations next week There are no recitations next week November 26 – Schreiber 07 November 26 – Schreiber 07 9:00 – 10:00 9:00 – 10:00 13:00 – 14:00 13:00 – 14:00 November 27 – Schreiber 07 November 27 – Schreiber 07 10:00 – 11:00 10:00 – 11:00

3 Generic compiler structure Executable code exe Source text txt Semantic Representation Backend (synthesis) Compiler Frontend (analysis)

4 Compiler IC Program ic x86 executable exe Lexical Analysis Syntax Analysis Parsing ASTSymbol Table etc. Inter. Rep. (IR) Code Generation IC compiler

5 Lexical Analysis converts characters to tokens converts characters to tokens class Quicksort { int[] a; int partition(int low, int high) { int pivot = a[low];... } 1: CLASS 1: CLASS_ID(Quicksort) 1: LCBR 2: INT 2: LB 2: RB 2: ID(a)... 2: SEMI

6 Lexical Analysis Tokens Tokens ID – _size, _num ID – _size, _num Num – 7, 5, 9, Num – 7, 5, 9, COMMA –, COMMA –, SEMI – ; SEMI – ; … Non tokens Non tokens Comment – // Comment – // Whitespace Whitespace Macro Macro …

7 Problem Input Input Program text Program text Tokens specification Tokens specification Output Output Sequence of tokens Sequence of tokens class Quicksort { int[] a; int partition(int low, int high) { int pivot = a[low];... } 1: CLASS 1: CLASS_ID(Quicksort) 1: LCBR 2: INT 2: LB 2: RB 2: ID(a)... 2: SEMI

8 Solution Write a lexical analyzer Write a lexical analyzer Token nextToken() { char c ; loop: c = getchar(); switch (c){ case ` `:goto loop ; case `;`: return SemiColumn; case `+`: c = getchar() ; switch (c) { case `+': return PlusPlus ; case '=’ return PlusEqual; default: ungetc(c); return Plus; } case `<`: case `w`: … }

9 Solution’s Problem A lot of work A lot of work Corner cases Corner cases Error prune Error prune Hard to debug Hard to debug Exhausting Exhausting Boring Boring Hard to reuse Hard to reuse Switch parser’s code between people Switch parser’s code between people …. ….

10 JFlex Off the shelf lexical analysis generator Off the shelf lexical analysis generator Input Input scanner specification file scanner specification file Output Output Lexical analyzer written in Java Lexical analyzer written in Java JFlexjavac IC.lex Lexical analyzer IC text tokens Lexer.java

11 JFlex Simple Simple Good for reuse Good for reuse Easy to understand Easy to understand Many developers and users debugged the generators Many developers and users debugged the generators "+" { return new symbol (sym.PLUS); } "boolean" { return new symbol (sym.BOOLEAN); } “int" { return new symbol (sym.INT); } "null" {return new symbol (sym.NULL);} "while" {return new symbol (sym.WHILE);} "=" {return new symbol (sym.ASSIGN);}…

12 JFlex Spec File User code Copied directly to Java file Copied directly to Java file % JFlex directives Define macros, state names Define macros, state names % Lexical analysis rules How to break input to tokens How to break input to tokens Action when token matched Action when token matched Possible source of javac errors down the road DIGIT= [0-9] LETTER= [a-zA-Z] YYINITIAL {LETTER} ({LETTER}|{DIGIT})*

13 User code package IC.Parser; import IC.Parser.Token; … any scanner-helper Java code …

14 JFlex Directives Control JFlex internals Control JFlex internals  %line switches line counting on  %char switches character counting on  %class class-name changes default name  %cup CUP compatibility mode  %type token-class-name  %public Makes generated class public (package by default)  %function read-token-method  %scanerror exception-type-name

15 JFlex Directives State definitions State definitions %state state-name %state state-name %state STRING %state STRING Macro definitions Macro definitions macro-name = regex macro-name = regex

16 Regular Expression r $r $r $r $ match reg. exp. r at end of a line. any character except the newline "..."string {name} macro expansion * zero or more repetitions + one or more repetitions ? zero or one repetitions (...) grouping within regular expressions a|ba|ba|ba|b match a or b [...] class of characters - any one character enclosed in brackets a–ba–ba–ba–b range of characters [^…] negated class – any one not enclosed in brackets

17 Example macros ALPHA=[A-Za-z_]DIGIT=[0-9]ALPHA_NUMERIC={ALPHA}|{DIGIT}IDENT={ALPHA}({ALPHA_NUMERIC})*NUMBER=({DIGIT})+NUMBER=[0-9]+

18 Rules [states] regexp {action as Java code} Priorities Priorities Longest match Longest match Order in the lex file Order in the lex file Rules should match all inputs!!! Rules should match all inputs!!! Breaks Input to Tokens Invokes when regexp matches break breakdown int identifier or integer ? The regexp should be evaluated ?

19 Rules Examples DIGIT}+ { return new Symbol(sym.NUMBER, yytext(), yyline); } "-" { return new Symbol(sym.MINUS, yytext(), yyline); } [a-zA-Z] ([a-zA-Z0-9]) * { return new Symbol(sym.ID, yytext(), yyline); }

20 Rules – Action Action Action Java code Java code Can use special methods and vars Can use special methods and vars yyline yyline yytext() yytext() Returns a token for a token Returns a token for a token Eats chars for non tokens Eats chars for non tokens

21 Rules – State State State Which regexp should be evaluated? Which regexp should be evaluated? yybegin(stateX) yybegin(stateX) jumps to stateX jumps to stateX YYINITIAL YYINITIAL JFlex’s initial state JFlex’s initial state

22 Rules – State "//" { yybegin(COMMENTS); } [^\n] { } [\n] { yybegin(YYINITIAL); } YYINITIALCOMMENTS ‘ // ’ \n ^\n

23 Lines Count Example import java_cup.runtime.Symbol; % %cup %{ private int lineCounter = 0; %} %eofval{ System.out.println("line number=" + lineCounter); return new Symbol(sym.EOF); %eofval} NEWLINE=\n % {NEWLINE} { lineCounter++; } [^{NEWLINE}] { }

24 Lines Count Example JFlex javac lineCount.lex Lexical analyzer text tokens Yylex.java Main.java JFlex and JavaCup must be on CLASSPATH sym.java java JFlex.Main lineCount.lex javac *.java

25 Test Bed import java.io.*; public class Main { public static void main(String[] args) { Symbol currToken; try { FileReader txtFile = new FileReader(args[0]); Yylex scanner = new Yylex(txtFile); do { currToken = scanner.next_token(); // do something with currToken } while (currToken.sym != sym.EOF); } catch (Exception e) { throw new RuntimeException("IO Error (brutal exit)” + e.toString()); }

26 Common Pitfalls Classpath Classpath Path to executable Path to executable Define environment variables Define environment variables JAVA_HOME JAVA_HOME CLASSPATH CLASSPATH

27 Programming Assignment 1 Implement a scanner for IC Implement a scanner for IC class Token class Token At least – line, id, value At least – line, id, value Should extend java_cup.runtime.Symbol Should extend java_cup.runtime.Symbol Numeric token ids in sym.java Numeric token ids in sym.java Will be later generated by JavaCup Will be later generated by JavaCup class Compiler class Compiler Testbed - calls scanner to print list of tokens Testbed - calls scanner to print list of tokens class LexicalError class LexicalError Caught by Compiler Caught by Compiler Don’t forget to generate scanner and recompile Java sources when you change the spec Don’t forget to generate scanner and recompile Java sources when you change the spec You need to download and install both JFlex and JavaCup You need to download and install both JFlex and JavaCup

28 sym.java public class sym { public static final int EOF = 0; public static final int EOF = 0; public static final int ID = 1; public static final int ID = 1;......} Defines symbol constant ids Communicate between parser and scanner Actual values don’t matter Unique value for each tokes Will be generated by cup in PA2

29 Token class import java_cup.runtime.Symbol; public class Token extends Symbol { public int getId() {...} public int getId() {...} public Object getValue() {...} public int getLine() {...} public Object getValue() {...} public int getLine() {...}......}

30 JFlex directives to use %cup (integrate with cup) %line (count lines) %type Token (pass type Token) %class Lexer (gen. scanner class)

31 %cup %implements java_cup.runtime.Scanner %implements java_cup.runtime.Scanner Lex class implements java_cup.runtime.Scanner Lex class implements java_cup.runtime.Scanner %function next_token %function next_token Returns the next token Returns the next token %type java_cup.runtime.Symbol %type java_cup.runtime.Symbol Return token Class Return token Class

32 Structure JFlex javac IC.lex Lexical analyzer test.ic tokens Lexer.java sym.java Token.java LexicalError.java Compiler.java

33 Directions Download Java Download Java Download JFlex Download JFlex Download JavaCup Download JavaCup Put JFlex and JavaCup in classpath Put JFlex and JavaCup in classpath Eclipse Eclipse Use ant build.xml Use ant build.xml Import jflex and javacup Import jflex and javacup Apache Ant Apache Ant

34 Directions Use skeleton from the website Use skeleton from the website Read Assignment Read Assignment Use Forum Use Forum