Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group

Similar presentations


Presentation on theme: "Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group"— Presentation transcript:

1 Compiler Construction Lexical Analysis

2 2 Administration Project Teams Project Teams Send me your group Send me your group ortamir@post.tau.ac.il ortamir@post.tau.ac.il Send me an email if you’re unable to find partners Send me an email if you’re unable to find partners Use the forum to team up with others Use the forum to team up with others First PA – submission in two weeks First PA – submission in two weeks

3 3 Generic compiler structure Executable code exe Source text txt Semantic Representation Backend (synthesis) Compiler Frontend (analysis)

4 4 Compiler IC Program ic x86 executable exe Lexical Analysis Syntax Analysis Parsing AST Symbol Table etc. Inter. Rep. (IR) Code Generation IC compiler

5 5 Lexical Analysis converts characters to tokens converts characters to tokens class Quicksort { int[] a; int partition(int low, int high) { int pivot = a[low];... } 1: CLASS 1: CLASS_ID(Quicksort) 1: LCBR 2: INT 2: LB 2: RB 2: ID(a)... 2: SEMI

6 6 Lexical Analysis Tokens Tokens ID – size, num, Car ID – size, num, Car Num – 7, 5, 9, 4926209837, 07 Num – 7, 5, 9, 4926209837, 07 COMMA –, COMMA –, SEMI – ; SEMI – ; … Non tokens Non tokens Comment – // Comment – // Whitespace Whitespace …

7 7 Problem Input Input Program text Program text Tokens specification Tokens specification Output Output Sequence of tokens Sequence of tokens class Quicksort { int[] a; int partition(int low, int high) { int pivot = a[low];... } 1: CLASS 1: CLASS_ID(Quicksort) 1: LCBR 2: INT 2: LB 2: RB 2: ID(a)... 2: SEMI

8 8 Solution Write a lexical analyzer Write a lexical analyzer Token nextToken() { char c ; loop: c = getchar(); switch (c){ case ` `:goto loop ; case `;`: return Semicolon; case `+`: c = getchar() ; switch (c) { case `+': return PlusPlus ; case '=’ return PlusEqual; default: ungetc(c); return Plus; } case `<`: case `w`: … }

9 9 Solution’s Problem A lot of work A lot of work Corner cases Corner cases Error prone Error prone Hard to debug Hard to debug Exhausting Exhausting Boring Boring Hard to reuse Hard to reuse Hard to understand Hard to understand …

10 10 JFlex Off-the-shelf lexical analysis generator Off-the-shelf lexical analysis generator Input Input Scanner specification file Scanner specification file Output Output Lexical analyzer written in Java Lexical analyzer written in Java JFlexjavac IC.lex Lexical analyzer IC text tokens Lexer.java

11 11 JFlex Simple Simple Good for reuse Good for reuse Easy to understand Easy to understand Many developers and users debugged the generators Many developers and users debugged the generators "+" { return new symbol (sym.PLUS); } "boolean" { return new symbol (sym.BOOLEAN); } “int" { return new symbol (sym.INT); } "null" {return new symbol (sym.NULL);} "while" {return new symbol (sym.WHILE);} "=" {return new symbol (sym.ASSIGN);}…

12 12 JFlex Spec File User code Copied directly to Java file Copied directly to Java file % JFlex directives Define macros, state names Define macros, state names % Lexical analysis rules How to break input to tokens How to break input to tokens Action when token matched Action when token matched Possible source of javac errors down the road DIGIT= [0-9] LETTER= [a-zA-Z] YYINITIAL {LETTER} ({LETTER}|{DIGIT})*

13 13 User Code package IC.Parser; import IC.Parser.Token; … any scanner-helper Java code …

14 14 JFlex Directives Control JFlex internals Control JFlex internals  %line switches line counting on  %char switches character counting on  %class class-name changes default name  %cup CUP compatibility mode  %type token-class-name  %public Makes generated class public (package by default)  %function read-token-method  %scanerror exception-type-name

15 15 JFlex Directives State definitions State definitions %state state-name %state state-name e.g.: %state STRING e.g.: %state STRING Macro definitions Macro definitions macro-name = regex macro-name = regex

16 16 Regular Expression. any character except the newline "..."string {name} macro expansion * zero or more repetitions + one or more repetitions ? zero or one repetitions (...) grouping within regular expressions a|ba|ba|ba|b match a or b [...] class of characters - any one character enclosed in brackets a–ba–ba–ba–b range of characters [^…] negated class – any one not enclosed in brackets

17 17 Examples of Macros ALPHA=[A-Za-z]DIGIT=[0-9]ALPHA_NUMERIC={ALPHA}|{DIGIT}IDENT={ALPHA}({ALPHA_NUMERIC})*NUMBER=({DIGIT})+NUMBER=[0-9]+

18 18 Rules [states] regexp {action as Java code} Priorities Priorities Longest match Longest match Order in the lex file Order in the lex file Rules should match all inputs!!! Rules should match all inputs!!! Breaks input to tokens Invoked when regexp matches e.g.: break vs breaker int identifier or integer ? The regexp should be evaluated ?

19 19 Rules Examples DIGIT}+ { return new Symbol(sym.NUMBER, yytext(), yyline); } \t {} "-" { return new Symbol(sym.MINUS, yytext(), yyline); } [a-zA-Z] ([a-zA-Z0-9]) * { return new Symbol(sym.ID, yytext(), yyline); }

20 20 Rules – Action Action Action Java code Java code Can use special methods and vars Can use special methods and vars yyline yyline yytext() yytext() Returns a token for a token Returns a token for a token “Eats” chars for non tokens “Eats” chars for non tokens

21 21 Rules – State State State Which regexp should be evaluated? Which regexp should be evaluated? YYINITIAL YYINITIAL JFlex’s initial state JFlex’s initial state yybegin(stateX) yybegin(stateX) Jumps to stateX Jumps to stateX

22 22 Rules – State "//" { yybegin(COMMENTS); } [^\n] { } [\n] { yybegin(YYINITIAL); } YYINITIALCOMMENTS ‘ // ’ \n ^\n

23 23 Lines Count Example import java_cup.runtime.Symbol; % %cup %{ private int lineCounter = 0; %} %eofval{ System.out.println("line number=" + lineCounter); return new Symbol(sym.EOF); %eofval} NEWLINE=\n % {NEWLINE} { lineCounter++; } [^{NEWLINE}] { }

24 24 Lines Count Example JFlex javac lineCount.lex Lexical analyzer text tokens Yylex.java Main.java JFlex and JavaCup must be on CLASSPATH sym.java java JFlex.Main lineCount.lex javac *.java

25 25 Testbed import java.io.*; public class Main { public static void main(String[] args) { Symbol currToken; try { FileReader txtFile = new FileReader(args[0]); Yylex scanner = new Yylex(txtFile); do { currToken = scanner.next_token(); // do something with currToken } while (currToken.sym != sym.EOF); } catch (Exception e) { throw new RuntimeException("IO Error (brutal exit)” + e.toString()); }

26 26 Programming Assignment 1 Implement a scanner for IC Implement a scanner for IC class Token class Token At least – line, id, value At least – line, id, value Should extend java_cup.runtime.Symbol Should extend java_cup.runtime.Symbol Numeric token ids in sym.java Numeric token ids in sym.java Later on, will be generated by JavaCup Later on, will be generated by JavaCup class Compiler class Compiler Testbed - calls scanner to print list of tokens Testbed - calls scanner to print list of tokens class LexicalError class LexicalError Caught by Compiler Caught by Compiler Don’t forget to generate scanner and recompile Java sources when you change the spec Don’t forget to generate scanner and recompile Java sources when you change the spec You need to download and install both JFlex and JavaCup You need to download and install both JFlex and JavaCup

27 27 sym.java public class sym { public static final int EOF = 0; public static final int EOF = 0; public static final int ID = 1; public static final int ID = 1;......} Defines symbol constant ids Communicate between parser and scanner Actual values don’t matter Unique value for each token Will be generated by cup in PA2

28 28 Token class import java_cup.runtime.Symbol; public class Token extends Symbol { public int getId() {...} public int getId() {...} public Object getValue() {...} public int getLine() {...} public Object getValue() {...} public int getLine() {...}......}

29 29 JFlex directives to use %line (count lines) %type Token (pass type Token) %class Lexer (gen. scanner class) %cup (integrate with cup)

30 30 %cup %class Lexer %scanerror LexicalError %function next_token %type Token … %eofval { return new Token(sym.EOF,…); return new Token(sym.EOF,…); %eofval }

31 31 Structure JFlex javac IC.lex Lexical analyzer test.ic tokens Lexer.java sym.java Token.java LexicalError.java Compiler.java

32 32 Directions Download Java Download Java Download JFlex Download JFlex Download JavaCup Download JavaCup Put JFlex and JavaCup in classpath Put JFlex and JavaCup in classpath Apache Ant Apache Ant Eclipse Eclipse Use ant build.xml Use ant build.xml Import jflex and javacup Import jflex and javacup

33 33 Directions Use skeleton from the website Use skeleton from the website Read Assignment Read Assignment


Download ppt "Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group"

Similar presentations


Ads by Google