Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

Similar presentations


Presentation on theme: "Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University."— Presentation transcript:

1 Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University

2 2 Administration Forum Forum https://forums.cs.tau.ac.il/viewforum.php?f=64 Project Teams Project Teams Send me an email if you can’t find a team Send me an email if you can’t find a team Send me your team if you found one and didn’t send an email Send me your team if you found one and didn’t send an email Check excel file on website Check excel file on website First PA is at: First PA is at: http://www.cs.tau.ac.il/research/ohad.shacham/wcc08/pa/pa1/pa1.pdf http://www.cs.tau.ac.il/research/ohad.shacham/wcc08/pa/pa1/pa1.pdf

3 3 Programming Assignment 1 Implement a scanner for IC Implement a scanner for IC class Token class Token At least – line, id, value At least – line, id, value Should extend java_cup.runtime.Symbol Should extend java_cup.runtime.Symbol Numeric token ids in sym.java Numeric token ids in sym.java Will be later generated by JavaCup Will be later generated by JavaCup class Compiler class Compiler Testbed - calls scanner to print list of tokens Testbed - calls scanner to print list of tokens [StateList] > { return appropriate symbol } [StateList] > { return appropriate symbol }

4 4 Programming Assignment 1 class LexicalError class LexicalError Caught by Compiler Caught by Compiler Assume Assume class identifiers starts with a capital letter class identifiers starts with a capital letter Other identifiers starts with a non capital letter Other identifiers starts with a non capital letter

5 5 sym.java public class sym { public static final int EOF = 0; public static final int EOF = 0; public static final int ID = 1; public static final int ID = 1;......} Defines symbol constant ids Communicate between parser and scanner Actual values don’t matter Unique value for each tokes Will be generated by cup in PA2

6 6 Token class import java_cup.runtime.Symbol; public class Token extends Symbol { public int getId() {...} public int getId() {...} public Object getValue() {...} public int getLine() {...} public Object getValue() {...} public int getLine() {...}......}

7 7 JFlex directives to use %cup (integrate with cup) %line (count lines) %type Token (pass type Token) %class Lexer (gen. scanner class)

8 8 %cup %implements java_cup.runtime.Scanner %implements java_cup.runtime.Scanner Lex class implements java_cup.runtime.Scanner Lex class implements java_cup.runtime.Scanner %function next_token %function next_token Returns the next token Returns the next token %type java_cup.runtime.Symbol %type java_cup.runtime.Symbol Return token Class Return token Class

9 9 Structure JFlex javac IC.lex Lexical analyzer test.ic tokens Lexer.java sym.java Token.java LexicalError.java Compiler.java

10 10 Directions Download Java Download Java Download JFlex Download JFlex Download JavaCup Download JavaCup Put JFlex and JavaCup in classpath Put JFlex and JavaCup in classpath Eclipse Eclipse Use ant build.xml Use ant build.xml Import jflex and javacup Import jflex and javacup Apache Ant Apache Ant

11 11 Directions Use skeleton from the website Use skeleton from the website Read Assignment Read Assignment Use Forum Use Forum

12 12 Tools Ant Ant Make environment Make environment A build.xml included in the skeleton A build.xml included in the skeleton Download from: Download from: http://ant.apache.org http://ant.apache.org http://ant.apache.org Use: Use: ant – to compile ant – to compile ant scanner – to run JFlex ant scanner – to run JFlex

13 13 Tools JFlex JFlex Lexical analyzer generator Lexical analyzer generator Download from: Download from: http://jflex.de/ http://jflex.de/ http://jflex.de/ Manual: http://jflex.de/manual.pdf Manual: http://jflex.de/manual.pdfhttp://jflex.de/manual.pdf Add $MyJFlex/lib/JFlex.jar to your classpath Add $MyJFlex/lib/JFlex.jar to your classpath Use: Use: java JFlex.Main IC.lex java JFlex.Main IC.lex ant scanner – for ant users ant scanner – for ant users

14 14 Tools Cup Cup Parser generator Parser generator Download from: Download from: http://www2.cs.tum.edu/projects/cup/ Manual: Manual: http://www2.cs.tum.edu/projects/cup/manual.html Put java-cup-11a.jar and java-cup-11a-runtime.jar in your classpath Put java-cup-11a.jar and java-cup-11a-runtime.jar in your classpath Use: Use: java -jar java-cup-11a.jar java -jar java-cup-11a.jar ant libparser – for ant users ant libparser – for ant users

15 15 Compiler IC Program ic x86 executable exe Lexical Analysis Syntax Analysis Parsing ASTSymbol Table etc. Inter. Rep. (IR) Code Generation IC compiler

16 16 Parsing Input: Sequence of Tokens Sequence of TokensOutput: Abstract Syntax Tree Abstract Syntax Tree Decide whether program satisfies syntactic structure Decide whether program satisfies syntactic structure

17 17 Parsing errors Error detection Error detection Report the most relevant error message Report the most relevant error message Correct line number Correct line number Current v.s. expected token Current v.s. expected token Error recovery Error recovery Recover and continue to the next error Recover and continue to the next error Heuristics for good recovery to avoid many spurious errors Heuristics for good recovery to avoid many spurious errors Search for a semi-column and ignore the statement Search for a semi-column and ignore the statement Ignore the next n errors Ignore the next n errors

18 18 Parsing Context Free Grammars (CFG) Context Free Grammars (CFG) Captures program structure (hierarchy) Captures program structure (hierarchy) Employ formal theory results Employ formal theory results Automatically create “efficient” parsers Automatically create “efficient” parsers Grammar: S  if E then S else S S  print E E  num

19 19 From text to abstract syntax 5 + (7 * x) )id*num(+num Lexical Analyzer program text token stream Parser Grammar: E  id E  num E  E + E E  E * E E  ( E ) num(5) E EE+ E*E ( E) num(7)id(x) + Num(5) Num(7) id(x) * Abstract syntax tree parse tree valid syntax error

20 20 From text to abstract syntax )id*num(+num token stream Parser Grammar: E  id E  num E  E + E E  E * E E  ( E ) num E EE+ E*E ( E) id + num x * Abstract syntax tree parse tree valid syntax error Note: a parse tree describes a run of the parser, an abstract syntax tree is the result of a successful run

21 21 Parsing terminology Symbols סימנים)) : terminals (tokens) + * ( ) id num non-terminals E Derivation (גזירה): E E + E 1 + E 1 + E * E 1 + 2 * E 1 + 2 * 3 Parse tree (עץ גזירה): 1 E EE+ E E * 2 3 Grammar rules :(חוקי דקדוק) E  id E  num E  E + E E  E * E E  ( E ) Convention: the non-terminal appearing in the first derivation rule is defined to be the initial non-terminal Each step in a derivation is called a production

22 22 Ambiguity Derivation: E E + E 1 + E 1 + E * E 1 + 2 * E 1 + 2 * 3 Parse tree: 1 E EE+ E E * 2 3 Derivation: E E * E E * 3 E + E * 3 E + 2 * 3 1 + 2 * 3 Parse tree: E EE* 3 E E + 1 2 Leftmost derivation Rightmost derivation Grammar rules: E  id E  num E  E + E E  E * E E  ( E ) Definition: a grammar is ambiguous ( רב - משמעי ) if there exists an input string that has two different derivations

23 23 Grammar rewriting Ambiguous grammar: E  id E  num E  E + E E  E * E E  ( E ) Unambiguous grammar: E  E + T E  T T  T * F T  F F  id F  num F  ( E ) E ET+ T F * 3 F 2 T F 1 Derivation: E E + T 1 + T 1 + T * F 1 + F * F 1 + 2 * F 1 + 2 * 3 Parse tree: Note the difference between a language and a grammar: A grammar represents a language. A language can be represented by many grammars.

24 24 Parsing methods – Top Down LL(k) LL(k) “L” – left-to-right scan of input “L” – left-to-right scan of input “L” – leftmost derivation “L” – leftmost derivation “k” – predict based on “k” look-ahead tokens “k” – predict based on “k” look-ahead tokens Predict a production for a non-terminal and “k” tokens Predict a production for a non-terminal and “k” tokens

25 25 Parsing methods – Bottom Up LR(0), SLR(1), LR(1), LALR(1) LR(0), SLR(1), LR(1), LALR(1) “L” – left-to-right scan of input “L” – left-to-right scan of input “R” – right most derivation “R” – right most derivation Decide a production for a RHS and a lookup Decide a production for a RHS and a lookup

26 26 Top Down – parsing E 1 + E E  T + E E  i T  i 1 1 + T + E + 1 + 2 + E T E 1 + 2 + 3 T + E E +T E 23 1 + 2 + 3

27 27 Top Down – parsing Starts with the start symbol Starts with the start symbol Tries to transform it to the input Tries to transform it to the input Also called predictive parsing Also called predictive parsing LL(1) example LL(1) example Grammar: S  if E then S else S S  begin S L S  print E L  end L  ; S L E  num if 5 then print 8 else… Token : rule S if : S  if E then S else S if E then S else S 5 : E  num if 5 then S else S print : print E if 5 then print E else S …

28 28 Top Down - problems Left Recursion Left Recursion A  Aa A  Aa A  a A  a Non termination Non termination A Aa Aaa Aaaa Aaaaaa….. …

29 29 Top Down - problems Two rules cannot start with same token Two rules cannot start with same token Can be solved by backtracking Can be solved by backtracking Reduce #backtracks Reduce #backtracks E  T + E E  T E  T + E E  T E T T + E

30 30 Top Down – solution Two ways Eliminate left recursion Eliminate left recursion Perform left refactoring Perform left refactoring

31 31 Top Down – solution Step I: left recursion removal E  E + T E  T T  T * F T  F F  id F  (E) E  T + E T  F * T

32 32 Top Down – solution Step II: left factoring E  T + E E  T T  F * T T  F F  id F  (E) E  T E’ E’  + E E’  ε T  F T’ T’  * T T’  ε F  id F  (E)

33 33 Top Down – left recursion Non-terminal with two rules starting with same prefix Non-terminal with two rules starting with same prefix Grammar: S  if E then S else S S  if E then S Left-factored grammar: S  if E then S X X  ε X  else S

34 34 Bottom Up – parsing No problem with left recursion No problem with left recursion Widely used in practice Widely used in practice LR(0), SLR(1), LR(1), LALR(1) LR(0), SLR(1), LR(1), LALR(1) We will focus only on the theory of LR(0) We will focus only on the theory of LR(0) JavaCup implements LALR(1) JavaCup implements LALR(1) Starts with the input Starts with the input Attempt to rewrite it to the start symbol Attempt to rewrite it to the start symbol

35 35 Bottom Up – parsing 1 + (2) + (3) E + (E) + (3) + E  E + (E) E  i E 12+3 E E + (3) E () () E + (E) E E E E + (2) + (3)

36 36 Bottom Up - problems Ambiguity Ambiguity E = E + E E = i 1 + 2 + 3 -> (1 + 2) + 3 ???? 1 + 2 + 3 -> 1 + (2 + 3) ????

37 37 Summary Do PA1 Do PA1 Use forum Use forum Next week Next week Cup Cup LR(0) LR(0)


Download ppt "Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University."

Similar presentations


Ads by Google