Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.

Similar presentations


Presentation on theme: "Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University."— Presentation transcript:

1 Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University

2 2 Administration Please use the forum for questions Please use the forum for questions https://forums.cs.tau.ac.il/viewforum.php?f=70 Don’t compile in the submission directory Don’t compile in the submission directory Check whether your group appears in the list Check whether your group appears in the list Send me an email if you can’t find a team Send me an email if you can’t find a team Send me your team if you found one and didn’t send an email Send me your team if you found one and didn’t send an email Please send Name, Id, nova Id, and leader Please send Name, Id, nova Id, and leader

3 3 Complementary Class November 26 – Schreiber 07 November 26 – Schreiber 07 9:00 – 10:00 9:00 – 10:00 13:00 – 14:00 13:00 – 14:00 November 27 – Schreiber 07 November 27 – Schreiber 07 10:00 – 11:00 10:00 – 11:00 Does anyone plan to come on Friday?

4 4 Compiler IC Program ic x86 executable exe Lexical Analysis Syntax Analysis Parsing ASTSymbol Table etc. Inter. Rep. (IR) Code Generation IC compiler

5 5 Parsing Input: Sequence of Tokens Sequence of Tokens A context free grammar A context free grammarOutput: Abstract Syntax Tree Abstract Syntax Tree Decide whether program satisfies syntactic structure Decide whether program satisfies syntactic structure

6 6 Parsing Context Free Grammars (CFG) Context Free Grammars (CFG) Captures program structure (hierarchy) Captures program structure (hierarchy) Employ formal theory results Employ formal theory results Automatically create “efficient” parsers Automatically create “efficient” parsers Grammar: S  if E then S else S S  print E E  num

7 7 From text to abstract syntax 5 + (7 * x) )id*num(+num Lexical Analyzer program text token stream Parser Grammar: E  id E  num E  E + E E  E * E E  ( E ) num(5) E EE+ E*E ( E) num(7)id(x) + Num(5) Num(7) id(x) * Abstract syntax tree parse tree valid syntax error

8 8 From text to abstract syntax )id*num(+num token stream Parser Grammar: E  id E  num E  E + E E  E * E E  ( E ) num E EE+ E*E ( E) id + num x * Abstract syntax tree parse tree valid syntax error Note: a parse tree describes a run of the parser, an abstract syntax tree is the result of a successful run

9 9 Parsing terminology Symbols סימנים)) : terminals (tokens) + * ( ) id num non-terminals E Derivation (גזירה): E E + E 1 + E 1 + E * E 1 + 2 * E 1 + 2 * 3 Parse tree (עץ גזירה): 1 E EE+ E E * 2 3 Grammar rules :(חוקי דקדוק) E  id E  num E  E + E E  E * E E  ( E ) Convention: the non-terminal appearing in the first derivation rule is defined to be the initial non-terminal Each step in a derivation is called a production

10 10 Ambiguity Derivation: E E + E 1 + E 1 + E * E 1 + 2 * E 1 + 2 * 3 Parse tree: 1 E EE+ E E * 2 3 Derivation: E E * E E * 3 E + E * 3 E + 2 * 3 1 + 2 * 3 Parse tree: E EE* 3 E E + 1 2 Leftmost derivation Rightmost derivation Grammar rules: E  id E  num E  E + E E  E * E E  ( E ) Definition: a grammar is ambiguous ( רב - משמעי ) if there exists an input string that has two different derivations

11 11 Grammar rewriting Ambiguous grammar: E  id E  num E  E + E E  E * E E  ( E ) Unambiguous grammar: E  E + T E  T T  T * F T  F F  id F  num F  ( E ) E ET+ T F * 3 F 2 T F 1 Derivation: E E + T 1 + T 1 + T * F 1 + F * F 1 + 2 * F 1 + 2 * 3 Parse tree: Note the difference between a language and a grammar: A grammar represents a language. A language can be represented by many grammars.

12 12 Parsing methods – Top Down Starts with the start symbol Starts with the start symbol Tries to transform it to the input Tries to transform it to the input Grammar: S  if E then S else S S  begin S L S  print E L  end L  ; S L E  num if 5 then print 8 else… Token : rule S if : S  if E then S else S if E then S else S 5 : E  num if 5 then S else S print : print E if 5 then print E else S …

13 13 Parsing methods – Bottom Up Starts with the input Starts with the input Attempt to rewrite it to the start symbol Attempt to rewrite it to the start symbol Widely used in practice Widely used in practice LR(0), SLR(1), LR(1), LALR(1) LR(0), SLR(1), LR(1), LALR(1) We will focus only on the theory of LR(0) We will focus only on the theory of LR(0) JavaCup implements LALR(1) JavaCup implements LALR(1)

14 14 Bottom Up – parsing 1 + (2) + (3) E + (E) + (3) + E  E + (E) E  i E 12+3 E E + (3) E () () E + (E) E E E E + (2) + (3)

15 15 Bottom Up - problems Ambiguity Ambiguity E = E + E E = i 1 + 2 + 3 -> (1 + 2) + 3 ? 1 + 2 + 3 -> 1 + (2 + 3) ?

16 16 Cup JavaCupjavac Parser spec.javaParser AST Constructor of Useful Parsers Constructor of Useful Parsers Automatic LALR(1) parser generator Automatic LALR(1) parser generator Input: cup spec file Input: cup spec file Output: Syntax analyzer in Java Output: Syntax analyzer in Java tokens

17 17 Expression calculator terminal Integer NUMBER; terminal PLUS, MINUS, MULT, DIV; terminal LPAREN, RPAREN; non terminal Integer expr; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr | LPAREN expr RPAREN | NUMBER ; Is 2+3+4+5 a valid expression?

18 18 Ambiguities a * b + c a bc + * a bc * + a + b + c a bc + + a bc + +

19 19 terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; non terminal Integer expr; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Expression calculator Increasing precedence Contextual precedence

20 20 Disambiguation Each terminal assigned with precedence By default all terminals have lowest precedence By default all terminals have lowest precedence User can assign his own precedence User can assign his own precedence MINUS expr %prec UMINUS MINUS expr %prec UMINUS CUP assigns each production a precedence CUP assigns each production a precedence Precedence of last terminal in production Precedence of last terminal in production expr MINUS expr expr MINUS expr User specified contextual precedence User specified contextual precedence MINUS expr %prec UMINUS MINUS expr %prec UMINUS

21 21 Disambiguation On shift/reduce conflict resolve ambiguity by comparing precedence of terminal and production and decides whether to shift or reduce On shift/reduce conflict resolve ambiguity by comparing precedence of terminal and production and decides whether to shift or reduce In case of equal precedences left / right help resolve conflicts In case of equal precedences left / right help resolve conflicts left means reduce left means reduce right means shift right means shift More information on precedence declarations in CUP’s manual More information on precedence declarations in CUP’s manualprecedence declarationsprecedence declarations

22 22 Resolving ambiguity a + b + c a bc + + a bc + + precedence left PLUS

23 23 Resolving ambiguity a * b + c a bc + * a bc * + precedence left PLUS precedence left MULT

24 24 Resolving ambiguity a + b * c a bc * + a bc + * precedence left PLUS precedence left MULT

25 25 Resolving ambiguity - a * b a b * - precedence left PLUS precedence left MULT MINUS expr %prec UMINUS a - b *

26 26 Resolving ambiguity terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Rule has precedence of UMINUS UMINUS never returned by scanner (used only to define precedence)

27 27 More CUP directives precedence nonassoc NEQ precedence nonassoc NEQ Non-associative operators: == != etc. Non-associative operators: == != etc. 1<2<3 identified as an error 1<2<3 identified as an error 6 == 7 == 8 == 9 6 == 7 == 8 == 9 start non-terminal start non-terminal Specifies start non-terminal other than first non-terminal Specifies start non-terminal other than first non-terminal Can change to test parts of grammar Can change to test parts of grammar Getting internal representation Getting internal representation Command line options: Command line options: -dump_grammar -dump_grammar -dump_states -dump_states -dump_tables -dump_tables -dump -dump

28 28 import java_cup.runtime.*; % %cup %eofval{ return new Symbol(sym.EOF); %eofval} NUMBER=[0-9]+ % ”+” { return new Symbol(sym.PLUS); } ”-” { return new Symbol(sym.MINUS); } ”*” { return new Symbol(sym.MULT); } ”/” { return new Symbol(sym.DIV); } ”(” { return new Symbol(sym.LPAREN); } ”)” { return new Symbol(sym.RPAREN); } {NUMBER} { return new Symbol(sym.NUMBER, new Integer(yytext())); } \n { }. { } Parser gets terminals from the scanner Scanner integration Generated from token declarations in.cup file

29 29 Assigning meaning So far, only validation So far, only validation Add Java code implementing semantic actions Add Java code implementing semantic actions expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ;

30 30 Symbol labels used to name variables Symbol labels used to name variables RESULT names the left-hand side symbol RESULT names the left-hand side symbol expr ::= expr:e1 PLUS expr:e2 {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} | expr:e1 MINUS expr:e2 {: RESULT = new Integer(e1.intValue() - e2.intValue()); :} | expr:e1 MULT expr:e2 {: RESULT = new Integer(e1.intValue() * e2.intValue()); :} | expr:e1 DIV expr:e2 {: RESULT = new Integer(e1.intValue() / e2.intValue()); :} | MINUS expr:e1 {: RESULT = new Integer(0 - e1.intValue(); :} %prec UMINUS | LPAREN expr:e1 RPAREN {: RESULT = e1; :} | NUMBER:n {: RESULT = n; :} ; Assigning meaning

31 31 Building an AST More useful representation of syntax tree More useful representation of syntax tree Less clutter Less clutter Actual level of detail depends on your design Actual level of detail depends on your design Basis for semantic analysis Basis for semantic analysis Later annotated with various information Later annotated with various information Type information Type information Computed values Computed values

32 32 Parse tree vs. AST + expr 12+3 ()() 12 + 3 +

33 33 AST construction AST Nodes constructed during parsing AST Nodes constructed during parsing Stored in push-down stack Stored in push-down stack Bottom-up parser Bottom-up parser Grammar rules annotated with actions for AST construction Grammar rules annotated with actions for AST construction When node is constructed all children available (already constructed) When node is constructed all children available (already constructed) Node (RESULT) pushed on stack Node (RESULT) pushed on stack

34 34 1 + (2) + (3) expr + (expr) + (3) + expr 12+3 expr + (3) expr ()() expr + (expr) expr expr + (2) + (3) int_const val = 1 plus e1e2 int_const val = 2 int_const val = 3 plus e1e2 expr ::= expr:e1 PLUS expr:e2 {: RESULT = new plus(e1,e2); :} | LPAREN expr:e RPAREN {: RESULT = e; :} | INT_CONST:i {: RESULT = new int_const(…, i); :} AST construction

35 35 terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV,LPAREN,RPAREN,SEMI; terminal UMINUS; non terminal Integer expr; non terminal expr_list, expr_part; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr_list ::= expr_list expr_part | expr_part ; expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI ; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Designing an AST

36 36 PA2 Write parser for IC Write parser for IC Write parser for libic.sig Write parser for libic.sig Check syntax Check syntax Emit either “Parsed [file] successfully!” or “Syntax error in [file]: [details]” Emit either “Parsed [file] successfully!” or “Syntax error in [file]: [details]” -print-ast option -print-ast option Prints one AST node per line Prints one AST node per line

37 37 PA2 – step 1 Understand IC grammar in the manual Understand IC grammar in the manual Don’t touch the keyboard before understanding spec Don’t touch the keyboard before understanding spec Write a debug JavaCup spec for IC grammar Write a debug JavaCup spec for IC grammar A spec with “debug actions” : print-out debug messages to understand what’s going on A spec with “debug actions” : print-out debug messages to understand what’s going on Try “debug grammar” on a number of test cases Try “debug grammar” on a number of test cases Keep a copy of “debug grammar” spec around Keep a copy of “debug grammar” spec around Optional: perform error recovery Optional: perform error recovery Use JavaCup error token Use JavaCup error token

38 38 PA2 – next week Flesh out AST class hierarchy Flesh out AST class hierarchy Don’t touch the keyboard before you understand the hierarchy Don’t touch the keyboard before you understand the hierarchy Keep in mind that this is the basis for later stages Keep in mind that this is the basis for later stages Web-site contains an AST adapted with permission from Tovi Almozlino Web-site contains an AST adapted with permission from Tovi Almozlino Change CUP actions to construct AST nodes Change CUP actions to construct AST nodes

39 39 Partial example of main import java.io.*; import IC.Lexer.Lexer; import IC.Parser.*; import IC.AST.*; public class Compiler { public static void main(String[] args) { try { FileReader txtFile = new FileReader(args[0]); Lexer scanner = new Lexer(txtFile); Parser parser = new Parser(scanner); // parser.parse() returns Symbol, we use its value ProgAST root = (ProgAST) parser.parse().value; System.out.println(“Parsed ” + args[0] + “ successfully!”); } catch (SyntaxError e) { System.out.print(“Syntax error in ” + args[0] + “: “ + e); } if (libraryFileSpecified) {... try { FileReader libicFile = new FileReader(libPath); Lexer scanner = new Lexer(libicFile); LibraryParser parser = new LibraryParser(scanner); ClassAST root = (ClassAST) parser.parse().value; System.out.println(“parsed “ + libPath + “ successfully!”); } catch (SyntaxError e) { System.out.print(“Syntax error in “ + libPath + “ “ + e); } }...


Download ppt "Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University."

Similar presentations


Ads by Google