Presentation is loading. Please wait.

Presentation is loading. Please wait.

ML-YACC David Walker COS 320. Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc.

Similar presentations


Presentation on theme: "ML-YACC David Walker COS 320. Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc."— Presentation transcript:

1 ML-YACC David Walker COS 320

2 Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc –Reading: Chapter 3 of Appel

3 Parser Implementation Implementation Options: 1.Write a Parser from scratch –not as boring as writing a lexer, but not exactly a weekend in the Bahamas 2.Use a Parser Generator –Very general & robust. sometimes not quite as efficient as hand-written parsers. Nevertheless, good for lazy compiler writers. Parser Specification

4 Parser Implementation Implementation Options: 1.Write a Parser from scratch –not as boring as writing a lexer, but not exactly a weekend in the Bahamas 2.Use a Parser Generator –Very general & robust. sometimes not quite as efficient as hand-written parsers. Nevertheless, good for lazy compiler writers. Parser Specification parser generator Parser

5 Parser Implementation Implementation Options: 1.Write a Parser from scratch –not as boring as writing a lexer, but not exactly a weekend in the Bahamas 2.Use a Parser Generator –Very general & robust. sometimes not quite as efficient as hand-written parsers. Nevertheless, good for lazy compiler writers. Parser Specification parser generator Parser abstract syntax stream of tokens

6 ML-Yacc specification three parts: User Declarations: declare values available in the rule actions % ML-Yacc Definitions: declare terminals and non-terminals; special declarations to resolve conflicts % Rules: parser specified by CFG rules and associated semantic action that generate abstract syntax

7 ML-Yacc declarations (preliminaries) specify type of positions %pos int * int specify terminal and nonterminal symbols %term IF | THEN | ELSE | PLUS | MINUS... %nonterm prog | exp | op specify end-of-parse token %eop EOF specify start symbol (by default, non terminal in LHS of first rule) %start prog

8 Simple ML-Yacc Example % %term NUM | PLUS | MUL | LPAR | RPAR %nonterm exp | fact | base %pos int %start exp %eop EOF % exp : fact () | fact PLUS exp() fact : base () | base MUL factor() base : NUM() | LPAR exp RPAR () grammar rules semantic actions (currently do nothing) grammar symbols

9 attribute-grammars ML-Yacc uses an attribute-grammar scheme –each nonterminal may have a semantic value associated with it –when the parser reduces with (X ::= s) a semantic action will be executed uses semantic values from symbols in s –when parsing is completed successfully parser returns semantic value associated with the start symbol usually a parse tree

10 attribute-grammars semantic actions typically build the abstract syntax for the internal language to use semantic values during parsing, we must declare symbol types: –%terminal NUM of int | PLUS | MUL |... –%nonterminal exp of int | fact of int | base of int type of semantic action must match type declared for LHS nonterminal in rule

11 ML-Yacc with Semantic Actions % %term NUM of int | PLUS | MUL | LPAR | RPAR %nonterm exp of int | fact of int | base of int %pos int %start exp %eop EOF % exp : fact (fact) | fact PLUS exp(fact + exp) fact : base (base) | base MUL base(base1 * base2) base : NUM(NUM) | LPAR exp RPAR (exp) grammar rules with semantic actions grammar symbols with type declarations computing integer result via semantic actions

12 ML-Yacc with Semantic Actions datatype exp = Int of int | Add of exp * exp | Mul of exp * exp %... % exp : fact (fact) | fact PLUS exp(Add (fact, exp)) fact : base (base) | base MUL exp(Mul (base, exp)) base : NUM(Int NUM) | LPAR exp RPAR (exp) computing abstract syntax via semantic actions

13 A simpler grammar datatype exp = Int of int | Add of exp * exp | Mul of exp * exp %... % exp : NUM (Int NUM) | exp PLUS exp(Add (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | LPAR exp RPAR (exp) why don’t we just use this simpler grammar?

14 A simpler grammar datatype exp = Int of int | Add of exp * exp | Mul of exp * exp %... % exp : NUM (Int NUM) | exp PLUS exp(Add (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | LPAR exp RPAR (exp) this grammar is ambiguous! NUM + NUM * NUM NUM + * E E E EE * + E E E EE

15 a simpler grammar datatype exp = Int of int | Add of exp * exp | Mul of exp * exp %... % exp : NUM (Int NUM) | exp PLUS exp(Add (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | LPAR exp RPAR (exp) But it is so clean that it would be nice to use. Moreover, we know which parse tree we want. We just need a mechanism to specify it! NUM + NUM * NUM NUM + * E E E EE * + E E E EE

16 Recall how LR parsing works: exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E yet to read NUM * + E E E EE desired parse tree: We have a shift-reduce conflict. What should we do to get the right parse? elements of desired parse parsed so far

17 Recall how LR parsing works: exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E * yet to read NUM * + E E E EE desired parse tree: We have a shift-reduce conflict. What should we do to get the right parse? SHIFT elements of desired parse parsed so far

18 Recall how LR parsing works: exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E * NUM yet to read NUM * + E E E EE desired parse tree: elements of desired parse parsed so far SHIFT

19 Recall how LR parsing works: exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E * E yet to read NUM * + E E E EE desired parse tree: elements of desired parse parsed so far REDUCE

20 Recall how LR parsing works: exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E yet to read NUM * + E E E EE desired parse tree: elements of desired parse parsed so far REDUCE

21 Recall how LR parsing works: exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E yet to read NUM * + E E E EE desired parse tree: elements of desired parse parsed so far REDUCE

22 The alternative parse exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E yet to read We have a shift-reduce conflict. Suppose we REDUCE next elements parsed so far NUM + EE

23 The alternative parse exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E yet to read REDUCE elements parsed so far NUM + EE E

24 The alternative parse exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E * E yet to read Now: SHIFT SHIFT REDUCE elements parsed so far NUM + EE E E *

25 The alternative parse exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E yet to read REDUCE NUM + * E E E EE elements parsed so far

26 Summary exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E yet to read NUM * + E E E EE desired parse tree: We have a shift-reduce conflict. We have E + E on stack, we see *. We want to shift. We ALWAYS want to shift since * has higher precedence than + ==> symbols to the right on the stack get processed first elements of desired parse parsed so far

27 Example 2 exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR NUM - NUM - NUM State of parse so far: Input from lexer: E - E yet to read We have a shift-reduce conflict. We have E - E on stack, we see -. We want “-” to be a left-associative operator. ie: NUM – NUM – NUM == ((NUM – NUM) – NUM) What do we do? NUM - EE elements parsed so far

28 Example 2 exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR NUM - NUM - NUM State of parse so far: Input from lexer: E yet to read We have a shift-reduce conflict. We have E - E on stack, we see -. What do we do? REDUCE NUM - EE elements parsed so far E

29 Example 2 exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR NUM - NUM - NUM State of parse so far: Input from lexer: E - E yet to read SHIFT SHIFT REDUCE NUM - - E E EE elements parsed so far

30 Example 2 exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR NUM - NUM - NUM State of parse so far: Input from lexer: E yet to read REDUCE NUM - - E E E EE elements parsed so far

31 Example 2: Summary exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR NUM - NUM - NUM State of parse so far: Input from lexer: E yet to read NUM - - E E E EE elements parsed so far We have a shift-reduce conflict. We have E - E on stack, we see -. What do we do? REDUCE. We ALWAYS want to reduce since – is left-associative.

32 precedence and associativity three solutions to dealing with operator precedence and associativity: 1) let Yacc complain. its default choice is to shift when it encounters a shift-reduce error BAD: programmer intentions unclear; harder to debug other parts of your grammar; generally inelegant 2) rewrite the grammar to eliminate ambiguity can be complicated and less clear 3) use Yacc precedence directives %left, %right %nonassoc

33 precedence and associativity given directives, ML-Yacc assigns precedence to each terminal and rule –precedence of terminal based on order in which associativity is specified –precedence of rule is the precedence of the right-most terminal eg: precedence of (E ::= E + E) == prec(+) a shift-reduce conflict is resolved as follows –prec(terminal) > prec(rule) ==> shift –prec(terminal) reduce –prec(terminal) = prec(rule) ==> assoc(terminal) = left ==> reduce assoc(terminal) = right ==> shift assoc(terminal) = nonassoc ==> report as error........E % E....................T E yet to read input: terminal T next: RHS of rule on stack:

34 precedence and associativity datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp % %left PLUS MINUS %left MUL DIV % exp : NUM (Int NUM) | exp PLUS exp(Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp(Div (exp1, exp2)) | LPAR exp RPAR (exp)

35 precedence and associativity...E PLUS E....................MUL E yet to read input: terminal T next: RHS of rule on stack: precedence directives: %left PLUS MINUS %left MUL DIV prec(MUL) > prec(PLUS)

36 precedence and associativity... E PLUS E....................MUL E yet to read input: terminal T next: RHS of rule on stack: precedence directives: %left PLUS MINUS %left MUL DIV prec(MUL) > prec(PLUS) SHIFT

37 precedence and associativity...E PLUS E....................SUB E yet to read input: terminal T next: RHS of rule on stack: precedence directives: %left PLUS MINUS %left MUL DIV prec(PLUS) = prec(SUB)

38 precedence and associativity...E PLUS E....................SUB E yet to read input: terminal T next: RHS of rule on stack: precedence directives: %left PLUS MINUS %left MUL DIV prec(PLUS) = prec(SUB) REDUCE

39 one more example datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp | Uminus of exp % %left PLUS MINUS %left MUL DIV % exp : NUM (Int NUM) | MINUS exp(Uminus exp) | exp PLUS exp(Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp(Div (exp1, exp2)) | LPAR exp RPAR (exp)...MINUS E....................MUL E yet to read what happens?

40 one more example datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp | Uminus of exp % %left PLUS MINUS %left MUL DIV % exp : NUM (Int NUM) | MINUS exp(Uminus exp) | exp PLUS exp(Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp(Div (exp1, exp2)) | LPAR exp RPAR (exp)...MINUS E....................MUL E yet to read what happens? prec(*) > prec(-) ==> we SHIFT

41 the fix datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp | Uminus of exp % %left PLUS MINUS %left MUL DIV %left UMINUS % exp : NUM (Int NUM) | MINUS exp%prec UMINUS (Uminus exp) | exp PLUS exp(Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp(Div (exp1, exp2)) | LPAR exp RPAR (exp)...MINUS E....................MUL E yet to read

42 the fix datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp | Uminus of exp % %left PLUS MINUS %left MUL DIV %left UMINUS % exp : NUM (Int NUM) | MINUS exp%prec UMINUS (Uminus exp) | exp PLUS exp(Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp(Div (exp1, exp2)) | LPAR exp RPAR (exp)...E MINUS E....................MUL E yet to read changing precedence of rule alters decision: prec(UMINUS) > prec(MUL) ==> we REDUCE

43 the dangling else problem Grammar: S ::= if E then S else S | if E then S |... Consider: if a then if b then S else S –parse 1: if a then (if b then S else S) –parse 2: if a then (if b then S) else S Parser reports shift-reduce error –in default behavior: shift (what we want)

44 the dangling else problem Grammar: S ::= if E then S else S | if E then S |... Alternative solution is to rewrite grammar: S ::= M | U M ::= if E then M else M |... U ::= if E then S | if E then M else U

45 default behavior of ML-Yacc Shift-Reduce error –shift Reduce-Reduce error –reduce by first rule –generally considered unacceptable for assignment 3, your job is to write a grammar for Fun such that there are no conflicts –you may use precedence directives tastefully

46 Note: To enter ML-Yacc hell, use a parser to catch type errors when doing assignment 3, your job is to catch parse errors there are lots of programming errors that will slip by the parser: –eg: 3 + true –catching these sorts of errors is the job of the type checker –just as catching program structure errors was the job of the parser, not the lexer –attempting to do type checking in the parser is impossible (in general) why? Hint: what does “context-free grammar” imply?


Download ppt "ML-YACC David Walker COS 320. Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc."

Similar presentations


Ads by Google