ML-YACC David Walker COS 320. Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc.

ML-YACC David Walker COS 320

Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc –Reading: Chapter 3 of Appel

Parser Implementation Implementation Options: 1.Write a Parser from scratch –not as boring as writing a lexer, but not exactly a weekend in the Bahamas 2.Use a Parser Generator –Very general & robust. sometimes not quite as efficient as hand-written parsers. Nevertheless, good for lazy compiler writers. Parser Specification

Parser Implementation Implementation Options: 1.Write a Parser from scratch –not as boring as writing a lexer, but not exactly a weekend in the Bahamas 2.Use a Parser Generator –Very general & robust. sometimes not quite as efficient as hand-written parsers. Nevertheless, good for lazy compiler writers. Parser Specification parser generator Parser

Parser Implementation Implementation Options: 1.Write a Parser from scratch –not as boring as writing a lexer, but not exactly a weekend in the Bahamas 2.Use a Parser Generator –Very general & robust. sometimes not quite as efficient as hand-written parsers. Nevertheless, good for lazy compiler writers. Parser Specification parser generator Parser abstract syntax stream of tokens

ML-Yacc specification three parts: User Declarations: declare values available in the rule actions % ML-Yacc Definitions: declare terminals and non-terminals; special declarations to resolve conflicts % Rules: parser specified by CFG rules and associated semantic action that generate abstract syntax

attribute-grammars ML-Yacc uses an attribute-grammar scheme –each nonterminal may have a semantic value associated with it –when the parser reduces with (X ::= s) a semantic action will be executed uses semantic values from symbols in s –when parsing is completed successfully parser returns semantic value associated with the start symbol usually a parse tree

attribute-grammars semantic actions typically build the abstract syntax for the internal language to use semantic values during parsing, we must declare symbol types: –%terminal NUM of int | PLUS | MUL |... –%nonterminal exp of int | fact of int | base of int type of semantic action must match type declared for LHS nonterminal in rule

ML-Yacc with Semantic Actions datatype exp = Int of int | Add of exp * exp | Mul of exp * exp %... % exp : fact (fact) | fact PLUS exp(Add (fact, exp)) fact : base (base) | base MUL exp(Mul (base, exp)) base : NUM(Int NUM) | LPAR exp RPAR (exp) computing abstract syntax via semantic actions

A simpler grammar datatype exp = Int of int | Add of exp * exp | Mul of exp * exp %... % exp : NUM (Int NUM) | exp PLUS exp(Add (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | LPAR exp RPAR (exp) this grammar is ambiguous! NUM + NUM * NUM NUM + * E E E EE * + E E E EE

a simpler grammar datatype exp = Int of int | Add of exp * exp | Mul of exp * exp %... % exp : NUM (Int NUM) | exp PLUS exp(Add (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | LPAR exp RPAR (exp) But it is so clean that it would be nice to use. Moreover, we know which parse tree we want. We just need a mechanism to specify it! NUM + NUM * NUM NUM + * E E E EE * + E E E EE

Recall how LR parsing works: exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E yet to read NUM * + E E E EE desired parse tree: We have a shift-reduce conflict. What should we do to get the right parse? elements of desired parse parsed so far

Recall how LR parsing works: exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E * yet to read NUM * + E E E EE desired parse tree: We have a shift-reduce conflict. What should we do to get the right parse? SHIFT elements of desired parse parsed so far

Recall how LR parsing works: exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E * NUM yet to read NUM * + E E E EE desired parse tree: elements of desired parse parsed so far SHIFT

Recall how LR parsing works: exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E * E yet to read NUM * + E E E EE desired parse tree: elements of desired parse parsed so far REDUCE

Recall how LR parsing works: exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E yet to read NUM * + E E E EE desired parse tree: elements of desired parse parsed so far REDUCE

Recall how LR parsing works: exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E yet to read NUM * + E E E EE desired parse tree: elements of desired parse parsed so far REDUCE

The alternative parse exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E yet to read We have a shift-reduce conflict. Suppose we REDUCE next elements parsed so far NUM + EE

The alternative parse exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E yet to read REDUCE elements parsed so far NUM + EE E

The alternative parse exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E * E yet to read Now: SHIFT SHIFT REDUCE elements parsed so far NUM + EE E E *

The alternative parse exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E yet to read REDUCE NUM + * E E E EE elements parsed so far

Summary exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E yet to read NUM * + E E E EE desired parse tree: We have a shift-reduce conflict. We have E + E on stack, we see *. We want to shift. We ALWAYS want to shift since * has higher precedence than + ==> symbols to the right on the stack get processed first elements of desired parse parsed so far

Example 2 exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR NUM - NUM - NUM State of parse so far: Input from lexer: E - E yet to read We have a shift-reduce conflict. We have E - E on stack, we see -. We want “-” to be a left-associative operator. ie: NUM – NUM – NUM == ((NUM – NUM) – NUM) What do we do? NUM - EE elements parsed so far

Example 2 exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR NUM - NUM - NUM State of parse so far: Input from lexer: E yet to read We have a shift-reduce conflict. We have E - E on stack, we see -. What do we do? REDUCE NUM - EE elements parsed so far E

Example 2 exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR NUM - NUM - NUM State of parse so far: Input from lexer: E - E yet to read SHIFT SHIFT REDUCE NUM - - E E EE elements parsed so far

Example 2 exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR NUM - NUM - NUM State of parse so far: Input from lexer: E yet to read REDUCE NUM - - E E E EE elements parsed so far

Example 2: Summary exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR NUM - NUM - NUM State of parse so far: Input from lexer: E yet to read NUM - - E E E EE elements parsed so far We have a shift-reduce conflict. We have E - E on stack, we see -. What do we do? REDUCE. We ALWAYS want to reduce since – is left-associative.

precedence and associativity three solutions to dealing with operator precedence and associativity: 1) let Yacc complain. its default choice is to shift when it encounters a shift-reduce error BAD: programmer intentions unclear; harder to debug other parts of your grammar; generally inelegant 2) rewrite the grammar to eliminate ambiguity can be complicated and less clear 3) use Yacc precedence directives %left, %right %nonassoc

precedence and associativity given directives, ML-Yacc assigns precedence to each terminal and rule –precedence of terminal based on order in which associativity is specified –precedence of rule is the precedence of the right-most terminal eg: precedence of (E ::= E + E) == prec(+) a shift-reduce conflict is resolved as follows –prec(terminal) > prec(rule) ==> shift –prec(terminal) reduce –prec(terminal) = prec(rule) ==> assoc(terminal) = left ==> reduce assoc(terminal) = right ==> shift assoc(terminal) = nonassoc ==> report as error........E % E....................T E yet to read input: terminal T next: RHS of rule on stack:

precedence and associativity...E PLUS E....................MUL E yet to read input: terminal T next: RHS of rule on stack: precedence directives: %left PLUS MINUS %left MUL DIV prec(MUL) > prec(PLUS)

precedence and associativity... E PLUS E....................MUL E yet to read input: terminal T next: RHS of rule on stack: precedence directives: %left PLUS MINUS %left MUL DIV prec(MUL) > prec(PLUS) SHIFT

precedence and associativity...E PLUS E....................SUB E yet to read input: terminal T next: RHS of rule on stack: precedence directives: %left PLUS MINUS %left MUL DIV prec(PLUS) = prec(SUB)

precedence and associativity...E PLUS E....................SUB E yet to read input: terminal T next: RHS of rule on stack: precedence directives: %left PLUS MINUS %left MUL DIV prec(PLUS) = prec(SUB) REDUCE

the dangling else problem Grammar: S ::= if E then S else S | if E then S |... Consider: if a then if b then S else S –parse 1: if a then (if b then S else S) –parse 2: if a then (if b then S) else S Parser reports shift-reduce error –in default behavior: shift (what we want)

default behavior of ML-Yacc Shift-Reduce error –shift Reduce-Reduce error –reduce by first rule –generally considered unacceptable for assignment 3, your job is to write a grammar for Fun such that there are no conflicts –you may use precedence directives tastefully

Note: To enter ML-Yacc hell, use a parser to catch type errors when doing assignment 3, your job is to catch parse errors there are lots of programming errors that will slip by the parser: –eg: 3 + true –catching these sorts of errors is the job of the type checker –just as catching program structure errors was the job of the parser, not the lexer –attempting to do type checking in the parser is impossible (in general) why? Hint: what does “context-free grammar” imply?

ML-YACC David Walker COS 320. Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc.

Similar presentations

Presentation on theme: "ML-YACC David Walker COS 320. Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ML-YACC David Walker COS 320. Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc.

Similar presentations

Presentation on theme: "ML-YACC David Walker COS 320. Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc."— Presentation transcript:

Similar presentations

About project

Feedback