ML-YACC David Walker COS 320. Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc.

Slides:



Advertisements
Similar presentations
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Advertisements

Abstract Syntax Mooly Sagiv html:// 1.
CSE 5317/4305 L4: Parsing #21 Parsing #2 Leonidas Fegaras.
Mooly Sagiv and Roman Manevich School of Computer Science
Yacc YACC BNF grammar example.y Other modules example.tab.c Executable
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
Cse321, Programming Languages and Compilers 1 6/12/2015 Lecture #10, Feb. 14, 2007 Modified sets of item construction Rules for building LR parse tables.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
1 Bottom Up Parsing. 2 Bottom-Up Parsing l Bottom-up parsing is more general than top-down parsing »And just as efficient »Builds on ideas in top-down.
Abstract Syntax Trees Compiler Baojian Hua
Bottom-Up Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
Context-Free Grammars Lecture 7
Prof. Bodik CS 164 Lecture 81 Grammars and ambiguity CS164 3:30-5:00 TT 10 Evans.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Lecture #8, Feb. 7, 2007 Shift-reduce parsing,
Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
CS 536 Spring Introduction to Bottom-Up Parsing Lecture 11.
Bottom Up Parsing.
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
CS 536 Spring Ambiguity Lecture 8. CS 536 Spring Announcement Reading Assignment –“Context-Free Grammars” (Sections 4.1, 4.2) Programming.
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
LR(1) Languages An Introduction Professor Yihjia Tsai Tamkang University.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Parser construction tools: YACC
Syntax Analysis – Part II Quick Look at Using Bison Top-Down Parsers EECS 483 – Lecture 5 University of Michigan Wednesday, September 20, 2006.
Compilers: Yacc/7 1 Compiler Structures Objective – –describe yacc (actually bison) – –give simple examples of its use , Semester 1,
LEX and YACC work as a team
LR Parsing Compiler Baojian Hua
Top-Down Parsing - recursive descent - predictive parsing
Automated Parser Generation (via CUP)CUP 1. High-level structure JFlexjavac Lexer spec Lexical analyzer text tokens.java CUPjavac Parser spec.javaParser.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Using the LALR Parser Generator yacc By J. H. Wang May 10, 2011.
1 YACC Parser Generator. 2 YACC YACC (Yet Another Compiler Compiler) Produce a parser for a given grammar.  Compile a LALR(1) grammar Original written.
Lab 3: Using ML-Yacc Zhong Zhuang
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
LANGUAGE TRANSLATORS: WEEK 17 scom.hud.ac.uk/scomtlm/cis2380/ See Appel’s book chapter 3 for support reading Last Week: Top-down, Table driven parsers.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Bernd Fischer RW713: Compiler and Software Language Engineering.
CPS 506 Comparative Programming Languages Syntax Specification.
–Writing a parser with YACC (Yet Another Compiler Compiler). Automatically generate a parser for a context free grammar (LALR parser) –Allows syntax direct.
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
1 Using Yacc. 2 Introduction Grammar –CFG –Recursive Rules Shift/Reduce Parsing –See Figure 3-2. –LALR(1) –What Yacc Cannot Parse It cannot deal with.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
YACC. Introduction What is YACC ? a tool for automatically generating a parser given a grammar written in a yacc specification (.y file) YACC (Yet Another.
Compiler Principles Fall Compiler Principles Lecture 6: Parsing part 5 Roman Manevich Ben-Gurion University.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Introduction to YACC CS 540 George Mason University.
Compiler Principles Fall Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University.
Top-Down Parsing.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
YACC Primer CS 671 January 29, CS 671 – Spring Yacc Yet Another Compiler Compiler Automatically constructs an LALR(1) parsing table from.
YACC (Yet Another Compiler-Compiler) Chung-Ju Wu
Parser Generation Tools (Yacc and Bison) CS 471 September 24, 2007.
Introduction to Parsing
Announcements/Reading
CS510 Compiler Lecture 4.
Compiler Baojian Hua LR Parsing Compiler Baojian Hua
Introduction to Parsing (adapted from CS 164 at Berkeley)
Chapter 4 Syntax Analysis.
Bison Marcin Zubrowski.
Parsing #2 Leonidas Fegaras.
Parsing #2 Leonidas Fegaras.
Presentation transcript:

ML-YACC David Walker COS 320

Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc –Reading: Chapter 3 of Appel

Parser Implementation Implementation Options: 1.Write a Parser from scratch –not as boring as writing a lexer, but not exactly a weekend in the Bahamas 2.Use a Parser Generator –Very general & robust. sometimes not quite as efficient as hand-written parsers. Nevertheless, good for lazy compiler writers. Parser Specification

Parser Implementation Implementation Options: 1.Write a Parser from scratch –not as boring as writing a lexer, but not exactly a weekend in the Bahamas 2.Use a Parser Generator –Very general & robust. sometimes not quite as efficient as hand-written parsers. Nevertheless, good for lazy compiler writers. Parser Specification parser generator Parser

Parser Implementation Implementation Options: 1.Write a Parser from scratch –not as boring as writing a lexer, but not exactly a weekend in the Bahamas 2.Use a Parser Generator –Very general & robust. sometimes not quite as efficient as hand-written parsers. Nevertheless, good for lazy compiler writers. Parser Specification parser generator Parser abstract syntax stream of tokens

ML-Yacc specification three parts: User Declarations: declare values available in the rule actions % ML-Yacc Definitions: declare terminals and non-terminals; special declarations to resolve conflicts % Rules: parser specified by CFG rules and associated semantic action that generate abstract syntax

ML-Yacc declarations (preliminaries) specify type of positions %pos int * int specify terminal and nonterminal symbols %term IF | THEN | ELSE | PLUS | MINUS... %nonterm prog | exp | op specify end-of-parse token %eop EOF specify start symbol (by default, non terminal in LHS of first rule) %start prog

Simple ML-Yacc Example % %term NUM | PLUS | MUL | LPAR | RPAR %nonterm exp | fact | base %pos int %start exp %eop EOF % exp : fact () | fact PLUS exp() fact : base () | base MUL factor() base : NUM() | LPAR exp RPAR () grammar rules semantic actions (currently do nothing) grammar symbols

attribute-grammars ML-Yacc uses an attribute-grammar scheme –each nonterminal may have a semantic value associated with it –when the parser reduces with (X ::= s) a semantic action will be executed uses semantic values from symbols in s –when parsing is completed successfully parser returns semantic value associated with the start symbol usually a parse tree

attribute-grammars semantic actions typically build the abstract syntax for the internal language to use semantic values during parsing, we must declare symbol types: –%terminal NUM of int | PLUS | MUL |... –%nonterminal exp of int | fact of int | base of int type of semantic action must match type declared for LHS nonterminal in rule

ML-Yacc with Semantic Actions % %term NUM of int | PLUS | MUL | LPAR | RPAR %nonterm exp of int | fact of int | base of int %pos int %start exp %eop EOF % exp : fact (fact) | fact PLUS exp(fact + exp) fact : base (base) | base MUL base(base1 * base2) base : NUM(NUM) | LPAR exp RPAR (exp) grammar rules with semantic actions grammar symbols with type declarations computing integer result via semantic actions

ML-Yacc with Semantic Actions datatype exp = Int of int | Add of exp * exp | Mul of exp * exp %... % exp : fact (fact) | fact PLUS exp(Add (fact, exp)) fact : base (base) | base MUL exp(Mul (base, exp)) base : NUM(Int NUM) | LPAR exp RPAR (exp) computing abstract syntax via semantic actions

A simpler grammar datatype exp = Int of int | Add of exp * exp | Mul of exp * exp %... % exp : NUM (Int NUM) | exp PLUS exp(Add (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | LPAR exp RPAR (exp) why don’t we just use this simpler grammar?

A simpler grammar datatype exp = Int of int | Add of exp * exp | Mul of exp * exp %... % exp : NUM (Int NUM) | exp PLUS exp(Add (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | LPAR exp RPAR (exp) this grammar is ambiguous! NUM + NUM * NUM NUM + * E E E EE * + E E E EE

a simpler grammar datatype exp = Int of int | Add of exp * exp | Mul of exp * exp %... % exp : NUM (Int NUM) | exp PLUS exp(Add (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | LPAR exp RPAR (exp) But it is so clean that it would be nice to use. Moreover, we know which parse tree we want. We just need a mechanism to specify it! NUM + NUM * NUM NUM + * E E E EE * + E E E EE

Recall how LR parsing works: exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E yet to read NUM * + E E E EE desired parse tree: We have a shift-reduce conflict. What should we do to get the right parse? elements of desired parse parsed so far

Recall how LR parsing works: exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E * yet to read NUM * + E E E EE desired parse tree: We have a shift-reduce conflict. What should we do to get the right parse? SHIFT elements of desired parse parsed so far

Recall how LR parsing works: exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E * NUM yet to read NUM * + E E E EE desired parse tree: elements of desired parse parsed so far SHIFT

Recall how LR parsing works: exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E * E yet to read NUM * + E E E EE desired parse tree: elements of desired parse parsed so far REDUCE

Recall how LR parsing works: exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E yet to read NUM * + E E E EE desired parse tree: elements of desired parse parsed so far REDUCE

Recall how LR parsing works: exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E yet to read NUM * + E E E EE desired parse tree: elements of desired parse parsed so far REDUCE

The alternative parse exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E yet to read We have a shift-reduce conflict. Suppose we REDUCE next elements parsed so far NUM + EE

The alternative parse exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E yet to read REDUCE elements parsed so far NUM + EE E

The alternative parse exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E * E yet to read Now: SHIFT SHIFT REDUCE elements parsed so far NUM + EE E E *

The alternative parse exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E yet to read REDUCE NUM + * E E E EE elements parsed so far

Summary exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR NUM + NUM * NUM State of parse so far: Input from lexer: E + E yet to read NUM * + E E E EE desired parse tree: We have a shift-reduce conflict. We have E + E on stack, we see *. We want to shift. We ALWAYS want to shift since * has higher precedence than + ==> symbols to the right on the stack get processed first elements of desired parse parsed so far

Example 2 exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR NUM - NUM - NUM State of parse so far: Input from lexer: E - E yet to read We have a shift-reduce conflict. We have E - E on stack, we see -. We want “-” to be a left-associative operator. ie: NUM – NUM – NUM == ((NUM – NUM) – NUM) What do we do? NUM - EE elements parsed so far

Example 2 exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR NUM - NUM - NUM State of parse so far: Input from lexer: E yet to read We have a shift-reduce conflict. We have E - E on stack, we see -. What do we do? REDUCE NUM - EE elements parsed so far E

Example 2 exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR NUM - NUM - NUM State of parse so far: Input from lexer: E - E yet to read SHIFT SHIFT REDUCE NUM - - E E EE elements parsed so far

Example 2 exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR NUM - NUM - NUM State of parse so far: Input from lexer: E yet to read REDUCE NUM - - E E E EE elements parsed so far

Example 2: Summary exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR NUM - NUM - NUM State of parse so far: Input from lexer: E yet to read NUM - - E E E EE elements parsed so far We have a shift-reduce conflict. We have E - E on stack, we see -. What do we do? REDUCE. We ALWAYS want to reduce since – is left-associative.

precedence and associativity three solutions to dealing with operator precedence and associativity: 1) let Yacc complain. its default choice is to shift when it encounters a shift-reduce error BAD: programmer intentions unclear; harder to debug other parts of your grammar; generally inelegant 2) rewrite the grammar to eliminate ambiguity can be complicated and less clear 3) use Yacc precedence directives %left, %right %nonassoc

precedence and associativity given directives, ML-Yacc assigns precedence to each terminal and rule –precedence of terminal based on order in which associativity is specified –precedence of rule is the precedence of the right-most terminal eg: precedence of (E ::= E + E) == prec(+) a shift-reduce conflict is resolved as follows –prec(terminal) > prec(rule) ==> shift –prec(terminal) reduce –prec(terminal) = prec(rule) ==> assoc(terminal) = left ==> reduce assoc(terminal) = right ==> shift assoc(terminal) = nonassoc ==> report as error E % E T E yet to read input: terminal T next: RHS of rule on stack:

precedence and associativity datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp % %left PLUS MINUS %left MUL DIV % exp : NUM (Int NUM) | exp PLUS exp(Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp(Div (exp1, exp2)) | LPAR exp RPAR (exp)

precedence and associativity...E PLUS E MUL E yet to read input: terminal T next: RHS of rule on stack: precedence directives: %left PLUS MINUS %left MUL DIV prec(MUL) > prec(PLUS)

precedence and associativity... E PLUS E MUL E yet to read input: terminal T next: RHS of rule on stack: precedence directives: %left PLUS MINUS %left MUL DIV prec(MUL) > prec(PLUS) SHIFT

precedence and associativity...E PLUS E SUB E yet to read input: terminal T next: RHS of rule on stack: precedence directives: %left PLUS MINUS %left MUL DIV prec(PLUS) = prec(SUB)

precedence and associativity...E PLUS E SUB E yet to read input: terminal T next: RHS of rule on stack: precedence directives: %left PLUS MINUS %left MUL DIV prec(PLUS) = prec(SUB) REDUCE

one more example datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp | Uminus of exp % %left PLUS MINUS %left MUL DIV % exp : NUM (Int NUM) | MINUS exp(Uminus exp) | exp PLUS exp(Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp(Div (exp1, exp2)) | LPAR exp RPAR (exp)...MINUS E MUL E yet to read what happens?

one more example datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp | Uminus of exp % %left PLUS MINUS %left MUL DIV % exp : NUM (Int NUM) | MINUS exp(Uminus exp) | exp PLUS exp(Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp(Div (exp1, exp2)) | LPAR exp RPAR (exp)...MINUS E MUL E yet to read what happens? prec(*) > prec(-) ==> we SHIFT

the fix datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp | Uminus of exp % %left PLUS MINUS %left MUL DIV %left UMINUS % exp : NUM (Int NUM) | MINUS exp%prec UMINUS (Uminus exp) | exp PLUS exp(Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp(Div (exp1, exp2)) | LPAR exp RPAR (exp)...MINUS E MUL E yet to read

the fix datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp | Uminus of exp % %left PLUS MINUS %left MUL DIV %left UMINUS % exp : NUM (Int NUM) | MINUS exp%prec UMINUS (Uminus exp) | exp PLUS exp(Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp(Div (exp1, exp2)) | LPAR exp RPAR (exp)...E MINUS E MUL E yet to read changing precedence of rule alters decision: prec(UMINUS) > prec(MUL) ==> we REDUCE

the dangling else problem Grammar: S ::= if E then S else S | if E then S |... Consider: if a then if b then S else S –parse 1: if a then (if b then S else S) –parse 2: if a then (if b then S) else S Parser reports shift-reduce error –in default behavior: shift (what we want)

the dangling else problem Grammar: S ::= if E then S else S | if E then S |... Alternative solution is to rewrite grammar: S ::= M | U M ::= if E then M else M |... U ::= if E then S | if E then M else U

default behavior of ML-Yacc Shift-Reduce error –shift Reduce-Reduce error –reduce by first rule –generally considered unacceptable for assignment 3, your job is to write a grammar for Fun such that there are no conflicts –you may use precedence directives tastefully

Note: To enter ML-Yacc hell, use a parser to catch type errors when doing assignment 3, your job is to catch parse errors there are lots of programming errors that will slip by the parser: –eg: 3 + true –catching these sorts of errors is the job of the type checker –just as catching program structure errors was the job of the parser, not the lexer –attempting to do type checking in the parser is impossible (in general) why? Hint: what does “context-free grammar” imply?