10/25/20151 Programming Languages and Compilers (CS 421) Grigore Rosu 2110 SC, UIUC Slides by Elsa Gunter, based.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Compiler Designs and Constructions
Compiler Principles Fall Compiler Principles Lecture 4: Parsing part 3 Roman Manevich Ben-Gurion University.
Bottom up Parsing Bottom up parsing trys to transform the input string into the start symbol. Moves through a sequence of sentential forms (sequence of.
Pushdown Automata Consists of –Pushdown stack (can have terminals and nonterminals) –Finite state automaton control Can do one of three actions (based.
Bhaskar Bagchi (11CS10058) Lecture Slides( 9 th Sept. 2013)
Lexical and Syntactic Analysis Here, we look at two of the tasks involved in the compilation process –Given source code, we need to first break it into.
5/24/20151 Programming Languages and Compilers (CS 421) Reza Zamani Based in part on slides by Mattox Beckman,
Cse321, Programming Languages and Compilers 1 6/12/2015 Lecture #10, Feb. 14, 2007 Modified sets of item construction Rules for building LR parse tables.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Lecture #8, Feb. 7, 2007 Shift-reduce parsing,
Bottom Up Parsing.
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
CS 330 Programming Languages 09 / 23 / 2008 Instructor: Michael Eckmann.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
Chapter 2 A Simple Compiler
Bottom-up parsing Goal of parser : build a derivation
Syntax and Semantics Structure of programming languages.
LEX and YACC work as a team
9/18/20151 Programming Languages and Compilers (CS 421) Elsa L Gunter 2112 SC, UIUC Based in part on slides by Mattox.
10/7/20151 Programming Languages and Compilers (CS 421) Elsa L Gunter 2112 SC, UIUC Based in part on slides by Mattox.
LANGUAGE TRANSLATORS: WEEK 17 scom.hud.ac.uk/scomtlm/cis2380/ See Appel’s book chapter 3 for support reading Last Week: Top-down, Table driven parsers.
1 Compiler Construction Syntax Analysis Top-down parsing.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 1 Chapter 4 Chapter 4 Bottom Up Parsing.
Syntax and Semantics Structure of programming languages.
Chapter 5: Bottom-Up Parsing (Shift-Reduce)
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
Programming Language Concepts (CIS 635) Elsa L Gunter 4303 GITC NJIT,
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Parsing and Code Generation Set 24. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program,
10/16/081 Programming Languages and Compilers (CS 421) Elsa L Gunter 2112 SC, UIUC Based in part on slides by Mattox.
2/1/20161 Programming Languages and Compilers (CS 421) Grigore Rosu 2110 SC, UIUC Slides by Elsa Gunter, based in.
GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
3/13/20161 Programming Languages and Compilers (CS 421) Elsa L Gunter 2112 SC, UIUC Based in part on slides by Mattox.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Syntax and Semantics Structure of programming languages.
Programming Languages and Compilers (CS 421)
Programming Languages and Compilers (CS 421)
Parsing Bottom Up CMPS 450 J. Moloney CMPS 450.
Programming Languages Translator
Unit-3 Bottom-Up-Parsing.
Table-driven parsing Parsing performed by a finite state machine.
Chapter 4 Syntax Analysis.
Fall Compiler Principles Lecture 4: Parsing part 3
Programming Languages and Compilers (CS 421)
Regular Grammar - Finite Automaton
Top-Down Parsing CS 671 January 29, 2008.
Programming Languages and Compilers (CS 421)
LR Parsing – The Tables Lecture 11 Wed, Feb 16, 2005.
Bison Marcin Zubrowski.
4d Bottom Up Parsing.
Parsing #2 Leonidas Fegaras.
Lecture (From slides by G. Necula & R. Bodik)
LALR Parsing Adapted from Notes by Profs Aiken and Necula (UCB) and
Programming Languages and Compilers (CS 421)
LR Parsing. Parser Generators.
Parsing #2 Leonidas Fegaras.
Programming Languages and Compilers (CS 421)
Lecture 18 Bottom-Up Parsing or Shift-Reduce Parsing
Programming Languages and Compilers (CS 421)
Chap. 3 BOTTOM-UP PARSING
Programming Languages and Compilers (CS 421)
Presentation transcript:

10/25/20151 Programming Languages and Compilers (CS 421) Grigore Rosu 2110 SC, UIUC Slides by Elsa Gunter, based in part on slides by Mattox Beckman, as updated by Vikram Adve and Gul Agha

Objective Learn the basics of ocamlyacc parsing Learn the basics of its foundations/theory LR parsing 10/25/2015 2

3 Using Ocamlyacc Input attribute grammar is put in file.mly Execute ocamlyacc.mly Produces code for parser in.ml and interface (including type declaration for tokens) in.mli

10/25/ Parser Code.ml defines one parsing function per entry point Parsing function takes a lexing function (lexer buffer to token) and a lexer buffer as arguments Returns semantic attribute of corresponding entry point

10/25/ Ocamlyacc Input File format: %{ %} % %

10/25/ Ocamlyacc Contains arbitrary Ocaml code Typically used to give types and functions needed for the semantic actions of rules and to give specialized error recovery May be omitted similar. Possibly used to call parser

10/25/ Ocamlyacc %token symbol … symbol Declare given symbols as tokens %token symbol … symbol Declare given symbols as token constructors, taking an argument of type %start symbol … symbol Declare given symbols as entry points; functions of same names in.ml

10/25/ Ocamlyacc %type symbol … symbol Specify type of attributes for given symbols. Mandatory for start symbols %left symbol … symbol %right symbol … symbol %nonassoc symbol … symbol Associate precedence and associativity to given symbols. Same line,same precedence; earlier line, lower precedence (broadest scope)

10/25/ Ocamlyacc nonterminal : symbol... symbol { semantic_action } |... | symbol... symbol { semantic_action } ; Semantic actions are arbitrary Ocaml expressions Must be of same type as declared (or inferred) for nonterminal Access semantic attributes (values) of symbols by position: $1 for first symbol, $2 to second …

10/25/ Ocamlyacc Rules without semantic actions form a context free grammar Without associativity and precedence declarations, Ocamlyacc can only accept unambiguous grammars. With such declaration, associativity and precedence must be only source of ambiguity Your main job: Disambiguate input grammar

Example Given grammar: ::= * | / | + | - | | ( ) Disambiguate to: ::= | + | - ::= | * | / ::= | ( ) 10/25/

10/25/ Example - Base types (* File: expr.ml *) type expr = Term_as_Expr of term | Plus_Expr of (term * expr) | Minus_Expr of (term * expr) and term = Factor_as_Term of factor | Mult_Term of (factor * term) | Div_Term of (factor * term) and factor = Id_as_Factor of string | Parenthesized_Expr_as_Factor of expr

10/25/ Example - Lexer (exprlex.mll) { (*open Exprparse*) } let numeric = ['0' - '9'] let letter =['a' - 'z' 'A' - 'Z'] rule token = parse | "+" {Plus_token} | "-" {Minus_token} | "*" {Times_token} | "/" {Divide_token} | "(" {Left_parenthesis} | ")" {Right_parenthesis} | letter (letter|numeric|"_")* as id {Id_token id} | [' ' '\t' '\n'] {token lexbuf} | eof {EOL}

10/25/ Example - Parser (exprparse.mly) %{ open Expr %} %token Id_token %token Left_parenthesis Right_parenthesis %token Times_token Divide_token %token Plus_token Minus_token %token EOL %start main %type main %

10/25/ Example - Parser (exprparse.mly) (* ::= | + | - *) expr: term { Term_as_Expr $1 } | term Plus_token expr { Plus_Expr ($1, $3) } | term Minus_token expr { Minus_Expr ($1, $3) }

10/25/ Example - Parser (exprparse.mly) (* ::= | * | / *) term: factor { Factor_as_Term $1 } | factor Times_token term { Mult_Term ($1, $3) } | factor Divide_token term { Div_Term ($1, $3) }

10/25/ Example - Parser (exprparse.mly) (* ::= | ( ) *) factor: Id_token { Id_as_Factor $1 } | Left_parenthesis expr Right_parenthesis {Parenthesized_Expr_as_Factor $2 } main: | expr EOL { $1 }

10/25/ Example - Using Parser # #use "expr.ml";; … # #use "exprparse.ml";; … # #use "exprlex.ml";; … # let test s = let lexbuf = Lexing.from_string (s^"\n") in main token lexbuf;;

10/25/ Example - Using Parser # test "a + b";; - : expr = Plus_Expr (Factor_as_Term (Id_as_Factor "a"), Term_as_Expr (Factor_as_Term (Id_as_Factor "b")))

10/25/ LR Parsing Read tokens left to right (L) Create a rightmost derivation (R) How is this possible? Start at the bottom (left) and work your way up Last step has only one non-terminal to be replaced so is right-most Working backwards, replace mixed strings by non-terminals Always proceed so that there are no non- terminals to the right of the string to be replaced

10/25/ Example: = 0 | 1 | ( ) | + => = ( ) + 0 shift

10/25/ Example: = 0 | 1 | ( ) | + => = ( ) + 0 shift

10/25/ Example: = 0 | 1 | ( ) | + => => ( ) + 0 reduce = ( ) + 0 shift

10/25/ Example: = 0 | 1 | ( ) | + => = ( + 1 ) + 0 shift => ( ) + 0 reduce = ( ) + 0 shift

10/25/ Example: = 0 | 1 | ( ) | + => = ( + 1 ) + 0 shift => ( ) + 0 reduce = ( ) + 0 shift

10/25/ Example: = 0 | 1 | ( ) | + => => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift => ( ) + 0 reduce = ( ) + 0 shift

10/25/ Example: = 0 | 1 | ( ) | + => => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift => ( ) + 0 reduce = ( ) + 0 shift

10/25/ Example: = 0 | 1 | ( ) | + => = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift => ( ) + 0 reduce = ( ) + 0 shift

10/25/ Example: = 0 | 1 | ( ) | + => => ( ) + 0 reduce = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift => ( ) + 0 reduce = ( ) + 0 shift

10/25/ Example: = 0 | 1 | ( ) | + => = + 0 shift => ( ) + 0 reduce = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift => ( ) + 0 reduce = ( ) + 0 shift

10/25/ Example: = 0 | 1 | ( ) | + => = + 0 shift => ( ) + 0 reduce = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift => ( ) + 0 reduce = ( ) + 0 shift

10/25/ Example: = 0 | 1 | ( ) | + => => + 0 reduce = + 0 shift => ( ) + 0 reduce = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift => ( ) + 0 reduce = ( ) + 0 shift

10/25/ Example: = 0 | 1 | ( ) | + => + reduce => + 0 reduce = + 0 shift => ( ) + 0 reduce = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift => ( ) + 0 reduce = ( ) + 0 shift

10/25/ Example: = 0 | 1 | ( ) | + => + reduce => + 0 reduce = + 0 shift => ( ) + 0 reduce = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift => ( ) + 0 reduce = ( ) + 0 shift

10/25/ Example ( ) + 0

10/25/ Example ( ) + 0

10/25/ Example ( ) + 0

10/25/ Example ( ) + 0

10/25/ Example ( ) + 0

10/25/ Example ( ) + 0

10/25/ Example ( ) + 0

10/25/ Example ( ) + 0

10/25/ Example ( ) + 0

10/25/ Example ( ) + 0

10/25/ Example ( ) + 0

10/25/ Example ( ) + 0

10/25/ Example ( ) + 0

10/25/ Example ( ) + 0

10/25/ Example ( ) + 0

10/25/ LR Parsing Tables Build a pair of tables, Action and Goto, from the grammar This is the hardest part, we omit here Rows labeled by states For Action, columns labeled by terminals and “end-of-tokens” marker (more generally strings of terminals of fixed length) For Goto, columns labeled by non- terminals

10/25/ Action and Goto Tables Given a state and the next input, Action table says either shift and go to state n, or reduce by production k (explained in a bit) accept or error Given a state and a non-terminal, Goto table says go to state m

10/25/ LR(i) Parsing Algorithm Based on push-down automata Uses states and transitions (as recorded in Action and Goto tables) Uses a stack containing states, terminals and non-terminals

10/25/ LR(i) Parsing Algorithm 0. Insure token stream ends in special “end- of-tokens” symbol 1. Start in state 1 with an empty stack 2. Push state(1) onto stack 3. Look at next i tokens from token stream (toks) (don’t remove yet) 4. If top symbol on stack is state(n), look up action in Action table at (n, toks)

10/25/ LR(i) Parsing Algorithm 5. If action = shift m, a) Remove the top token from token stream and push it onto the stack b) Push state(m) onto stack c) Go to step 3

10/25/ LR(i) Parsing Algorithm 6. If action = reduce k where production k is E ::= u a) Remove 2 * length(u) symbols from stack (u and all the interleaved states) b) If new top symbol on stack is state(m), look up new state p in Goto(m,E) c) Push E onto the stack, then push state(p) onto the stack d) Go to step 3

10/25/ LR(i) Parsing Algorithm 7. If action = accept Stop parsing, return success 8. If action = error, Stop parsing, return failure

10/25/ Adding Synthesized Attributes Add to each reduce a rule for calculating the new synthesized attribute from the component attributes Add to each non-terminal pushed onto the stack, the attribute calculated for it When performing a reduce, gather the recorded attributes from each non- terminal popped from stack Compute new attribute for non-terminal pushed onto stack

10/25/ Shift-Reduce Conflicts Problem: can’t decide whether the action for a state and input character should be shift or reduce Caused by ambiguity in grammar Usually caused by lack of associativity or precedence information in grammar

10/25/ Example: = 0 | 1 | ( ) | shift -> reduce -> shift -> reduce -> + + 0

10/25/ Example - cont Problem: shift or reduce? You can shift-shift-reduce-reduce or reduce-shift-shift-reduce Shift first - right associative Reduce first- left associative

10/25/ Reduce - Reduce Conflicts Problem: can’t decide between two different rules to reduce by Again caused by ambiguity in grammar Symptom: RHS of one production suffix of another Requires examining grammar and rewriting it Harder to solve than shift-reduce errors

10/25/ Example S ::= A | aB A ::= abc B ::= bc abc shift abc Problem: reduce by B ::= bc then by S ::= aB, or by A::= abc then S::A?