1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Predictive Parsing l Find derivation for an input string, l Build a abstract syntax tree (AST) –a representation of the parsed program l Build a symbol.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Pertemuan 12, 13, 14 Bottom-Up Parsing
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #7 Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #10 Parsing.
Parsing VI The LR(1) Table Construction. LR(k) items The LR(1) table construction algorithm uses LR(1) items to represent valid configurations of an LR(1)
Parsing V Introduction to LR(1) Parsers. from Cooper & Torczon2 LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
CS 330 Programming Languages 09 / 23 / 2008 Instructor: Michael Eckmann.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
– 1 – CSCE 531 Spring 2006 Lecture 8 Bottom Up Parsing Topics Overview Bottom-Up Parsing Handles Shift-reduce parsing Operator precedence parsing Readings:
Shift/Reduce and LR(1) Professor Yihjia Tsai Tamkang University.
Bottom-up parsing Goal of parser : build a derivation
Lexical and syntax analysis
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Parsing IV Bottom-up Parsing. Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves.
LR(1) Parsers Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.
Syntax and Semantics Structure of programming languages.
Parsing. Goals of Parsing Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse,
Joey Paquet, 2000, 2002, 2012, Lecture 6 Bottom-Up Parsing.
410/510 1 of 21 Week 2 – Lecture 1 Bottom Up (Shift reduce, LR parsing) SLR, LR(0) parsing SLR parsing table Compiler Construction.
1 Compiler Construction Syntax Analysis Top-down parsing.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
Syntax and Semantics Structure of programming languages.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Bottom-up Parsing, Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Parsing V LR(1) Parsers C OMP 412 Rice University Houston, Texas Fall 2001 Copyright 2000, Keith D. Cooper, Ken Kennedy, & Linda Torczon, all rights reserved.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 CIS 461 Compiler Design and Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #8 Parsing Techniques.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #6 Parsing.
Parsing V LR(1) Parsers N.B.: This lecture uses a left-recursive version of the SheepNoise grammar. The book uses a right-recursive version. The derivations.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
Parsing V LR(1) Parsers. LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited right context (1 token) for handle recognition.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Lecture 5: LR Parsing CS 540 George Mason University.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
1 Chapter 6 Bottom-Up Parsing. 2 Bottom-up Parsing A bottom-up parsing corresponds to the construction of a parse tree for an input tokens beginning at.
Conflicts in Simple LR parsers A SLR Parser does not use any lookahead The SLR parsing method fails if knowing the stack’s top state and next input token.
Syntax and Semantics Structure of programming languages.
Parsing — Part II (Top-down parsing, left-recursion removal)
Programming Languages Translator
Bottom-up parsing Goal of parser : build a derivation
Compiler design Bottom-up parsing Concepts
Unit-3 Bottom-Up-Parsing.
UNIT - 3 SYNTAX ANALYSIS - II
Parsing IV Bottom-up Parsing
Table-driven parsing Parsing performed by a finite state machine.
Subject Name:COMPILER DESIGN Subject Code:10CS63
Lexical and Syntax Analysis
Top-Down Parsing CS 671 January 29, 2008.
Lecture 8 Bottom Up Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Bottom Up Parsing.
Parsing IV Bottom-up Parsing
Parsing — Part II (Top-down parsing, left-recursion removal)
Chap. 3 BOTTOM-UP PARSING
Presentation transcript:

1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4

2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree from the start symbol and grow toward leaves (similar to a derivation) Pick a production and try to match the input Bad “pick”  may need to backtrack Some grammars are backtrack-free (predictive parsing) Bottom-up parsers (LR(1), operator precedence) Start at the leaves and grow toward root We can think of the process as reducing the input string to the start symbol At each reduction step a particular substring matching the right-side of a production is replaced by the symbol on the left-side of the production Bottom-up parsers handle a large class of grammars

3 S lookahead input string S A B C ? ? lookahead start symbol fringe of the parse tree A ? lookahead upper fringe of the parse tree left-to-right scan Bottom-up Parsing left-most derivation Top-down Parsing D D C S right-most derivation in reverse

4 Handle-pruning, Bottom-up Parsers The process of discovering a handle & reducing it to the appropriate left-hand side is called handle pruning Handle pruning forms the basis for a bottom-up parsing method To construct a rightmost derivation S   0   1   2  …   n-1   n  w Apply the following simple algorithm for i  n to 1 by -1 Find the handle in  i Replace  i with  i to generate  i-1

5 Example The expression grammar Handles for rightmost derivation of input string: x – 2 * y Sentential FormHandle Prod’n, Pos’n S— Expr1,1 Expr – Term3,3 Expr – Term * Factor5,5 Expr – Term * 9,5 Expr – Factor * 7,3 Expr – * 8,3 Term – * 4,1 Factor – * 7,1 – * 9,1 1 S  Expr 2 Expr  Expr + Term 3 | Expr – Term 4 | Term 5 Term  Term * Factor 6 | Term / Factor 7 | Factor 8 Factor  num 9 | id

6 Handle-pruning, Bottom-up Parsers One implementation technique is the shift-reduce parser push $ lookahead = get_ next_token( ) repeat until (top of stack == start symbol and lookahead == $) if the top of the stack is a handle  then /* reduce  to  */ pop |  | symbols off the stack push  onto the stack else if (lookahead  $) then /* shift */ push lookahead lookahead = get_next_token( ) How do errors show up? failure to find a handle hitting $ and needing to shift (final else clause) Either generates an error

7 Example, Corresponding Parse Tree S TermFact.Expr – Fact. Term * 1. Shift until top-of-stack is the right end of a handle 2. Pop the left end of the handle & reduce 5 shifts + 9 reduces + 1 accept

8 Shift-reduce Parsing Shift reduce parsers are easily built and easily understood A shift-reduce parser has just four actions Shift — next word is shifted onto the stack Reduce — right end of handle is at top of stack Locate left end of handle within the stack Pop handle off stack & push appropriate lhs Accept — stop parsing & report success Error — call an error reporting/recovery routine Accept & Error are simple Shift is just a push and a call to the scanner Reduce takes |rhs| pops & 1 push If handle-finding requires state, put it in the stack Handle finding is key handle is on stack finite set of handles  use a DFA !

9 LR Parsers LR(k) parsers are table-driven, bottom-up, shift-reduce parsers that use a limited right context (k-token lookahead) for handle recognition LR(k): Left-to-right scan of the input, Rightmost derivation in reverse with k token lookahead A grammar is LR(k) if, given a rightmost derivation S   0   1   2  …   n-1   n  sentence We can 1. isolate the handle of each right-sentential form  i, and 2. determine the production by which to reduce, by scanning  i from left-to-right, going at most k symbols beyond the right end of the handle of  i

10 LR Parsers A table-driven LR parser looks like Scanner Table-driven Parser A CTION & G OTO Tables Parser Generator source code grammar IR Stack

11 LR Shift-Reduce Parsers push($); // $ is the end-of-file symbol push(s 0 ); // s 0 is the start state of the DFA that recognizes handles lookahead = get_next_token(); repeat forever s = top_of_stack(); if ( ACTION[s,lookahead] == reduce  ) then pop 2*|  | symbols; s = top_of_stack(); push(  ); push(GOTO[s,  ]); else if ( ACTION[s,lookahead] == shift s i ) then push(lookahead); push(s i ); lookahead = get_next_token(); else if ( ACTION[s,lookahead] == accept and lookahead == $ ) then return success; else error(); The skeleton parser uses A CTION & G OTO does |words| shifts does |derivation| reductions does 1 accept

12 To make a parser for L(G), we need a set of tables The grammar The tables LR Parsers (parse tables) 1 S  Z 2 Z  Z z 3 | z ACTION State$z 0— shift 2 1accept shift 3 2reduce 3 reduce 3 3reduce 2 reduce 2 GOTO StateZ

13 Example Parses The string “z” The string “zz” Stack InputAction $ s 0 z $shift 2 $ s 0 z s 2 $reduce 3 $ s 0 Z s 1 $accept Stack InputAction $ s 0 z z $shift 2 $ s 0 z s 2 z $reduce 3 $ s 0 Z s 1 z $shift 3 $ s 0 Z s 1 z s 3 $reduce 2 $ s 0 Zs 1 $accept

14 LR Parsers How does this LR stuff work? Unambiguous grammar  unique rightmost derivation Keep upper fringe on a stack –All active handles include TOS –Shift inputs until TOS is right end of a handle Language of handles is regular –Build a handle-recognizing DFA –A CTION & G OTO tables encode the DFA To match subterms, recurse and leave DFA ’s state on stack Final states of the DFA correspond to reduce actions –New state is G OTO[lhs, state at TOS] –For Z, this takes the DFA to S 1 S0S0 S3S3 S2S2 S1S1 z z Z Control DFA for the simple example Reduce action

15 Building LR Parsers How do we generate the A CTION and G OTO tables? Use the grammar to build a model of the handle recognizing DFA Use the DFA model to build A CTION & G OTO tables If construction succeeds, the grammar is LR How do we build the handle-recognizing DFA ? Encode the set of productions that can be used as handles in the DFA state: Use LR(k) items Use two functions goto( s,  ) and closure( s ) –goto() is analogous to move() in the DFA to NFA conversion –closure() is analogous to  -closure Build up the states and transition functions of the DFA Use this information to fill in the A CTION and G OTO tables

16 LR(k) items An LR(k) item is a pair [A, B], where A is a production  with a at some position in the rhs B is a lookahead string of length ≤ k (terminal symbols or $) Examples: [  , a], [  , a], [  , a], & [ , a] The in an item indicates the position of the top of the stack LR(0) items [   ] (no lookahead symbol) LR(1) items [  , a ] (one token lookahead) LR(2) items [  , a b ] (two token lookahead)...

17 LR(k) items The in an item indicates the position of the top of the stack [  , a] means that the input seen so far is consistent with the use of  immediately after the symbol on top of the stack [  , a] means that the input seen so far is consistent with the use of  at this point in the parse, and that the parser has already recognized . [ , a] means that the parser has seen , and that a lookahead a is consistent with reducing to  (for LR(k) parsers a is a string of terminal symbols of length k) The table construction algorithm uses items to represent valid configurations of an LR(1) parser

18 LR(1) Items The production   , with lookahead a, generates 4 items [   , a], [  , a], [    , a], & [   , a] The set of LR(1) items for a grammar is finite What’s the point of all these lookahead symbols? Carry them along to choose correct reduction Lookaheads are bookkeeping, unless item has at right end –Has no direct use in [    , a] –In [   , a], a lookahead of a implies a reduction by    –For { [   , a],[    , b] } lookahead = a  reduce to  ; lookahead  F IRST (  )  shift  Limited right context is enough to pick the actions

19 Back to Finding Handles Parser in a state where the stack (the fringe) was Expr – Term With lookahead of * How did it choose to expand Term rather than reduce to Expr? Lookahead symbol is the key With lookahead of + or –, parser should reduce to Expr With lookahead of * or /, parser should shift Parser uses lookahead to decide All this context from the grammar is encoded in the handle- recognizing mechanism

20 Back to x - 2 * y 1. Shift until TOS is the right end of a handle 2. Find the left end of the handle & reduce shift here reduce here

21 High-level overview  Build the handle-recognizing DFA (aka Canonical Collection of sets of LR(1) items), C = { I 0, I 1,..., I n } aIntroduce a new start symbol S’ which has only one production S’  S bInitial state, I 0 should include [S’  S, $], along with any equivalent items Derive equivalent items as closure( I 0 ) cRepeatedly compute, for each I k, and each grammar symbol , goto(I k,  ) If the set is not already in the collection, add it Record all the transitions created by goto( ) This eventually reaches a fixed point 2 Fill in the ACTION and GOTO tables using the DFA The canonical collection completely encodes the transition diagram for the handle-finding DFA LR(1) Table Construction

22 Computing Closures closure(I) adds all the items implied by items already in I Any item [   , a] implies [    , x] for each production with  on the lhs, and x  F IRST (  a) Since  is valid, any way to derive  is valid, too The algorithm Closure( I ) while ( I is still changing ) for each item [    , a]  I for each production     P for each terminal b  FIRST(  a) if [   , b]  I then add [   , b] to I Fixpoint computation

23 Example Grammar Initial step builds the item [S  A,$] and takes its closure( ) Closure( [S  A, $] ) So, initial state s 0 is { [S  Z,$], [Z  Z z, $],[Z  z, $], [Z  Z z, z], [Z  z, z] } 1 S  Z 2 Z  Z z 3 | z ItemFrom [S  Z, $]Original item [Z  Z z, $]1,  a is $ [Z  z, $]1,  a is $ [Z  Z z, z]2,  a is z $ [Z  z, z]2,  a is z $

24 Computing Gotos goto(I, x) computes the state that the parser would reach if it recognized an x while in state I goto( { [     , a] },  ) produces [     , a] It also includes closure( [     , a] ) to fill out the state The algorithm Goto( I, x ) new = Ø for each [    x , a]  I new = new  [    x , a] return closure(new) Not a fixpoint method Uses closure

25 Example Grammar s 0 is { [S  Z,$], [Z  Z z, $],[Z  z, $], [Z  Z z, z], [Z  z, z] } goto( S 0, z ) Loop produces Closure adds nothing since is at end of rhs in each item In the construction, this produces s 2 { [Z  z, {$, z}]} New, but obvious, notation for two distinct items [Z  z, $ ] and [Z  z, z] ItemFrom [Z  z, $]Item 3 in s 0 [Z  z, z]Item 5 in s 0