CSE P501 – Compilers LR Parsing Build Bottom-Up Parse Tree Handles

CSE P501 – Compilers LR Parsing Build Bottom-Up Parse Tree Handles
Writing a Shift-Reduce Parser ACTION & GOTO Parse Tables Dotted Items SR & RR conflicts Next Spring 2014 Jim Hogg - UW - CSE - P501

LR Parsing Source Front End ‘Middle End’ Back End Target Scan Optimize
chars IR IR Scan Optimize Select Instructions tokens IR Allocate Registers Parse AST IR Emit Convert IR IR Machine Code AST = Abstract Syntax Tree IR = Intermediate Representation Spring 2014 Jim Hogg - UW - CSE P501

Build Bottom-Up Parse Tree
 a b b c d e S  aABe A  Abc | b B  d Black dot  marks how much we've read of the input token stream so far (none) Shift dot to right Spring 2014 Jim Hogg - UW - CSE - P501

abbcde – done wrong – step 2
S  aABe A  Abc | b B  d Can we reduce a ? No. Shift dot to right Spring 2014 Jim Hogg - UW - CSE - P501

S  aABe A  Abc | b B  d Can we reduce a or ab? Yes, using A  b. Note: a b (in red) marks current frontier Spring 2014 Jim Hogg - UW - CSE - P501

| a b  b c d e S  aABe A  Abc | b B  d Note: frontier is now aA Can we reduce aA or A ? No. Shift dot Spring 2014 Jim Hogg - UW - CSE - P501

| a b b  c d e S  aABe A  Abc | b B  d Can we reduce aAb or Ab or b? Yes, using A  b Spring 2014 Jim Hogg - UW - CSE - P501

A A | | a b b  c d e S  aABe A  Abc | b B  d Can we reduce aAA or AA or A? No. Shift dot Spring 2014 Jim Hogg - UW - CSE - P501

A A | | a b b c  d e S  aABe A  Abc | b B  d Can we reduce aAAc or AAc or Ac or c? No. Shift dot Spring 2014 Jim Hogg - UW - CSE - P501

A A | | a b b c d  e S  aABe A  Abc | b B  d Can we reduce aAAcd or AAcd or Acd or dd or d? No. Shift dot Spring 2014 Jim Hogg - UW - CSE - P501

A A | | a b b c d e  S  a A B e A  A b c | b B  d Can we reduce aAAcde or AAcde or Acde or cde or de or e? No. We are stuck! Spring 2014 Jim Hogg - UW - CSE - P501

Rewind Rewind to step 5 Let's not reduce using A  b. Shift instead
Try again! Spring 2014 Jim Hogg - UW - CSE - P501

abbcde – done right – step 5
| a b b  c d e S  aABe A  Abc | b B  d Can we reduce aAb or Ab or b? Yes, we could, using A  b. But it led to a dead-end, first time. So do not reduce. Instead, shift Spring 2014 Jim Hogg - UW - CSE - P501

| a b b c  d e S  aABe A  Abc | b B  d Can we reduce aAbc or Abc or bc or c? Yes, using A  Abc. Spring 2014 Jim Hogg - UW - CSE - P501

| a b b c  d e S  aABe A  Abc | b B  d Can we reduce aA or A? No. Shift Spring 2014 Jim Hogg - UW - CSE - P501

| a b b c d  e S  aABe A  Abc | b B  d Can we reduce aAd or Ad or d? Yes, using B  d Spring 2014 Jim Hogg - UW - CSE - P501

| A B | | a b b c d  e S  aABe A  Abc | b B  d Can we reduce aAB or AB? No. Shift Spring 2014 Jim Hogg - UW - CSE - P501

| A B | | a b b c d e  S  aABe A  Abc | b B  d Can we reduce aABe or ABe or Be or e ? Yes, using S  aABe Spring 2014 Jim Hogg - UW - CSE - P501

| A B | | a b b c d e  S  aABe A  Abc | b B  d We just executed a rightmost derivation, backwards: S => a A B e => a A d e => a A b c d e => a b b c d e Spring 2014 Jim Hogg - UW - CSE - P501

LR(1) Parsing Left to right scan; Rightmost derivation; 1 token lookahead "Bottom-Up" approach. Also called Shift-Reduce parser The syntax of almost all practical programming languages can be specified by an LR(1) grammar LALR(1) and SLR are subsets of LR(1) LALR(1) can parse most real languages, has a smaller memory footprint, and is used by CUP All variants (SLR, LALR, LR) use same algorithm – but different driver tables Spring 2014 Jim Hogg - UW - CSE - P501

LR Parsing in Greek Bottom-up parser builds a rightmost derivation, backwards Given the rightmost derivation: S =>1=>2=>…=>n-2=>n-1=>n = w parser will first discover n-1=>n , then n-2=>n-1 , etc But it discovers n-1=>n before seeing all of n : S => a A B e => a A d e => a A b c d e => a b b c d e X denotes rightmost terminal to derive S <= a A B e  <= a A d  e <= a A b c  d e <= a b  b c d e  denotes handle  denotes top-of-stack = end of input so far seen Parsing terminates when 1 reduced to S (start symbol, success), or No match can be found (syntax error) Let’s take a step back and generalize what this little example has taught us about LR parsing: Spring 2014 Jim Hogg - UW - CSE - P501

Terminology : Sentential Forms
If S =>* , the string  is called a sentential form of the of the grammar (not yet a sentence, but on its way) In the derivation S =>1=>2=>…=>n-2=>n-1=>n = w each of the i are sentential forms A sentential form in a rightmost derivation is called a right sentential form Before diving in, here is some jargon we should know! Spring 2014 Jim Hogg - UW - CSE - P501

Handles Informally, a handle is a substring of the frontier that matches the RHS of the "correct " production Even if A   is a production,  is a handle only if it matches the frontier at a point where A   was used in the derivation. So, it’s a handle if we should reduce by it (yes, this definition is circular)  may appear in many other places in the frontier without being a handle for that particular production Spring 2014 Jim Hogg - UW - CSE - P501

Handle – the Dragon Definition
Formally, a handle of a right-sentential form  is a production A   and a position in  where  may be replaced by A to produce the previous right-sentential form in the rightmost derivation of  Spring 2014 Jim Hogg - UW - CSE - P501

Handle Examples In the derivation:
S => a A B e => a A d e => a A b c d e => abbcde abbcde is a right sentential form whose handle is Ab at position 2 aAbcde is a right sentential form whose handle is AAbc at position 4 Note: some books take the left of the match as the position – but it really doesn't matter Spring 2014 Jim Hogg - UW - CSE - P501

Writing a Shift-Reduce Parser
Key Data structures A stack holding the frontier of the tree The token-stream of remaining input Spring 2014 Jim Hogg - UW - CSE - P501

Shift-Reduce Parser Operations
Reduce – if the top of stack is a handle (RHS of some A that we should use to reduce), pop , push A Shift – push the next input symbol onto the stack Accept – announce success Error – syntax error discovered Spring 2014 Jim Hogg - UW - CSE - P501

Shift-Reduce Example – Step 0
S  aABe A  Abc | b B  d Stack Input Action $ abbcde$ shift Note: “$” marks bottom-of-stack “$” also marks end-of-input Neither one takes part in the parse. They don’t move. Spring 2014 Jim Hogg - UW - CSE - P501

Shift-Reduce Example – Step 1
S  aABe A  Abc | b B  d Stack Input Action $ abbcde$ shift $a bbcde$ shift At each step, look for a handle at top-of-stack If we find a handle, then reduce If we don’t find a handle, then shift We are relying on clairvoyance - foretelling the future - to decide which RHSs at top-of-stack are handles Spring 2014 Jim Hogg - UW - CSE - P501

Shift-Reduce Example S  aAB e A  Abc | b B  d Stack Input Action
0 $ abbcde$ shift 1 $a bbcde$ shift 2 $ab bcde$ reduce 3 $aA bcde$ shift 4 $aAb cde$ shift 5 $aAbc de$ reduce 6 $aA de$ shift 7 $aAd e$ reduce 8 $aAB e$ shift 9 $aABe $ reduce 10 $S $ accept Spring 2014 Jim Hogg - UW - CSE - P501

How Do We Automate This? Lacking a clairvoyance function, we could resort to back-tracking. But it's too slow. It's a non-starter Viable prefix – a prefix of a right-sentential form that can appear on the stack of the shift-reduce parser (on its way to a successful parse) Idea: Construct a DFA to recognize viable prefixes given the stack and (one or two tokens from) remaining input Perform reductions when we recognize handles Au02: def. from the dragon book Spring 2014 Jim Hogg - UW - CSE - P501

DFA for prefixes for: S'  S$ S  aABe A  Abc | b B  d e accept 8 9
start A b c 1 2 3 6 7 A  A b c b d 4 5 A  b B  d We have augmented the grammar with a unique start symbol, S’ This DFA replaces our clairvoyance function – equally magical at this point! States 4,5,7,9 of this DFA define the handles Eg: if stack is …aAbc then reduce using A  Abc (always at top of stack); then unwind, back to state 1 Spring 2014 Jim Hogg - UW - CSE - P501

Trace – Step 1 S'  S$ S  aABe A  Abc | b B  d Stack Input
2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Stack Input $ abbcde$ Au02: trace through this, beginning at the start state each time Stack = { } DFA = state 1; not a “reduce” state, {4,5,7,9}, so shift Spring 2014 Jim Hogg - UW - CSE - P501

1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Stack Input $a bbcde$ Au02: trace through this, beginning at the start state each time Stack = a DFA = 2; not a “reduce” state, so shift Spring 2014 Jim Hogg - UW - CSE - P501

1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Stack Input $ab bcde$ Au02: trace through this, beginning at the start state each time Stack = ab DFA = 4 => reduce, using production A  b So, pop b (RHS of production); and push A (LHS or production) Retrace to DFA state 1 Spring 2014 Jim Hogg - UW - CSE - P501

1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Stack Input $aA bcde$ Au02: trace through this, beginning at the start state each time Stack = aA So transition states 1  2  3 DFA = 3 => shift Spring 2014 Jim Hogg - UW - CSE - P501

1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Stack Input $aAb cde$ Au02: trace through this, beginning at the start state each time Stack = aAb DFA = 6 => shift Spring 2014 Jim Hogg - UW - CSE - P501

1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Stack Input $aAbc de$ Au02: trace through this, beginning at the start state each time Stack = aAbc DFA = 7 => reduce by A  Abc So, pop Abc push A Retreat to state 1 Spring 2014 Jim Hogg - UW - CSE - P501

Trace – Step 7 S'  S$ S  aABe A  Abc | b B  d Stack Input $aA de$
1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Stack Input $aA de$ Au02: trace through this, beginning at the start state each time Stack = aA DFA = 3 => shift Spring 2014 Jim Hogg - UW - CSE - P501

Trace – Step 8 S'  S$ S  aABe A  Abc | b B  d Stack Input $aAd e$
1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Stack Input $aAd e$ Au02: trace through this, beginning at the start state each time Stack = aAd DFA = 5 => reduce by B  d So, pop d, push B Retreat to DFA = 1 Spring 2014 Jim Hogg - UW - CSE - P501

Trace – Step 9 S'  S$ S  aABe A  Abc | b B  d Stack Input $aAB e$
1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Stack Input $aAB e$ Au02: trace through this, beginning at the start state each time Stack = aAB So transition states 1  2  3  8 DFA = 8 => shift Spring 2014 Jim Hogg - UW - CSE - P501

Trace – Step 10 S'  S$ S  aABe A  Abc | b B  d Stack Input $aABe $
2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Stack Input $aABe $ Au02: trace through this, beginning at the start state each time Stack = aABe DFA = 9 => reduce by S  aABe So pop aABe, push S Retreat to DFA = 1 Spring 2014 Jim Hogg - UW - CSE - P501

Trace – Step 11 S'  S$ S  aABe A  Abc | b B  d Stack Input $S $
2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Stack Input $S $ Au02: trace through this, beginning at the start state each time Stack = S DFA = 1 => shift Spring 2014 Jim Hogg - UW - CSE - P501

Trace – Step 12 S'  S$ S  aABe A  Abc | b B  d Stack Input $S$
3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Stack Input $S$ Au02: trace through this, beginning at the start state each time Stack = S$ => accept Spring 2014 Jim Hogg - UW - CSE - P501

Cast out the Magic We started with a magical clairvoyance function
We replaced this with, an equally magical, DFA The DFA approach included too much repetition: retreat to DFA = 1, then rescan the stack to find the new DFA state we only replaced the handle with its NonTerminal LHS, so first part of stack is unchanged Want the parser to run in linear time – proportional to total number of tokens How do avoid repetition? How to construct the magic DFA, for any grammar? Spring 2014 Jim Hogg - UW - CSE - P501

Avoiding DFA Rescanning
Observe: after a reduce, the contents of the stack are little altered: we replaced handle at top-of-stack with its LHS non-terminal So, re-scanning the stack will step thru same DFA transitions as before, until the last one So, record trace of DFA state numbers on stack to avoid the rescan Spring 2014 Jim Hogg - UW - CSE - P501

LR Stack DFA pictures are nice, but we want a program to do it
Could change the stack to hold <state, token> pairs. Perhaps easier to understand and/or debug? $ <s0,X0> <s1,X1> <sn,Xn> But, all we need are the states (think about it!) on a reduce, pop top states - reduce rule tells us how many then push corresponding LHS, non-terminal Spring 2014 Jim Hogg - UW - CSE - P501

Reminder - DFA for: S  aABe A  Abc A  b B  d e accept 8 9
$ a start A b c 1 2 3 6 7 A  A b c b d 4 5 Next time: fix start state info (S’::=S$) A  b B  d => shift => reduce Spring 2014 Jim Hogg - UW - CSE - P501

ACTION & GOTO Parse Tables
S  aABe A  Abc A  b B  d State ACTION GOTO a b c d e $ A B S 1 s2 acc g1 2 s4 g3 3 s6 s5 g8 4 r3 5 r4 6 s7 7 r2 8 s9 9 r1 1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Key sN = shift; transition to state number N rR = reduce using rule R gN = goto state number N acc = accept blank = syntax error in program

DFA Transition Tables: Summary
ACTION => what to do after a shift Row = current state Column = next token (terminal) sN = shift; move to DFA state number N rR = reduce, using Rule number R acc = accept the input program blank = syntax error in input program report, recover, continue for P501, just report and stop! GOTO => what to do after a reduce Row = current state (top-of-stack, after pushing non-terminal) – think of this as the uncovered state Column = LHS of reduction (non-terminal) gN = goto DFA state number N blank = bug in the GOTO table State ACTION GOTO a b c d e $ A B S 1 s2 acc g1 2 s4 g3 3 s6 s5 g8 4 r3 5 r4 6 s7 7 r2 8 s9 9 r1 Spring 2014 Jim Hogg - UW - CSE - P501

LR Parsing Algorithm terminal = getToken() while (true)
s = top-of-stack // current DFA state number if (ACTION[s, terminal] = si ) // shift; and transition to state i push i // new state elseif (ACTION[s, terminal] = rj ) // reduce, using rule number j pop (length of RHS of Rj) times // || uncovered = top-of-stack push LHS of Rj // A, in A   push GOTO[uncovered, A] elseif (ACTION[s, terminal] == accept) return else report syntax error; recover endif endwhile Spring 2014 Jim Hogg - UW - CSE - P501

Parse Trace – Step 0 S  aABe A  Abc A  b B  d Stack Input
$ abbcde$ S action goto a b c d e $ A B 1 s2 ac g1 2 s4 g3 3 s6 s5 g8 4 r3 5 r4 6 s7 7 r2 8 s9 9 r1 1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Spring 2014 Jim Hogg - UW - CSE - P501

$ bbcde$ S action goto a b c d e $ A B 1 s2 ac g1 2 s4 g3 3 s6 s5 g8 4 r3 5 r4 6 s7 7 r2 8 s9 9 r1 1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Spring 2014 Jim Hogg - UW - CSE - P501

$ bcde$ S action goto a b c d e $ A B 1 s2 ac g1 2 s4 g3 3 s6 s5 g8 4 r3 5 r4 6 s7 7 r2 8 s9 9 r1 1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Spring 2014 Jim Hogg - UW - CSE - P501

$ cde$ S action goto a b c d e $ A B 1 s2 ac g1 2 s4 g3 3 s6 s5 g8 4 r3 5 r4 6 s7 7 r2 8 s9 9 r1 1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Spring 2014 Jim Hogg - UW - CSE - P501

$ de$ S action goto a b c d e $ A B 1 s2 ac g1 2 s4 g3 3 s6 s5 g8 4 r3 5 r4 6 s7 7 r2 8 s9 9 r1 1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Spring 2014 Jim Hogg - UW - CSE - P501

Parse Trace – Step 7 S  aABe A  Abc A  b B  d Stack Input $1 2 de$
action goto a b c d e $ A B 1 s2 ac g1 2 s4 g3 3 s6 s5 g8 4 r3 5 r4 6 s7 7 r2 8 s9 9 r1 1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Spring 2014 Jim Hogg - UW - CSE - P501

$ de$ S action goto a b c d e $ A B 1 s2 ac g1 2 s4 g3 3 s6 s5 g8 4 r3 5 r4 6 s7 7 r2 8 s9 9 r1 1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Spring 2014 Jim Hogg - UW - CSE - P501

$ $ S action goto a b c d e $ A B 1 s2 ac g1 2 s4 g3 3 s6 s5 g8 4 r3 5 r4 6 s7 7 r2 8 s9 9 r1 1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Spring 2014 Jim Hogg - UW - CSE - P501

Parse Trace – Step 13 S  aABe A  Abc A  b B  d Stack Input $1 $ S
$ $ S action goto a b c d e $ A B 1 s2 ac g1 2 s4 g3 3 s6 s5 g8 4 r3 5 r4 6 s7 7 r2 8 s9 9 r1 1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Spring 2014 Jim Hogg - UW - CSE - P501

How to build ACTION & GOTO
Idea is that each state encodes The set of all possible productions that we could be looking at, given the current state of the parse, and Where we are in the right hand side of each of those productions Spring 2014 Jim Hogg - UW - CSE - P501

Dotted Items An item is a production with a dot in the right hand side
Example: Items for production A XY A  XY A  XY A  XY Idea: The dot represents a position in the production Spring 2014 Jim Hogg - UW - CSE - P501

Items for: S  aABe A  Abc A  b B  d accept B  d A  b 8 e
9 accept 2 3 $ 1 S  aABe A  Abc A  b S  aABe A  Abc B  d 6 7 a A b c S  aABe A  Abc A  Abc d b B  d 5 A  b 4 1 2 3 6 7 4 5 8 9 start a A  b B  d b d A c A  Abc B e S  aABe accept $ Spring 2014 Jim Hogg - UW - CSE - P501

SR & RR Conflicts Grammars can encounter two problems when constructing an LR parser Shift-reduce conflict Reduce-reduce conflict Spring 2014 Jim Hogg - UW - CSE - P501

Shift-Reduce Conflicts
Situation: both a shift and a reduce are possible at a given point in the parse Equivalently: entry in ACTION table holds both ri and sj Classic example: if-else statement S  ifthen S | ifthen S else S we elide the (exp) – common to both Spring 2014 Jim Hogg - UW - CSE - P501

Parser States for State has SR conflict Can shift else into state 4
1. S  ifthen S 2. S  ifthen S else S S   ifthen S S   ifthen S else S 1 ifthen State has SR conflict Can shift else into state 4 Can reduce S  ifthen S 3 S  ifthen  S S  ifthen  S else S 2 S S  ifthen S  S  ifthen S  else S 3 else 4 S  ifthen S else  S Spring 2014 Jim Hogg - UW - CSE - P501

De-Conflicting Shift-Reduce Conflicts
Fix the grammar Done in Java reference grammar Use a parse tool with a “longest match” rule – i.e., if there is a conflict, choose to shift instead of reduce Does exactly what we want for if-else case Guideline: a few shift-reduce conflicts are fine, but be sure they do what you want Spring 2014 Jim Hogg - UW - CSE - P501

Reduce-Reduce Conflicts
Situation: two different reductions are possible in a given state Contrived example S  A S  B A  x B  x Spring 2014 Jim Hogg - UW - CSE - P501

Parser States for State has a RR conflict (r3, r4) 1. S  A 2. S  B
3. A  x 4. B  x S   A S   B A   x B   x 1 x A  x  B  x  State has a RR conflict (r3, r4) 2 2 Spring 2014 Jim Hogg - UW - CSE - P501

De-conflicting Reduce-Reduce Conflicts
Normally indicates serious problem with the grammar Fixes Use a different kind of parser generator that takes lookahead information into account when constructing the states (LR(1) instead of SLR(1) for example) Most practical tools use this information Fix the grammar Spring 2014 Jim Hogg - UW - CSE - P501

Another Reduce-Reduce Conflict
Suppose the grammar separates arithmetic and boolean expressions exp  aexp | bexp aexp  aexp * aident | aident bexp  bexp && bident | bident aident  id bident  id This will create an RR conflict Spring 2014 Jim Hogg - UW - CSE - P501

Covering Grammars A solution is to merge aident and bident into a single non-terminal (or use id in place of aident and bident everywhere they appear) This expanded grammar accepts a larger language than we want This is a covering grammar Includes some programs that are not generated by the original grammar Use the type checker or other static semantic analysis to weed out illegal programs later Spring 2014 Jim Hogg - UW - CSE - P501

Next Constructing LR tables LL parsers and recursive descent
We’ll present a simple version - SLR(0). Then talk about extending it to LR(1) LL parsers and recursive descent Cooper&Torczon chapter 4 Spring 2014 Jim Hogg - UW - CSE - P501

CSE P501 – Compilers LR Parsing Build Bottom-Up Parse Tree Handles

Similar presentations

Presentation on theme: "CSE P501 – Compilers LR Parsing Build Bottom-Up Parse Tree Handles"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE P501 – Compilers LR Parsing Build Bottom-Up Parse Tree Handles

Similar presentations

Presentation on theme: "CSE P501 – Compilers LR Parsing Build Bottom-Up Parse Tree Handles"— Presentation transcript:

Similar presentations

About project

Feedback