LR Parsing Techniques Bottom-Up Parsing

LR Parsing Techniques Bottom-Up Parsing
2017/4/21 LR Parsing Techniques Bottom-Up Parsing - LR: a special form of BU Parser LR Parsing as Handle Pruning Shift-Reduce Parser (LR Implementation) LR(k) Parsing Model - k lookaheads to determine next action Parsing Table Construction: SLR, LR, LALR Compilers: LR Parsing

Bottom-Up Parsing A bottom-up parser attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working up towards the root (the top).

Bottom-Up Parsing: Ex1 BU Parsing: Construct a parse tree from the leaves to the root: left-to-right reduction G: S  a A B e input: abbcde A  A b c | b B  d S B A c a d e b A c a d e b B A c a d e b c a d e b A c a d e b

Bottom-Up Parsing: Ex2 BU Parsing: Construct a parse tree from the leaves to the root: random reduction G: S  a A B e input: abbcde A  A b c | b B  d S B A c a d e b B B A c a d e b c a d e b A A c a d e b a b b c d e

LR Parsing: BU + Left-to-Right
Many ways to construct a parse tree bottom-up Ex1 & Ex2 Prefer a simpler form of parser… Left-to-right scanning If scanning strictly Left-to-right  Rightmost derivation in reverse (thus the name LR Parser). Why rm.? (…Ex1) Never consider right terminals while reducing left (S|N)* Reduce left (S|N)* (terminals or non-terminals) as much as possible until no further reduce Shift when no further reduce  Reversing the sequence of reduction corresponds to a rightmost derivation LR Parser A special form of BU Parser A parser with simpler form: left-to-right scan

LR Parsing: BU + Left-to-Right
LR Parsing: Construct a parse tree from the leaves to the root, scanning left-to-right (resulting in rightmost derivation in reverse) S  a A B e input: abbcde A  A b c | b B  d S B A c a d e b c a d e b A A c a d e b B A c a d e b c a d e b A     c a d e b abbcde rm aAbcde  rm aAde  rm aABe  rm S

Example of Bottom-Up Parsing
Let G = S  aABe A  Abc|b B  d The sentence abbcde can be reduced to S according to the following steps: abbcde aAbcde aAde aABe S The above reductions trace out the following right-most derivation in reverse:

Rightmost Derivation in Reverse

LR Parsing The L stands for scanning the input from left to right
The R stands for constructing a rightmost derivation in reverse

LR Parsing LR Parsing =/= Leftmost Reduction Top-Down:
The 1st reducible substring does not always result in successful parse Handle(s): those successfully lead to S Top-Down: Expansion Matching Bottom-Up: Shift/Reduce Locating next “handle” to reduce [How To??] Handle pruning: hide details below reduced (S|N)* A c a d e b c a d e b A

Handles NOT all (leftmost) reduction (A  ) leads to the start symbol S:     rm  A  rm  (n)  rm S Only some handles do A handle  of a right-sentential form  consists of a production A   a position of  where  can be replaced by A to produce the previous right-sentential form in a rightmost derivation of  Right-sent. forms: abbcde rm aAbcde  rm aAde  rm aABe  rm S A  b A  A b c B  d S  a A B e Handles:

Handles If , then Ab in the position following a is a handle of abw. (The string w contains only terminal symbols.) We say “a handle” rather than “the handle” since the grammar may be ambiguous. But if the grammar is unambiguous, then every right sentential form has exactly one handle.

Handles Informally, a handle of a string is substring that matches the RHS of a production, and whose reduction to the nonterminal on the LHS of the production represents one step along the reverse of a rightmost derivation. E.g., A b (after “ab…”) in previous example is not a handle. Formally, a handle of a right-sentential form  (canonical sentential form) is a production A  b and a position of  where the string b may be found and replaced by A to produce the previous right-sentential form in a rightmost derivation of . If , then Ab in the position following a is a handle of abw. The string w contains only terminal symbols. We say “a handle” rather than “the handle” since the grammar may be ambiguous. But if the grammar is unambiguous, then every right sentential form has exactly one handle.

Example of Handles Let G = S  aABe A  Abc|b B  d The sentence abbcde can be reduced to S according to the following steps: abbcde abbcde is a right-sentential form whose handle is A b at position 2. aAbcde aAbcde is a right-sentential form whose handle is A Abc at position 2. aAde aAde is a right-sentential form whose handle is B d at position 3. aABe aABe is a right-sentential form whose handle is S aABe at position 1. S The above reductions trace out the following right-most derivation in reverse:

LR Parsing as Handle Pruning
    rm  A  rm S S A    Pruning: Find a string b that is reducible to S and hide its details by reduction and proceed with the new sentential form. Never consider right terminals while reducing left grammar symbols The string  to the right of the handle contains only terminals (A is the rightmost non-terminal) A is the leftmost complete interior node with all its children in the tree

An Example S B A c a d e b S B A a e S

LR Parsing as Handle Pruning (1st reduction sequence)
A rightmost derivation in reverse can be obtained by handle pruning. Let G = E E+E | E*E | (E) | id (ambiguous!) rm

LR Parsing as Handle Pruning (2nd reduction sequence)
A rightmost derivation in reverse can be obtained by handle pruning. Let G = E E+E | E*E | (E) | id (ambiguous!) rm

Bottom-Up Shift-Reduce Parsing
A bottom-up parser can be implemented as a shift-reduce parser. Input tokens are shifted onto the stack until the top of the stack contains a handle of the sentential form. The handle is reduced by replacing it on the parse stack with the nonterminal that is its parent in the parse tree. A handle is a sequence of symbols that match some RHS of a production and may be correctly replaced with LHS (whose reduction leads to the start symbol). It can be proved that the handles will always appear on the top of (and never appear within) the stack

Shift-Reduce Parsing w     rm  A  rm S Input shift b Handle
Ab (reduce) Parsing program Output a Parsing table Stack

Stack Implementation of Shift-Reduce Parsers
A convenient way to implement a shift-reduce parse is to use a stack to hold grammar symbols and an input buffer to hold the string w to be parsed. a push-down machine with a tape The parser operates by shifting zero or more symbols onto the stack until a handle b is on top of the stack. The parser then replaces/reduces b with/to the left side of the appropriate production. This procedure repeats until the stack contains the start symbol and the input is empty.

Stack Operations Shift: shift the next input symbol onto the top of the stack Reduce: replace the handle at the top of the stack with the corresponding nonterminal Accept: announce successful completion of the parsing Error: call an error recovery routine

Shift-Reduce Parsing Handle pruning with a stack 4 actions:
Shift: the next input symbol onto the stack. Reduce (assume that a handle b is at the top of the stack): pop the handle symbols from the stack and push the left-part of the production rule A  b. halt and accept. halt and declare “error”.

An Example Action Stack Input S $ a b b c d e $ S $ a b b c d e $
R $ a b b c d e $ S $ a A b c d e $ S $ a A b c d e $ R $ a A b c d e $ S $ a A d e $ R $ a A d e $ S $ a A B e $ R $ a A B e $ A $ S $

Configurations of shift-reduce parser on input id1+id2*id3
*Note: The grammar is ambiguous. Therefore, there is another possible reduction sequence.

LR Parsing States S B A c a d e b $ 10 0 1 2 4 5 7 9 3 6 8
G: S  a A B e A  A b c | b B  d Input: abbcde LR Parsing States How to represent parsing states so we can tell the right parsing actions to take? S B A c a d e b $ 10 3

LR Parsing States S0: . S  . a A B e S B A c a d e b $ 10
G: S  a A B e A  A b c | b B  d Input: abbcde LR Parsing States S0: . S  . a A B e S B A c a d e b $ 10 3

G: S  a A B e A  A b c | b B  d Input: abbcde LR Parsing States S1: S  a . A B e (shift a) A  . A b c A  . b S B A c a d e b $ 10 3

LR Parsing States S2: A  b . (shift b, to reduce A  b) S B A c a d e
G: S  a A B e A  A b c | b B  d Input: abbcde LR Parsing States S2: A  b . (shift b, to reduce A  b) S B A c a d e b $ 10 3

G: S  a A B e A  A b c | b B  d Input: abbcde LR Parsing States S3: S  a A . B e B  . d A  A . b c S B A c a d e b $ 10 3

LR Parsing States S4: A  A b . c (shift b) S B A c a d e b $ 10
G: S  a A B e A  A b c | b B  d Input: abbcde LR Parsing States S4: A  A b . c (shift b) S B A c a d e b $ 10 3

G: S  a A B e A  A b c | b B  d Input: abbcde LR Parsing States S5: A  A b c . (shift c, reduce A  A b c ) S B A c a d e b $ 10 3

LR Parsing States S6: S  a A . B e B  . d S B A c a d e b $ 10
G: S  a A B e A  A b c | b B  d Input: abbcde LR Parsing States S6: S  a A . B e B  . d S B A c a d e b $ 10 3

G: S  a A B e A  A b c | b B  d Input: abbcde LR Parsing States S7: B  d . (shift d, reduce B  d) S B A c a d e b $ 10 3

G: S  a A B e A  A b c | b B  d Input: abbcde LR Parsing States S8: S  a A B . e S B A c a d e b $ 10 3

G: S  a A B e A  A b c | b B  d Input: abbcde LR Parsing States S9: S  a A B e . (shift e, reduce S  a A B e ) S B A c a d e b $ 10 3

G: S  a A B e A  A b c | b B  d Input: abbcde LR Parsing States S10: S’  S . (S reduced) S B A c a d e b $ 10 3

LR Parsing States c a d e b c a d e b A A c a d e b S B A c a d e b B
G: S  a A B e A  A b c | b B  d Input: abbcde LR Parsing States S0: . S  . a A B e S1: S  a . A B e (shift a) A  . A b c | . b A  . A b c | . b (closed, no further expansion) S2: A  b . (shift b, reduce A  b) S3: S  a A . B e B  . d A  A . b c S4: A  A b . c (shift b) S5: A  A b c . (shift c, reduce A  A b c ) S6: S  a A . B e S7: B  d . (shift d, reduce B  d) S8: S  a A B . e S9: S  a A B e . (shift e, reduce S  a A B e ) c a d e b c a d e b A A c a d e b S B A c a d e b B A c a d e b S10: S’  S (S reduced)

Ambiguity: Sources of Conflicts
2017/4/21 Ambiguity: Sources of Conflicts When trying to reduce a sub-string of the current sentential form: Not all reducible substrings are handles Ambiguous: More than one substring as a handle Sources of Conflictsnon-LR Grammar Shift-reduce conflicts Reduce-reduce conflicts @2010/6/3: +”Ambiguity:” Compilers: LR Parsing

Shift/Reduce Conflict
stmt  if expr then stmt | if expr then stmt else stmt | other Stack Input $ if expr then stmt * else stmt $ Shift  if expr then stmt else stmt Reduce  if expr then stmt

Reduce/Reduce Conflict
(1) stmt  id ( para_list ) // func(a,b) (2) stmt  expr := expr (3) para_list  para_list , para (4) para_list  para (5) para  id (6) expr  id ( expr_list ) // array(a,b) (7) expr  id (8) expr_list  expr_list , expr (9) expr_list  expr Need a complex lexical analyzer to identify id vs. procid - Reduction depends on stack[sp-2] Stack Input (a) $ id ( id , id ) $ [Q: r5? r7?] [Sol: use “stmt  procid ( para_list )” => (a) r7 (b) r5] (b) $- - - procid ( id , id ) $ [r5]

LR(k) Grammars Only some classes of grammars, known as the “LR(k) Grammars,” can be parsed deterministically by a shift-reduce parser CFG’s that are non-LR may need some adaptation to make them deterministically parsed with a shift-reduce parser Parsing Table Construction Predict handles at each positions (after shifts)

LR(k) Parsing The L stands for scanning the input from left to right
The R stands for constructing a rightmost derivation in reverse The k stands for the number of lookahead input symbols used to make parsing decisions

LR Parsing The LR parsing algorithm Constructing SLR(1) parsing tables
Constructing LR(1) parsing tables Constructing LALR(1) parsing tables

Model of an LR Parser Input State after action Stack LR
Parsing Program Sm Output Xm handle Sm-1 Shift/Reduce State after Reduction Xm-1 State before action Action Goto Parsing table S0 Initial State

Parsing Table for Expression Grammar
State Action Goto id * ( ) $ E T F s s s acc r2 s r2 r2 r4 r r4 r4 s s r6 r r6 r6 s s s s s s11 r1 s r1 r1 r3 r r3 r3 r5 r r5 r5 (0) E’  E (1) E  E + T (2) E  T (3) T  T * F (4) T  F (5) F  ( E ) (6) F  id Follow(E)={+,),$} Follow(T)={+,),$,*} Follow(F)={+,),$,*}

GOTO Actions After reduction Before reduction id I0: E’  . E
E  . E + T E  . T T  . T * F T  . F F  . ( E ) F  . id E I1: E’  E . E  E . + T I2: E  T . T  T . * F I3: T  F . I4: F  ( . E ) E  . E + T E  . T T  . T * F T  . F F  . ( E ) F  . id I5: F  id . T F After reduction 0 E 1 ( 0 T 2 Before reduction 0 F 3 0 id 5 id

LR Parsing Algorithm Input: Output: Method:
An input string w and an LR parsing table with functions action and goto for a grammar G. Output: If w is in L(G), a bottom-up parse for w; otherwise, an error indication. Method: Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in the input buffer. Shift/reduce according to the parsing table (See next Page)

LR Parsing Program while (1) do { s := the state of top of the stack;
a := get input token; if (action[s,a] == shift s’) { push a then s’ on top of the stack; a = get input token; } else if (action[s,a] == reduce A->b) { pop 2*|b| symbols off the stack; s’ = the state now on top of the stack; push A then goto[s’,A] on top of the stack; output the production A->b; else if (action[s,a] == accept) return; else error();

LR Parsing on id1*id2+id3 Stack Input shift/reduce+goto Action (1)
id * id + id $ (0,id):s5 Shift (2) 0 id 5 * id + id $ (5,*):r6; (0,F):3 Reduce by F  id (3) 0 F 3 (3,*):r4; (0,T):2 Reduce by T  F (4) 0 T 2 (2,*):s7 (5) 0 T 2 * 7 id + id $ (7,id):s5 (6) 0 T 2 * 7 id 5 + id $ (5,+):r6; (7,F):10 (7) 0 T 2 * 7 F 10 (10,+):r3; (0,T):2 Reduce by T  T*F (8) (2,+):r2; (0,E):1 Reduce by E  T (9) 0 E 1 (1,+):s6 (10) 0 E 1 + 6 id $ (6,id):s5 (11) 0 E id 5 $ (5,$):r6; (6,F):3 (12) 0 E F 3 (3,$):r4; (6,T):9 (13) 0 E T 9 (9,$):r1; (0,E):1 Reduce by E  E+T (14) (1,$):acc Accept

LR Parsing Advantages Efficient: non-backtracking Coverage:
Efficient Parsing Efficient Error detection (& correction) Detect syntax error as soon as one appear during L-o-R scan Coverage: virtually all programming languages G(LR) > G(TD predictive parsing) Disadvantages: Too much work to construct by hands (YACC)

How To: LR Parsing (repeated)
LR Parsing =/= Leftmost Reduction The 1st reducible substring does not always result in successful parse Handle(s): those successfully lead to S Top-Down: Expansion Matching Bottom-Up: Shift/Reduce Locating next “handle” to reduce [How To??] Handle pruning: hide details below reduced (S|N)*

LR Parsing Table Construction Techniques
SLR(1) Parser - LR(0) Items & States LR(1) Parser shift/reduce conflict resolution LR(1) Items & States LALR(1) Parser LR(1) state merge reduce-reduce conflict

LR(k) Grammar A grammar that can be parsed by an LR parser examining up to k input symbols on each move is called an LR(k) grammar

SLR Parser Coverage: Parser: a DFA for recognizing viable prefixes
weakest in terms of #grammars it succeeds Easiest to construct Parser: a DFA for recognizing viable prefixes States: Sets of LR(0) Items The items in a set can be viewed as the states of an NFA recognizing viable prefixes Grouping items into sets is equivalent to subset construction

G: S  a A B e A  A b c | b B  d Input: abbcde LR Parsing States S0: . S  . a A B e S1: S  a . A B e (shift a) A  . A b c | . b A  . A b c | . b (closed, no further expansion) S2: A  b . (shift b, reduce A  b) S3: S  a A . B e B  . d A  A . b c S4: A  A b . c (shift b) S5: A  A b c . (shift c, reduce A  A b c ) S6: S  a A . B e S7: B  d . (shift d, reduce B  d) S8: S  a A B . e S9: S  a A B e . (shift e, reduce S  a A B e ) c a d e b c a d e b A A c a d e b S B A c a d e b B A c a d e b S10: S’  S (S reduced)

G: S  a A B e A  A b c | b B  d Input: abbcde LR Parsing States S0: . S  . a A B e S1: S  a . A B e (shift a) A  . A b c | . b A  . A b c | . b (closed, no further expansion) S2: A  b . (shift b, reduce A  b) S3: S  a A . B e B  . d A  A . b c S4: A  A b . c (shift b) S5: A  A b c . (shift c, reduce A  A b c ) S6: S  a A . B e S7: B  d . (shift d, reduce B  d) S8: S  a A B . e S9: S  a A B e . (shift e, reduce S  a A B e ) c a d e b c a d e b A A c a d e b S B A c a d e b B A c a d e b

Viable Prefix The set of prefixes of c.s.f.’s (canonical/right sentential forms) that can appear on the stack of a shift-reduce parser are called viable prefixes. Equivalently, it is a prefix of a right-sentential form that does not continue past the right end of the rightmost handle of that sentential form If ab is a viable prefix, then  w  *  abw is a c.s.f.

Item and Valid Item An LR(0) item (item for short) is a marked production [A b1•b2] (dotted rule: production with a dot at RHS) An item [A b1•b2] is said to be valid for some viable prefix ab1 iff  w  *  S*aAw  ab1b2w The “•” represents where we are now during parsing Left of dot: those scanned Right of dot: those to be visited later

Example of Valid Item Consider the grammar: S  • 1C | • D C 3 | 4
D  • 1B B 2 Valid items for the viable prefix e: [S  • 1C], [S  • D], and [D  • 1B]

Example of Valid Item (cont.)
2017/4/21 Example of Valid Item (cont.) Assume 1  a’, i.e., could be Valid items for the viable prefix “1”: [S 1 • C], [C  • 3], [C  • 4], [D 1 • B], and [B  • 2] Compilers: LR Parsing

Example of Valid Item (cont.)
Assume Valid item for viable prefix “13”: [C  3 • ] Valid item for viable prefix “1C”: [S  1 C • ]

Closure: All Valid Items Enumerable from G
Given a grammar E’  E E  E+T | T T  T*F | F F  (E) | id What are valid items for the viable prefix “…E+” ? [E E+ • T], but also [T  ...•F] since E’*aE+Tw T  F aE+ Fw Likewise, [T •T*F], [T •F], [F • (E)] , [F •id] called Closure of [E  E+•T] (inclusive)

Computation of Closure
Given a set, I, of items Initially Closure(I) = I Loop: for all items [A a•Bb]… If [A a•Bb] is in Closure(I) and B g is in P, then include [B  • g ] into Closure(I). Repeat the Loop until no new dotted rules can be added Initial set of items for a grammar: I0 = Closure({[S’ •S] }) (S: start symbol, S’: augmented start symbol)

GOTO Computation ([]: set of items I, including [A a•Xb] & others)
Let I be a set of items which are valid for some viable prefix g. Then goto(I,X), where X(N or Σ), is the set of items which are valid for the viable prefix gX. So [A a•Xb] in I implies Closure({[A aX • b]}) in goto(I,X) S* d[A]w  d[a • X b]w  d{[a X • b]}w ([]: set of items I, including [A a•Xb] & others) g = da

Sets of LR(0) Items Construction
Augment the grammar with: S’ S Let I0 = Closure({[S’ •S] }), C = {I0} while (not all elements of C are marked) { -select an unmarked item set of C (say “I”) and mark it; -  X (V or Σ), if goto(I,X) is not already in C, then add goto(I,X) to C (unmarked); } also called Characteristic Finite State Machine (CFSM) Construction Algorithm.

SLR(1) Parsing Actions Compute the CFSM states C={I0,I1,…,In}.
1. If [A a•ab] Ii and goto(Ii,a) = Ij then set action(Ii,a) = shift,Ij (where ‘a’ is a terminal) If [A a•] Ii then set action(Ii,a) = reduce A a for all a in Follow(A) A terminal a in Follow(A) does not guarantee that A a will result in a successful parse. (not necessarily a “handle”) But, a terminal NOT in Follow(A) will definitely indicate an impossible parse. So reduction on symbols in Follow(A) is only a loose criterion for possible success parse. If [S’ S•] Ii then set action(Ii,$) = accept Other action(*,*) = error

Conflicts Shift-reduce conflicts: both a shift action and a reduce action are possible in the same Closure. E.g., state 2 in Figure 4.37 (p.229) [Aho 86] Reduce-reduce conflicts: two or more distinct reduce actions are possible in the same Closure.

Example: Grammar G for Math Expressions
(0) E’ E (1) E  E+T (2) E  T (3) T  T*F (4) T  F (5) F  (E) (6) F  id Follow(E)={+,),$}, Follow(T)={+,),$,*}, Follow(F)={+,),$,*}

Computing SLR(1) States for G
an SLR(1) State = a set of LR(0) items (See the next slide, Fig. 4.35, page 225, [Aho 86])

Canonical LR(0) Collection for G
I0: E’  . E E  . E + T E  . T T  . T * F T  . F F  . ( E ) F  . id E I1: E’  E . E  E . + T I6: E  E + . T T  . T * F T  . F F  . ( E ) F  . id T + I9: E  E + T . T  T . * F T * ( id I2: E  T . T  T . * F * F F I3: T  F . I10: T  T * F . F I7: T  T * . F F  . ( E ) F  . id T F I4: F  ( . E ) E  . E + T E  . T T  . T * F T  . F F  . ( E ) F  . id ( ( id + ( E I8: F  ( E . ) E  E . + T I11: F  ( E ) . ) id id I5: F  id .

Canonical LR(0) Collection for G
I0: E’  . E E  . E + T E  . T T  . T * F T  . F F  . ( E ) F  . id E I1: E’  E . E  E . + T I6: E  E + . T T  . T * F T  . F F  . ( E ) F  . id T + I9: E  E + T . T  T . * F T * ( id I2: E  T . T  T . * F * F F I3: T  F . I10: T  T * F . I7: T  T * . F F  . ( E ) F  . id T F F I4: F  ( . E ) E  . E + T E  . T T  . T * F T  . F F  . ( E ) F  . id ( ( id + ( E I8: F  ( E . ) E  E . + T I11: F  ( E ) . ) id id I5: F  id .

E  . E + T E  . T T  . T * F T  . F F  . ( E ) F  . id E I1: E’  E . E  E . + T I2: E  T . T  T . * F I3: T  F . I4: F  ( . E ) E  . E + T E  . T T  . T * F T  . F F  . ( E ) F  . id I5: F  id . T F After reduction 0 E 1 ( 0 T 2 Before reduction 0 F 3 0 id 5 id

E  . E + T E  . T T  . T * F T  . F F  . ( E ) F  . id E I1: E’  E . E  E . + T I2: E  T . T  T . * F I3: T  F . I4: F  ( . E ) E  . E + T E  . T T  . T * F T  . F F  . ( E ) F  . id I5: F  id . T F After reduction ( Before reduction id

State Action Goto id * ( ) $ E T F s s s acc r2 s r2 r2 r4 r r4 r4 s s r6 r r6 r6 s s s s s s11 r1 s r1 r1 r3 r r3 r3 r5 r r5 r5 (0) E’  E (1) E  E + T (2) E  T (3) T  T * F (4) T  F (5) F  ( E ) (6) F  id Follow(E)={+,),$} Follow(T)={+,),$,*} Follow(F)={+,),$,*}

Transition Diagram of DFA D for Viable Prefixes
State transition in terms of sets of LR(0) items (Fig. 4.36) SLR(1) Parsing Table: (Fig. 4.31) Ii = “a” => Ij: action(i,a) = shift-j Ii = “A” => Ij:  goto(i,A) = j Ii : [A  a . ]  action(i,FOLLOW(A)) = reduce [A  a] If A = S’ (augmented start symbol )  action(i,$)=accept

Visualizing Transitions in the Transition Diagram
Shift: moving forward one step along arc Equivalent to pushing input symbols Reduce “LHS  RHS”: moving backward to a previous state ‘s’ along arcs labeled with the RHS symbols Then GOTO(s, LHS) equivalent to popping RHS symbols from stack then pushing LHS, then redefining current state

LR Parsing Table Construction Techniques…
2017/4/21 LR Parsing Table Construction Techniques… Canonical LR Parsing Table … LALR Parsing Table … (See Textbook …) Compilers: LR Parsing

Canonical LR Parser SLR(1) parser does NOT always work
SLR(1) Grammar => Unambiguous Unambiguous CFG =/=> SLR(1) Grammar E.g., Shift-reduce conflicts in the SLR(1) parsing table may NOT be a real shift-reduce conflict (e.g., impossible “reduce”) Need more specific & additional information to define states [to avoid false reductions] use LR(1) items, instead of LR(0) items Much more states than SLR(1) Need (canonical) LR(1) or LALR(1) Parsers (Parsing Table construction methods)

Example: non-SLR(1) Grammar for Assignment
(2) S  R (3) L  * R (content of R) (4) L  id (5) R  L I3: (2) S  R . R ‘=’  Follow(S) S => L = R => *R = R Action(2,‘=’) =shift 6 L I2: (1) S  L . = R (5) R  L . IF: Reduce on ‘=’ Goto I3 Error (Follow(S)) NOT Really Reducible Follow(R) = {‘=’, …} Action(2,‘=’) = reduce 5

Example: non-SLR(1) Grammar for Assignment
Problem: G is unambiguous SLR Shift/Reduce conflict is false, but SLR parsing table is unable to remember enough left context to decide proper action on ‘=’ when seeing a string reducible to L

Why Unambiguous Yet Non-SLR(1)
Some reduce actions are not really reducible by checking input against Follow(LHS) Not all symbols in FOLLOW(LHS) result in successful reduction to S. May fail after a few steps of reductions. SLR(1) states does not resolve such conflicts by using LR(0)-item defined states Need more specific constraints to rule out a subset of Follow(LHS) from indicating a reduction action

LR(1) Parsing Table Construction
SLR: reduce A → a on input ‘a’ if Ii contains [A → a .] & ‘a’  FOLLOW(A) Not really reducible for all ‘a’  FOLLOW(A) Only a subset (maybe proper subset) But on some cases: S  b a a =/=> b A a Reduce A → a does not produce a right sentential form E.g., “S  L • = R …” =/=> “S  R • = R …” although “S  *R • = R”  ‘=‘ in follow(R)

Solution: Define each state by including more specific information to rule out invalid reductions Sometimes results in splitting states of the same “core” LR(0) items: [A → a . b] Only dotted production (the “core”) LR(1) items: [A → a . b, LA’s] Dotted production(the “core”), plus lookaheads that allow reduction upon [A → a b .] “1”: length of LA symbols

[A → a . b, a] (& b≠e) : LA (‘a’) has no effect on items of this form [A → a . , a] (i.e., b=e): LA has effect on items of this form Reduction is called for only when next input is ‘a’ (not all terminal symbols in Follow(A)) Only a subset in Follow(A) will be the right LA’s Initially, only one restriction is known: [S’ → . S, $] Infer other restrictions by closure computation

LR(1) Item and Valid Item
An LR(1) item is a dotted production plus lookahead symbols: [A a•b,, a] An LR(1) item [A a•b,, a] is said to be valid for a viable prefix g if  r.m. derivation S* d A w  d a b w, where 1. g = d a 2. ‘a’  First(w) (or w=e && a = ‘$’) The “•” represents where we are now during parsing Left of dot: those scanned Right of dot: those to be visited later

Change the closure() and goto() functions of SLR parsing table construction, with initial collection: C = {closure({S’  . S, $})} [A a•Bb, a] valid implies [B  • g , b] valid if b is in FIRST(ba) Construction method for set of LR(1) items See next few pages

LR(1): Closure(I) Given a set, I, of items Initially Closure(I) = I
Repeat: for each items [A a•Bb, a] in I, each production B g is in G’, and each terminal b in FIRST(ba), include [B  • g , b] to Closure(I). Until no more items can be added to I

LR(1): GOTO(I,X) Let J = {[A aX • b, a] | such that [A a•Xb, a] is in I}. goto(I,X) = closure(J) That is: J = {} For all [A a•Xb, a] in I, J += {[A aX • b, a]} Return(closure(J)) J: [A aX • b, a] [A’ a’X • b’, a’] … I: [A a •X b, a] [A’ a’ •X b’, a’] Goto(I,X) = Closure ({[A aX • b, a], [A’ a’X • b’, a’]}) X

Sets of LR(1) Items Construction
Augment the grammar with: S’ S, call it G’ Let I0 = Closure({[S’ •S, $] }), C = {I0} Repeat { -  I C, -  X(N or Σ), if goto(I,X) is not already in C, then add goto(I,X) to C } Until no more sets of items can be added to C

Example: resolving shift/reduce conflicts with LR(1) items
G’: {S’S, S  CC, C  cC|d} L(G)={ cm d cn d } => I0 ~ I9 (Fig. 4.39, p. 235 [Aho 86]) I3 vs. I6: same set of LR(0) items with different lookaheads Conditions for reduction are different I3: reduce on c/d (when constructing 1st ‘C’) I6: reduce on $ (when constructing 2nd ‘C’)

SLR(1) Goto Graph S I1: S’  S . [$] I0: S’  . S S  . C C C  . c C
C  . d C I2: S  C . C C  . c C C  . d C I5: S  C C . [$] Follow Sets: S: {$} C: {c,d,$} c d c G: S’  S S  C C C  c C C  d c I3: C  c . C C  . c C C  . d C I8: C  c C .,c/d/$ d d I4: C  d . , c/d/$

LR(1) Goto Graph S I1: S’  S ., $ I0: S’  . S, $ S  . C C, $
C  . c C, c/d C  . d , c/d C I2: S  C . C, $ C  . c C, $ C  . d , $ C I5: S  C C ., $ c c I6: C  c . C, $ C  . c C, $ C  . d , $ C I9: C  c C . , $ d c d I7: C  d . , $ G: S’  S S  C C C  c C C  d c I3: C  c . C, c/d C  . c C, c/d C  . d , c/d C I8: C  c C . , c/d d d I4: C  d . , c/d

Construction of Canonical LR(1) Parsing Table
Algorithm 4.10 Shift: (same as SLR, ignoring LA in item) Reduce on ‘a’: [A a•,, a] Accept on ‘$’: [S’ S •,, $] Goto: (same as SLR) LR(1) Grammar: a grammar without conflicts (multiply defined actions) in LR(1) Parsing Table

SLR(1) vs. LR(1) LR(1): more specific states
May split into states with the same “core” but with different lookaheads SLR(1) Grammar  LR(1) Grammar Number of states LR(1) >> SLR(1)

LALR(1) Merge LR(1) states with the same core, while retaining lookahead symbols Considerably smaller than canonical LR tables Most programming language constructs can be expressed by an LALR grammar SLR and LALR have the same number of states Without/with lookahead symbols [full/subset of FOLLOW] Several hundred states for PASCAL Several thousands, if using LR(1) G is an LALR(1) Grammar: if no conflicts after state merge

LALR(1) vs. LR(1) Effect of LR(1) state merge:
The merging of states with common cores can never produce a shift-reduce conflict that was not present in one of the original states Because shift actions depend only on the core, not the lookahead However, a merge may produce a reduce-reduce conflict. Because union of lookaheads may introduce unnecessary reductions

LALR(1) vs. LR(1) Example: merging that produces reduce-reduce conflicts. LR(1) Grammar: S’  S S  a A d | b B d | a B e | b A e A  c B  c Sets of LR(1) items: {[A  c . , d], [B  c . , e]} (valid for viable prefix ac) {[A  c . , e], [B  c . , d]} (valid for viable prefix bc) Merging states with common cores  {[A  c . , d/e], [B  c . , d/e]} merging also merges loohaheads Reduce-reduce conflicts: A  c and B  c , on inputs d and e

LALR(1) vs. LR(1) Effect of LR(1) state merge:
Behave like the original, or Declare error later, but before shifting next input symbol For correct input: LR and LALR have the same sequence of shift/reduce For erroneous input: LALR requires extra reduces after LR has detected an error (but before shifting next)

Example: Merge States with Same Core
Fig. 4.39: I4 vs. I7 – same reduction with different lookaheads State merge: dotted rules remain, LA’s merged Examples: I3 + I6 => I36 I4 + I7 => I47 I8 + I9 => I89 Same as SLR(1) table (Fig. 4.41, p239, [Aho 86])

LALR(1) Parsing Table Construction (I)
Method 1: (Naïve Method) [1] Construct LR(1) parsing table Very costly [#states is normally very large] [2] Merge states with the same core

LALR(1) Parsing Table Construction (II)
Method 2: (Efficient Construction Method) [1] Construct kernels set of LR(0) items, from [S’•S] It is Possible to Compute shift/reduce/goto actions directly from kernel items kernel items: items whose dot is not at the beginning, except [S’  . S, $]: those not derived from closure() Can represent a set of items [2] Append lookaheads Compute initial spontaneous lookaheads, and those item pairs that pass Propagated lookaheads

LALR(1) Parsing Table Construction (II.1)
Compute shift/reduce/goto actions directly from kernel items: (pps ) Reduce: Shift: Goto: Need to pre-compute First’(C) = {A | r.m. C* A } for all pairs of nonterminals (C, A) and 

LALR(1) Parsing Table Construction (II.2)
Determine spontaneous and propagated lookaheads (Fig. 4.43) Compute closure({core,#}) by assuming a “dummy lookahead” ‘#’

LALR(1) Parsing Table Construction: Example
Example: 4.46/Fig [p. 241, Aho 86] Kernels of sets of LR(0) items  Fig [with non-kernel items] Example: 4.47 Get Spontaneous & Propagated lookaheads Fig. 4.44: item pairs that propagate lookaheads Fig. 4.45: initial spontaneous lookahead, and multiple passes of lookahead propagation LALR(1) parsing table: Todo by yourself

LALR(1) Parsing Table Construction
LALR(/LR) (Fig 4.45) SLR (Fig. 4.37) SLR: I2: shift/reduce conflict on ‘=’ LALR(/LR): I2: shift on ‘=’, reduce on ‘$’, NO conflict I2: (1) S  L . = R (5) R  L . I2: (1) S  L . = R, $ (5) R  L . , $

Using Ambiguous Grammar
(see Handouts)

Parser Generators YACC (Slide Part II)

LR Parsing Techniques Bottom-Up Parsing

Similar presentations

Presentation on theme: "LR Parsing Techniques Bottom-Up Parsing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

LR Parsing Techniques Bottom-Up Parsing

Similar presentations

Presentation on theme: "LR Parsing Techniques Bottom-Up Parsing"— Presentation transcript:

Similar presentations

About project

Feedback