Chap. 3 BOTTOM-UP PARSING

Chap. 3 BOTTOM-UP PARSING

LR parsers The technique called LR(k) ("L" is for left-to-right scanning of the input, the "R" for constructing a rightmost derivation in reverse, and the k for the number of input symbols of lookahead that are used in making parsing decisions) parsing can be used to parse a large class of context-free grammars. When (k) is omitted, k is assumed to be 1.

LR parsers can be constructed to recognize virtually all programming-language constructs for which context-free grammars can be written. - The LR parsing method is the most general nonbacktracking shift-reduce parsing method known, yet it can be implemented as efficiently as other shift-reduce methods. - An LR parser can detect a syntactic error as soon as it is possible to do so on a left-to-right scan of the input. It is a lot of work to construct an LR parser by hand, fortunately there exist specialized tools called LR parser generators.

There exist 3 techniques for constructing an LR parsing table for a grammar.
- The first method is called simple LR (SLR) is the easiest to implement, but the least powerful of the three. - The second method, called canonical LR, is the most powerful, and the most expensive. - The third method, called lookahead LR (LALR) is intermediate in power and cost between the other two.

Input LR Parsing Program Stack action goto a1 … ai an $ sm  Output Xm
Input a1 … ai an $ LR Parsing Program Stack sm  Output Xm sm-1 Xm-1 s0 action goto

An LR parser consists of an input, a stack, a driver program, and a parsing table that has two parts (action and goto). The driver program is the same for all LR parsers; only the parsing table changes from one parser to another. The parsing program reads characters from an input buffer one at a time. The program uses a stack to store a string of the form s0X1s1X2s2X3…Xmsm, where sm is on top. Each Xi is a grammar symbol and each si is a symbol called a state. Each state symbol summarizes the information contained in the stack below it, and the combination of the state symbol on top of the stack and the current input symbol are used to index the parsing table and determine the shift-reduce parsing decision.

The parsing table consists of two parts, a parsing action function action and a goto function goto. The program driving the LR parser determines sm, the state currently on top of the stack, and ai, the current input symbol. It then consults action[sm, ai], the parsing action table entry for state sm and input ai, which can have one of 4 values: 1. shift s, where s is a state, 2. reduce by a grammar production A, 3. accept, and 4. error.

The function goto takes a state and grammar symbol as arguments and produces a state.
In fact, the goto function of a parsing table is a transition table of a deterministic finite automaton that recognizes viable prefixes of G. The initial state of this DFA is the state initially put on top of the LR parser stack.

A configuration of an LR parser is a pair whose first component is the stack contents and whose second component is the unexpended input: (s0X1s1X2…Xmsm , aiai+1…an$) The next move of the parser is determined by reading the current input symbol ai, and the state on top of the stack sm, and then consulting the parsing action table entry action[sm, ai]. The configurations resulting after each of the four types of move are as follows:

1. If action[sm, ai] = shift s, the parser executes a shift move, the parser executes a shift move entering the configuration (s0X1s1X2…Xmsmais , ai+1…an$). Here the parser has shifted both the current input symbol ai and the next state s, which is goto[sm,ai] onto the stack ; ai+1 becomes the current input symbol. 2. If action[sm, ai] = reduce A , then parser executes a reduce move entering the configuration (s0X1s1X2…Xm-rsm-rAs , aiai+1…an$) where s=GOTO[sm-r,A] and r is the length of , the right side of the production. Here the parser first popped 2r symbols off the stack (r state symbols and r grammar symbols), exposing state sm-r. The parser then pushed both A, the left side of the production, and s, the entry for goto[sm-r,A], onto the stack. The current input symbol is not changed in a reduce move. For the LR parsers Xm-r+1… Xm, the sequence of grammar symbols popped off the stack, will always match , the right side of the reducing production. 3. If action[sm, ai]=ACCEPT, parsing is completed. 4. If action[sm, ai]=ERROR, the parser calls an error recovery routine.

LR parsing algorithm Input : An input string w and an LR parsing table with functions action and goto for a grammar G. Output : If w is in L(G), a bottom-up parse for w; otherwise, an error indication. Method : Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in the input buffer. The parser then executes the following program until an accept or error action is encountered. set ip to point to the first symbol of w$ ;

repeat forever let s be the state on top of the stack and a the symbol pointed by ip ; if action[s, a]=shift s’ then begin push a then s’ on top of the stack; advance ip to the next input symbol end else if action[s, a]=reduce by Aβ then begin pop 2*| β| symbols off the stack; let s’ be the state now on top of the stack ; push A then goto[s’,A] on top of the stack; output the production Aβ else if action[s, a]=Accept then return /* Success */ else Error()

Example E  E + T (1) E  T (2) T  T * F (3) T  F (4) F  (E) (5)
F  id (6)

LR grammar A grammar for which we can construct a parsing table is said to be an LR grammar. There is a significant difference between LL and LR grammars. For a grammar to be LR(k), we must be able to recognize the occurrence of the right side of a production, having seen all of what is derived from that right side with k input symbols of lookahead. This requirement is far less stringent than that for LL(k) grammars where we must be able to recognize the use of a production seeing only the first k symbols of what its right side derives. Thus, LR grammars can describe more languages than LL grammars.

Constructing SLR Parsing Tables
An LR(0) item (item for short) of a grammar G is a production of G with a dot at some position of the right side. Thus, production AXYZ yields the four items A  .XYZ A  X.YZ A  XY.Z A  XYZ.

The Closure operation If I is the set of items for a grammar G, then closure(I) is the set of items constructed from I by the two rules: 1. Initially, every item in I is added to closure(I). 2. If A  .B is in closure(I) and  is a production, then add the item B. to closure(I), if it is not already there. We apply this rule until no more new items can be added to closure(I). Intuitively, A  .B in closure(I) indicates that, at some point in the parsing process, we think we might next see a substring derivable from B as input. If B is a production, we also expect we might see a substring derivable from  at this point. For this reason we also include B. in closure(I).

Example Consider the augmented Expression grammar : E’  E E  E + T|T
T  T * F|F F  (E)|id

If I is the set of one item {[E’  E]}, then Closure(I) contains the items :
T .T*F T  .F F  .(E) F  .id

Function Closure(I) begin J :=I repeat for each item A  .B in J and each production B   of G such that B  . is not in J do add B  . to J. until no more items can be added to J ; return J end

The Goto operation Goto(I,X) where I is a set of items and X is a grammar symbol. Goto(I,X) is defined to be the closure of the set of all items [A  X.] such that [A  .X] is in I. Intuitively, if I is the set of items that are valid for some viable prefix , then goto(I,X) is the set of items that are valid for the viable prefix X. Example If I is the set of two items {[E’  E], [E  E.+T]} then goto(I,+) consists of E  E +.T T  .T*F T  .F F  .(E) F  .id

The sets-of-Items construction
procedure items(G'); begin C :={Closure({[S’  .S]})} ; repeat for each set of items I in C and each grammar symbol X such that goto(I,X) is not empty and not in C do add goto(I,X) to C until no more sets of items can be added to C end

Example I0 : E’  .E, E  .E+T, E .T, T  .T*F, T  .F, F .(E), F  .id I1 : E’  E., E  E.+T I2 : E  T., T  T.*F I3 : T  F. I4 : F  (.E) , E  .E+T, E .T, T  .T*F, T  .F, F .(E), F  .id I5 : F  id. I6 : E  E+.T, T  .T*F, T  .F, F .(E), F  .id I7 : T  T*.F, F .(E), F  .id I8 : F  (E.), E  E.+T I9 : E  E+T., T  T.*F I10 : T  T*F. I11 : F  (E).

SLR parsing table Input. An augmented grammar G'.
Output. The SLR parsing table functions action and goto for G'. Method. 1. Construct C = {I0, I1, …In}, the set of LR(0) items for G’. 2. State i is constructed from Ii. The parsing actions for state i are determined as follows : If [A  .a] is in Ii et goto[Ii,a] = Ij, then set action[i,a] with shift j. Here a must be a terminal. b) If [A  .] is in Ii, then set action [i,a] with reduce by "A  " for all a of FOLLOW(A) ; here A may not be S’. c) If [S’  S.] is in Ii, then set action[i,$] with "accept"

If any conflicting actions are generated by the above rules, we say the grammar is not SLR(1).
1. The goto transitions for state i are constructed for all nonterminals A using the rule: If goto[Ii,A]=Ij, then goto[i,A]=j 2. All entries not defined by rules (2) and (3) are made "error". 3. The initial state of the parser is the one constructed from the set of items containing [S’  .S]

Example Every SLR(1) grammar is unambigious, but there are many unambigious grammars that are not SLR(1). Consider the grammar with productions S  L=R S  R L  *R L  id R  L We can consider L and R as standing for L-value and R-value, respectively, and * operator indicating "contents of"

I0 : S’  .S, S  .L=R, L  .*R, L  .id, S  .R, R  .L
I2 : S  L.=R, R  L. I3 : S  R. I4 : L  *.R, R  .L, L  .id, L  .*R I5 : L  id. I6 : S  L=.R, R  .L, L  .*R, L  .id I7 : L  *R. I8 : R  L. I9 : S  L=R.

Consider the set of items I2.
action[2,=]=shift 6 FOLLOW(L)=FOLLOW(R)={$,=} action[2,=]=reduce by R  L. So state 2 has a shift/reduce conflict.

Chap. 3 BOTTOM-UP PARSING

Similar presentations

Presentation on theme: "Chap. 3 BOTTOM-UP PARSING"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chap. 3 BOTTOM-UP PARSING

Similar presentations

Presentation on theme: "Chap. 3 BOTTOM-UP PARSING"— Presentation transcript:

Similar presentations

About project

Feedback