Presentation is loading. Please wait.

Presentation is loading. Please wait.

Top-down parsing cannot be performed on left recursive grammars.

Similar presentations


Presentation on theme: "Top-down parsing cannot be performed on left recursive grammars."— Presentation transcript:

1 Top-down parsing cannot be performed on left recursive grammars.
Chap 2. TOP-DOWN PARSING We introduce the basic ideas behind top-down parsing and show how to construct an efficient non-backtracking form of top-down parsing called a predictive parser. We define the class of LL(1) grammars from which predictive parsers can be constructed automatically. Top-down parsing cannot be performed on left recursive grammars.

2 Recursive-Descent Parsing
Top-down parsing can be viewed as an attempt to find a leftmost derivation for an input string. We can also view it as an attempt to construct a parse tree for the input starting from the root and creating the nodes of the parse tree in preorder. Let's consider a general form of top-down parsing, called recursive descent, that may involve backtracking, that is, making repeated scans of the input. However, backtracking parsers are not seen frequently, since backtracking is rarely needed to parse programming language constructs.

3 Example Consider the grammar ScAd Aab|a
and the input string w = cad. S S   c A d c A d a b a

4 Elimination of Left Recursion
A grammar is immediately left recursive if it has a nonterminal A such that there is a production AA for some string . In order to be able to parse in the top-down way, left recursion must be eliminated. An immediate left recursion can be eliminated very easily. Let's take a production EE+T. Suppose the procedure for E decides to apply this production. The right side begins with E so the procedure for E is called recursively, and the parser loops forever. A left recursive production can be eliminated by rewriting the offending production.

5 Consider a nonterminal A with two productions
AA |  where  and  are sequences of terminals and nonterminals that do not start with A. For example, in EE+T | T A=E, =+T, and =T. Repeated application of the production AA builds up a sequence of 's to the right of A. A   A   A   A      …… 

6 When A is finally replaced by , we have a  followed by a sequence of 0 or more 's.
The same effect can be achieved, by rewriting the productions for A in the following manner. AR RR |  A  R  R ……………………  R  R   

7 Here R is a new nonterminal
Here R is a new nonterminal. The production RR is right recursive, because this production for R has R itself as the last symbol on the right side. Right recursive productions lead to trees that grow down towards the right.

8 Example Consider the following grammar for arithmetic expressions.
E E + T | T T T * F | F F (E) | id. Eliminating the immediate left recursion from the productions for E and then for T, we obtain E TE' E' + TE' |  T FT' T' * FT' | 

9 No matter how many A-productions there are, we can eliminate immediate left recursion from them
A A 1 | A2 | …| Am | 1 | 2 | …| n where no i begins with an A. Then, we replace the A-productions by A 1A' | 2A' | … |nA' A'1A' | 2A' | … |mA' |  The nonterminal A generates the same strings as before but is no longer left recursive.

10 Nonrecursive Predictive Parsing
It is possible to build a nonrecursive predictive parser by maintaining a stack explicitly, rather than implicitly via recursive calls. The key problem during predictive parsing is that of determining the production to be applied for a nonterminal. The nonrecursive parser looks up the production to be applied in a parsing table. In what follows, we shall see how the table can be constructed directly from certain grammars.

11 Predictive Parsing Program
Input a + b $ Predictive Parsing Program Stack X   Output Y Z Parsing Table M

12 A table-driven predictive parser has an input buffer, a stack, a parsing table, and an output stream. The input buffer contains the string to be parsed, followed by $, a symbol used as a right endmarker to indicate the end of the input string. The stack contains a sequence of grammar symbols with $ on the bottom, indicating the bottom of the stack. Initially, the stack contains the start symbol of the grammar on top of $. The parsing table is a two-dimensional array M [A, a], where A is a nonterminal, and a is a terminal or the symbol $.

13 The parser is controlled by a program that behaves as follows
The parser is controlled by a program that behaves as follows. The program considers X, the symbol on top of the stack, and a, the current input symbol. If X = a = $, the parser halts and announces successful completion of parsing. 2. If X = a  $, the parser pops X off the stack and advances the input pointer to the next input symbol. 3. If X is a nonterminal, the program consults entry M [A, a] of the parsing table M. This entry will be either an X-production of the grammar or an error entry. If, for example, M[X, a] = {X  UVW}, the parser replaces X on top of the stack by WVU (with U on top). As output, we shall assume that the parser just prints the production used; any other code could be executed here. If M[X, a] = error, the parser calls an error recovery routine.

14 Algorithm for Nonrecursive predictive parsing.
Input. A string w and a parsing table M for grammar G. Output. If w is in L(G), a leftmost derivation of w, otherwise, an error indication. Method. Initially, the parser is in a configuration in which it has $S on the stack with S, the start symbol of G on top, and w$ in the input buffer.

15 set ip to point to the first symbol of w$;
repeat let X be the top stack symbol and a the symbol pointed to by ip; if X is a terminal or $ then if X = a then pop X from the stack and advance ip else error () else /* X is a nonterminal */ if M[X, a] = X Y1 Y2  Yk then begin pop X from the stack; push Yk , Yk–1 , . . ., Y1 onto the stack, with Y1 on top; output the production X Y1 Y2  Yk end until X = $ /* stack is empty */

16 Example Consider the grammar for arithmetic expression

17 Nonter-minal Input Sumbol id + * ( ) $ E  TE' E’ E’ ε T T  FT' T’
E’ E’ +TE' E’ ε T T  FT' T’ T’ ε T ‘ *FT' F F  id F  (E)

18 With input id + id * id the predictive parser makes the sequence of moves.
The input pointer points to the leftmost symbol of the string in the input column. If we observe the actions of this parser carefully, we see that it is tracing out a leftmost derivation for the input, that is, the productions output are those of a leftmost derivation.

19 FIRST and FOLLOW The construction of a predictive parser is aided by two functions associated with a grammar G. These functions, FIRST and FOLLOW, allow us to fill in the entries of a predictive parsing table for G, whenever possible. If α is any string of grammar symbols, let first(α) be the set of terminals that begin the strings derived from α. If α , then  is also in FIRST(α). Define FOLLOW (A), for nonterminal A, to be the set of terminals a that can appear immediately to the right of A in some sentential form, that is the set of terminals a such that there exists a derivation of the form S  jAaβ for some α and β.

20 If X is terminal, then FIRST(X) is {X}.
If X is a production, then add  to FIRST(X). If X is nonterminal and X — Y1 Y2 • • • Yk is a production, then place a in FIRST(X) if for some i, a is in FIRST(Yi), and  is in all of FIRST(Y1), .... FIRST(Yi–1); that is, Y1 • • • Yi–1. If  is in FIRST(Yj) for all j = 1, 2, .... k, then add  to FIRST(X). For example, everything in FIRST(Y1) is surely in FIRST(X). If Y1 does not derive , then we add nothing more to FIRST(X), but if Y1., then we add FIRST(Y2) and so on.

21 Now, we can compute FIRST for any string X1 X2 • • • Xn, as follows
Now, we can compute FIRST for any string X1 X2 • • • Xn, as follows. Add to FIRST(X1X2 • • • Xn) all the non- symbols of FIRST(X1). Also add the non- symbols of FIRST(X2) if  is in FIRST(X1), the non- symbols of FIRST(X3) if  is in both FIRST(X1) and FIRST(X2), and so on. Finally, add  to FIRST(X1X2 • • • Xn) if, for all i, FIRST(Xi) contains .

22 To compute FOLLOW(A) for all nonterminals A, apply the following rules until nothing can be added to any FOLLOW set. Place $ in FOLLOW(S), where S is the start symbol and $ is the input right endmarker. If there is a production A  , then everything in FIRST() except for  is placed in FOLLOW(B). 3. If there is a production A  , or a production A , where FIRST() contains  (i.e., ), then everything in FOLLOW(A) is in FOLLOW(B).

23 Example ETE' E'+TE' |  TFT' T'*FT' |  F(E) | id Then
Then FIRST(E) = FIRST(T) = FIRST(F) = {(, id} FIRST(E') = {+, } FIRST(T') = {*, } FOLLOW(E) = FOLLOW(E') = {), $} FOLLOW(F) = {+,*, ), $}

24 Construction of Predictive Parsing Tables
Algorithm. Construction of a predictive parsing table Input. Grammar G Output. Parsing table M Method 1.        For each production A of the grammar, do steps 2 and 3. 2.        For each terminal a in FIRST(), add A to M[A,a] 3.        If  is in FIRST() for each aFOLLOW(A), add A to M[A,a]. 4.        Make each undefined entry of M be error.

25 Example Let's apply the algorithm above to the arithmetic expression's grammar. Since FIRST(TE') = FIRST(T) = {(, id}, production ETE' causes M[E, (] and M[E,id] to acquire the entry ETE'. Production E'TE' causes M[E',)] and M[E',$] to acquire E' since FOLLOW(E') = {), $}. We will get the table that we have already seen.

26 LL(1) Grammars If G is left recursive or ambiguous, then M will have at least one multiply-defined entry. Example Let's consider a grammar : S  iEtSS' | a S'  eS |  E  b FIRST(S) = {i, a} FIRST(S') = {e, } FIRST(E) = {b} FOLLOW(S) = FOLLOW(S') ={$,e} FOLLOW(E) = {t}

27 Nonterminal Input Symbol a b e i t $ S S a S' S'   S'  eS E E  b
S iEtSS' S' S'   S'  eS E E  b

28 The entry M[S',e] contains both S'   and S'  eS, since FOLLOW(S') = {e,$}.
The grammar is ambiguous and the ambiguity is manifested by a choice in what production to use when an e (else) is seen.

29 A grammar whose parsing table has no multiply-defined entries is LL(1).
The first "L" is for scanning the input from left to right. The second "L" for producing the leftmost derivation. The "1" for using one input symbol of lookahead at each step to make parsing action decisions.


Download ppt "Top-down parsing cannot be performed on left recursive grammars."

Similar presentations


Ads by Google