Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 4 - Parsing CSCE 343.

Similar presentations


Presentation on theme: "Chapter 4 - Parsing CSCE 343."— Presentation transcript:

1 Chapter 4 - Parsing CSCE 343

2 The Parsing Problem Goals of the parser:
Find syntax errors Produce good error messages Recover quickly to find as many errors as possible Produce the parse tree Parse tree is used for language translation Parsers that work for any unambiguous grammar are O(n3) We restrict the grammars and examine O(n) parsers

3 Parsers Two categories of parsers: Parsers look only one token ahead
Top down Produce the parse tree beginning at the root. Order is a leftmost derivation Builds the parse tree in pre-order Bottom up Produce the parse tree beginning at leaves Order is a reverse of a rightmost derivation Parsers look only one token ahead Where do the tokens come from?

4 Top-Down Parsers Basic process: Top-down algorithms (LL algorithms):
Given a sentential form xAα, the parser must: Find the correct A-rule to get the next sentential form in the leftmost derivation. Can only look ahead one token. Top-down algorithms (LL algorithms): Recursive descent – coded implementation Table driven implementation LL: Left-to-right scan of input, Leftmost derivation

5 Recursive-Descent Parsing
Well suited for EBNF but only works for restricted forms of EBNF One subprogram (function/method) for each non-terminal Subprogram parses sentences generated by non-term and creates parse tree. All subprograms have access to: Lexical analyzer method lex(), puts next token code in nextToken

6 Recursive Descent For S
//method for S --- S has only one rule //SaBc S(): if (nextToken == a) lex() B() else error if (nextToken == c) reture

7 Example E  A { (+ | - ) A } A  F { (* | / ) F }
F  ( E ) | a | b | c Write the code for E, A, and F Trace a call to E with this string a + ( b – c ) * a

8 RD Parsing / EBNF Restrictions
If there is more than one RHS for a nonterminal A  α1 | α2 Must determine which to use Choose based on next token of input Next token is compared with first token that can be generated by each RHS First (α) = { a | α  * aβ} If no match, error pairwise disjointness test First (α1) ∩ First (α2) = Ø

9 RD Parsing / EBNF Restrictions
Left recursion problem Direct or indirect left recursion, cannot be parsed by a top-down parser A  Ab B  Ca C  Bc Can be modified to remove left recursion A  A bT | A AX | bc | X | aa A  bcA’ | XA’ | aaA’ A’  bTA’ | AXA’ | ε

10 Grammars for RD Parsing
Must pass the pairwise disjoint test Can often use left factoring to resolve problem <var>  <ident> | <ident> [<expr>] //array reference <var>  <ident> <new> <new>  ε | [<expr>] Can not have direct or indirect left recursion This problem can always be solved, but could get messy

11 RD Parsing For each grammar determine if the RHSs of A are pairwise disjoint A  a | bB | cAb A  a | aB A  B | C B  bC C  ( E ) | d

12 Bottom up Parsing Bottom up Parsing does not have the same restrictions as top down parsing left recursion pairwise disjoint RHS sentential forms Uses BNF not EBNF Parse order is the reverse of a rightmost derivation

13 Bottom-up Parsing Problem:
Find the correct RHS in a right-sentential form that reduces to the previous right-sentential form (the handle) Def:  is the handle of the right sentential form  = w if and only if S =>*rm Aw =>rm w Look at a parse tree phrase (leaf nodes of internal nodes in parse tree) simple phrase (leaf nodes of internal nodes at level 1 in parse tree) Handle: The handle of a right-sentential form is its leftmost simple phrase Given a parse tree, easy to find handle

14 Bottom Up Parsing Example 1(Textbook Figure 4.3) Example 2 Example 3
S AB B bBc | bc A  aAb | ab Find parse tree, rightmost derivation, phrases, simple phrases, and handle for aabbbbcc Example 3 E  A { (+ | - ) A } A  F { (* | / ) F } F  ( E ) | a | b | c Find (partial) parse tree, rightmost derivation, phrases, simple phrases and handle for A+(E)*a

15 Grammars and PDA Formal Languages has a machine called a push down automata that can recognize strings generated by CFG These machines are nondeterministic Knuth realized that this problem could be resolved by adding a finite number of states to the stack Canonical LR algorithm(Knuth 1965)

16 Shift-Reduce Algorithms
Canonical LR (Knuth 1965): Reduce: replace the handle on the top of the parse stack with its LHS Shift: moving the next input token to the top of the parse stack LR parsing table constructed with a tool (yacc)

17 Canonical LR State:

18 Parser Table

19 LR Example EE + T E T T  T * F T  F F  ( E ) F  id

20 Advantages of LR Parsers
They will work for nearly all grammars that describe programming languages. They work on a larger class of grammars than other bottom-up algorithms, but are as efficient as any other bottom-up parser. They can detect syntax errors as soon as it is possible. The LR class of grammars is a superset of the class parsable by LL parsers.


Download ppt "Chapter 4 - Parsing CSCE 343."

Similar presentations


Ads by Google