Presentation on theme: "Parsing II : Top-down Parsing"— Presentation transcript:
1Parsing II : Top-down Parsing Lecture 7CS 4318/5531 Spring 2010Apan QasemTexas State University*some slides adopted from Cooper and Torczon
2Review Parsing Goals Context-free grammars Derivations Parse Trees Sequence of production rules leading to a sentenceLeftmost derivationsRightmost derivationsParse TreesTree representation of a derivationTransforms into IRPrecedence in languagesCan manipulate grammar to enforce precedenceCannot do this with REsAre there cases where we have a rightmost derivation but not a leftmost one, or vice versa?
3Chomsky Hierarchy CFL CSL Unrestricted RL DFA/NFA PDA LL(1) LR(1)LL(1)DFA/NFAPDAMany parsersTuring machinesRecursively enumerableNoam ChomskyThree Models for the Description of Language, 1956
4Today Top-down parsing algorithm Issues in parsing Ambiguity BacktrackingLeft Recursion
5Another Derivation for x – 2 * y Can we categorize this as leftmost or rightmost derivation?
6Two Leftmost Derivations for x – 2 * y Original choiceNew choiceIs this a problem for parsers?implies non-determinism, difficult to automate
7Ambiguous GrammarIf a grammar has more than one leftmost derivation for a single sentential form, the grammar is ambiguousIf a grammar has more than one rightmost derivation for a single sentential form, the grammar is ambiguousThe leftmost and rightmost derivations for a sentential form may differ, even in an unambiguous grammar
8Ambiguity Example : The Dangling else Classic exampleStmt if Expr then Stmt| if Expr then Stmt else Stmt| … other stmts …
9Ambiguity Example : Derivation Input: if E1 then if E2 then S1 else S2First derivationstmt2 if expr then stmt else stmt1 if expr then if expr then stmt else stmtif E1 then if E2 then S1 else S2Second derivation1 if expr then stmt2 if expr then if expr then stmt else stmt
10Ambiguity Example : Parse Trees Input: if E1 then if E2 then S1 else S2thenelseifE1E2S2S1production 2, then production 1thenifE1E2S1elseS2production 1, then production 2
11Resolving The Dangling Else Problem Match else to the innermost unmatched ifStmt if Expr then Stmt| if Expr then WithElse else Stmt| … other stmts …WithElse if Expr then WithElse else WithElseOnce into WithElse we cannot generate an unmatched else
12Deeper Ambiguity a = f(17) Ambiguity usually refers to confusion in the CFGOverloading can create deeper ambiguitya = f(17)The above code is fine in C but in Fortran it’s ambiguousf could be either a function or a subscripted variableDisambiguating this one requires contextNeed values of declarationsReally an issue of type, not context-free syntaxRequires an extra-grammatical solution (not in CFG)Must handle these with a different mechanismStep outside grammar rather than use a more complex grammar
13Dealing with Ambiguity Ambiguity arises from two distinct sourcesConfusion in the context-free syntax (if-then-else)Confusion that requires context to resolve (overloading)Resolving ambiguityTo remove context-free ambiguity, rewrite the grammarTo handle context-sensitive ambiguity takes cooperationKnowledge of declarations, types, …Accept a superset of L(G) & check it by other meansThis is a language design problemIn practice, most compilers will accept an ambiguous grammarParsing techniques that “do the right thing”i.e., always select the same derivation
14Detecting AmbiguityCan we come up with a rule for detecting ambiguity in CFGs?Let be a string in the L(G)Need to showA * 1 * andB * 2 * Turns out this is undecidable!
15Parsing GoalIs there a derivation that produces a string of terminals that matches the input string?Answer this question by attempting to build a parse tree
16Two Approaches to Parsing Top-down parsers (LL(1), recursive descent)Start at the root of the parse tree and grow toward leavesAt each step pick a re-write rule to applyWhen the sentential form consists of only terminals check if it matches the inputBottom-up parsers (LR(1))Start at the leaves and grow toward rootAt each step consume input string and find a matching rule to create parent node in parse treeWhen a node with the start symbol is created we are doneVery high-level sketch,Lots of holesPlug-in the holes as we go along
17Top-down Parsing Algorithm Construct the root node of the parse tree with the start symbolRepeat until input string matches fringePick a re-write rule to applyStart symbolAlso called goal symbol (comes from bottom-up parsing)FringeLeaf nodes from left to right (order is important)At any stage of the construction they can be labeled with both terminals and non-terminals
18Top-down Parsing Algorithm Construct the root node of the parse tree with the start symbolRepeat until input string matches fringePick a re-write rule to applyNeed to expand on this step
19Top-down Parsing Algorithm Construct the root node of the parse tree with the start symbolRepeatPick the leftmost node on the fringe labeled with an NT to expandIf the expansion adds a terminal to the leftmost node of the fringe match the terminal with input symbol and if there is a match move the cursor on the input stringUntil fringe consists of only terminalsWhat type of derivation are we doing?
20Selecting The Right Rules What re-write rule do we pick?Can specify leftmost or rightmost NTSentential Form: a B C d b Aa B C d b A (Leftmost : Pick B to re-write)A B C d b A (Rightmost : Pick A to re-write)Solves one problem : which NT to re-writeBut we can still have multiple options for each NTB -> a | b | cGrammar does not need to be ambiguous for this to happenDifferent derivations may lead to different strings in (or not in) the languageWhat happens if we pick the wrong re-write rule?
21Back to the Expression Grammar Add the start symbolEnforce arithmetic precedence
22Example : Problematic Parse of x – 2 * y ExprTerm+Fact.<id,x>Leftmost derivation, choose productions in an order that exposes problems
23Example : Problematic Parse of x – 2 * y ExprTerm+Fact.<id,x>Followed legal production rules but “–” doesn’t match “+”The parser must backtrack to the second re-write applied
24Example : Problematic Parse of x – 2 * y ExprTerm–Fact.<id,x>
25Example : Problematic Parse of x – 2 * y ExprTerm–Fact.<id,x>This time, “–” and “–” matchedWe can advance past “–” to look at “2”Now, we need to expand Term - the last NT on the fringe
26Example : Problematic Parse of x – 2 * y ExprTerm–Fact.<id,x><num,2>
27Example : Problematic Parse of x – 2 * y ExprTerm-Fact.<id,x><num,2>Where are we? “2” matches “2”We have more input, but no NTs left to expandThe expansion terminated too soonThis is also a problem !
28Example : Problematic Parse of x – 2 * y ExprTerm–Fact.<id,x><id,y><num,2>*This time, we matched & consumed all the inputSuccess!
29BacktrackingWhenever we have multiple production rules for the same NT there is a possibility that our parser might choose the wrong oneTo get around this problem most parsers will do backtrackingIf the parser realizes that there is no match, it will go back and try other optionsOnly when all the options have been tried out the parser will reject an input stringIn a way, the parser is simulating all possible pathsDoes this remind you of something we have seen before?
30Top-down Parsing Algorithm with Backtracking Another stab at the algorithm :Construct the root node of the parse tree with the start symbolRepeatAt a node labeled A, select a production with A on its lhs and for each symbol on its rhs, construct the appropriate childIf the expansion adds a terminal to the leftmost node of the fringe attempt to match the terminal with input symbolif there is a match move the cursor on the input string else backtrackFind the next node to be expandedUntil fringe consists of only non-terminals
31Another Possible Parse of x – 2 * y This doesn’t terminateWrong choice of expansion leads to non-terminationNon-termination is a bad property for a parser to haveParser must make the right choiceconsuming no input !
32Top-down parsers cannot handle left-recursive grammars Left RecursionTop-down parsers cannot handle left-recursive grammarsFormally,A grammar is left recursive if A NT such that a sequence of productions A + A, for some string (NT T )+Our expression grammar is left recursiveThis can lead to non-termination in a top-down parserFor a top-down parser, any recursion must be right recursionWe would like to convert the left recursion to right recursionNon-termination is a bad property in any part of a compiler
33Eliminating Left Recursion To remove left recursion, we can transform the grammarConsider a grammar fragment of the formFoo Foo | where neither nor start with FooWe can rewrite this asFoo BarBar Bar| where Bar is a new non-terminalThis accepts the same language, but uses only right recursion
34Eliminating Left Recursion The expression grammar contains two cases of left recursion We can eliminate both of them without changing the language
35Eliminating Left Recursion These fragments use only right recursionThey retain the original left associativity
36Eliminating Left Recursion This grammar is correct, if somewhat non-intuitive.It is left associative, as was the originalA top-down parser will terminate using it.A top-down parser may need to backtrack with it.
37Eliminating Left Recursion The transformation eliminates immediate left recursionWhat about more general, indirect left recursion ?The general algorithm:arrange the NTs into some order A1, A2, …, Anfor i 1 to nfor s 1 to i – 1replace each production Ai As with Ai 12k,where As 12k are all the current productions for Aseliminate any immediate left recursion on Aiusing the direct transformationThis assumes that the initial grammarhas no cycles (Ai + Ai )no epsilon productions
38Eliminating Left Recursion How does this algorithm work?Impose arbitrary order on the non-terminalsOuter loop cycles through NT in orderInner loop ensures that a production expanding Ai has no non-terminal As in its rhs, for s < ILast step in outer loop converts any direct recursion on Ai to right recursion using the transformation shown earlierNew non-terminals are added at the end of the order and have no left recursionAt the start of the ith outer loop iterationFor all k < i, no production that expands Ak contains a non-terminal As in its rhs, for s < k
39Example : Eliminating Left Recursion Order of symbols: G, E, TG EE E + TE TT E - TT id
40Example : Eliminating Left Recursion Order of symbols: G, E, T1. Ai = GG EE E + TE TT E - TT id
41Example : Eliminating Left Recursion Order of symbols: G, E, T1. Ai = GG EE E + TE TT E - TT id2. Ai = EG EE T E'E' + T E'E' eT E - TT id
42Example : Eliminating Left Recursion Order of symbols: G, E, T1. Ai = GG EE E + TE TT E - TT id2. Ai = EG EE T E'E' + T E'E' eT E - TT id3. Ai = T, As = EG EE T E'E' + T E'E' eT T E' - TT id
43Example : Eliminating Left Recursion Order of symbols: G, E, T1. Ai = GG EE E + TE TT E - TT id2. Ai = EG EE T E'E' + T E'E' eT E - TT id3. Ai = T, As = EG EE T E'E' + T E'E' eT T E' - TT id4. Ai = TG EE T E'E' + T E'E' eT id T'T' E' - T T'T' e
44Detecting Ambiguity A aA | B B bB | b What does that tell us? One leftmost derivationA aA aB abAnother leftmost derivationA B bWhat does that tell us?Nothing!Need multiple leftmost derivationfor the same string
45Detecting AmbiguityA aA| BB bB| b| aAbOne leftmost derivationA aA aB abB abbAnother leftmost derivationA B aAb aBb abbWhen a prefix (containing at least one NT) of alternate rules areidentical the grammar is ambiguousX 1 | 2