# CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language.

## Presentation on theme: "CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language."— Presentation transcript:

CS466(Prasad)L7Parse1 Parsing Recognition of strings in a language

CS466(Prasad)L7Parse2 Graph of a Grammar Represents leftmost derivations of a CFG. path –A path from node S to a node w is a leftmost derivation. NodesLeft sentential forms Arc labelsProduction rules RootStart Symbol LeavesSentences

CS466(Prasad)L7Parse3 S aSbB aaSabB abaB bbS bbC …

CS466(Prasad)L7Parse4 Properties of Graph of a Grammar Every node has a finite number of children. Simple breadth-first enumeration feasible. The number of leaves is infinite if the language is infinite. Typical case. There can be infinite long paths (derivations). Loops in depth-first traversals.

CS466(Prasad)L7Parse5 S aSSb aaSaab aSb abb Sbb … (Illustrates ambiguity in the grammar.) ab Directed Acyclic Graph

CS466(Prasad)L7Parse6 (Illustrates ambiguous grammar with cycles.) Cyclic structure S SS SSS

CS466(Prasad)L7Parse7 Parser A program that determines if a string by constructing a derivation. Equivalently, it searches the graph of G. –Top-down parsers Constructs the derivation tree from root to leaves. Leftmost derivation. –Bottom-up parsers Constructs the derivation tree from leaves to root. Rightmost derivation in reverse.

CS466(Prasad)L7Parse8 S SS S S SS a S SS ab Leftmost derivation Derivation Trees

CS466(Prasad)L7Parse9 S SS S S SS b S SS S S SS a ab b Rightmost Derivation in Reverse Rightmost derivation Derivation Trees SS

CS466(Prasad)L7Parse10 Top-down parsers: Breadth-first vs Depth-first Search the graph of a grammar breadth-first Uses: Queue (+) Always terminates with shortest derivation (-) Inefficient in general. Search the graph of a grammar depth-first Uses: Stack (-) Can get into infinite loops (e.g., left recursion) (+) Efficient in general.

CS466(Prasad)L7Parse11 Determining when Number of terminals in sentential form > length of w Prefix of sentential form preceding the leftmost non-terminal not a prefix of w. No rules applicable to sentential form.

CS466(Prasad)L7Parse13 Breadth-first top-down parser S A TA+T b(A) T+T A+T+T (T)(A+T) (b)((A)) … … … T+T+T A+T+T+T …… Queue-up left sentential forms level by level (T)+T (A)+T (b)+T (b)+b Parse successful

CS466(Prasad)L7Parse14 Depth-first top-down parser S A TA+T b(A) T+T A+T+T (T)(A+T) (b)((A)) … T+T+TA+T+T+T …… Use stack to pursue entire path from left Backtrack On failure Parse fails

CS466(Prasad)L7Parse15 Summary In BFTD version, all left derivations investigated in parallel. In DFTD version, one specific derivation is pursued to completion. Done, if succeeds. Otherwise, backtrack and investigate another path. (Incomplete strategy) (Used by Prolog interpreter)

CS466(Prasad)L7Parse16 Bottom-up parsing (b)+b (T)+b (b)+T (T)+T Not allowed (b)+A(T)+T … … (A)+b (A)+T(S)+bT+b A+b A+TSA Parse successful

CS466(Prasad)L7Parse17 Practical Parsers Language/Grammar designed to enable deterministic (directed and backtrack-free) searches. Uses lookahead tokens and/or exploits the context in the sentential form constructed so far. “Look before you leap.” vs “Procrastination principle.” –Top-down parsers : LL(k) languages E.g., Pascal, Ada, etc. Better error diagnosis and recovery. –Bottom-up parsers : LALR(1), LR(k) languages E.g., C/C++, Java, etc. Handles left recursion in the grammar. –Backtracking parsers E.g., Prolog interpreter.