Introduction Parsers can check whether a word matches a certain grammar, and provide one or more syntactic analyses. There are two basic types: – Top-down parsing Directional (goes from left to right). Non-directional. – Bottom-up parsing
Introduction Today we will discuss directional top-down parsing.
Directional Top-down Parsing Begin with the start symbol S. Apply productions until we arrive at the input string. We draw the prediction right under the part of the input it predicts.
Imitating Leftmost derivations The grammar form consists of both terminals and non-terminals. If a terminal symbol is in front, we match it with the current input symbol, if non-terminal is in front, we pick one of its right-hand sides. This way we all the time replace leftmost non- terminal, and in the end, if we succeed, we have imitated a leftmost derivation.
Example This is our grammar: Input sentence is aabb.
Example ctd. We try to rederive the input aabb from the start symbol S. The first symbol of our prediction is non-terminal, so we have to replace it by one of its right-hand sides. S → aB | bA We apply the first option, because the terminals match. Now we have to parse abb, and we match terminals again.. B → b | bS | aBB
Example ctd. We're now left with BB for bb. B → b | bS | aBB Then we have to replace leftmost B by one of its choices (B → b). In the end we receive the following derivation: S → aB → aaBB → aabB → aabb
Push-down automaton. A stack is FILO list. The PDA operates by popping the stack (that contains stack alphabet) and reading an input symbol. These two symbols give us a choice of several lists of stack symbols to be pushed back on the stack. So there is a mapping of (input symbol, stack symbol) pairs to lists of stack symbols. The automaton accepts the input sentence when the stack is empty at the end of the input.
Breadth-first Top-Down Parsing Two different strategies to go through decision tree – breadth-first and depth-first. In breadth-first we maintain a list of all possible predictions. We process it in the following way: If there's non-terminal on top, we replace the prediction stack by several new predictions stacks, depending on the choices for this non- terminal If we have a terminal, we can eliminate all the prediction stacks that do not match.
Example Grammar: S → AB | DC A → a | aA B → bc | bBc D → ab | aDb C → c | cC Input: aabc
Depth-first (Backtracking) Parsers The breadth-first method uses too much memory, because it stores a list of all possible predictions. The depth-first method doesn't have this problem because we look at only one path at a time. Firstly we examine the path, if it turns out to be a failure, we roll back our actions and continue with other possibilities.
Backtracking Sometimes we have multiple right-hand sides and we have to choose one. But if we choose the wrong one, we come to a dead end. So, we have to go back to the point where we made the choice, and try an alternative path. We do this until we succeed, or run out of choices.
Example Backtracking over a terminal is done by moving a vertical line backwards.
Conclusion We always process the leftmost symbol of the prediction. If this symbol is a terminal, we have no choice: we have to match it with the current input symbol or reject the parse. If this symbol is a non-terminal, we have to make a prediction, it has to be replaced by one of its right-hand sides. Thus, we always process the leftmost non-terminal first, so we get a leftmost derivation. As a result, a top-down method recognizes the nodes of the parse tree in pre-order: the parent is identified before any of its children.