Syntax Analysis The recognition problem: given a grammar G and a string w, is w L(G)? The parsing problem: if G is a grammar and w L(G), how can w be derived in G? Both of these problems are decidable - that is, there are algorithms which will give a definite (correct) yes or no answer for any given instance of the problems. Parsing is important, because understanding the derivation of a structure helps us to understand the meaning of the structure.
Derivation Structure Consider the expression in the language G 0 : a +( a * a) In order to process this expression, it helps to consider the (a*a) substring as a more significant sub-unit than a+(a, for example. We can use the derivation of the string: 1) S -> S + S 2) S -> S * S 3) S -> (S) 4) S -> a.
Derivation Structure Consider the expression in the language G 0 : a +( a * a) In order to process this expression, it helps to consider the (a*a) substring as a more significant sub-unit than a+(a, for example. We can use the derivation of the string: S => S+S => S+(S) => S+(S*S) => S+(S*a) => S+(a*a) => a+(a*a). S S + S ( S ) S * S aa a 1) S -> S + S 2) S -> S * S 3) S -> (S) 4) S -> a.
Derivation Trees For any derivation, we can construct a derivation tree. The root of the tree will be a node representing the start symbol. Every time we apply a production A -> , we add a subtree below A A is the root, and there is a branch for every symbol of , in the same left-to-right order in which they appear in . We read the string represented by the derivation tree by reading the "leaf" nodes in left-to-right order. Note: "left-to-right" order means the "structural" order - the leftmost path, then the same path, but with the next-to-left branch at the last node where there was a choice, etc. - and not any order which may appear in the sketch.
S => S+S => S+(S) => S+(S*S) => S+(S*a) => S+(a*a) => a+(a*a). S S S + S ( S ) => S S + S S S + S ( S ) S * S S S + S ( S ) S * S aa S S + S ( S ) S * S aa a S S + S ( S ) S * S a
Equivalent Derivations Two different derivations can have the same derivation tree. Example: S => S+S => S+a => a+a and S => S+S => a+S => a+a both produce the tree S S + S a a In CFG's, the order of applying productions is irrelevant, as long as the same production is applied to the same symbol. 1) S -> S + S 2) S -> S * S 3) S -> (S) 4) S -> a.
Multiple Derivation Trees Consider the two derivations below: 1. S => S+S => S+S*S => S+S*a => S+a*a => a+a*a 2. S => S*S => S*a=> S+S*a => S+a*a => a+a*a These give essentially different derivation trees for the same final sentence. S S a + S S * S a a 1. S S a+ S S * S a a 2. This causes problems for our attempt to understand a string by considering its derivation. 1) S -> S + S 2) S -> S * S 3) S -> (S) 4) S -> a.
Ambiguous Grammars A derivation in which at each step the rightmost non-terminal is replaced is a right derivation. In a right derivation, the order of symbols to be replaced is fixed. A string has two different right derivations iff it has two different derivation trees. A CFG is ambiguous if there is at least one string in L(G) having two or more different right derivations (or, equally, two or more different derivation trees).
The Problem With Ambiguity By the previous example, the grammar of algebraic expressions, G 0, is ambiguous. Problem: 2+2*2 = ? Under derivation 1., we get 2 + (2*2) = 6. Under derivation 2., we get (2+2)*2 = 8. Which do we select? Why is this a problem? Suppose we are attempting to analyse strings in the language of G 0, in order to perform simple arithmetic - the structure of the derivation will tell us which operation to apply when. 1) S -> S + S 2) S -> S * S 3) S -> (S) 4) S -> a.
Unambiguous Expressions We are aiming to produce an unambiguous version of G 0. Essentially, we want to assign priorities to the operators, and reflect this in the grammar. Also, although it makes no difference to the evaluated expression, we want a+a+a to be (a+a)+a. We will do this by introducing new symbols - a term, T, will represent a product; a factor, F, will represent things that can be multiplied; and S will represent sums. An expression can be a sum of an expression and a term, or simply a term. A term can be a product of a term and a factor, or simply a factor. A factor can be an expression (in parentheses), or simply a symbol. 1) S -> S + S 2) S -> S * S 3) S -> (S) 4) S -> a.
Unambiguous Expressions We are aiming to produce an unambiguous version of G 0. Essentially, we want to assign priorities to the operators, and reflect this in the grammar. Also, although it makes no difference to the evaluated expression, we want a+a+a to be (a+a)+a. Example: Grammar G 1. S -> S + T | T T -> T * F | F F -> (S) | a We will do this by introducing new symbols - a term, T, will represent a product; a factor, F, will represent things that can be multiplied; and S will represent sums. An expression can be a sum of an expression and a term, or simply a term. A term can be a product of a term and a factor, or simply a factor. A factor can be an expression (in parentheses), or simply a symbol. 1) S -> S + S 2) S -> S * S 3) S -> (S) 4) S -> a.
Ambiguity and Decidability The ambiguity we have seen so far has always been a property of the grammar, and not of the langauge. However, there exist languages for which every grammar defining them is ambiguous. Example: {a i b j c k : i = j or j = k } A language for which every defining grammar is ambiguous is inherently ambiguous. More importantly, there is no algorithm which will determine whether or not a given grammar is ambiguous.