Bottom-up parsing Pumping Theorem for CFLs MA/CSSE 474 Theory of Computation.

Bottom-up parsing Pumping Theorem for CFLs MA/CSSE 474 Theory of Computation

Your Questions? Previous class days' material Reading Assignments HW10 or 11 problems Exam 2 problems Anything else

PDAs and Context-Free Grammars Theorem: The class of languages accepted by PDAs is exactly the class of context-free languages. Recall: context-free languages are languages that can be defined with context-free grammars. Restate theorem: Can describe with context-free grammar Can accept by PDA

Recap: Going One Way Lemma: Each context-free language is accepted by some PDA. Proof (by construction): The idea: Let the stack do the work. Two approaches: Top down Bottom up

Example from Yesterday L = { a n b m c p d q : m + n = p + q} 0 (p, ,  ), (q, S) (1)S  a S d 1 (q, , S), (q, a S d ) (2)S  T2 (q, , S), (q, T) (3)S  U3 (q, , S), (q, U) (4)T  a T c 4 (q, , T), (q, a T c ) (5)T  V5 (q, , T), (q, V) (6)U  b U d 6 (q, , U), (q, b U d ) (7)U  V7 (q, , U), (q, V) (8)V  b V c 8 (q, , V), (q, b V c ) (9)V   9 (q, , V), (q,  ) 10 (q, a, a ), (q,  ) 11 (q, b, b ), (q,  ) input = a a b c c d 12 (q, c, c ), (q,  ) 13 (q, d, d ), (q,  ) transstate unread input stack

The Other Way to Build a PDA - Directly L = { a n b m c p d q : m + n = p + q} (1) S  a S d (6) U  b U d (2) S  T (7) U  V (3) S  U (8) V  b V c (4) T  a T c (9) V   (5) T  V input = a a b c d d

The Other Way to Build a PDA - Directly L = { a n b m c p d q : m + n = p + q} (1) S  a S d (6) U  b U d (2) S  T (7) U  V (3) S  U (8) V  b V c (4) T  a T c (9) V   (5) T  V input = a a b c d d 1234 a//aa//ab//ab//ac/a/c/a/ d/a/d/a/ b//ab//ac/a/c/a/ d/a/d/a/ c/a/c/a/ d/a/d/a/ d/a/d/a/

Notice the Nondeterminism Machines constructed with the algorithm are often nondeterministic, even when they needn't be. This happens even with trivial languages. Example: A n B n = { a n b n : n  0} A grammar for A n B n is:A PDA M for A n B n is: (0) ((p, ,  ), (q, S)) [1] S  a S b (1) ((q, , S), (q, a S b )) [2] S   (2) ((q, , S), (q,  )) (3) ((q, a, a ), (q,  )) (4) ((q, b, b ), (q,  )) But transitions 1 and 2 make M nondeterministic. A directly constructed machine for A n B n can be deterministic. Constructing deterministic top-down parsers major topic in CSSE 404.

Bottom-Up PDA (1) E  E + T (2) E  T (3) T  T  F (4) T  F (5) F  (E) (6) F  id Reduce Transitions: (1) (p, , T + E), (p, E) (2) (p, , T), (p, E) (3) (p, , F  T), (p, T) (4) (p, , F), (p, T) (5) (p, , )E( ), (p, F) (6) (p, , id), (p, F) Shift Transitions: (7) (p, id,  ), (p, id) (8) (p, (,  ), (p, () (9) (p, ),  ), (p, )) (10) (p, +,  ), (p, +) (11) (p, ,  ), (p,  ) The idea: Let the stack keep track of what has been found. Discover a rightmost derivation in reverse order. Start with the sentence and try to "pull it back" (reduce) to S. When the right side of a production is on the top of the stack, we can replace it by the left side of that production… …or not! That's where the nondeterminism comes in: choice between shift and reduce; choice between two reductions. Parse the string: id + id * id

Hidden: Solution to bottom-up example A bottom-up parser is sometimes called a shift-reduce parser. Show how it works on id + id * id State stack remaining input transition to use p  id + id * id 7 p id + id * id 6 p F + id * id 4 p T + id * id 2 p E + id * id 10 p +E id * id 7 p id+E * id 6 p F+E * id 4 p T+E * id 11 p *T+E id 7 p id*T+E  6 p F*T+E  3 p T+E  1 p E  0 q  

A Bottom-Up Parser Top-down parser discovers a leftmost derivation of the input string (If any). Bottom-up parser discovers a rightmost derivation (in reverse order) The outline of M is: M = ({p, q}, , V, , p, {q}), where  contains: ● The shift transitions: ((p, c,  ), (p, c)), for each c  . ● The reduce transitions: ((p, , (s 1 s 2 …s n.) R ), (p, X)), for each rule X  s 1 s 2 …s n. in G. Undoes an application of this rule. ● The finish-up transition: ((p, , S), (q,  )).

Acceptance by PDA  derived from CFG Much more complex than the other direction. Nonterminals in the grammar we build from the PDA M are based on a combination of M's states and stack symbols. It gets very messy. Takes 10 dense pages in the textbook. I think we can use our limited course time better.

How Many Context-Free Languages Are There? (we had a slide just like this for regular languages) Theorem: There is a countably infinite number of CFLs. Proof: ● Upper bound: we can lexicographically enumerate all the CFGs. ● Lower bound: {a}, {aa}, {aaa}, … are all CFLs. The number of languages is uncountable. Thus there are more languages than there are context- free languages. So there must be some languages that are not context- free.

Languages That Are and Are Not Context-Free a * b * is regular. A n B n = { a n b n : n  0} is context-free but not regular. A n B n C n = { a n b n c n : n  0} is not context-free. Is every regular language also context-free?

Showing that L is Context-Free Techniques for showing that a language L is context-free: 1. Exhibit a context-free grammar for L. 2. Exhibit a PDA for L. 3. Use the closure properties of context-free languages. Unfortunately, these are weaker than they are for regular languages. union,reverse, concatenation, Kleene star intersection of CFL with a regular language NOT intersection, complement, set difference

Showing that L is Not Context-Free Remember the pumping argument for regular languages:

A Review of Parse Trees A parse tree, derived from a grammar G = (V, , R, S), is a rooted, ordered tree in which: ● Every leaf node is labeled with an element of   {  }, ● The root node is labeled S, ● Every other node is labeled with some element of V - , ● If m is a non-leaf node labeled X and the children of m are labeled x 1, x 2, …, x n, then the rule X  x 1 x 2 … x n is in R.

Some Tree Basics The height h of a tree is the length of the longest path from the root to any leaf. The branching factor b of a tree is the largest number of children associated with any node in the tree. Theorem: The length of the yield of any tree T with height h and branching factor b is  b h. Done in CSSE 230.

From Grammars to Trees Given a context-free grammar G: ● Let n be the number of nonterminal symbols in G. ● Let b be the branching factor of G Suppose that a tree T is generated by G and no nonterminal appears more than once on any path: The maximum height of T is: The maximum length of T’s yield is:

The Context-Free Pumping Theorem This time we use parse trees, not machines, as the basis for our argument. Suppose L(G) contains a string w such that |w| is greater than b n ; then its parse tree must look like (for some nonterminal X): Let T be a parse tree for w such that there is no other parse tree for w (generated from G) that has fewer nodes than T. X[1] is the lowest place in the tree for which this happens. I.e., there is no other X in the derivation of x from X[2].

The Context-Free Pumping Theorem There is another derivation in G: S  * uXz  * uxz, in which, at the point labeled [1], the nonrecursive rule 2 is used instead. So uxz is also in L(G).

The Context-Free Pumping Theorem There are infinitely many derivations in G, such as: S  * uXz  * uvXyz  * uvvXyyz  * uvvxyyz Those derivations produce the strings: uv 2 xy 2 z, uv 3 xy 3 z, uv 4 xy 4 z, … So all of those strings are also in L(G).

The Context-Free Pumping Theorem If rule 1 is X  X a, we could have v = . If rule 1 is X  a X, we could have y = . But it is not possible that both v and y are . If they were, then the derivation S  * uXz  * uxz would also yield w and it would create a parse tree with fewer nodes. But that contradicts the assumption that we started with a tree with the smallest possible number of nodes.

The Context-Free Pumping Theorem The height of the subtree rooted at [1] is at most: So |vxy| .

The Context-Free Pumping Theorem If L is a context-free language, then  k  1 (  strings w  L, where |w|  k (  u, v, x, y, z (w = uvxyz, vy  , |vxy|  k, and  q  0 (uv q xy q z is in L)))). Write it in contrapositive form

Regular vs CF Pumping Theorems Similarities: ● We don't get to choose k. ● We choose w, the string to be pumped, based on k. ● We don't get to choose how w is broken up (into xyz or uvxyz) ● We choose a value for q that shows that w isn’t pumpable. ● We may apply closure theorems before we start. Things that are different in CFL Pumping Theorem: ● Two regions, v and y, must be pumped in tandem. ● We don’t know anything about where in the strings v and y will fall. All we know is that they are reasonably “close together”, i.e., |vxy|  k. ● Either v or y could be empty, but not both.

An Example of Pumping: A n B n C n A n B n C n = { a n b n c n, n  0} Choose w = a k b k c k 1 | 2 | 3 (the regions: all a's, all b's, all c's) If either v or y spans two regions, then let q = 2 (i.e., pump in once). The resulting string will have letters out of order and thus not be in A n B n C n. If both v and y each contain only one distinct character, set q to 2. Additional copies of at most two different characters are added, leaving the third unchanged. There are no longer equal numbers of the three letters, so the resulting string is not in A n B n C n.

An Example of Pumping: {, n  0} L = {, n  0} The elements of L: nw 0  1 a1a1 2 a4a4 3 a9a9 4 a 16 5 a 25 6 a 36

Hidden: Pumping Example : { : n  0} L = {, n  0}. For any given k > 0, Let n = k 2, then n 2 = k 4. Let w =. vy must be a p, for some nonzero p. Set q to 2. The resulting string, s, is. It must be in L. But it isn’t in L because it is too short: w:next longer string in L: (k 2 ) 2 a ’s (k 2 + 1) 2 a ’s k 4 a ’sk 4 + 2k 2 + 1 a ’s For s to be in L, p = |vy| would have to be at least 2k 2 + 1. But |vxy|  k, so p can’t be that large. Thus s is not in L and L is not context-free.

Another Example of Pumping L = { a n b m a n, n, m  0 and n  m}. Let w = a k b k a k aaa … aaabbb … bbbaaa … aaa | 1 | 2 | 3 |

Hidden: L = { a n b m a n, n, m  0 and n  m}. Let w = a k b k a k aaa … aaabbb … bbbaaa … aaa | 1 | 2 | 3 | If either v or y crosses regions, then set q to 2 ( thus pumping in once). The resulting string will have letters out of order and so not be in L. So in all the re-maining cases we assume that v and y each falls within a single region. ( 1, 1): Both v and y fall in region 1. Set q to 2. In the resulting string, the first group of as is longer than the second group of as. So the string is not in L. ( 2, 2): Both v and y fall in region 2. Set q to 2. In the resulting string, the b region is longer than either of the a regions. So the string is not in L ( 3, 3): Both v and y fall in region 3. Set q to 0. The same argument as for ( 1, 1). ( 1, 2): Nonempty v falls in region 1 and nonempty y falls in region 2. ( If either v or y is empty, it does not matter where it falls. So we can treat it as though it falls in the same region as the nonempty one. We have already considered all of those cases.) Set q to 2. In the resulting string, the first group of as is longer than the second group of as. So the string is not in L. ( 2, 3): Nonempty v falls in region 2 and nonempty y falls in region 3. Set q to 2. In the resulting string the second group of as is longer than the first group of as. So the string is not in L. ( 1, 3): Nonempty v falls in region 1 and nonempty y falls in region 3. If this were allowed by the other conditions of the Pumping Theorem, we could pump in as and still produce strings in L. But if we pumped out, we would violate the requirement that the a regions be at least as long as the b region. More importantly, this case violates the requirement that So it need not be considered. There is no way to divide w into uvxyz such that all the conditions of the Pumping Theorem are met. So L is not context- free.

Nested and Cross-Serial Dependencies PalEven = {ww R : w  { a, b }*} a a b b a a The dependencies are nested. W c W = {w c w : w  { a, b }*} a a b c a a b Cross-serial dependencies.

Let w = a k b k ca k b k. aaa … aaabbb … bbbcaaa … aaabbb … bbb | 1 | 2 |3| 4 | 5 | Call the part before c the left side and the part after c the right side. ● If v or y overlaps region 3, set q to 0. The resulting string will no longer contain a c. ● If both v and y occur before region 3 or they both occur after region 3, then set q to 2. One side will be longer than the other. ● If either v or y overlaps region 1, then set q to 2. In order to make the right side match, something would have to be pumped into region 4. Violates |vxy|  k. ● If either v or y overlaps region 2, then set q to 2. In order to make the right side match, something would have to be pumped into region 5. Violates |vxy|  k. W c W = {w c w : w  { a, b }*}

Work with another student on these {(ab) n a n b n : n > 0} {x#y : x, y  {0, 1}* and x  y}

Let w = (ab) k a k b k. Divide w into three regions: the ab region, the a region, and the b region. If either v or y crosses the boundary between regions 2 and 3 then pump in once. The resulting string will have characters out of order. We consider the remaining alternatives for where nonempty v and y can occur: (1, 1) If |vy| is odd, pump in once and the resulting string will have characters out of order. If it is even, pump in once. The number of ab’s will no longer match the number of a’s in region 2 or b’s in region 3. (2, 2) Pump in once. More a’s in region 2 than b’s in region 3. (3, 3) Pump in once. More b’s in region 3 than a’s in region 2. v or y crosses the boundary between 1 and 2: Pump in once. Even if v and y are arranged such that the characters are not out of order, there will be more ab pairs than there are b’s in region 3. (1, 3) |vxy| must be less than or equal to k, so this is impossible. Hidden: {(ab) n a n b n : n > 0}

SURPRISINGLY, it is Context-free! We can build a PDA M to accept L. All M has to do is to find one way in which x and y differ. We sketch its construction: M starts by pushing a bottom of stack marker Z onto the stack. Then it nondeterministically chooses to go to state 1 or 2. From state 1, it pushes the characters of x, then after the # starts popping the characters of y. It accepts if the two strings are of different lengths. From state 2, it must accept if two equal-length strings have at least one different character. So M starts pushing a % for each character it sees. It nondeterministically chooses a character on which to stop pushing. It remembers that character in its state (so it branches and there are two similar branches (one for "0" and one for "1" from here on). Next it reads the characters up to the # and does nothing with them. Starting with the first character after the #, it pops one % for each character it reads. When the stack is empty (actually when it contains Z) it checks to see whether the next input character matches the remembered character. If it does not, it accepts. Hidden: {x#y : x, y  {0, 1}* and x  y}

Bottom-up parsing Pumping Theorem for CFLs MA/CSSE 474 Theory of Computation.

Similar presentations

Presentation on theme: "Bottom-up parsing Pumping Theorem for CFLs MA/CSSE 474 Theory of Computation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bottom-up parsing Pumping Theorem for CFLs MA/CSSE 474 Theory of Computation.

Similar presentations

Presentation on theme: "Bottom-up parsing Pumping Theorem for CFLs MA/CSSE 474 Theory of Computation."— Presentation transcript:

Similar presentations

About project

Feedback