Bottom-up parsing Pumping Theorem for CFLs MA/CSSE 474 Theory of Computation.

Slides:



Advertisements
Similar presentations
Context-Free and Noncontext-Free Languages
Advertisements

Theory of Computation CS3102 – Spring 2014 A tale of computers, math, problem solving, life, love and tragic death Nathan Brunelle Department of Computer.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
Pushdown Automata Chapter 12. Recognizing Context-Free Languages We need a device similar to an FSM except that it needs more power. The insight: Precisely.
Nathan Brunelle Department of Computer Science University of Virginia Theory of Computation CS3102 – Spring 2014 A tale.
CS 3240: Languages and Computation Properties of Context-Free Languages.
Pushdown Automata Part II: PDAs and CFG Chapter 12.
1 Introduction to Computability Theory Lecture7: The Pumping Lemma for Context Free Languages Prof. Amos Israeli.
Introduction to Computability Theory
Lecture 15UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 15.
Transparency No. P2C5-1 Formal Language and Automata Theory Part II Chapter 5 The Pumping Lemma and Closure properties for Context-free Languages.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
CS5371 Theory of Computation Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL)
Normal forms for Context-Free Grammars
Context Free Pumping Lemma Zeph Grunschlag. Agenda Context Free Pumping Motivation Theorem Proof Proving non-Context Freeness Examples on slides Examples.
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.3: Non-Context-Free Languages) David Martin With.
Transparency No. P2C5-1 Formal Language and Automata Theory Part II Chapter 5 The Pumping Lemma and Closure properties for Context-free Languages.
CS 3240 – Chapter 8.  Is a n b n c n context-free? CS Properties of Context-Free Languages2.
1 Background Information for the Pumping Lemma for Context-Free Languages Definition: Let G = (V, T, P, S) be a CFL. If every production in P is of the.
Today Chapter 2: (Pushdown automata) Non-CF languages CFL pumping lemma Closure properties of CFL.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
INHERENT LIMITATIONS OF COMPUTER PROGRAMS CSci 4011.
INHERENT LIMITATIONS OF COMPUTER PROGAMS CSci 4011.
Problem of the DAY Create a regular context-free grammar that generates L= {w  {a,b}* : the number of a’s in w is not divisible by 3} Hint: start by designing.
1 Computer Language Theory Chapter 2: Context-Free Languages.
Pushdown Automata (PDA) Part 2 While I am distributing graded exams: Design a PDA to accept L = { a m b n : m  n; m, n > 0} MA/CSSE 474 Theory of Computation.
CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our.
Pushdown Automata (PDA) Intro
The Pumping Lemma for Context Free Grammars. Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar.
TM Design Universal TM MA/CSSE 474 Theory of Computation.
Design contex-free grammars that generate: L 1 = { u v : u ∈ {a,b}*, v ∈ {a, c}*, and |u| ≤ |v| ≤ 3 |u| }. L 2 = { a p b q c p a r b 2r : p, q, r ≥ 0 }
CSCI 2670 Introduction to Theory of Computing September 22, 2005.
Context-Free and Noncontext-Free Languages Chapter 13 1.
1 L= { w c w R : w  {a, b}* } is accepted by the PDA below. Use a construction like the one for intersection for regular languages to design a PDA that.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 11 Midterm Exam 2 -Context-Free Languages Mälardalen University 2005.
Pushdown Automata Chapters Generators vs. Recognizers For Regular Languages: –regular expressions are generators –FAs are recognizers For Context-free.
Cs3102: Theory of Computation Class 8: Non-Context-Free Languages Spring 2010 University of Virginia David Evans.
Context-Free and Noncontext-Free Languages Chapter 13 1.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
Properties of Regular Languages
Non-CF Languages The language L = { a n b n c n | n  0 } does not appear to be context-free. Informal: A PDA can compare #a’s with #b’s. But by the time.
Context-Free and Noncontext-Free Languages Chapter 13.
CSCI 2670 Introduction to Theory of Computing September 23, 2004.
Chapter 8 Properties of Context-free Languages These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata,
Transparency No. P2C5-1 Formal Language and Automata Theory Part II Chapter 5 The Pumping Lemma and Closure properties for Context-free Languages.
January 20, 2016CS21 Lecture 71 CS21 Decidability and Tractability Lecture 7 January 20, 2016.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
Lecture 6: Context-Free Languages
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
CS 154 Formal Languages and Computability March 17 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron Mak.
 2004 SDU Lecture8 NON-Context-free languages.  2004 SDU 2 Are all languages context free? Ans: No. # of PDAs on  < # of languages on  Pumping lemma:
Complexity and Computability Theory I Lecture #12 Instructor: Rina Zviel-Girshin Lea Epstein.
CSE 105 Theory of Computation Alexander Tsiatas Spring 2012 Theory of Computation Lecture Slides by Alexander Tsiatas is licensed under a Creative Commons.
MA/CSSE 474 Theory of Computation How many regular/non-regular languages are there? Closure properties of Regular Languages (if there is time) Pumping.
Context-Free and Noncontext-Free Languages Chapter 13.
Chapter Fourteen: The Context-Free Frontier
Definition: Let G = (V, T, P, S) be a CFL
Context-Free Grammars
COSC 3340: Introduction to Theory of Computation
Bottom-up parsing Pumping Theorem for CFLs
COSC 3340: Introduction to Theory of Computation
CS21 Decidability and Tractability
MA/CSSE 474 Theory of Computation
Pumping Theorem Examples
Pushdown Automata (PDA) Part 2
Pumping Theorem for CFLs
MA/CSSE 474 Theory of Computation
Presentation transcript:

Bottom-up parsing Pumping Theorem for CFLs MA/CSSE 474 Theory of Computation

Your Questions? Previous class days' material Reading Assignments HW10 or 11 problems Exam 2 problems Anything else

PDAs and Context-Free Grammars Theorem: The class of languages accepted by PDAs is exactly the class of context-free languages. Recall: context-free languages are languages that can be defined with context-free grammars. Restate theorem: Can describe with context-free grammar Can accept by PDA

Recap: Going One Way Lemma: Each context-free language is accepted by some PDA. Proof (by construction): The idea: Let the stack do the work. Two approaches: Top down Bottom up

Example from Yesterday L = { a n b m c p d q : m + n = p + q} 0 (p, ,  ), (q, S) (1)S  a S d 1 (q, , S), (q, a S d ) (2)S  T2 (q, , S), (q, T) (3)S  U3 (q, , S), (q, U) (4)T  a T c 4 (q, , T), (q, a T c ) (5)T  V5 (q, , T), (q, V) (6)U  b U d 6 (q, , U), (q, b U d ) (7)U  V7 (q, , U), (q, V) (8)V  b V c 8 (q, , V), (q, b V c ) (9)V   9 (q, , V), (q,  ) 10 (q, a, a ), (q,  ) 11 (q, b, b ), (q,  ) input = a a b c c d 12 (q, c, c ), (q,  ) 13 (q, d, d ), (q,  ) transstate unread input stack

The Other Way to Build a PDA - Directly L = { a n b m c p d q : m + n = p + q} (1) S  a S d (6) U  b U d (2) S  T (7) U  V (3) S  U (8) V  b V c (4) T  a T c (9) V   (5) T  V input = a a b c d d

The Other Way to Build a PDA - Directly L = { a n b m c p d q : m + n = p + q} (1) S  a S d (6) U  b U d (2) S  T (7) U  V (3) S  U (8) V  b V c (4) T  a T c (9) V   (5) T  V input = a a b c d d 1234 a//aa//ab//ab//ac/a/c/a/ d/a/d/a/ b//ab//ac/a/c/a/ d/a/d/a/ c/a/c/a/ d/a/d/a/ d/a/d/a/

Notice the Nondeterminism Machines constructed with the algorithm are often nondeterministic, even when they needn't be. This happens even with trivial languages. Example: A n B n = { a n b n : n  0} A grammar for A n B n is:A PDA M for A n B n is: (0) ((p, ,  ), (q, S)) [1] S  a S b (1) ((q, , S), (q, a S b )) [2] S   (2) ((q, , S), (q,  )) (3) ((q, a, a ), (q,  )) (4) ((q, b, b ), (q,  )) But transitions 1 and 2 make M nondeterministic. A directly constructed machine for A n B n can be deterministic. Constructing deterministic top-down parsers major topic in CSSE 404.

Bottom-Up PDA (1) E  E + T (2) E  T (3) T  T  F (4) T  F (5) F  (E) (6) F  id Reduce Transitions: (1) (p, , T + E), (p, E) (2) (p, , T), (p, E) (3) (p, , F  T), (p, T) (4) (p, , F), (p, T) (5) (p, , )E( ), (p, F) (6) (p, , id), (p, F) Shift Transitions: (7) (p, id,  ), (p, id) (8) (p, (,  ), (p, () (9) (p, ),  ), (p, )) (10) (p, +,  ), (p, +) (11) (p, ,  ), (p,  ) The idea: Let the stack keep track of what has been found. Discover a rightmost derivation in reverse order. Start with the sentence and try to "pull it back" (reduce) to S. When the right side of a production is on the top of the stack, we can replace it by the left side of that production… …or not! That's where the nondeterminism comes in: choice between shift and reduce; choice between two reductions. Parse the string: id + id * id

Hidden: Solution to bottom-up example A bottom-up parser is sometimes called a shift-reduce parser. Show how it works on id + id * id State stack remaining input transition to use p  id + id * id 7 p id + id * id 6 p F + id * id 4 p T + id * id 2 p E + id * id 10 p +E id * id 7 p id+E * id 6 p F+E * id 4 p T+E * id 11 p *T+E id 7 p id*T+E  6 p F*T+E  3 p T+E  1 p E  0 q  

A Bottom-Up Parser Top-down parser discovers a leftmost derivation of the input string (If any). Bottom-up parser discovers a rightmost derivation (in reverse order) The outline of M is: M = ({p, q}, , V, , p, {q}), where  contains: ● The shift transitions: ((p, c,  ), (p, c)), for each c  . ● The reduce transitions: ((p, , (s 1 s 2 …s n.) R ), (p, X)), for each rule X  s 1 s 2 …s n. in G. Undoes an application of this rule. ● The finish-up transition: ((p, , S), (q,  )).

Acceptance by PDA  derived from CFG Much more complex than the other direction. Nonterminals in the grammar we build from the PDA M are based on a combination of M's states and stack symbols. It gets very messy. Takes 10 dense pages in the textbook. I think we can use our limited course time better.

How Many Context-Free Languages Are There? (we had a slide just like this for regular languages) Theorem: There is a countably infinite number of CFLs. Proof: ● Upper bound: we can lexicographically enumerate all the CFGs. ● Lower bound: {a}, {aa}, {aaa}, … are all CFLs. The number of languages is uncountable. Thus there are more languages than there are context- free languages. So there must be some languages that are not context- free.

Languages That Are and Are Not Context-Free a * b * is regular. A n B n = { a n b n : n  0} is context-free but not regular. A n B n C n = { a n b n c n : n  0} is not context-free. Is every regular language also context-free?

Showing that L is Context-Free Techniques for showing that a language L is context-free: 1. Exhibit a context-free grammar for L. 2. Exhibit a PDA for L. 3. Use the closure properties of context-free languages. Unfortunately, these are weaker than they are for regular languages. union,reverse, concatenation, Kleene star intersection of CFL with a regular language NOT intersection, complement, set difference

Showing that L is Not Context-Free Remember the pumping argument for regular languages:

A Review of Parse Trees A parse tree, derived from a grammar G = (V, , R, S), is a rooted, ordered tree in which: ● Every leaf node is labeled with an element of   {  }, ● The root node is labeled S, ● Every other node is labeled with some element of V - , ● If m is a non-leaf node labeled X and the children of m are labeled x 1, x 2, …, x n, then the rule X  x 1 x 2 … x n is in R.

Some Tree Basics The height h of a tree is the length of the longest path from the root to any leaf. The branching factor b of a tree is the largest number of children associated with any node in the tree. Theorem: The length of the yield of any tree T with height h and branching factor b is  b h. Done in CSSE 230.

From Grammars to Trees Given a context-free grammar G: ● Let n be the number of nonterminal symbols in G. ● Let b be the branching factor of G Suppose that a tree T is generated by G and no nonterminal appears more than once on any path: The maximum height of T is: The maximum length of T’s yield is:

The Context-Free Pumping Theorem This time we use parse trees, not machines, as the basis for our argument. Suppose L(G) contains a string w such that |w| is greater than b n ; then its parse tree must look like (for some nonterminal X): Let T be a parse tree for w such that there is no other parse tree for w (generated from G) that has fewer nodes than T. X[1] is the lowest place in the tree for which this happens. I.e., there is no other X in the derivation of x from X[2].

The Context-Free Pumping Theorem There is another derivation in G: S  * uXz  * uxz, in which, at the point labeled [1], the nonrecursive rule 2 is used instead. So uxz is also in L(G).

The Context-Free Pumping Theorem There are infinitely many derivations in G, such as: S  * uXz  * uvXyz  * uvvXyyz  * uvvxyyz Those derivations produce the strings: uv 2 xy 2 z, uv 3 xy 3 z, uv 4 xy 4 z, … So all of those strings are also in L(G).

The Context-Free Pumping Theorem If rule 1 is X  X a, we could have v = . If rule 1 is X  a X, we could have y = . But it is not possible that both v and y are . If they were, then the derivation S  * uXz  * uxz would also yield w and it would create a parse tree with fewer nodes. But that contradicts the assumption that we started with a tree with the smallest possible number of nodes.

The Context-Free Pumping Theorem The height of the subtree rooted at [1] is at most: So |vxy| .

The Context-Free Pumping Theorem If L is a context-free language, then  k  1 (  strings w  L, where |w|  k (  u, v, x, y, z (w = uvxyz, vy  , |vxy|  k, and  q  0 (uv q xy q z is in L)))). Write it in contrapositive form

Regular vs CF Pumping Theorems Similarities: ● We don't get to choose k. ● We choose w, the string to be pumped, based on k. ● We don't get to choose how w is broken up (into xyz or uvxyz) ● We choose a value for q that shows that w isn’t pumpable. ● We may apply closure theorems before we start. Things that are different in CFL Pumping Theorem: ● Two regions, v and y, must be pumped in tandem. ● We don’t know anything about where in the strings v and y will fall. All we know is that they are reasonably “close together”, i.e., |vxy|  k. ● Either v or y could be empty, but not both.

An Example of Pumping: A n B n C n A n B n C n = { a n b n c n, n  0} Choose w = a k b k c k 1 | 2 | 3 (the regions: all a's, all b's, all c's) If either v or y spans two regions, then let q = 2 (i.e., pump in once). The resulting string will have letters out of order and thus not be in A n B n C n. If both v and y each contain only one distinct character, set q to 2. Additional copies of at most two different characters are added, leaving the third unchanged. There are no longer equal numbers of the three letters, so the resulting string is not in A n B n C n.

An Example of Pumping: {, n  0} L = {, n  0} The elements of L: nw 0  1 a1a1 2 a4a4 3 a9a9 4 a 16 5 a 25 6 a 36

Hidden: Pumping Example : { : n  0} L = {, n  0}. For any given k > 0, Let n = k 2, then n 2 = k 4. Let w =. vy must be a p, for some nonzero p. Set q to 2. The resulting string, s, is. It must be in L. But it isn’t in L because it is too short: w:next longer string in L: (k 2 ) 2 a ’s (k 2 + 1) 2 a ’s k 4 a ’sk 4 + 2k a ’s For s to be in L, p = |vy| would have to be at least 2k But |vxy|  k, so p can’t be that large. Thus s is not in L and L is not context-free.

Another Example of Pumping L = { a n b m a n, n, m  0 and n  m}. Let w = a k b k a k aaa … aaabbb … bbbaaa … aaa | 1 | 2 | 3 |

Hidden: L = { a n b m a n, n, m  0 and n  m}. Let w = a k b k a k aaa … aaabbb … bbbaaa … aaa | 1 | 2 | 3 | If either v or y crosses regions, then set q to 2 ( thus pumping in once). The resulting string will have letters out of order and so not be in L. So in all the re-maining cases we assume that v and y each falls within a single region. ( 1, 1): Both v and y fall in region 1. Set q to 2. In the resulting string, the first group of as is longer than the second group of as. So the string is not in L. ( 2, 2): Both v and y fall in region 2. Set q to 2. In the resulting string, the b region is longer than either of the a regions. So the string is not in L ( 3, 3): Both v and y fall in region 3. Set q to 0. The same argument as for ( 1, 1). ( 1, 2): Nonempty v falls in region 1 and nonempty y falls in region 2. ( If either v or y is empty, it does not matter where it falls. So we can treat it as though it falls in the same region as the nonempty one. We have already considered all of those cases.) Set q to 2. In the resulting string, the first group of as is longer than the second group of as. So the string is not in L. ( 2, 3): Nonempty v falls in region 2 and nonempty y falls in region 3. Set q to 2. In the resulting string the second group of as is longer than the first group of as. So the string is not in L. ( 1, 3): Nonempty v falls in region 1 and nonempty y falls in region 3. If this were allowed by the other conditions of the Pumping Theorem, we could pump in as and still produce strings in L. But if we pumped out, we would violate the requirement that the a regions be at least as long as the b region. More importantly, this case violates the requirement that So it need not be considered. There is no way to divide w into uvxyz such that all the conditions of the Pumping Theorem are met. So L is not context- free.

Nested and Cross-Serial Dependencies PalEven = {ww R : w  { a, b }*} a a b b a a The dependencies are nested. W c W = {w c w : w  { a, b }*} a a b c a a b Cross-serial dependencies.

Let w = a k b k ca k b k. aaa … aaabbb … bbbcaaa … aaabbb … bbb | 1 | 2 |3| 4 | 5 | Call the part before c the left side and the part after c the right side. ● If v or y overlaps region 3, set q to 0. The resulting string will no longer contain a c. ● If both v and y occur before region 3 or they both occur after region 3, then set q to 2. One side will be longer than the other. ● If either v or y overlaps region 1, then set q to 2. In order to make the right side match, something would have to be pumped into region 4. Violates |vxy|  k. ● If either v or y overlaps region 2, then set q to 2. In order to make the right side match, something would have to be pumped into region 5. Violates |vxy|  k. W c W = {w c w : w  { a, b }*}

Work with another student on these {(ab) n a n b n : n > 0} {x#y : x, y  {0, 1}* and x  y}

Let w = (ab) k a k b k. Divide w into three regions: the ab region, the a region, and the b region. If either v or y crosses the boundary between regions 2 and 3 then pump in once. The resulting string will have characters out of order. We consider the remaining alternatives for where nonempty v and y can occur: (1, 1) If |vy| is odd, pump in once and the resulting string will have characters out of order. If it is even, pump in once. The number of ab’s will no longer match the number of a’s in region 2 or b’s in region 3. (2, 2) Pump in once. More a’s in region 2 than b’s in region 3. (3, 3) Pump in once. More b’s in region 3 than a’s in region 2. v or y crosses the boundary between 1 and 2: Pump in once. Even if v and y are arranged such that the characters are not out of order, there will be more ab pairs than there are b’s in region 3. (1, 3) |vxy| must be less than or equal to k, so this is impossible. Hidden: {(ab) n a n b n : n > 0}

SURPRISINGLY, it is Context-free! We can build a PDA M to accept L. All M has to do is to find one way in which x and y differ. We sketch its construction: M starts by pushing a bottom of stack marker Z onto the stack. Then it nondeterministically chooses to go to state 1 or 2. From state 1, it pushes the characters of x, then after the # starts popping the characters of y. It accepts if the two strings are of different lengths. From state 2, it must accept if two equal-length strings have at least one different character. So M starts pushing a % for each character it sees. It nondeterministically chooses a character on which to stop pushing. It remembers that character in its state (so it branches and there are two similar branches (one for "0" and one for "1" from here on). Next it reads the characters up to the # and does nothing with them. Starting with the first character after the #, it pops one % for each character it reads. When the stack is empty (actually when it contains Z) it checks to see whether the next input character matches the remembered character. If it does not, it accepts. Hidden: {x#y : x, y  {0, 1}* and x  y}