CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our.

CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata, 4th ed., by Peter Linz, published by Jones and Bartlett Publishers, Inc., Sudbury, MA, 2006. They are intended for classroom use only and are not a substitute for reading the textbook.

The pumping lemma for context-free languages Suppose you have a CFG G in which the variable A is used in two different rules, to derive two different strings, e.g., (1)S  vAz (2)A  wAy (3)A  x We can use these rules, applying rule 2 recursively, to generate the following string: S  vAz  vwAyz  vwwAyyz  vwwwAyyyz ...  vw n xy n z.

The pumping lemma for CFLs Of course, we can apply rule 3 at any point along the way to bring the process to a halt. Thus, the following strings are all legitimate strings in the language: vwxyz, vwwxyyz, vwwwxyyyz, etc. In fact, with rules 2 and 3 in the language, there is no way to prevent the language from containing an infinite number of strings of the form vw n xy n z.

The pumping lemma for CFLs Remember the definition of Chomsky Normal Form grammars: A CFG is in Chomsky Normal Form if every production is of one of these two types: A  BC A  a Remember also that we can put any CFG grammar into CNF (omitting the null string, if it belongs to the original language).

The pumping lemma for CFLs If a grammar is in CNF, then its derivation tree will be binary; that is, every node will have at most two children. Why? There are only 3 possibilities: (1) The node represents the first type of rule above, in which a single variable produces two variables. (2) The node represents the second type of rule above, in which a single variable produces a single terminal. (3) The node is a terminal node and so has no children.

The pumping lemma for CFLs A path in a binary tree is either empty, or consists of a node, one of its descendants, and all of the nodes in between. The length of a path is the number of nodes it contains (for this class, we will us this definition; however, most of the time length and height are in terms of the number of edges, not number of nodes). The height of a binary tree is the length of its longest path.

The pumping lemma for CFLs You could create a very tall binary tree by having all branches be unary. You can create the shortest possible binary tree by having all of its branches be binary, except possibly for some or all of the branches at the bottom level of the tree.

The pumping lemma for CFLs What is the smallest height possible in a binary tree of 7 nodes? How many leaf nodes does it have? height = 3 num. leaves = 4

The pumping lemma for CFLs What is the smallest height possible in a binary tree of (2 n ) - 1 nodes? How many leaf nodes does it have? height = n num. leaves = 2 n-1

The pumping lemma for CFLs Note the pattern here: In a completely filled binary tree with (2 n ) – 1 nodes, half of the nodes (rounding up) will be leaves. That is, (2 n ) / 2 nodes will be leaf nodes. And we can rewrite (2 n ) / 2 as 2 n-1. This leads us to the following lemma:

The pumping lemma for CFLs Lemma: For any h  1, a binary tree which has more than 2 h-1 leaf nodes must have a height greater than h. Example: If a binary tree has 17 leaf nodes, can it have a height of 5? No; a complete binary tree of height 5 has only 16 leaf nodes. A binary tree with 17 leaves must have a height greater than 5.

The pumping lemma for CFLs Here is the point of all this: If the height of the derivation tree for a given string in the language is h, and there are fewer than h production rules in the grammar, then at least one rule must recur on the same path in the derivation of this string.

The pumping lemma for CFLs For a variable to recur farther down in the same path, it must be either: self-recursive (e.g., A  aA) or path-recursive (e.g., A  aB, and B  bA ) In either case, this variable may be pumped an unrestricted number of times.

Theorem 8.1 Let L be a CFL. Then there is an integer m so that for any w  L satisfying |w|  m, there are strings u, v, x, y, and z satisfying w = uvxyz |vy| > 0 |vxy|  m for any i > 0, uv i xy i z  L

The pumping lemma for CFLs We can use the pumping lemma for context-free languages to prove that there must exist some language that is not context- free. We do this by assuming that the language is context free; this means that there must be an m satisfying the conditions given above. If we find that this causes a contradiction, then we know the language can’t be a CFL.

Proof Given the language L = {a i b i c i | i  1}, assume that L is context-free. Let w = a m b m c m, with |w|  m. According to theorem 8.1, |vy| > 0. Thus, v and y together must contain at least one type of symbol. According to theorem 8.1, |vxy|  m. Thus, the string vxy can contain at most two distinct types of symbols.

Proof The string vxy can’t contain all three symbols, a, b, and c. (Why? Because |vxy|  m.) The string uv 2 xy 2 z contains additional occurrences of the symbols in v and y. Therefore, uv 2 xy 2 z cannot contain equal numbers of all three symbols. But the pumping lemma says that uv 2 xy 2 z must be a legitimate string in L. Obviously, this is a contradiction. Consequently, L cannot be a context-free language.

Example Given the language L = {a i b i c i | i  1}, how would you try to process this language using a push-down automaton? We can insure that we have an equal number of a’s and b’s, by pushing the a’s onto the stack one at a time, then popping them off and matching them up with the b’s one by one.

Example However, once we have done that, we don’t have anything left to match the c’s with, so we can’t guarantee that we have the same number of c’s as a’s and b’s. We can’t solve this problem by pushing a’s or b’s back onto the stack. This is due to the limitations of the type of memory we have in a PDA.

Pumping lemma (again) The pumping lemma for regular languages states: every sufficiently long string in a regular language contains a short substring that can be pumped. The pumping lemma for context-free languages states: every sufficiently long string in a context-free language contains two short (and close-together) substrings that can be pumped (the same number of times).

Formal statement (again) Let L be a context-free language. Then there exists some positive integer m such that any string w  L of length |w|  m can be decomposed into substrings, u, v, x, y, z, such that w = uvxyz, and |vxy|  m, |v| > 0 or |y| > 0, uv k xy k z  L, for k  0

Informal statement Every context-free language has a “pumping length” such that every string in the language that is longer than this can be pumped to yield another string in the language. The string can be divided into five parts such that the second and fourth parts can be repeated together, or “pumped,” any number of times, and the resulting string remains in the language.

In the pumping lemma for regular languages, the “pumping length” m reflects the number of states of the finite automaton. In the pumping lemma for context-free languages, what does m reflect? Roughly, it is the length of the longest string that can be generated by a parse tree in which the same nonterminal never occurs twice on the same path through the tree. What is m?

In a sufficiently large parse tree, some nonterminal must repeat along some path from the root. This follows from the pigeonhole principle. S A A u v x y z

Proof Idea The repetition of some nonterminal along a path through the parse tree allows us to replace the subtree under the last occurrence of the nonterminal with the subtree under an earlier occurrence of the nonterminal and still get a valid parse tree This corresponds to pumping v and y Note that the parse tree of the previous slide corresponds to the following derivation:

Important to remember You can use a pumping lemma to prove that a language is not context-free (or regular). You cannot use a pumping lemma to prove that a language is context-free (or regular).

Exercise The language L = {ww | w  {a, b}*} is not context-free. Pick a string in L. Try a m b m a m b m. Then note that you must consider three cases. It must be the case that vxy is a substring of the prefix a m b m, or the “middle” b m a m, or the suffix a m b m. Intuitively, why can’t a PDA accept this language, although it can accept the language {ww R | w  {a, b}*}?

Pumping Lemma for Linear Languages Let L be an infinite linear language. Then there exists some positive integer m, such that any w  L, with |w|  m can be decomposed as w = uvxyz with |uvyz|  m |vy|  1 such that uv i xy i z  L for all i = 0,1,2…

Pumping Lemma for Linear Languages Note that the conclusion for this theorem is different from Theorem 8.1, since in 8.1 we have |vxy|  m and in Theorem 8.2 we have |uvyz|  m This implies that the strings v and y to be pumped must now be within m symbols of the left and right ends of w, respectively. The middle string x can be of arbitrary length. Theorem 8.2 helps establish the fact that the family of linear languages is a proper subset of the family of context-free languages.

Closure properties for context-free languages The family of context-free languages is closed under the operations of: Union Concatenation Kleene closure but not under the operations of Intersection Complementation

Definition A context-free grammar (CFG) is a 4-tuple G = (V, T, S, P) where V and T are disjoint sets, S  V, and P is a finite set of rules of the form A  x, where A  V and x  (V  T)*. V = non-terminals or variables T = terminals S = Start symbol P = Productions or grammar rules

Closure properties of CFGs CFLs are closed under Union, Concatenation and Kleene closure. Proof by construction: Let G 1 = (V 1, T 1, S 1, P 1 ) and G 2 = (V 2, T 2, S 2, P 2 ) with L 1 = L(G 1 ) and L 2 = L(G 2 )

Union We create grammar G u = (V u, T 1  T 2, S u, P u ) generating L 1  L 2 1. Rename the elements of V 2 if necessary so that V 1  V 2 = . 2. Create a new start symbol S u, not already in V 1 or V 2. 3. Set V u = V 1  V 2  {S u } 4. Set P u = P 1  P 2  {S u  S 1 | S 2 } Construction completed.

Concatenation We create grammar G c = (V c, T 1  T 2, S c, P c ) generating L 1 L 2 1. Rename the elements of V 2 if necessary so that V 1  V 2 = . 2. Create a new start symbol S c, not already in V 1 or V 2. 3. Set V c = V 1  V 2  {S c } 4. Set P c = P 1  P 2  {S c  S 1 S 2 } Construction completed.

Closure under Kleene star Let G 1 be any context-free grammar with the starting symbol S. Adding the rules S  λ and S  SS creates a new context-free grammar G 2 such that L(G 2 ) is the result of applying the Kleene star operator to L(G 1 ).

Kleene Closure We create grammar G* = (V, T, S, P) generating L 1 * 1. Create a new start symbol S, not already in V 1. 2. Set V* = V 1  {S} 3. Set P* = P 1  {S  S 1 S | l} Construction completed. (See text for justification.)

Not closed under intersection The context-free languages are not closed under Intersection. However, the intersection of a context-free language with a regular language is always a context-free language. The context-free languages are not closed under Complementation

Corollary: Are Regular Languages context free? Yes. Why? We can express any Regular language in the form of a CFG. Regular languages are a proper subset of CFGs.

Are Regular Languages context free? Proof: According to your textbook, the set of regular languages is the smallest set that contains all languages , { }, and {a} (for every a   ) and is closed under the operations of union, concatenation, and Kleene*. We just demonstrated that the operations of union, concatenation, and Kleene* on CFGs produce CFGs, so all we need to do is show that the languages , { }, and {a} have CFGs.

Are Regular Languages context free? The empty language can be written S  S The language consisting of a null string can be written S  The language consisting of single characters can be written S  a QED

Decision properties of context-free languages Can decide: Membership Empty Infinite But there is no algorithm for deciding whether two CFGs generate the same language!

CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our.

Similar presentations

Presentation on theme: "CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our.

Similar presentations

Presentation on theme: "CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our."— Presentation transcript:

Similar presentations

About project

Feedback