Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Introduction to Computation2 Using Grammar Rules to Define a Language Regular languages and FAs are too simple for many purposes –Using context-free grammars allows us to describe more interesting languages –Much high-level programming language syntax can be expressed with context-free grammars –Context-free grammars with a very simple form provide another way to describe the regular languages Grammars can be ambiguous We will study how derivations can be related to the structure of the string being derived

Using Grammar Rules to Define a Language (cont’d.) A grammar is a set of rules, usually simpler than those of English, by which strings in a language can be generated Consider the language AnBn = {a n b n | n  0}, defined using the recursive definition: –   AnBn –For every S  AnBn, aSb  AnBn Think of S as a variable representing an arbitrary element, and write these rules as S   S  aSb (In the process of obtaining an element of AnBn, S can be replaced by either string) Introduction to Computation3

Using Grammar Rules to Define a Language (cont’d.) If  and  are strings, and  contains at least one occurrence of S, then    means that  is obtained from  in one step, by using one of the two rules to replace a single occurrence of S by either  or aSb For example, we could write: S  aSb  aaSbb  aaaSbbb  aaabbb to describe a derivation of the string aaabbb We can simplify the rules by using the | symbol to mean “or”, so that the rules become S   | aSb Introduction to Computation4

Context-Free Grammars: Definitions and More Examples Definition: A context-free grammar (CFG) is a 4-tuple G=(V, , S, P), where V and  are disjoint finite sets, S  V, and P is a finite set of formulas of the form A  , where A  V and   (V ∪  )* –Elements of  are terminal symbols, or terminals, and elements of V are variables, or nonterminals –S is the start variable, and elements of P are grammar rules, or productions –We use  for productions in a grammar and  for a step in a derivation –The notations   n  and   *  refer to n steps and zero or more steps, respectively Introduction to Computation5

Context-Free Grammars: Definitions and More Examples (cont’d.) We will sometimes write  G to indicate a derivation in a particular grammar G    means that there are strings  1,  2, and  in (V ∪  )* and a production A   in P such that  =  1 A  2 and  =  1  2 –This is a single step in a derivation What makes the grammar context-free is that the production above, with left side A, can be applied wherever A occurs in the string (irrespective of the context; i.e., regardless of what  1 and  2 are) Introduction to Computation6

Context-Free Grammars: Definitions and More Examples (cont’d.) Definition: If G = (V, , S, P) is a CFG, the language generated by G is L(G) = { x   * | S  G * x} (S is the start variable, and x is a string of terminals) A language L is a context-free language (CFL) if there is a CFG G with L = L(G) Introduction to Computation7

Context-Free Grammars: Definitions and More Examples (cont’d.) Consider AEqB = {x  {a,b}* | n a (x) = n b (x)} Let’s develop a CFG for AEqB If x is a non-null string in AEqB then either x = ay, where y  L b = {z | n b (z) = n a (z) + 1}, or x = by, where y  L a = {z | n a (z) = n b (z) + 1} –We represent L b by the variable B and L a by the variable A –The productions so far are S   | aB | bA –All we need now are productions for A and B Introduction to Computation8

Context-Free Grammars: Definitions and More Examples (cont’d.) If a string x  L a starts with a, then the remainder is a member of AEqB If it starts with b, the rest has two more a’s than b’s Observation: a string containing two more a’s than b’s must be the concatenation of two strings, each with one more a; similarly with a and b reversed The grammar resulting from these observations is S   | aB | bA A  aS | bAA B  bS | aBB (Note: if A were the start variable, it would generate L a ) Introduction to Computation9

Context-Free Grammars: Definitions and More Examples (cont’d.) Theorem 4.9: If L 1 and L 2 are CFLs over , then so are L 1 ∪ L 2, L 1 L 2, and L 1 * Suppose G 1 and G 2 are CFGs that generate L 1 and L 2 respectively, and assume that they have no variables in common Suppose that S 1 and S 2 are the start variables. S u, S c and S k, the start variables of the new grammars, will be new variables. –G u just adds the rules S u  S 1 | S 2 to G 1 and G 2 –G c just adds the rule S c  S 1 S 2 to G 1 and G 2 –G k just adds the rules S k   | S k S 1 to G 1 Introduction to Computation10

Regular Languages and Regular Grammars The three operations in Theorem 4.9 are the ones involved in the recursive definition of regular languages The “basic” regular languages over ,  and {  }, are easily seen to be CFLs Now we can prove by structural induction that every regular language over  is a CFL In fact, however, the CFG can be of a simpler form. Definition 4.13: A context-free grammar is regular if every production is of the form A   B or A   Introduction to Computation11

Regular Languages and Regular Grammars (cont’d.) Theorem 4.14: For every language L   *, L is regular if and only if L = L(G) for some regular grammar G Proof: –If L is a regular language, then there is a FA M=(Q, , q 0, A,  ) that accepts it –Define G=(V, , S, P) by letting V be Q, S the initial state q 0, and P the set containing the production T  aU for every transition  (T, a) = U in M and the production T   for every accepting state T of M Introduction to Computation12

Regular Languages and Regular Grammars (cont’d.) G is a regular grammar, and G accepts the same language as M –For every x = a 1 a 2 …a n, the transitions on these symbols that start at q 0 end at an accepting state if and only if there is a derivation of x in G To prove the other direction we can start with a regular grammar G and reverse the construction to produce M –M may be an NFA, but it still accepts L(G), and it follows that L(G) is regular Introduction to Computation13

Derivation Trees and Ambiguity So far we’ve been interested in what strings a CFG generates It is also useful to consider how a string is generated by a CFG A derivation may provide information about the structure of a string, and if a string has several possible derivations, one may be more appropriate than another We can draw trees to represent derivations Introduction to Computation14

Derivation Trees and Ambiguity (cont’d.) The root node represents the start variable S Any interior node and its children represent a production A   used in the derivation; the node represents A, and the children, from left to right, represent the symbols in . Each leaf node represents a symbol or  The string derived is read off from left to right, ignoring  ’s Every derivation has exactly one derivation tree, but a tree can represent more than one derivation Introduction to Computation15

Derivation Trees and Ambiguity (cont’d.) In a derivation, at each step some production is applied to some occurrence of a variable Consider a derivation that starts S  S + S. We could apply a production to either the first or second of the S’s, but the resulting trees would be the same When we talk about a string having several possible derivations, one being more appropriate, we are talking about derivations corresponding to different trees Introduction to Computation16

Derivation Trees and Ambiguity (cont’d.) We can distinguish between trivially different derivations and essentially different ones by specifying that in a derivation, we always choose the left-most variable to expand Definition 4.16: A derivation in a CFG is a leftmost derivation (LMD) if, at each step, a production is applied to the leftmost variable-occurrence in the current string –A rightmost derivation is defined similarly Introduction to Computation17

Derivation Trees and Ambiguity (cont’d.) Theorem 4.17: If G is a CFG, then for any x  L(G) these three statements are equivalent: –x has more than one derivation tree –x has more than one LMD –x has more than one RMD Proof: see book Definition 4.18: A CFG G is ambiguous if, for at least one x  L(G), x has more than one derivation tree (or equivalently, according to Theorem 4.17, more than one LMD) Introduction to Computation18

Derivation Trees and Ambiguity (cont’d.) A classic example of ambiguity is the dangling else In C, an if-statement can be defined by S  if ( E ) S | if ( E ) S else S | OS (where OS stands for “other statement”) Consider the statement if (e1) if (e2) f(); else g(); –In C, the else to belong to the second if, but this grammar does not rule out the other interpretation The two derivation trees shown on the next slide show the two interpretations of a dangling else Introduction to Computation19

Introduction to Computation20

Derivation Trees and Ambiguity (cont’d.) Clearly the grammar given is ambiguous, but there are equivalent grammars that allow only the correct interpretation Example: S  S 1 | S 2 S 1  if ( E ) S 1 else S 1 | OS S 2  if ( E ) S | if ( E ) S 1 else S 2 Introduction to Computation21

22 Derivation Trees and Ambiguity (cont’d.) Consider the CFG G : S  S + S | S * S | (S) | a G generates simple algebraic expressions One reason for ambiguity is that the relative precedence of + and * hasn’t been specified: a+a*a could be interpreted as (a+a)*a or as a+(a*a) In fact, S  S + S causes ambiguity by itself, because a+a+a could be interpreted as either (a+a)+a or a+(a+a). Similarly for S  S * S We might try to correct both problems by using the productions S  S + T | T T  T + F | F (think of T as “term” and F as “factor”) Introduction to Computation

23 Derivation Trees and Ambiguity (cont’d.) * now has higher precedence than + (all the multiplications are performed within a term) By making the production S  S + T, not S  T + S, we make + associate to the left. Similarly for * We want parenthetical expressions to be evaluated first; this means we should consider such an expression to be part of a factor. The resulting unambiguous CFG generating L(G) is S  S + T | T T  T * F | F F  (S) | a (proofs of unambiguity and equivalence are both somewhat complicated) Introduction to Computation

Simplified Forms and Normal Forms Questions about the strings generated by a CFG are sometimes easier to answer if we know something about the form of the productions –For example, if we know that a grammar has no  -productions and no unit productions (A  B) we can deduce that no derivation of a string x can take more than 2|x| - 1 steps (see book for details). We could then, in principle, determine whether x can be derived by considering derivations no longer than this We show how to modify an arbitrary CFG to have no productions of either of these types Introduction to Computation24

Simplified Forms and Normal Forms (cont’d.) Suppose we have the production A  BCDCB, and  can be derived from either B or C. If we get rid of  -productions, then the steps that replace B and C by  will no longer be possible, but we must still be able to get all the same non-null strings from A We must retain the production A  BCDCB but we should add A  CDCB, A  DCB, A  BDCB, and so on We will need to know what variables can derive  (we will call such a variable a nullable variable) Introduction to Computation25

Simplified Forms and Normal Forms (cont’d.) Definition 4.26: A recursive definition of the set of nullable variables of G –If there is a production A   then A is nullable –If A 1, A 2, …, A k are nullable variables and there is a production B  A 1 A 2 … A k, then B is nullable This leads immediately to an algorithm for identifying the nullable variables Introduction to Computation26

Simplified Forms and Normal Forms (cont’d.) Theorem 4.27: For every CFG G = (V, , S, P) the following algorithm produces a CFG G 1 =(V, , S, P 1 ) having no  -productions for which L(G 1 ) = L(G) – {  } –Identify the nullable variables in V and initialize P 1 to P –For every production A   in P, add to P 1 every production obtained by deleting from  one or more variable-occurrences involving a nullable variable –Delete every  -production from P 1, as well as every production of the form A  A Introduction to Computation27

Simplified Forms and Normal Forms (cont’d.) The procedure we use to eliminate unit productions is similar We first identify pairs of variables (A, B) for which A  * B (in this case we call B A-derivable); then for each such pair (A, B) and each nonunit production B  , we add the production A   Such pairs can be found as follows: –If A  B is a production, then B is A-derivable –If C is A-derivable and C  B is a production, then B is A-derivable –No other variables are A-derivable Introduction to Computation28

Simplified Forms and Normal Forms (cont’d.) Theorem 4.28: For every CFG G = (V, , S, P) without  -productions, the CFG G 1 =(V, , S, P 1 ) produced by the following algorithm generates the same language as G and has no unit productions: –Initialize P 1 to P, and for each A  V, identify the A-derivable variables –For every such pair A  B and every nonunit production B  , add the production A   to P 1 –Delete all unit productions from P 1 Introduction to Computation29

Simplified Forms and Normal Forms (cont’d.) Definition 4.29: A CFG is said to be in Chomsky normal form if every production is of one of these two types: A  BC (where B and C are variables) A   (where  is a terminal) Theorem 4.30: For every context-free grammar G, there is another CFG G 1 in Chomsky normal form such that L(G 1 ) = L(G) – {  } The algorithm on the next slide shows how to generate G 1 Introduction to Computation30

Simplified Forms and Normal Forms (cont’d.) The first step is to eliminate  -productions and unit productions The second step is to introduce for every terminal symbol  a new variable X  and production X    In every production, replace every terminal by its new variable (except for the new productions above) Replace a production like A  BACB by the productions A  BY 1, Y 1  AY 2, Y 2  CB, where Y 1 and Y 2 are new variables The resulting CFG is in Chomsky normal form Introduction to Computation31

Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Similar presentations

Presentation on theme: "Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Similar presentations

Presentation on theme: "Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1."— Presentation transcript:

Similar presentations

About project

Feedback