Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Slides:



Advertisements
Similar presentations
Lecture # 8 Chapter # 4: Syntax Analysis. Practice Context Free Grammars a) CFG generating alternating sequence of 0’s and 1’s b) CFG in which no consecutive.
Advertisements

Closure Properties of CFL's
CFGs and PDAs Sipser 2 (pages ). Long long ago…
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Context Free Grammars.
CS5371 Theory of Computation
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Context-free.
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.1: Context-Free Grammars) David Martin With some.
1 Module 28 Context Free Grammars –Definition of a grammar G –Deriving strings and defining L(G) Context-Free Language definition.
A basis for computer theory and A means of specifying languages
104 Closure Properties of Regular Languages Regular languages are closed under many set operations. Let L 1 and L 2 be regular languages. (1) L 1  L 2.
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
Normal forms for Context-Free Grammars
Transparency No. P2C1-1 Formal Language and Automata Theory Part II Pushdown Automata and Context-Free Languages.
Lecture 9UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 9.
COP4020 Programming Languages
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
Formal Languages Context free languages provide a convenient notation for recursive description of languages. The original goal of CFL was to formalize.
1 Homework #7 (Models of Computation, Spring, 2001) Due: Section 1; April 16 (Monday) Section 2; April 17 (Tuesday) 2. Covert the following context-free.
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
Normal Forms for Context-Free Grammars Definition: A symbol X in V  T is useless in a CFG G=(V, T, P, S) if there does not exist a derivation of the form.
The Pumping Lemma for Context Free Grammars. Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar.
CSCI 2670 Introduction to Theory of Computing September 21, 2004.
Syntax Analysis The recognition problem: given a grammar G and a string w, is w  L(G)? The parsing problem: if G is a grammar and w  L(G), how can w.
1 Context-Free Languages Not all languages are regular. L 1 = {a n b n | n  0} is not regular. L 2 = {(), (()), ((())),...} is not regular.  some properties.
Classification of grammars Definition: A grammar G is said to be 1)Right-linear if each production in P is of the form A  xB or A  x where A and B are.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
Lecture # 19. Example Consider the following CFG ∑ = {a, b} Consider the following CFG ∑ = {a, b} 1. S  aSa | bSb | a | b | Λ The above CFG generates.
Chapter 5 Context-Free Grammars
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Grammars CPSC 5135.
PART I: overview material
Languages & Grammars. Grammars  A set of rules which govern the structure of a language Fritz Fritz The dog The dog ate ate left left.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
Context Free Grammars. Context Free Languages (CFL) The pumping lemma showed there are languages that are not regular –There are many classes “larger”
CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
Re-enter Chomsky More about grammars. 2 Parse trees S  A B A  aA | a B  bB | b Consider L = { a m b n | m, n > 0 } (one/more a ’s followed by one/more.
Grammars Hopcroft, Motawi, Ullman, Chap 5. Grammars Describes underlying rules (syntax) of programming languages Compilers (parsers) are based on such.
Grammars CS 130: Theory of Computation HMU textbook, Chap 5.
Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction.
1 Chapter 6 Simplification of CFGs and Normal Forms.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
Chapter 5 Context-free Languages
Top-Down Parsing.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Context-Free Languages
Chapter 8 Properties of Context-free Languages These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata,
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
CS 461 – Sept. 23 Context-free grammars Derivations Ambiguity Proving correctness.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
Context-Free Grammars: an overview
CS510 Compiler Lecture 4.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Context-Free Languages
REGULAR LANGUAGES AND REGULAR GRAMMARS
CHAPTER 2 Context-Free Languages
Finite Automata and Formal Languages
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Theory of Computation Lecture #
Presentation transcript:

Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1

Introduction to Computation2 Using Grammar Rules to Define a Language Regular languages and FAs are too simple for many purposes –Using context-free grammars allows us to describe more interesting languages –Much high-level programming language syntax can be expressed with context-free grammars –Context-free grammars with a very simple form provide another way to describe the regular languages Grammars can be ambiguous We will study how derivations can be related to the structure of the string being derived

Using Grammar Rules to Define a Language (cont’d.) A grammar is a set of rules, usually simpler than those of English, by which strings in a language can be generated Consider the language AnBn = {a n b n | n  0}, defined using the recursive definition: –   AnBn –For every S  AnBn, aSb  AnBn Think of S as a variable representing an arbitrary element, and write these rules as S   S  aSb (In the process of obtaining an element of AnBn, S can be replaced by either string) Introduction to Computation3

Using Grammar Rules to Define a Language (cont’d.) If  and  are strings, and  contains at least one occurrence of S, then    means that  is obtained from  in one step, by using one of the two rules to replace a single occurrence of S by either  or aSb For example, we could write: S  aSb  aaSbb  aaaSbbb  aaabbb to describe a derivation of the string aaabbb We can simplify the rules by using the | symbol to mean “or”, so that the rules become S   | aSb Introduction to Computation4

Context-Free Grammars: Definitions and More Examples Definition: A context-free grammar (CFG) is a 4-tuple G=(V, , S, P), where V and  are disjoint finite sets, S  V, and P is a finite set of formulas of the form A  , where A  V and   (V ∪  )* –Elements of  are terminal symbols, or terminals, and elements of V are variables, or nonterminals –S is the start variable, and elements of P are grammar rules, or productions –We use  for productions in a grammar and  for a step in a derivation –The notations   n  and   *  refer to n steps and zero or more steps, respectively Introduction to Computation5

Context-Free Grammars: Definitions and More Examples (cont’d.) We will sometimes write  G to indicate a derivation in a particular grammar G    means that there are strings  1,  2, and  in (V ∪  )* and a production A   in P such that  =  1 A  2 and  =  1  2 –This is a single step in a derivation What makes the grammar context-free is that the production above, with left side A, can be applied wherever A occurs in the string (irrespective of the context; i.e., regardless of what  1 and  2 are) Introduction to Computation6

Context-Free Grammars: Definitions and More Examples (cont’d.) Definition: If G = (V, , S, P) is a CFG, the language generated by G is L(G) = { x   * | S  G * x} (S is the start variable, and x is a string of terminals) A language L is a context-free language (CFL) if there is a CFG G with L = L(G) Introduction to Computation7

Context-Free Grammars: Definitions and More Examples (cont’d.) Consider AEqB = {x  {a,b}* | n a (x) = n b (x)} Let’s develop a CFG for AEqB If x is a non-null string in AEqB then either x = ay, where y  L b = {z | n b (z) = n a (z) + 1}, or x = by, where y  L a = {z | n a (z) = n b (z) + 1} –We represent L b by the variable B and L a by the variable A –The productions so far are S   | aB | bA –All we need now are productions for A and B Introduction to Computation8

Context-Free Grammars: Definitions and More Examples (cont’d.) If a string x  L a starts with a, then the remainder is a member of AEqB If it starts with b, the rest has two more a’s than b’s Observation: a string containing two more a’s than b’s must be the concatenation of two strings, each with one more a; similarly with a and b reversed The grammar resulting from these observations is S   | aB | bA A  aS | bAA B  bS | aBB (Note: if A were the start variable, it would generate L a ) Introduction to Computation9

Context-Free Grammars: Definitions and More Examples (cont’d.) Theorem 4.9: If L 1 and L 2 are CFLs over , then so are L 1 ∪ L 2, L 1 L 2, and L 1 * Suppose G 1 and G 2 are CFGs that generate L 1 and L 2 respectively, and assume that they have no variables in common Suppose that S 1 and S 2 are the start variables. S u, S c and S k, the start variables of the new grammars, will be new variables. –G u just adds the rules S u  S 1 | S 2 to G 1 and G 2 –G c just adds the rule S c  S 1 S 2 to G 1 and G 2 –G k just adds the rules S k   | S k S 1 to G 1 Introduction to Computation10

Regular Languages and Regular Grammars The three operations in Theorem 4.9 are the ones involved in the recursive definition of regular languages The “basic” regular languages over ,  and {  }, are easily seen to be CFLs Now we can prove by structural induction that every regular language over  is a CFL In fact, however, the CFG can be of a simpler form. Definition 4.13: A context-free grammar is regular if every production is of the form A   B or A   Introduction to Computation11

Regular Languages and Regular Grammars (cont’d.) Theorem 4.14: For every language L   *, L is regular if and only if L = L(G) for some regular grammar G Proof: –If L is a regular language, then there is a FA M=(Q, , q 0, A,  ) that accepts it –Define G=(V, , S, P) by letting V be Q, S the initial state q 0, and P the set containing the production T  aU for every transition  (T, a) = U in M and the production T   for every accepting state T of M Introduction to Computation12

Regular Languages and Regular Grammars (cont’d.) G is a regular grammar, and G accepts the same language as M –For every x = a 1 a 2 …a n, the transitions on these symbols that start at q 0 end at an accepting state if and only if there is a derivation of x in G To prove the other direction we can start with a regular grammar G and reverse the construction to produce M –M may be an NFA, but it still accepts L(G), and it follows that L(G) is regular Introduction to Computation13

Derivation Trees and Ambiguity So far we’ve been interested in what strings a CFG generates It is also useful to consider how a string is generated by a CFG A derivation may provide information about the structure of a string, and if a string has several possible derivations, one may be more appropriate than another We can draw trees to represent derivations Introduction to Computation14

Derivation Trees and Ambiguity (cont’d.) The root node represents the start variable S Any interior node and its children represent a production A   used in the derivation; the node represents A, and the children, from left to right, represent the symbols in . Each leaf node represents a symbol or  The string derived is read off from left to right, ignoring  ’s Every derivation has exactly one derivation tree, but a tree can represent more than one derivation Introduction to Computation15

Derivation Trees and Ambiguity (cont’d.) In a derivation, at each step some production is applied to some occurrence of a variable Consider a derivation that starts S  S + S. We could apply a production to either the first or second of the S’s, but the resulting trees would be the same When we talk about a string having several possible derivations, one being more appropriate, we are talking about derivations corresponding to different trees Introduction to Computation16

Derivation Trees and Ambiguity (cont’d.) We can distinguish between trivially different derivations and essentially different ones by specifying that in a derivation, we always choose the left-most variable to expand Definition 4.16: A derivation in a CFG is a leftmost derivation (LMD) if, at each step, a production is applied to the leftmost variable-occurrence in the current string –A rightmost derivation is defined similarly Introduction to Computation17

Derivation Trees and Ambiguity (cont’d.) Theorem 4.17: If G is a CFG, then for any x  L(G) these three statements are equivalent: –x has more than one derivation tree –x has more than one LMD –x has more than one RMD Proof: see book Definition 4.18: A CFG G is ambiguous if, for at least one x  L(G), x has more than one derivation tree (or equivalently, according to Theorem 4.17, more than one LMD) Introduction to Computation18

Derivation Trees and Ambiguity (cont’d.) A classic example of ambiguity is the dangling else In C, an if-statement can be defined by S  if ( E ) S | if ( E ) S else S | OS (where OS stands for “other statement”) Consider the statement if (e1) if (e2) f(); else g(); –In C, the else to belong to the second if, but this grammar does not rule out the other interpretation The two derivation trees shown on the next slide show the two interpretations of a dangling else Introduction to Computation19

Introduction to Computation20

Derivation Trees and Ambiguity (cont’d.) Clearly the grammar given is ambiguous, but there are equivalent grammars that allow only the correct interpretation Example: S  S 1 | S 2 S 1  if ( E ) S 1 else S 1 | OS S 2  if ( E ) S | if ( E ) S 1 else S 2 Introduction to Computation21

22 Derivation Trees and Ambiguity (cont’d.) Consider the CFG G : S  S + S | S * S | (S) | a G generates simple algebraic expressions One reason for ambiguity is that the relative precedence of + and * hasn’t been specified: a+a*a could be interpreted as (a+a)*a or as a+(a*a) In fact, S  S + S causes ambiguity by itself, because a+a+a could be interpreted as either (a+a)+a or a+(a+a). Similarly for S  S * S We might try to correct both problems by using the productions S  S + T | T T  T + F | F (think of T as “term” and F as “factor”) Introduction to Computation

23 Derivation Trees and Ambiguity (cont’d.) * now has higher precedence than + (all the multiplications are performed within a term) By making the production S  S + T, not S  T + S, we make + associate to the left. Similarly for * We want parenthetical expressions to be evaluated first; this means we should consider such an expression to be part of a factor. The resulting unambiguous CFG generating L(G) is S  S + T | T T  T * F | F F  (S) | a (proofs of unambiguity and equivalence are both somewhat complicated) Introduction to Computation

Simplified Forms and Normal Forms Questions about the strings generated by a CFG are sometimes easier to answer if we know something about the form of the productions –For example, if we know that a grammar has no  -productions and no unit productions (A  B) we can deduce that no derivation of a string x can take more than 2|x| - 1 steps (see book for details). We could then, in principle, determine whether x can be derived by considering derivations no longer than this We show how to modify an arbitrary CFG to have no productions of either of these types Introduction to Computation24

Simplified Forms and Normal Forms (cont’d.) Suppose we have the production A  BCDCB, and  can be derived from either B or C. If we get rid of  -productions, then the steps that replace B and C by  will no longer be possible, but we must still be able to get all the same non-null strings from A We must retain the production A  BCDCB but we should add A  CDCB, A  DCB, A  BDCB, and so on We will need to know what variables can derive  (we will call such a variable a nullable variable) Introduction to Computation25

Simplified Forms and Normal Forms (cont’d.) Definition 4.26: A recursive definition of the set of nullable variables of G –If there is a production A   then A is nullable –If A 1, A 2, …, A k are nullable variables and there is a production B  A 1 A 2 … A k, then B is nullable This leads immediately to an algorithm for identifying the nullable variables Introduction to Computation26

Simplified Forms and Normal Forms (cont’d.) Theorem 4.27: For every CFG G = (V, , S, P) the following algorithm produces a CFG G 1 =(V, , S, P 1 ) having no  -productions for which L(G 1 ) = L(G) – {  } –Identify the nullable variables in V and initialize P 1 to P –For every production A   in P, add to P 1 every production obtained by deleting from  one or more variable-occurrences involving a nullable variable –Delete every  -production from P 1, as well as every production of the form A  A Introduction to Computation27

Simplified Forms and Normal Forms (cont’d.) The procedure we use to eliminate unit productions is similar We first identify pairs of variables (A, B) for which A  * B (in this case we call B A-derivable); then for each such pair (A, B) and each nonunit production B  , we add the production A   Such pairs can be found as follows: –If A  B is a production, then B is A-derivable –If C is A-derivable and C  B is a production, then B is A-derivable –No other variables are A-derivable Introduction to Computation28

Simplified Forms and Normal Forms (cont’d.) Theorem 4.28: For every CFG G = (V, , S, P) without  -productions, the CFG G 1 =(V, , S, P 1 ) produced by the following algorithm generates the same language as G and has no unit productions: –Initialize P 1 to P, and for each A  V, identify the A-derivable variables –For every such pair A  B and every nonunit production B  , add the production A   to P 1 –Delete all unit productions from P 1 Introduction to Computation29

Simplified Forms and Normal Forms (cont’d.) Definition 4.29: A CFG is said to be in Chomsky normal form if every production is of one of these two types: A  BC (where B and C are variables) A   (where  is a terminal) Theorem 4.30: For every context-free grammar G, there is another CFG G 1 in Chomsky normal form such that L(G 1 ) = L(G) – {  } The algorithm on the next slide shows how to generate G 1 Introduction to Computation30

Simplified Forms and Normal Forms (cont’d.) The first step is to eliminate  -productions and unit productions The second step is to introduce for every terminal symbol  a new variable X  and production X    In every production, replace every terminal by its new variable (except for the new productions above) Replace a production like A  BACB by the productions A  BY 1, Y 1  AY 2, Y 2  CB, where Y 1 and Y 2 are new variables The resulting CFG is in Chomsky normal form Introduction to Computation31