104 Closure Properties of Regular Languages Regular languages are closed under many set operations. Let L 1 and L 2 be regular languages. (1) L 1  L 2.

Slides:



Advertisements
Similar presentations
Theory of Computation CS3102 – Spring 2014 A tale of computers, math, problem solving, life, love and tragic death Nathan Brunelle Department of Computer.
Advertisements

Closure Properties of CFL's
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
About Grammars CS 130 Theory of Computation HMU Textbook: Sec 7.1, 6.3, 5.4.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
CS5371 Theory of Computation
Lecture Note of 12/22 jinnjy. Outline Chomsky Normal Form and CYK Algorithm Pumping Lemma for Context-Free Languages Closure Properties of CFL.
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.1: Context-Free Grammars) David Martin With some.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
1 Homework #6 (Models of Computation, Spring, 2001) Due: Section 1; March 29 Section 2; March Let L be the language of the following grammar G 1.
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
CS5371 Theory of Computation Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL)
Normal forms for Context-Free Grammars
January 15, 2014CS21 Lecture 61 CS21 Decidability and Tractability Lecture 6 January 16, 2015.
1 Background Information for the Pumping Lemma for Context-Free Languages Definition: Let G = (V, T, P, S) be a CFL. If every production in P is of the.
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
Chapter 12: Context-Free Languages and Pushdown Automata
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
نظریه زبان ها و ماشین ها فصل دوم Context-Free Languages دانشگاه صنعتی شریف بهار 88.
Pushdown Automata (PDA) Intro
1 Homework #7 (Models of Computation, Spring, 2001) Due: Section 1; April 16 (Monday) Section 2; April 17 (Tuesday) 2. Covert the following context-free.
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
Normal Forms for Context-Free Grammars Definition: A symbol X in V  T is useless in a CFG G=(V, T, P, S) if there does not exist a derivation of the form.
The Pumping Lemma for Context Free Grammars. Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
Chapter 5 Context-Free Grammars
Grammars CPSC 5135.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
Section 12.4 Context-Free Language Topics
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 11 Midterm Exam 2 -Context-Free Languages Mälardalen University 2005.
Pushdown Automata Chapters Generators vs. Recognizers For Regular Languages: –regular expressions are generators –FAs are recognizers For Context-free.
Context-Free and Noncontext-Free Languages Chapter 13 1.
1 Simplification of Context-Free Grammars Some useful substitution rules. Removing useless productions. Removing -productions. Removing unit-productions.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
CS 203: Introduction to Formal Languages and Automata
Context-Free and Noncontext-Free Languages Chapter 13.
1 Chapter 6 Simplification of CFGs and Normal Forms.
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
Syntax Analyzer (Parser)
Context-Free Languages
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Chapter 8 Properties of Context-free Languages These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata,
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Exercises on Chomsky Normal Form and CYK parsing
Lecture # 10 Grammar Problems. Problems with grammar Ambiguity Left Recursion Left Factoring Removal of Useless Symbols These can create problems for.
Context-Free and Noncontext-Free Languages Chapter 13.
20 G M aaba acba aaba.. What is it about? Models of Language Generation Models of Language Recognition.
Closed book, closed notes
Context-Free Grammars: an overview
Complexity and Computability Theory I
7. Properties of Context-Free Languages
PDAs Accept Context-Free Languages
Context-Free Languages
Definition: Let G = (V, T, P, S) be a CFL
7. Properties of Context-Free Languages
CHAPTER 2 Context-Free Languages
Properties of Context-Free Languages
Key Answers for Homework #7
Answer Questions about Exam2 problems
Presentation transcript:

104 Closure Properties of Regular Languages Regular languages are closed under many set operations. Let L 1 and L 2 be regular languages. (1) L 1  L 2 (the union) is regular. (2) L 1 L 2 (the concatenation) is regular. (3) L 1  (the Kleene star) and L 1 + (the Kleene plus) are regular. (4) L 1 R (the reversed language) is regular. (5) L 1 (the complement) is regular. (6) L 1  L 2 (the intersection) is regular. (7) L 1 - L 2 (the set subtraction) is regular.

105 Proof of the Closure Properties We can either use regular grammars, FA, or regular expressions for the simplicity of the proof. Let r 1 and r 2 be regular expressions that, respectively, express the languages L 1 and L 2. (1)Clearly, r 1 + r 2 is a regular expression which denotes the union of two languages L 1 and L2, respectively, denoted by r 1 and r 2. Since every regular expression denotes a regular language, L 1  L 2 is regular. We can also constructively prove this property as follows; let G 1 = ( V N1, V T1, P 1, S 1 ) and G 2 = ( V N2, V T2, P 2, S 2 ) be regular grammars that generate L 1 and L 2, respectively. Without loss of generality, assume that V N1 and V N2 are disjoint, i.e., V N1  V N2 = . Otherwise, we can always convert the given grammars to the ones that satisfy such property. Construct a regular grammar G with production rules S  S 1 | S 2 and all the rules in P 1 and P 2. Clearly, L(G) = L 1  L 2. (2) Clearly r 1 r 2 is a regular expression which denotes the language L 1 L 2, which means L 1 L 2 is regular. (3) Let r 1 be regular expression for L 1. Clearly, (r 1 ) * is regular expression for L 1  Since L 1 + = L 1  - {  }, by property (7) that will be proved, L 1 + is regular.

106 Proof of the closure Properties (cont’ed) (4) Suppose that the following FSA M 1 accepts L 1. We modify M 1 as shown below. Clearly, the resulting automaton recognizes the reversed language of L 1. a b b a a a b start Add a new accepting state Let the new accepting state be the start state, reverse the direction of all the edges, and the old start state be the only accepting state.   a b b a a a b start   a b b a a a b

107 Proof of the Closure Properties (cont’ed) You can also prove part (4) using a regular grammar G 1 using another form of regular grammars, where the production rules are restricted to the form either A  Bx, or A  x, where A and B are arbitrary nonterminal symbols, and x is a string of terminal symbol, or . (Recall that we chose to restrict to A  xB, or A  x.) If we reverse the right side of each production rule, then the resulting grammar G generates L 1 R. (5) As for part (4), we modify the finite transition graph M 1 of an automaton that recognizes L 1 as follows. Add the dead state, if it is not shown in the transition graph. (Recall that we usually do not show the dead state for convenience.) Change accepting states to non-accepting states and non-accepting states to accepting states. (6) Since L 1  L 2 = L 1  L 2 = L 1  L 2, and regular language are closed under union and complementation (properties (1) and (5) above), L 1  L 2 is regular. (7) Since L 1 - L 2 = L 1  L 2, it is regular by properties (5) and (6)above.

108 Properties of Context-free Languages Let L 1 and L 2 be CFL’s. (1) L 1  L 2 (the union) is CFL. (2) L 1 L 2 (the concatenation) is CFL. (3) L 1 * (the Kleene star) and L 1 + (the Kleene plus) are CFL. (4) L 1 R (the reversed language) is CFL. (5) L 1  L 2 (the intersection) is not necessarily CFL. (6) L 1 (the complement) is not necessarily CFL.

109 Proof of the Context-free Language Properties Let G 1 = ( V N1, V T1, P 1, S 1 ) and G 2 = ( V N2, V T2, P 2, S 2 ) be CF grammars that generate L 1 and L 2, respectively. Without loss of generality, assume that V N1 and V N2 are disjoint, i.e., V N1  V N2 = . (Otherwise, we can modify them.) (1) Construct a CFG G by merging the rules of grammars G 1 and G 2 and adding new rules S  S 1 | S 2. (This is the same technique for regular languages.) (2) Construct a CFG G by merging the rules of G 1 and G 2 and adding a new rule S  S 1 S 2. (3) For L 1 * add rules S  S 1 S |  in grammar G 1. For L 1 + add rules S  S 1 S | S 1,where S is new start symbol. (4) Construct a CFG from G 1 by changing each rule A   to A   R, i. e., reverse right side of each production rule.

110 (5) We know that L 1 = {a i b i c j  i, j  0 } and L 2 = {a k b n c n  k, n  0 } are CFL’s. But L 1  L 2 = {a i b i c i  i  0 } is not CFL. (6) Suppose that CFL’s are closed under complementation. Since CFL’s are closed under union (property (1)), and L 1  L 2 = L 1  L 2, which implies CFL’s are closed under intersection. This contradicts to the proven fact of property (5). Proof of the Context-free Language Properties (cont’ed)

111 Minimizing the Number of  -Production Rules Theorem. Given an arbitrary CFG G, we can construct a CFG G´ such that L(G) = L(G´) and if  is not in L(G), then G´ dose not have  - production rule. If   L(G), then S   is the only  -production rule of G´. Proof (an algorithm). Let G = (V T, V N, P, S), and let A, B  V N. We construct a CFG G´ = (V T,V N,P´, S) from G by the following steps. (1) Find the set W of all nonterminals of G which derive  as follows; W 0 = {A| A  V N and A   is in P}; Do W i+1 = W i  {A | A  V N and A   is in P, for some   W i + }; until (W i+1 = W i ); W = W i ; //W contains all nonterminal symbols from which  can be derived. (2) Delete all  -production from P. Call this new set of productions P 1. (3) Modify P 1 to P´ as follows: If a production A   is in P 1, then put the rules A   and A   into P´, for all  (   ) which are obtained from  by deleting one or more nonterminals in the set W constructed by step (1). (4) If S is in W, then add S   in P´.

112 Minimizing the Number of  -Production Rules (example) Convert the following CFG G to another CFG G´ such that L(G) = L(G´) and G´ has the smallest possible number of  -production rules. G: S  ADC | EFg A  aA |  D  FGH | b C  c |  E  a F  f |  G  Gg | H H  h |  Computing W: W 0 = {A, C, F, H} W 1 = W 0  {G} = {A, C, F, G, H} W 2 = W 1  {D} = {A, C, D, F, G, H} W 3 = W 2  {S} = {A, C, D, F, G, H, S} W 4 = W 3  {} = {A, C, D, F, G, H, S} P 1 : S  ADC | EFg A  aA D  FGH | b C  c E  a F  f G  Gg | H H  h P´: S  ADC | AD | AC | DC | A | D | C |  | EFg | Eg A  aA | a D  FGH | FG | FH | GH | F | G | H | b C  c E  a F  f G  Gg | g | H H  h

113 Eliminating Useless Symbols from a CFG Lemma 1. Given a CFG G = (V T, V N, P, S), we can construct an equivalent CFG G´ = (V T, V´ N, P´, S), such that every nonterminal symbol A in V´ N derives a string x  (V T ) * Proof. Let OLDV and NEWV be sets of nonterminals, and A be an arbitrary nonterminal. We construct V´ N and P´ as follows. OLDV =  ; NEWV = {A | A  w is in P for some w  (V T ) * }; while (OLDV  NEWV) do { OLDV = NEWV; NEWV = OLDV  {A | A   for some  in (V T  OLDV) * }; } V´ N = NEWV; P´ = {A   | A   is in P and   (V´ N  V T ) * };

114 Eliminating Useless Symbols from a CFG (cont’ed) Lemma 2. Given a CFG G = (V T,V N, P, S), we can construct an equivalent CFG G´ = (V´ T,V´ N, P´, S), such that, for each symbol X  V´ T  V´ N, the start symbol derives  X , for some ,   (V´ T  V´ N ) *, i.e., S can derive a sentential form (a string of terminals and nonterminals) which contains symbol X. Proof. The following algorithm computes V´ T, V´ N and P´. (1) Let V´ T and V´ N be the empty sets. (2) Put S into V´ N. (3) If A  V N is put into V´ N and A   1 |  2 |....  n, then all nonterminals in  i, 1  i  n, are put into V´ N and all terminals in are put into V´ T. (4) Repeat (3) until there is no symbol to be added to V´ N. (5) Let P´ contain all the productions in P except for the ones which have a symbol not in V´ T  V´ N.

115 Eliminating Useless Symbols from a CFG (cont’ed) Theorem. Given arbitrary CFG G = (V T, V N, P, S), we can construct an equivalent CFG G´ = (V´ T, V´ N, P´, S), such that, (1) for each A  V´ N, A  (V´) * T (i.e., A derives a terminal string or  ), and (2) for each X  V´ T  V´ N, S   X , for some ,   V´ N  (V´) * T, (i.e., the start symbol can drive a sentential form which contains X). Proof. Use Lemmas 1 and 2.

116 Eliminating Useless Symbols from a CFG (example) Example. Eliminate useless symbols from the following CFG G. G: S  AD | EFg A  aGD D  FGd C  cCEc E  Ee F  Ff |  G  Gg | g H  hH | h Step 1: Apply Lemma 1 to find the set of nonterminals V´ N such that every nonterminal symbol in V´ N derives a string x  (V T ) *. OLDV = {}; NEWV = {F, G, H} OLDV = NEWV; NEWV = OLDV  {D} = {D, F, G, H}; OLDV = NEWV; NEWV = OLDV  {A} = {A, D, F, G, H}; OLDV = NEWV; NEWV = OLDV  {S} = {A, D, F, G, H, S}; OLDV = NEWV; NEWV = OLDV  { } = {A, D, F, G, H, S}; V´ N = NEWV = {A, D, F, G, H, S} Find the set of rules P´. P´: S  AD A  aGD D  FGd F  Ff |  G  Gg | g H  hH | h

117 P´: S  AD A  aGD D  FGd F  Ff |  G  Gg | g H  hH | h Step 2: Find the set of symbols V´ = V´ T  V´ N such that each symbol in V´ can be derived starting from S. 1.V´ T = V´ N = {}; // initialize with empty set 2.V´ N = V´ N  {S} V´ T = V´ T  {} 3.V´ N = V´ N  {A, D} = {S, A, D} V´ T = V´ T  {} 4.V´ N = V´ N  {G, F} = {S, A, D, G, F} V´ T = V´ T  {a, d} ={a, d} 5.V´ N = V´ N  {} = {S, A, D, F, G} V´ T = V´ T  {a, d} ={a, d, g, f} 6.V´ N = V´ N  {} = {S, A, D, F, G} V´ T = V´ T  {} ={a, d, g, f} Cleaned set of rules: P´: S  AD A  aGD D  FGd F  Ff |  G  Gg | g

118 Remark: Notice that applying Lemma 2 first and then Lemma 1 may fail to eliminate all useless productions. Example. Consider grammar with rules P = {S  AB | a A  a} By applying Lemma 1 first, we have P = {S  a A  a }, then applying Lemma 2, we have P = {S  a }. However, if we apply Lemma 2 first, we have P = {S  AB | a A  a }. Then applying Lemma 1, we have P = {S  a A  a }, which still has a useless production. Eliminating Useless Symbols from a CFG (cont’ed)

119 There are two kinds of ambiguities in a language. Lexical ambiguity (or semantic ambiguity): A symbol or an expression has more than one meaning (e.g., story, saw). Syntactic ambiguity (or structural ambiguity): An expression can be parsed in two different ways. A CFG is ambiguous if the language has a string for which there are more than one parse tree. Ambiguous Context-free Grammars man A enteredroom the withpicture a man A enteredroom the withpicture a For a given context-free grammar G and a string x, the parse tree shows how x is derived with the rules of G (see an example on the next slide). In programming language different parse trees give different object codes. In this course we will only study syntactic ambiguity of context-free grammars. Example 1 (in natural language). “A man entered a room with a picture” can be interpreted in two different ways.

120 Ambiguous Context-free Grammars (cont’ed) Example 2 (in formal language). The following context-free grammar is ambiguous, because it has two parse trees shown in Figures (a) and (b) below for string p  q  r. G: S  S  S  S  S  S  A A  p  q  r Figure (a) S S S SS A AA p qr   S S S SSA AA pq r Figure (b)  

121 Some Techniques for Designing Unambiguous CFG (1) Use parenthesis such that each derivation tree generates unique string. Notice that this technique changes the language by introducing new terminal symbols, the parentheses. Example: Ambiguous G 1 : S  S  S  S  S  S  A A  p  q  r Unambiguous G 2 : S  (S  S)  (S  S)  (S)  A A  p  q  r S S S SS A AA pq r Figure (b). ((p  q )  r)   ( ) ()  Figure (a) (p  (q  r)) S S S S A AA p qr  ( ) ( ) S

122 Some Techniques for Designing Unambiguous CFG (2) Modify the production rules that cause the ambiguity. Examples: (a) Grammar G 3 below is clearly ambiguous grammar because it can either generate left side b first and then right side b or vice versa for string bcb. Grammar G 4 doesn’t have this possibility because it generates left side b’s first, if any. Ambiguous G 3 : S  bS  Sb  c Unambiguous G 4 : S  bS  A A  Ab  c S b S b S c S b S bS c S b S b A A c Figure (a). Ambiguity of G 3 Figure (b). Unambiguous G 4.

123 Some Techniques for Designing Unambiquous CFG (cont’ed) (b) The following grammar G 5 is ambiguous, since it can generate  in two ways. We eliminate this possibility by applying the technique of reducing  -production rules. Grammar G 6 is the result. G 5 : S  B  D B  bBc  D  dDe  G 6 : S  B  D  B  bBc  bc D  dDe  de (c) Grammar G 1 can be modified in two different ways to make it unambiguous. Notice that for G 7 we used the same technique for Example (a) above. G 7 : S  A  S  A  S  S  A A  p  q  r G 8 : S  D  S  D D  C  D  C C   C  A A  p  q  r For G 8 we set up a precedence rule such that , if any, is derived (by S) first, then  (by D) and  in that order from the top of the parse tree. The later an operator is derived the higher precedence it has over the others.

124 Known facts about ambiguous context-free grammars. There is no algorithm that can tell whether an arbitrary CFG is ambiguous or not. There is so called inherently ambiguous context-free languages for which every CFG is ambiguous. Here is an example. {a n b n c m d m  n, m  1}  {a n b m c m d n  n, m  1}. There is no algorithm that can convert an arbitrary ambiguous CFG, which is not inherently ambiguous, to an unambiguous one.

125 Normal Forms of Context-free Grammars When we investigate context-free grammars and their languages, sometimes it is convenient to make the right side of each production rule meet certain form. Such form is called normal form. There are two normal forms for context-free grammars; Chomsky Normal Form(CNF) and Greibach Normal Form(GNF). Let G = (V N, V T, P, S) be a context-free grammar. Grammar G is in CNF, if all the production rules of the grammar are of the form A  BC or A  a, where A, B, C  V N, a  V T. A context-free grammar is in GNF, if every production rule of the grammar is of the form A  a , where A  V N, a  V T, and   (V N ) *. Notice that  is a string of nonterminal symbols or a null string. We can show that every context-free grammar whose language does not contain  can be converted to CNF and GNF. (Recall that we can eliminate all  - production rules from a given context-free grammar, if its language does not contain .) The following example shows how to convert a context-free grammar to CNF. We can easily generalize the idea. Converting a context-free grammar to GNF is quite involved (see the text Chapter 6). We shall not study the proof.

126 Converting a Context-free Grammar to CNF(example) Suppose that a context-free grammar has a production rule A  aBCDbE, which is not in CNF. We introduce new nonterminal symbols and production rules in CNF such that A can derive the right side string aBCDbE as follows; A  A 1 B 1 A 1  a // and we let B 1 derive BCDbE as follows; B 1  BC 1 // and we let C 1 derive CDbE as follows; C 1  CD 1 // and we let D 1 derive DbE as follows; D 1  DE 1 // and we let E 1 derive bE as follows; E 1  F 1 E F 1  b // and we let E 1 derive bE as follows; Example. Convert the following context-free grammar to CNF. S  AaBCb A  abb B  aC C  aCb | ac Answer: S  AA 1 A 1  A 2 A 3 A 2  a A 3  BA 4 A 4  CA 5 A 5  b A  B 1 B 2 B 1  a B 2  B 3 B 4 B 3  b B 4  b B  C 1 C C 1  a C  D 1 D 2 | E 1 E 2 D 1  a D 2  CD 3 D 3  b E 1  a E 2  c