Structure and Ambiguity Removing Ambiguity Chomsky Normal Form Pushdown Automata Intro (who is he foolin', thinking that there will be time to get to this?)

Slides:



Advertisements
Similar presentations
Closure Properties of CFL's
Advertisements

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
About Grammars CS 130 Theory of Computation HMU Textbook: Sec 7.1, 6.3, 5.4.
Western Michigan University CS6800 Advanced Theory of Computation Spring 2014 By Abduljaleel Alhasnawi & Rihab Almalki.
Pushdown Automata Part II: PDAs and CFG Chapter 12.
Chapter 4 Normal Forms for CFGs Chomsky Normal Form n Defn A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of.
CS5371 Theory of Computation
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.1: Context-Free Grammars) David Martin With some.
Context-Free Grammars Lecture 7
104 Closure Properties of Regular Languages Regular languages are closed under many set operations. Let L 1 and L 2 be regular languages. (1) L 1  L 2.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
Normal forms for Context-Free Grammars
How to Convert a Context-Free Grammar to Greibach Normal Form
January 15, 2014CS21 Lecture 61 CS21 Decidability and Tractability Lecture 6 January 16, 2015.
Lecture 9UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 9.
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
INHERENT LIMITATIONS OF COMPUTER PROGRAMS CSci 4011.
Chapter 12: Context-Free Languages and Pushdown Automata
EECS 6083 Intro to Parsing Context Free Grammars
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
Pushdown Automata (PDA) Intro
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
CSCI 2670 Introduction to Theory of Computing September 21, 2004.
Syntax Analysis The recognition problem: given a grammar G and a string w, is w  L(G)? The parsing problem: if G is a grammar and w  L(G), how can w.
Context-Free Grammars Chapter 11. Languages and Machines.
Classification of grammars Definition: A grammar G is said to be 1)Right-linear if each production in P is of the form A  xB or A  x where A and B are.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
Chapter 5 Context-Free Grammars
PART I: overview material
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
Context-Free Grammars Chapter 11. Languages and Machines.
Context Free Grammars. Context Free Languages (CFL) The pumping lemma showed there are languages that are not regular –There are many classes “larger”
Introduction to Parsing
1 Simplification of Context-Free Grammars Some useful substitution rules. Removing useless productions. Removing -productions. Removing unit-productions.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
Context-Free Grammars Chapter Languages and Machines 2.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Parsing Chapter 15. The Job of a Parser Examine a string and decide whether or not it is a syntactically well-formed member of L(G), and If it is, assign.
1 Chapter 6 Simplification of CFGs and Normal Forms.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Context-Free Grammars Chapter 11. Languages and Machines.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
Exercises on Chomsky Normal Form and CYK parsing
Context-Free Grammars Chapter 11. Languages and Machines.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Theory of Languages and Automata By: Mojtaba Khezrian.
Normal Forms (Chomsky and Greibach) Pushdown Automata (PDA) Intro PDA examples MA/CSSE 474 Theory of Computation.
Context-Free Grammars: an overview
CS510 Compiler Lecture 4.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Complexity and Computability Theory I
Simplifications of Context-Free Grammars
Context-Free Grammars
CHAPTER 2 Context-Free Languages
Context-Free Grammars
Structure and Ambiguity
Answer Questions about Exam2 problems
MA/CSSE 474 Theory of Computation
Presentation transcript:

Structure and Ambiguity Removing Ambiguity Chomsky Normal Form Pushdown Automata Intro (who is he foolin', thinking that there will be time to get to this?) MA/CSSE 474 Theory of Computation

Context free languages: We care about structure. E E +E id E * E 3 id id 5 7 Structure

Parse trees capture essential structure: S  SS  (S)S  ((S))S  (())S  (())(S)  (())() S  SS  (S)S  ((S))S  ((S))(S)  (())(S)  (())() S S S ( S ) ( S ) ( S )   Derivations and parse trees

Parse Trees A parse tree, derived from a grammar G = (V, , R, S), is a rooted, ordered tree in which: ● Every leaf node is labeled with an element of   {  }, ● The root node is labeled S, ● Every other node is labeled with an element of N, and ● If m is a non-leaf node labeled X and the (ordered) children of m are labeled x 1, x 2, …, x n, then R contains the rule X  x 1 x 2, … x n.

S NP VP Nominal VNP Adjs N Nominal AdjN the smart cat smells chocolate Structure in English

Generative Capacity Because parse trees matter, it makes sense, given a grammar G, to distinguish between: ● G’s weak generative capacity, defined to be the set of strings, L(G), that G generates, and ● G’s strong generative capacity, defined to be the set of parse trees that G generates.

Algorithms Care How We Search Algorithms for generation and recognition must be systematic. They typically use either the leftmost derivation or the rightmost derivation. S (S)(S) (S )  

Derivations of The Smart Cat A left-most derivation is: S  NP VP  the Nominal VP  the Adjs N VP  the Adj N VP  the smart N VP  the smart cat VP  the smart cat V NP  the smart cat smells NP  the smart cat smells Nominal  the smart cat smells N  the smart cat smells chocolate A right-most derivation is: S  NP VP  NP V NP  NP V Nominal  NP V N  NP V chocolate  NP smells chocolate  the Nominal smells chocolate  the Adjs N smells chocolate  the Adjs cat smells chocolate  the Adj cat smells chocolate  the smart cat smells chocolate

Ambiguity A grammar is ambiguous iff there is at least one string in L(G) for which G produces more than one parse tree *. For many applications of context-free grammars, this is a problem. Example: A programming language. If there can be two different structures for a string in the language, there can be two different meanings. Not good! * Equivalently, more than one leftmost derivation, or more than one rightmost derivation.

An Arithmetic Expression Grammar E  E + E E  E  E E  (E) E  id

Inherent Ambiguity Some CF languages have the property that every grammar for them is ambiguous. We call such languages inherently ambiguous. Example: L = {a n b n c m : n, m  0}  {a n b m c m : n, m  0}.

Inherent Ambiguity L = { a n b n c m : n, m  0}  { a n b m c m : n, m  0}. One grammar for L has the rules: S  S 1 | S 2 S 1  S 1 c | A/* Generate all strings in { a n b n c m }. A  a A b |  S 2  a S 2 | B/* Generate all strings in { a n b m c m }. B  b B c |  Consider any string of the form a n b n c n. It turns out that L is inherently ambiguous.

Ambiguity and undecidability Both of the following problems are undecidable*: Given a context-free grammar G, is G ambiguous? Given a context-free language L, is L inherently ambiguous? Informal definition of undecidable for the first problem: There is no algorithm (procedure that is guaranteed to always halt) that, given a grammar G, determines whether G is ambiguous.

But We Can Often Reduce Ambiguity We can get rid of: ● some  rules like S  , ● rules with symmetric right-hand sides, e.g., S  SS E  E + E ● rule sets that lead to ambiguous attachment of optional postfixes, such as if … else ….

A Highly Ambiguous Grammar S   S  SS S  (S)

Resolving the Ambiguity with a Different Grammar The biggest problem is the  rule. A different grammar for the language of balanced parentheses: S*   S*  S S  SS S  (S) S  () We'd like to have an algorithm for removing all  - productions… … except for the case where  - is actually in the language; then we introduce a new start symbol and have one  -production whose left side is that symbol.

Nullable Nonterminals Examples: S  a T a T   S  a T a T  A B A   B   A nonterminal X is nullable iff either: (1) there is a rule X  , or (2) there is a rule X  PQR… and P, Q, R, … are all nullable.

Nullable Nonterminals A nonterminal X is nullable iff either: (1) there is a rule X  , or (2) there is a rule X  PQR… and P, Q, R, … are all nullable. So compute N, the set of nullable nonterminals, as follows: 1. Set N to the set of nonterminals that satisfy (1). 2. Repeat until an entire pass is made without adding anything to N Evaluate all other nonterminals with respect to (2). If any nonterminal satisfies (2) and is not in N, insert it.

A General Technique for Getting Rid of  -Rules Definition: a rule is modifiable iff it is of the form: P   Q , for some nullable Q. removeEps(G: cfg) = 1. Let G = G. 2. Find the set N of nullable nonterminals in G. 3. Repeat until G contains no modifiable rules that haven’t been processed: Given the rule P   Q , where Q  N, add the rule P   if it is not already present and if    and if P  . 4. Delete from G all rules of the form X  . 5. Return G. L(G) = L(G) – {  }

An Example G = {{S, T, A, B, C, a, b, c }, { a, b, c }, R, S), R = {S  a T a T  ABC A  a A | C B  B b | C C  c |  } removeEps(G: cfg) = 1. Let G = G. 2. Find the set N of nullable nonterminals in G. 3. Repeat until G contains no modifiable rules that haven’t been processed: Given the rule P   Q , where Q  N, add the rule P   if it is not already present and if    and if P  . 4. Delete from G all rules of the form X  . 5. Return G.

What If   L? atmostoneEps(G: cfg) = 1. G  = removeEps(G). 2. If S G is nullable then/* i. e.,   L(G) 2.1 Create in G  a new start symbol S*. 2.2 Add to R G  the two rules: S*   S*  S G. 3. Return G .

But There Can Still Be Ambiguity S*   What about ()()() ? S*  S S  SS S  (S) S  ()

Eliminating Symmetric Recursive Rules S*   S*  S S  SS S  (S) S  () Replace S  SS with one of: S  SS 1 /* force branching to the left S  S 1 S /* force branching to the right So we get: S*   S  SS 1 S*  SS  S 1 S 1  (S) S 1  ()

Eliminating Symmetric Recursive Rules So we get: S*   S*  S S  SS 1 S  S 1 S 1  (S) S 1  () S* S SS 1 S 1 ( ) ( ) ( )

Arithmetic Expressions E  E + E E  E  E E  (E) E  id } E E EE E id  id  id Problem 1: Associativity

Arithmetic Expressions E  E + E E  E  E E  (E) E  id } E E EE E id  id + id Problem 2: Precedence

Arithmetic Expressions - A Better Way E  E + T E  T T  T * F T  F F  (E) F  id

Ambiguous Attachment The dangling else problem: ::= if then ::= if then else Consider: if cond 1 then if cond 2 then st 1 else st 2

::= | | ::= | | … ::= if ( ) else ::= if ( ) else if (cond) else The Java Fix

Hidden: Going Too Far S  NP VP NP  the Nominal | Nominal | ProperNoun | NP PP Nominal  N | Adjs N N  cat | girl | dogs | ball | chocolate | bat ProperNoun  Chris | Fluffy Adjs  Adj Adjs | Adj Adj  young | older | smart VP  V | V NP | VP PP V  like | likes | thinks | hits PP  Prep NP Prep  with ● Chris likes the girl with the cat. ● Chris shot the bear with a rifle.

● Chris likes the girl with the cat. ● Chris shot the bear with a rifle. Hidden: Going Too Far

Hidden: Comparing Regular and Context-Free Languages Regular LanguagesContext-Free Languages ● regular exprs. or ● regular grammars ● context-free grammars ● recognize ● parse

Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid objects, with the following two properties: ● For every element c of C, except possibly a finite set of special cases, there exists some element f of F such that f is equivalent to c with respect to some set of tasks. ● F is simpler than the original form in which the elements of C are written. By “simpler” we mean that at least some tasks are easier to perform on elements of F than they would be on elements of C.

Normal Form Examples ● Disjunctive normal form for database queries so that they can be entered in a query-by- example grid. ● Jordan normal form for a square matrix, in which the matrix is almost diagonal in the sense that its only non-zero entries lie on the diagonal and the superdiagonal. ● Various normal forms for grammars to support specific parsing techniques.

Normal Forms for Grammars Chomsky Normal Form, in which all rules are of one of the following two forms: ● X  a, where a  , or ● X  BC, where B and C are elements of V - . Advantages: ● Parsers can use binary trees. ● Exact length of derivations is known: S AB AAB B aab BB b b

Normal Forms for Grammars Greibach Normal Form, in which all rules are of the following form: ● X  a , where a   and   (V -  )*. Advantages: ● Every derivation of a string s contains |s| rule applications. ● Greibach normal form grammars can easily be converted to pushdown automata with no  - transitions. This is useful because such PDAs are guaranteed to halt.

Theorems: Normal Forms Exist Theorem: Given a CFG G, there exists an equivalent Chomsky normal form grammar G C such that: L(G C ) = L(G) – {  }. Proof: The proof is by construction. Theorem: Given a CFG G, there exists an equivalent Greibach normal form grammar G G such that: L(G G ) = L(G) – {  }. Proof: The proof is also by construction. Details of Chomsky conversion are complex but straightforward; I leave them for you to read in Chapter 11 and/or in the next 16 slides. Details of Greibach conversion are more complex but still straightforward; I leave them for you to read in Appendix D if you wish (not req'd).

Hidden: Converting to a Normal Form 1. Apply some transformation to G to get rid of undesirable property 1. Show that the language generated by G is unchanged. 2. Apply another transformation to G to get rid of undesirable property 2. Show that the language generated by G is unchanged and that undesirable property 1 has not been reintroduced. 3. Continue until the grammar is in the desired form.

Hidden: Rule Substitution X  a Y c Y  b Y  ZZ We can replace the X rule with the rules: X  abc X  a ZZ c X  a Y c  a ZZ c

Hidden: Rule Substitution Theorem: Let G contain the rules: X   Y  and Y   1 |  2 | … |  n, Replace X   Y  by: X   1 , X   2 , …, X   n . The new grammar G' will be equivalent to G.

Hidden: Rule Substitution Replace X   Y  by: X   1 , X   2 , …, X   n . Proof: ● Every string in L(G) is also in L(G'): If X   Y  is not used, then use same derivation. If it is used, then one derivation is: S  …   X    Y    k   …  w Use this one instead: S  …   X    k   …  w ● Every string in L(G') is also in L(G): Every new rule can be simulated by old rules.

Hidden: Convert to Chomsky Normal Form 1. Remove all  -rules, using the algorithm removeEps. 2. Remove all unit productions (rules of the form A  B). 3. Remove all rules whose right hand sides have length greater than 1 and include a terminal: (e.g., A  a B or A  B a C) 4. Remove all rules whose right hand sides have length greater than 2: (e.g., A  BCDE)

Remove all  productions: (1) If there is a rule P   Q  and Q is nullable, Then: Add the rule P  . (2) Delete all rules Q  . Hidden: Recap: Removing  - Productions

Example: S  a A A  B | CDC B   B  a C  BD D  b D   Hidden: Removing  -Productions

Hidden: Unit Productions A unit production is a rule whose right-hand side consists of a single nonterminal symbol. Example: S  X Y X  A A  B | a B  b Y  T T  Y | c

removeUnits(G) = 1. Let G' = G. 2. Until no unit productions remain in G' do: 2.1 Choose some unit production X  Y. 2.2 Remove it from G'. 2.3 Consider only rules that still remain. For every rule Y  , where   V*, do: Add to G' the rule X   unless it is a rule that has already been removed once. 3. Return G'. After removing epsilon productions and unit productions, all rules whose right hand sides have length 1 are in Chomsky Normal Form. Hidden: Removing Unit Productions

removeUnits(G) = 1. Let G' = G. 2. Until no unit productions remain in G' do: 2.1 Choose some unit production X  Y. 2.2 Remove it from G'. 2.3 Consider only rules that still remain. For every rule Y  , where   V*, do: Add to G' the rule X   unless it is a rule that has already been removed once. 3. Return G'. Hidden: Removing Unit Productions Example:S  X Y X  A A  B | a B  b Y  T T  Y | c

Hidden: Mixed Rules removeMixed(G) = 1. Let G = G. 2. Create a new nonterminal T a for each terminal a in . 3. Modify each rule whose right-hand side has length greater than 1 and that contains a terminal symbol by substituting T a for each occurrence of the terminal a. 4. Add to G, for each T a, the rule T a  a. 5. Return G. Example: A  a A  a B A  B a C A  B b C

Hidden: Long Rules removeLong(G) = 1. Let G = G. 2. For each rule r of the form: A  N 1 N 2 N 3 N 4 …N n, n > 2 create new nonterminals M 2, M 3, … M n Replace r with the rule A  N 1 M Add the rules: M 2  N 2 M 3, M 3  N 3 M 4, … M n-1  N n-1 N n. 5. Return G. Example: A  BCDEF

Hidden: An Example S  a AC a A  B | a B  C | c C  c C |  removeEps returns: S  a AC a | a A a | a C a | aa A  B | a B  C | c C  c C | c

Hidden: An Example Next we apply removeUnits: Remove A  B. Add A  C | c. Remove B  C. Add B  c C (B  c, already there). Remove A  C. Add A  c C (A  c, already there). So removeUnits returns: S  a AC a | a A a | a C a | aa A  a | c | c C B  c | c C C  c C | c S  a AC a | a A a | a C a | aa A  B | a B  C | c C  c C | c

Hidden: An Example S  a AC a | a A a | a C a | aa A  a | c | c C B  c | c C C  c C | c Next we apply removeMixed, which returns: S  T a ACT a | T a AT a | T a CT a | T a T a A  a | c | T c C B  c | T c C C  T c C | c T a  a T c  c

Hidden: An Example S  T a ACT a | T a AT a | T a CT a | T a T a A  a | c | T c C B  c | T c C C  T c C | c T a  a T c  c Finally, we apply removeLong, which returns: S  T a S 1 S  T a S 3 S  T a S 4 S  T a T a S 1  AS 2 S 3  AT a S 4  CT a S 2  CT a A  a | c | T c C B  c | T c C C  T c C | c T a  a T c  c

The Price of Normal Forms E  E + E E  (E) E  id Converting to Chomsky normal form: E  E E E  P E E  L E  E   E R E  id L  ( R  ) P  + Conversion doesn’t change weak generative capacity but it may change strong generative capacity.