How to find and remove unproductive rules in a grammar Roger L. Costello May 1, 2014 New! How to find and remove unreachable rules in a grammar.

Slides:



Advertisements
Similar presentations
How to convert a left linear grammar to a right linear grammar
Advertisements

1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
Closure Properties of CFL's
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Exercise 1: Balanced Parentheses Show that the following balanced parentheses grammar is ambiguous (by finding two parse trees for some input sequence)
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
Transforming Context-Free Grammars to Chomsky Normal Form 1 Roger L. Costello April 12, 2014.
CS 3240 – Chapter 6.  6.1: Simplifying Grammars  Substitution  Removing useless variables  Removing λ  Removing unit productions  6.2: Normal Forms.
Discussion #41/17 Discussion #4 Ambiguity & Precedence Grammars.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
CS5371 Theory of Computation
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Context-Free Grammars Lecture 7
Discussion #31/20 Discussion #3 Grammar Formalization & Parse-Tree Construction.
104 Closure Properties of Regular Languages Regular languages are closed under many set operations. Let L 1 and L 2 be regular languages. (1) L 1  L 2.
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
Normal forms for Context-Free Grammars
CS5371 Theory of Computation Lecture 8: Automata Theory VI (PDA, PDA = CFG)
Chapter 3: Formal Translation Models
How to Convert a Context-Free Grammar to Greibach Normal Form
January 15, 2014CS21 Lecture 61 CS21 Decidability and Tractability Lecture 6 January 16, 2015.
MA/CSSE 474 Theory of Computation
COP4020 Programming Languages
Cs466(Prasad)L8Norm1 Normal Forms Chomsky Normal Form Griebach Normal Form.
Context-free grammars are a subset of context-sensitive grammars
Problem of the DAY Create a regular context-free grammar that generates L= {w  {a,b}* : the number of a’s in w is not divisible by 3} Hint: start by designing.
Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
Recursive Descent Parsing for XML Developers Roger L. Costello 15 October
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
PART I: overview material
Context Free Grammar. Introduction Why do we want to learn about Context Free Grammars?  Used in many parsers in compilers  Yet another compiler-compiler,
11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.
CMSC 330: Organization of Programming Languages Context-Free Grammars.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Section 12.4 Context-Free Language Topics
Pushdown Automata Chapters Generators vs. Recognizers For Regular Languages: –regular expressions are generators –FAs are recognizers For Context-free.
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
1 Chapter 6 Simplification of CFGs and Normal Forms.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Context-Free Languages
Parsing and Code Generation Set 24. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program,
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Parsing using CYK Algorithm Transform grammar into Chomsky Form: 1.remove unproductive symbols 2.remove unreachable symbols 3.remove epsilons (no non-start.
Formal Languages and Grammars
Discrete Structures ICS252 Chapter 5 Lecture 2. Languages and Grammars prepared By sabiha begum.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2007.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Grammar Set of variables Set of terminal symbols Start variable Set of Production rules.
Exercises on Chomsky Normal Form and CYK parsing
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2006.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Even-Even Devise a grammar that generates strings with even number of a’s and even number of b’s.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Presentation transcript:

How to find and remove unproductive rules in a grammar Roger L. Costello May 1, 2014 New! How to find and remove unreachable rules in a grammar

Objective This mini-tutorial will answer these questions: 1.What are unproductive grammar rules? 2

Objective This mini-tutorial will answer these questions: 1.What are unproductive grammar rules? 2.Why remove unproductive rules? 3

Objective This mini-tutorial will answer these questions: 1.What are unproductive grammar rules? 2.Why remove unproductive rules? 3.Is there an intuitive algorithm to find unproductive rules? 4

Objective This mini-tutorial will answer these questions: 1.What are unproductive grammar rules? 2.Why remove unproductive rules? 3.Is there an intuitive algorithm to find unproductive rules? 4.Intuition is a dangerous master; is there a precise, formal algorithm to find unproductive rules? 5

Objective This mini-tutorial will answer these questions: 1.What are unproductive grammar rules? 2.Why remove unproductive rules? 3.Is there an intuitive algorithm to find unproductive rules? 4.Intuition is a dangerous master; is there a precise, formal algorithm to find unproductive rules? 5.Can we identify and eliminate unproductive rules in XML Schemas? 6

Objective This mini-tutorial will answer these questions: 1.What are unproductive grammar rules? 2.Why remove unproductive rules? 3.Is there an intuitive algorithm to find unproductive rules? 4.Intuition is a dangerous master; is there a precise, formal algorithm to find unproductive rules? 5.Can we identify and eliminate unproductive rules in XML Schemas? 6.New! What are unreachable rules, how do we identify them, and how do we eliminate them? 7

Context-free grammars The following discussion shows a systematic procedure for finding and eliminating unproductive rules in context-free grammars. – Finding and eliminating unproductive rules is decidable for context-free grammars. There is no procedure for finding and eliminating unproductive rules in context-sensitive or phrase- structure grammars. – Finding and eliminating unproductive rules is undecidable for context-sensitive and phrase- structure grammars. 8

SSABSSAB →→→→→→→→→ A B a bB This is a productive rule. It generates a string: S → A → a 9

SSABSSAB →→→→→→→→→ A B a bB This is an unproductive rule. It does not generate a string: S → B → bB → bbB → bbbB → bbbbB → … (the production process never terminates) 10

Definition A rule is productive if at least one string can be generated from it. A productive rule is also known as an active rule. 11

Why remove unproductive rules? Unproductive rules are not a fundamental problem: they do not obstruct the normal production process. Still, they are dead wood in the grammar and one would like to remove them. Also, when they occur in a grammar specified by a programmer they probably point at some error and one would like to detect them and give warning or error messages.

First, find productive rules To find unproductive rules we will first find the productive rules. The next few slides show an algorithm for finding productive rules.

Algorithm to find productive rules A rule is productive if its right-hand side consists of symbols all of which are productive. Productive symbols: – Terminal symbols are productive since they produce terminals. – Empty (ε) is productive since it produces the empty string. – A non-terminal is productive if there is a productive rule for it. 14

Example grammar The above grammar looks innocent: all its non- terminals are defined and it does not exhibit any suspicious constructions. 15 S → A B S → D E A → a B → b C C → c D → d F E → e F → f D

Initial knowledge RuleProductive S → A B | D E A → aA → a Productive B → b CB → b C C → cC → c D → d F E → eE → e Productive F → f D 16 Go through the grammar and for each rule for which we know that all its right-hand side symbols are productive, mark the rule and the non-terminal it defines as Productive.

Build on top of our knowledge RuleProductive S → A B | D E A → aA → a Productive B → b CB → b C Productive (since b is productive and C is productive) C → cC → c Productive D → d F E → eE → e Productive F → f D 17 Now we know more. Apply this knowledge in a second round through the grammar.

Round three RuleProductive S → A B S → D E Productive (since A is productive and B is productive) A → aA → a Productive B → b CB → b C Productive (since b is productive and C is productive) C → cC → c Productive D → d F E → eE → e Productive F → f D 18

Round four RuleProductive S → A B S → D E Productive (since A is productive and B is productive) A → aA → a Productive B → b CB → b C Productive (since b is productive and C is productive) C → cC → c Productive D → d F E → eE → e Productive F → f D 19 A fourth round yields nothing new.

Recap RuleProductive S → A B S → D E Productive (since A is productive and B is productive) A → aA → a Productive B → b CB → b C Productive (since b is productive and C is productive) C → cC → c Productive D → d F E → eE → e Productive F → f D 20 We now know the rules for A, B, C, E and the rule S → A B are productive. The rules for D, F, and the rule S → D E are unproductive.

Remove unproductive rules RuleProductive S → A B Productive (since A is productive and B is productive) A → aA → a Productive B → b CB → b C Productive (since b is productive and C is productive) C → cC → c Productive E → eE → e 21 We have pursued all possible avenues for productivity and have not found any possibilities for D, F, and the second rule for S. That means they are unproductive and can be removed from the grammar. The grammar after removing unproductive rules

Bottom-up process Removing the unproductive rules is a bottom-up process: only at the bottom level, where the terminal symbols live, can we know what is productive. 22

Find productive rules first We found the unproductive rules by finding the productive rules. After finding all productive rules, the other, remaining rules are the unproductive rules. 23

Knowledge-improving algorithm In the previous slides we increased our knowledge with each round. The previous slides illustrate a closure algorithm. 24

Closure algorithm Closure algorithms are characterized by two components: 1.Initialization: an assessment of what we know initially. For our problem we knew: The grammar rules Terminals and empty are productive 2.Inference rule: a rule telling how knowledge from several places is to be combined. The inference rule for our problem was: If all the right-hand side symbols of a rule are productive, then the rule’s left-hand side non-terminal is productive. The inference rule is repeated until nothing changes any more. 25

Subject to misinterpretation The closure algorithm that we used (below) is expressed in natural language. Natural languages are prone to misinterpretation. A rule is productive if its right-hand side consists of symbols all of which are productive. Symbols that are productive: – Terminal symbols are productive since they produce terminals. – Empty is productive since it produces the empty string. – A non-terminal is productive if there is a productive rule for it. Algorithm to find productive rules 26

Razor-sharp precision desired The following slides present a formal, succinct, precise algorithm for finding productive non- terminals.

Avoid Ambiguity Where possible it is desirable to express things mathematically, using equations. Why? Because an equation avoids the clumsiness and ambiguity of verbal descriptions. Likewise, where possible it is desirable to express algorithms formally, using standardized symbols. Why? Because standardized symbols avoids the clumsiness and ambiguity of verbal descriptions.

Identify rules with the form: X → a A rule is productive if its right-hand side consists of symbols all of which are productive. Symbols that are productive: – Terminal symbols are productive since they produce terminals. – Empty is productive since it produces the empty string. – A non-terminal is productive if there is a productive rule for it. Algorithm to find productive rules 29 Identify rules that use just terminal symbols or ε (empty). Create a set consisting of the rules’ non-terminals.

Symbols we will use Let: V N denote the set of non-terminal symbols V T the set of terminal symbols S the start symbol F the production rules 30

Transformation to a precise expression Identify rules that use just terminal symbols or ε (empty). Create a set consisting of the rules’ non-terminals. Terminal symbols are productive since they produce terminals. Empty is productive since it produces the empty string. A 1 = {X | X → P ∈ F for some P ∈ V T *} “ A 1 is the set of non-terminals X such that X has the form X → P, the rule is one of the grammar rules F, and P is zero or more terminal symbols V T * ” 31

Set A 1 for our example grammar S → A B S → D E A → a B → b C C → c D → d F E → e F → f D A 1 = {X | X → P ∈ F for some P ∈ V T *} A 1 = { A, C, E } These rules have the desired form. Add their non-terminals to A 1. Non-terminal symbols that are productive. 32

A 1 corresponds to the “initial knowledge” diagram A 1 is the set of non-terminals that have terminal symbols on the right-hand side. {X | X → P ∈ F for some P ∈ V T *} is a precise specification of what we intuitively did in this diagram: RuleProductive S → A B | D E A → aA → a Productive B → b CB → b C C → cC → c D → d F E → eE → e Productive F → f D 33

Productive non-terminals { A, C, E } We have identified these productive non-terminal symbols. 34

Identify rules that use terminals and productive non-terminals A rule is productive if its right-hand side consists of symbols all of which are productive. Symbols that are productive: – Terminal symbols are productive since they produce terminals. – Empty is productive since it produces the empty string. – A non-terminal is productive if there is a productive rule for it. Algorithm to find productive rules 35 Identify rules that use terminal symbols and productive non-terminals. Create a set consisting of the rules’ non-terminals. Merge this set with A 1.

Rule which uses terminal symbols and symbols from A 1 S → A B S → D E A → a B → b C C → c D → d F E → e F → f D The right-hand side of this rule consists of a terminal and an element of A 1. 36

Merge (union) sets 37 { A, C, E } { B } A 2 = { A, B, C, E }

Formal definition of set A 2 A 2 = A 1 ∪ {X | X → W ∈ F for some W ∈ (V T ∪ A 1 )* } “ A 2 is the union of A 1 with the set of non-terminals X that have the form X → W, the rule is one of the grammar rules F, and W is zero or more terminal symbols V T and symbols from A 1 ” 38

Productive non-terminals { A, B, C, E } We have identified these productive non-terminal symbols. 39

Make bigger and bigger sets A rule is productive if its right-hand side consists of symbols all of which are productive. Symbols that are productive: – Terminal symbols are productive since they produce terminals. – Empty is productive since it produces the empty string. – A non-terminal is productive if there is a productive rule for it. Algorithm to find productive rules 40 Create new sets until nothing is added to the next set, i.e., A i+1 = A i

Rule which uses symbols from A 2 S → A B S → D E A → a B → b C C → c D → d F E → e F → f D The right-hand side of this rule consists of symbols from A 2. 41

Distinguish the two rules for S S → A B S → D E A → a B → b C C → c D → d F E → e F → f D Let’s call this S1 Let’s call this S2 42

Merge (union) sets 43 { A, B, C, E } { S1 } A 3 = { A, B, C, E, S1 }

Set A 3 A 3 = A 2 ∪ {X | X → W ∈ F for some W ∈ (V T ∪ A 2 )* } “ A 3 is the union of A 2 with the set of non-terminals X that have the form X → W, the rule is one of the grammar rules F, and W is zero or more terminal symbols V T and symbols from A 2 ” 44

A 4 = A 3 S → A B S → D E A → a B → b C C → c D → d F E → e F → f D No additional rules are productive. 45

Grammar’s productive non-terminals { A, B, C, E, S1 } These are the grammar’s productive non-terminal symbols. 46

Formal algorithm for finding productive non-terminals 1.Create a set of all the non-terminals that have just terminal symbols on the right-hand side (RHS): A 1 = {X | X → P ∈ F for some P ∈ V T *} 2.Add to A 1 the non-terminals that have on the RHS non-terminals from A 1 concatenated to terminal symbols: A 2 = A 1 ∪ {X | X → W ∈ F for some W ∈ (V T ∪ A 1 )* } 3.Repeat step 2 until no more non-terminals are added to the set: A i+1 = A i ∪ {X | X → W ∈ F for some W ∈ (V T ∪ A i )* } 4.The resulting set A k consists of all productive non- terminals (those non-terminals that generate strings) 47

How to find unproductive rules in a grammar Find the productive non-terminals as described on the previous slide. Remove the rules for the non-terminals that are not productive. S → A B S → D E A → a B → b C C → c D → d F E → e F → f D remove unproductive rules S → A B A → a B → b C C → c E → e original grammar cleaned grammar 48

Empty Language A grammar might just consist of rules that loop infinitely, in which case the language generated by the grammar is empty, { }. Here’s how to determine if a grammar generates empty: – Find the productive non-terminals for a grammar. – If the start symbol is not in the set of productive non-terminals, then no string can be generated from and therefore the language generated by the grammar is empty. 49 The halting problem is decidable for CF grammars

Eliminate unproductive rules from XML Schemas An XML Schema defines a grammar. The next slide shows an XML Schema corresponding to this grammar: SSABSSAB →→→→→→→→→ ABaBABaB This is an unproductive rule. It does not generate a string: S → B → B → B → B → B → … (the production process never terminates) 50

XML Schema SSABSSAB →→→→→→→→→ ABaBABaB 51

Remove unproductive rules Cleaned XML Schema 52

Find and remove unreachable non-terminals 53

Reachable non-terminal SABSAB →→→→→→→ AabAab A is reachable. That is, we can get to it from the start symbol: S → A 54

Unreachable non-terminals SABSAB →→→→→→→ AabAab B is unreachable. That is, there is no way to get to it from the start symbol. 55

From the start symbol downward To find productive symbols we started with non-terminal symbols that have terminal symbols on the right-hand side. That is, we started at the bottom of a production tree and worked upward. To find reachable symbols we start at the top and work downward. 56

Closure algorithm for finding reachable non-terminals Initialization: the start symbol is marked “reachable”. Inference rule: for each rule in the grammar of the form A → α with A marked “reachable”, all non-terminals in α are marked “reachable”. Continue applying the inference rule until nothing changes any more. The remaining unmarked non-terminals are unreachable and their rules can be removed. 57

Initialization 58 RuleReachable S → A B S is reachable A → aA → a B → b CB → b C C → cC → c E → eE → e

Round one 59 RuleReachable S → A B S is reachable A → aA → a A is reachable because it is reachable from S B → b CB → b C B is reachable because it is reachable from S C → cC → c E → eE → e

Round two 60 RuleReachable S → A B S is reachable A → aA → a A is reachable because it is reachable from S B → b CB → b C B is reachable because it is reachable from S C → cC → c C is reachable because it is reachable from B E → eE → e

Round three 61 RuleReachable S → A B S is reachable A → aA → a A is reachable because it is reachable from S B → b CB → b C B is reachable because it is reachable from S C → cC → c C is reachable because it is reachable from B E → eE → e The third round produces no change. So the rule E → e is unreachable and is removed.

Cleaned grammar 62 S → A B | D E A → a B → b C C → c D → d F E → e F → f D S → A B A → a B → b C C → c E → e S → A B A → a B → b C C → c Initial grammar Grammar after removing unproductive rules Grammar after removing unreachable non- terminals

Subject to misinterpretation The closure algorithm that we used (below) is expressed in natural language. Natural languages are prone to misinterpretation. Initialization: the start symbol is marked “reachable”. Inference rule: for each rule in the grammar of the form A → α with A marked “reachable”, all non-terminals in α are marked “reachable”. Continue applying the inference rule until nothing changes any more. Algorithm to find reachable rules 63

Razor-sharp precision desired The following slides present a formal, succinct, precise algorithm for finding reachable non- terminals. 64

Set of reachable non-terminals 65

R 2 is a set consisting of the start symbol plus all the non-terminals that can be directly reached from the start symbol. This is expressed formally as 66

Non-terminals on RHS of S S → A B A → a B → b C C → c E → e {A, B}  Add these to {S} 67

Merge (union) sets 68 { S } { A, B } R 2 = { A, B, S }

R 2 plus its non-terminals R 3 consists of the symbols in R 2 plus all the non- terminals that can be directly reached from the symbols in R 2. This is expressed formally as “ R 3 is the union of R 2 with the non-terminals that are on the right-hand side of X, where X is a non-terminal in R 2. ” 69

Add non-terminals on RHS of non-terminals in R 2 S → A B A → a B → b C C → c E → e Add {C} to R 2 R 2 = { A, B, S } 70

Merge (union) sets 71 { A, B, S } { C } R 3 = { A, B, C, S }

Add non-terminals on RHS of non-terminals in R 3 S → A B A → a B → b C C → c E → e No additional non-terminals to add! R 3 = { A, B, C, S } 72

We have the set of reachable non-terminals S → A B A → a B → b C C → c E → e These are the reachable non- terminals in this grammar. So, the rule E → e can be removed. R 3 = { A, B, C, S } 73

Formal algorithm for finding reachable non-terminals 74

Non-redundant grammar Remove all the unproductive non-terminals. From the resulting grammar, remove all the unreachable non-terminals. The result is a non-redundant grammar. A non-redundant grammar is one where each non-terminal is both productive and reachable. It is also known as a reduced grammar. 75