# Lecture 17 Naveen Z Quazilbash Simplification of Grammars.

## Presentation on theme: "Lecture 17 Naveen Z Quazilbash Simplification of Grammars."— Presentation transcript:

Lecture 17 Naveen Z Quazilbash Simplification of Grammars

Overview Attendance Motivation Simplification of Grammars Eliminating useless variables Eliminating null productions Eliminating unit productions Quiz result

Motivation for grammar simplification Parsing Problem Given a CFG G and string w, determine if w ϵ L(G). Fundamental problem in compiler design and natural language processing If G is in general form then the procedure maybe very inefficient. So the grammar is “transformed” into a simpler form to make the parsing problem easier.

Simplification of Grammars It involves the removal of: 1. Useless variables 2. ε -productions 3. Unit productions

Useless variables: There are two types of useless variables: 1. Variables that cannot be reached 2. Variables that do not derive any strings

3. ε -productions E.g.: A  ε Note that if we remove these productions, the language no longer includes the empty string.

4. Unit productions: They are of the form A  B Or A  A

1) Unreachable Variables E.g.: S  BS|B|E A  DA|D|S B  CB|C C  aC|a D  bD|b E  cE|c

To find unreachable variables, draw a dependency graph Dependency Graph: Vertices of the graph are variables The graph doesn’t include alphabet symbols, such as “a” or “b” If there is a production A  …..B…, i.e., the left side is A and the right side includes B, then there is an edge A  B

A variable is reachable if there is a path from S to this variable S itself is always reachable After identifying unreachable variables, remove all productions with unreachable left side.

S  BS|B|E A  DA|D|S B  CB|C C  aC|a D  bD|b E  cE|c Drawing its dependency graph: Reachable: S, B, C, E S DAE CB

Grammar without unreachable variables: S  BS|B|E B  CB|C C  aC|a E  cE|c Ex: Determine its language!!

2) Variables that don’t terminate A variable A terminates if either: There is a production A  …. with no variables on the right, e.g. A  aabc, OR There is a production A  … where all variables on the right terminate; e.g. A  aBbaC, where B and C terminate. Note: to find all variables that terminate, keep looking for such productions until you cannot find any new ones.

TASK Example: S  A|BC|DE A  aA|bA B  bB|b C  EF D  dD|BD|BA E  aE|a F  cFc|c Remove all productions that include a variable that doesn’t terminate. Note: We remove a production if it has such a variable on either side.

Solution xS  A|BC|DE XA  aA|bA xB  bB|b xC  EF XD  dD|BD|BA xE  aE|a xF  cFc|c

S  BC B  bB|b C  EF E  aE|a F  cFc|c Ex: Determine its language.

3) Eliminating ε -Productions Nullable variables: A variable is nullable if either: There is a production A  ε, or There is a production A  B 1 B 2 …B n (only variables, no symbols), where all variables on the right side are nullable. Note: to find all nullable variables, keep looking for such productions, until you cannot find any new ones.

TASK S  SAB|SBC|BC A  aA|a B  bB|bC|C C  cC| ε First we find variables that can lead to the empty string: C=> ε B=>C=> ε S=>BC=>B=>C=> ε

xS  SAB|SBC|BC A  aA|a xB  bB|bC|C xC  cC| ε Thus, S, B, and C can lead to ε ; they are called nullable variables

For each production that has nullable variables, consider all possible ways to skip some of these variables and add the corresponding productions. E.g. W  aWXaYZb, suppose that X, Y and Z are nullable; then there are 8 ways to skip some of them. W  aWab|aWXab|aWaYb|aWaZb|aWXaYb|aWXaZb| aWaYZb|aWXaYZb

Back to our grammar where S,B and C are nullable: S  A|AB|SA|SAB|S|B|C|SB|BC|SBC A  aA|a B  b|bB|bC|C C  c|cC| ε Now, we can remove the ε- productions without changing the language. The only possible change is losing the empty string, if it is in the original language.

So our grammar without null productions becomes: S  A|AB|SA|SAB|S|B|C|SB|BC|SBC A  aA|a B  b|bB|bC|C C  c|cC

4) Eliminating Unit Productions S  Aa|B A  a|bc|B B  A|bb|C|cC C  a|C First, for every variable, we find all single variables that can be reached from it: For S: S=>B=>A, S=>B=>C For A: A=>B=>C For B: B=>A, B=>C For C: NONE (C itself doesn’t count)

For finding reachable single variables, what should we do?

Use Dependency Graph! Drawing Dependency Graph: Vertices of the graph are variables. If there is a unit production A  B, then there is an edge A  B. A single variable is reachable from A if there is a pth from A to B.

Dependency Graph: S A B C

To construct an equivalent grammar without unit productions: Remove all unit productions For each pair A=>*B, where B is a single variable reachable from A, consider all productions B  p 1 |p 2 |…|p n ; and add the corresponding productions A  p 1 |p 2 |…|p n. for example, since A=>*B and B  bb|cC, add the productions A  bb|cC

S  Aa|B A  a|bc|B B  A|bb|C|cC C  a|C S  Aa B  bb|cC A  a|bc CaCa Note that the variable B has become useless and we need to remove it! S  bb|cC|a|bc|a B  a|bc|a A  bb|cC|a C  a Old non-unit productions new productions

Summary Main steps of simplifying a grammar: Remove useless variables, which cannot be reached or do not terminate. Remove ε- productions. Remove unit productions. Remove useless variables again!