Presentation is loading. Please wait.

Presentation is loading. Please wait.

Normal forms and parsing

Similar presentations


Presentation on theme: "Normal forms and parsing"— Presentation transcript:

1 Normal forms and parsing
The Chinese University of Hong Kong Fall 2008 CSC 3130: Automata theory and formal languages Normal forms and parsing Andrej Bogdanov

2 Testing membership and parsing
Given a grammar How can we know if a string x is in its language? If so, can we reconstruct a parse tree for x? S → 0S1 | 1S0S1 | T T → S | e

3 First attempt S → 0S1 | 1S0S1 | T T → S |  x = 00111
Maybe we can try all possible derivations: S 0S1 00S11 01S0S11 0T1 when do we stop? 1S0S1 10S10S1 ... T S

4 Problems S → 0S1 | 1S0S1 | T T → S |  x = 00111
How do we know when to stop? S 0S1 00S11 01S0S11 when do we stop? 0T1 1S0S1 10S10S1 ...

5 Problems S → 0S1 | 1S0S1 | T T → S |  x = 01011
Idea: Stop derivation when length exceeds |x| Not right because of -productions We might want to eliminate -productions too S  0S1  01S0S11  01S011  01011 1 3 7 6 5

6 Problems S → 0S1 | 1S0S1 | T T → S |  x = 00111
Loops among the variables (S → T → S) might make us go forever We might want to eliminate such loops

7 Unit productions A unit production is a production of the form where A1 and A2 are both variables Example A1 → A2 grammar: unit productions: S → 0S1 | 1S0S1 | T T → S | R |  R → 0SR S T R

8 Removal of unit productions
If there is a cycle of unit productions delete it and replace everything with A1 Example A1 → A2 → ... → Ak → A1 S T S → 0S1 | 1S0S1 | T T → S | R |  R → 0SR S → 0S1 | 1S0S1 S → R |  R → 0SR R T is replaced by S in the {S, T} cycle

9 Removal of unit productions
For other unit productions, replace every chain by productions A1 → ,... , Ak →  Example A1 → A2 → ... → Ak →  S → 0S1 | 1S0S1 | R |  R → 0SR S → 0S1 | 1S0S | 0SR |  R → 0SR S → R → 0SR is replaced by S → 0SR, R → 0SR

10 Removal of -productions
A variable N is nullable if there is a derivation How to remove -productions (except from S) N   * Find all nullable variables N1, ..., Nk For i = 1 to k For every production of the form A → Ni, add another production A →  If Ni →  is a production, remove it If S is nullable, add the special production S → 

11 Example Find the nullable variables grammar nullable variables S  ACD
A a B   C  ED |  D  BC | b E  b B C D Find all nullable variables N1, ..., Nk

12 Finding nullable variables
To find nullable variables, we work backwards First, mark all variables A s.t. A   as nullable Then, as long as there are productions of the form where all of A1,…, Ak are marked as nullable, mark A as nullable A → A1… Ak

13 Eliminating e-productions
D  C S  AD D  B D  e S  AC S  A C  E S  ACD A a B   C  ED |  D  BC | b E  b nullable variables: B, C, D For i = 1 to k For every production of the form A → Ni, add another production A →  If Ni →  is a production, remove it

14 Recap After eliminating e-productions and unit productions, we know that every derivation doesn’t shrink in length and doesn’t go into cycles Exception: S →  We will not use this rule at all, except to check if e  L Note e-productions must be eliminated before unit productions S  a1…ak * where a1, …, ak are terminals

15 Example: testing membership
unit, e-prod eliminate S → 0S1 | 1S0S1 | T T → S |  S →  | 01 | 101 | 0S1 |10S1 | 1S01 | 1S0S1 x = 00111 S 01, 101 0S1 0011, 01011 00S11 strings of length ≥ 6 only strings of length ≥ 6 10S1 10011, strings of length ≥ 6 1S01 10101, strings of length ≥ 6 1S0S1 only strings of length ≥ 6

16 Algorithm 1 for testing membership
We can now use the following algorithm to check if a string x is in the language of G Eliminate all e-productions and unit productions If x = e and S → , accept; else delete S →  Let X := S While some new production P can be applied to X Apply P to X If X = x, accept If |X| > |x|, backtrack If no more productions can be applied to X, reject

17 Practical limitations of Algorithm I
Previous algorithm can be very slow if x is long There is a faster algorithm, but it requires that we do some more transformations on the grammar G = CFG of the java programming language x = code for a 200-line java program algorithm might take about steps!

18 Chomsky Normal Form A grammar is in Chomsky Normal Form if every production (except possibly S → e) is of the type Conversion to Chomsky Normal Form is easy: A → BC or A → a A → BcDE A → BCDE C → c A → BX1 X1 → CX2 X2 → DE break up sequences with new variables replace terminals with new variables C → c

19 Exercise Convert this CFG into Chomsky Normal Form: S   |ADDA A  a
C  c D  bCb

20 Algorithm 2 for testing membership
SAC S  AB | BC A  BA | a B  CC | b C  AB | a SAC B B SA B SC SA B AC AC B AC x = baaba b a a b a Idea: We generate each substring of x bottom up

21 Parse tree reconstruction
b AC B SA SC SAC S  AB | BC A  BA | a B  CC | b C  AB | a x = baaba Tracing back the derivations, we obtain the parse tree

22 Cocke-Younger-Kasami algorithm
Input: Grammar G in CNF, string x = x1…xk table cells For i = 1 to k If there is a production A  xi Put A in table cell ii For b = 2 to k For s = 1 to k – b Set t = s + b For j = s to t If there is a production A  BC where B is in cell sj and C is in cell jt Put A in cell st 1k 12 23 11 22 kk x x … xk 1 s j t k b Cell ij remembers all possible derivations of substring xi…xj


Download ppt "Normal forms and parsing"

Similar presentations


Ads by Google