Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms.

Similar presentations


Presentation on theme: "CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms."— Presentation transcript:

1 CSC 3130: Automata theory and formal languages Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130 The Chinese University of Hong Kong Normal forms and parsing Fall 2009

2 Testing membership and parsing Given a grammar How can we know if a string x is in its language? If so, can we obtain a parse tree for x ? Can we tell if the parse tree is unique? S → 0S1 | 1S0S1 | T T → S | e

3 First attempt Maybe we can try all possible derivations: S → 0S1 | 1S0S1 | T T → S |  x = 00111 S0S1 1S0S1 T 00S11 01S0S11 0T1 S  10S10S1... when do we stop?

4 Problems How do we know when to stop? S → 0S1 | 1S0S1 | T T → S |  x = 00111 S0S1 1S0S1 00S11 01S0S11 0T1 10S10S1... when do we stop?

5 Problems Idea: Stop derivation when length exceeds |x| Not right because of  -productions We might want to eliminate  -productions too S → 0S1 | 1S0S1 | T T → S |  x = 01011 S  0S1  01S0S11  01S011  01011 13765

6 Problems Loops among the variables ( S → T → S ) might make us go forever We want to eliminate such loops S → 0S1 | 1S0S1 | T T → S |  x = 00111

7 Removal of  -productions A variable N is nullable if there is a derivation How to remove  -productions (except from S ) Find all nullable variables N 1,..., N k For every production of the form A →  N i , add another production A →  If N i →  is a production, remove it If S is nullable, add the special production S →  N  N   *   

8 Example Find the nullable variables S  ACD A  a B   C  ED |  D  BC | b E  b BCD nullable variablesgrammar Find all nullable variables N 1,..., N k 

9 Finding nullable variables To find nullable variables, we work backwards –First, mark all variables A s.t. A   as nullable –Then, as long as there are productions of the form where all of A 1,…, A k are marked as nullable, mark A as nullable A → A 1 … A k

10 Eliminating  -productions S  ACD A  a B   C  ED |  D  BC | b E  b nullable variables: B, C, D For every production of the form A →  N i , add another production A →  If N i →  is a production, remove it  D  C S  AD D  B D   S  AC S  A C  E

11 Dealing with loops A unit production is a production of the form where A 1 and A 2 are both variables Example A 1 → A 2 S → 0S1 | 1S0S1 | T T → S | R |  R → 0SR grammar:unit productions: ST R

12 Removal of unit productions If there is a cycle of unit productions delete it and replace everything with A 1 Example A 1 → A 2 →... → A k → A 1 S → 0S1 | 1S0S1 | T T → S | R |  R → 0SR ST R S → 0S1 | 1S0S1 S → R |  R → 0SR T is replaced by S in the {S, T} cycle  

13 Removal of unit productions For other unit productions, replace every chain by productions A 1 → ,..., A k →  Example A 1 → A 2 →... → A k →  S → R → 0SR is replaced by S → 0SR, R → 0SR S → 0S1 | 1S0S1 | R |  R → 0SR S → 0S1 | 1S0S1 | 0SR |  R → 0SR

14 Recap After eliminating  -productions and unit productions, we know that every derivation doesn’t shrink in length and doesn’t go into cycles Exception: S →  –We will not use this rule at all, except to check if  L Note –  -productions must be eliminated before unit productions S  a 1 …a k where a 1, …, a k are terminals *

15 Example: testing membership S → 0S1 | 1S0S1 | T T → S |  x = 00111 S →  | 01 | 101 | 0S1 |10S1 | 1S01 | 1S0S1 S 01, 101 10S1 1S01 1S0S1 10011, strings of length ≥ 6 10101, strings of length ≥ 6 unit,  -prod eliminate only strings of length ≥ 6 0S1 0011, 01011 00S11 strings of length ≥ 6 only strings of length ≥ 6

16 Algorithm 1 for testing membership How to check if a string x ≠  is in L(G) Eliminate all  -productions and unit productions Let X := S While some new rule R can be applied to X Apply R to X If X = x, you have found a derivation for x If |X| > |x|, backtrack If no more rules can be applied to X, x is not in L    

17 Practical limitations of Algorithm I This method can be very slow if x is long There is a faster algorithm, but it requires that we do some more transformations on the grammar G = CFG of the java programming language x = code for a 200-line java program algorithm might take about 10 200 steps!

18 Chomsky Normal Form A grammar is in Chomsky Normal Form if every production (except possibly S →  ) is of the type Conversion to Chomsky Normal Form is easy: A → BC A → a or A → BcDE replace terminals with new variables A → BCDE C → c break up sequences with new variables A → BX 1 X 1 → CX 2 X 2 → DE C → c

19 Exercise Convert this CFG into Chomsky Normal Form: S   |ADDA A  a C  c D  bCb

20 Algorithm 2 for testing membership S  AB | BC A  BA | a B  CC | b C  AB | a x = baaba Idea: We generate each substring of x bottom up abbaa ACBB BSA SC B–B SAC–

21 Parse tree reconstruction S  AB | BC A  BA | a B  CC | b C  AB | a x = baaba abbaa ACACBBACACACAC BSASASASCSC B–B SAC– Tracing back the derivations, we obtain the parse tree

22 Cocke-Younger-Kasami algorithm For cells in last row If there is a production A  x i Put A in table cell ii For cells st in other rows If there is a production A  BC where B is in cell sj and C is in cell jt Put A in cell st x 1 x 2 … x k 11 22kk 12 23 …… 1k1k table cells s jtk 1 Input: Grammar G in CNF, string x = x 1 …x k Cell ij remembers all possible derivations of substring x i …x j


Download ppt "CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms."

Similar presentations


Ads by Google