Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Ambiguity.

Similar presentations


Presentation on theme: "CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Ambiguity."— Presentation transcript:

1 CSCI 3130: Automata theory and formal languages Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130 The Chinese University of Hong Kong Ambiguity Parsing algorithm for CFGs Fall 2010

2 Ambiguity A grammar is ambiguous if some strings have more than one parse tree 1+2*2 E EE + EE * V VV 1 22 E EE * EE + V VV 12 2 E  E + E | E * E | ( E ) | N N  1 N | 2 N | 1 | 2 = 5 = 6 

3 Disambiguation Sometimes we can rewrite the grammar to remove the ambiguity E  E + E | E * E | ( E ) | N N  1 N | 2 N | 1 | 2 same precedence! Divide expression into terms and factors 2 * (1 + 2 * 2) FF T T FF

4 Disambiguation E  E + E | E * E | ( E ) | N N  1 N | 2 N | 1 | 2 E  T | E + T An expression is a sum of one or more terms Each term is a product of one or more factors T  F | T * F Each factor is a parenthesized expression or a number F  ( E ) | 1 | 2

5 Parsing example 2 * (1 + 1 + 2 * 2) + 1 E  T | E + T T  F | T * F F  ( E ) | 1 | 2 E T T E + T F * E () T F FF F F T E + TE + FT *

6 Disambiguation Disambiguation is not always possible –There exist inherently ambiguous languages –There is no general procedure for disambiguation In programming languages, ambiguity comes from precedence rules, and we can do like in example In English, ambiguity is sometimes a problem: He ate the cookies on the floor

7 Parsing Do we have a method for building a parse tree? Can we tell if the parse tree is unique? S → 0S1 | 1S0S1 | T T → S |  input: 00111

8 First attempt Maybe we can try all possible derivations: S → 0S1 | 1S0S1 | T T → S |  x = 00111 S0S1 1S0S1 T 00S11 01S0S11 0T1 S  10S10S1... when do we stop?

9 Problems How do we know when to stop? S → 0S1 | 1S0S1 | T T → S |  x = 00111 S0S1 1S0S1 00S11 01S0S11 0T1 10S10S1... when do we stop?

10 Problems Idea: Stop derivation when length exceeds |x| Not right because of  -productions We want to eliminate  -productions S → 0S1 | 1S0S1 | T T → S |  x = 01011 S  0S1  01S0S11  01S011  01011 13765

11 Problems Loops among the variables ( S → T → S ) might make us go forever We want to eliminate such loops S → 0S1 | 1S0S1 | T T → S |  x = 00111

12 Removal of  -productions A variable N is nullable if there is a derivation How to remove  -productions Find all nullable variables N For every production of the form A →  N , add another production A →  If N →  is a production, remove it If S is nullable, add the special production S →  N  N   *   

13 Example Find the nullable variables S  ACD A  a B   C  ED |  D  BC | b E  b BCD nullable variablesgrammar Find all nullable variables 

14 Finding nullable variables To find nullable variables, we work backwards –First, mark all variables A s.t. A   as nullable –Then, as long as there are productions of the form where all of A 1,…, A k are marked as nullable, mark A as nullable A → A 1 … A k

15 Eliminating  -productions S  ACD A  a B   C  ED |  D  BC | b E  b nullable variables: B, C, D For every production of the form A →  N , add another production A →  If N →  is a production, remove it  D  C S  AD D  B D   S  AC S  A C  E

16 Dealing with loops A unit production is a production of the form where A 1 and A 2 are both variables Example A 1 → A 2 S → 0S1 | 1S0S1 | T T → S | R |  R → 0SR grammar:unit productions: ST R

17 Removal of unit productions If there is a cycle of unit productions delete it and replace everything with A 1 Example A 1 → A 2 →... → A k → A 1 S → 0S1 | 1S0S1 | T T → S | R |  R → 0SR ST R S → 0S1 | 1S0S1 S → R |  R → 0SR T is replaced by S in the {S, T} cycle  

18 Removal of unit productions For other unit productions, replace every chain by productions A 1 → ,..., A k →  Example A 1 → A 2 →... → A k →  S → R → 0SR is replaced by S → 0SR, R → 0SR S → 0S1 | 1S0S1 | R |  R → 0SR S → 0S1 | 1S0S1 | 0SR |  R → 0SR

19 Recap After eliminating  -productions and unit productions, we know that every derivation doesn’t shrink in length and doesn’t go into cycles Exception: S →  –We will not use this rule at all, except to check if  L Note –  -productions must be eliminated before unit productions S  a 1 …a k where a 1, …, a k are terminals *

20 Example: testing membership S → 0S1 | 1S0S1 | T T → S |  x = 00111 S →  | 01 | 101 | 0S1 |10S1 | 1S01 | 1S0S1 S 01, 101 10S1 1S01 1S0S1 10011, strings of length ≥ 6 10101, strings of length ≥ 6 unit,  -prod eliminate only strings of length ≥ 6 0S1 0011, 01011 00S11 strings of length ≥ 6 only strings of length ≥ 6

21 Algorithm 1 for testing membership How to check if a string x ≠  is in L(G) Eliminate all  -productions and unit productions Let X := S While some new rule R can be applied to X Apply R to X If X = x, you have found a derivation for x If |X| > |x|, backtrack If no more rules can be applied to X, x is not in L    

22 Practical limitations of Algorithm I This method can be very slow if x is long There is a faster algorithm, but it requires that we do some more transformations on the grammar G = CFG of the java programming language x = code for a 200-line java program algorithm might take about 10 200 steps!

23 Chomsky Normal Form A CFG is in Chomsky Normal Form if every production (except S →  ) is Convert to Chomsky Normal Form: A → BC A → a or A → BcDE replace terminals with new variables A → BCDE C → c break up sequences with new variables A → BX 1 X 1 → CX 2 X 2 → DE C → c Noam Chomsky

24 Algorithm 2 for testing membership S  AB | BC A  BA | a B  CC | b C  AB | a x = baaba Idea: We generate each substring of x bottom up abbaa ACBB BSA SC B–B SAC–

25 Parse tree reconstruction S  AB | BC A  BA | a B  CC | b C  AB | a x = baaba abbaa ACACBBACACACAC BSASASASCSC B–B SAC– Tracing back the derivations, we obtain the parse tree

26 Cocke-Younger-Kasami algorithm For cells in last row If there is a production A  x i Put A in table cell ii For cells st in other rows If there is a production A  BC where B is in cell sj and C is in cell jt Put A in cell st x 1 x 2 … x k 11 22kk 12 23 …… 1k1k table cells s jtk 1 Input: Grammar G in CNF, string x = x 1 …x k Cell ij remembers all possible derivations of substring x i …x j


Download ppt "CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Ambiguity."

Similar presentations


Ads by Google