Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ambiguity Parsing algorithms

Similar presentations


Presentation on theme: "Ambiguity Parsing algorithms"— Presentation transcript:

1 Ambiguity Parsing algorithms
The Chinese University of Hong Kong Fall 2011 CSCI 3130: Automata theory and formal languages Ambiguity Parsing algorithms Andrej Bogdanov

2 Ambiguity E  E + E | E * E | (E) | N N  1N | 2N | 1 | 2 E * + N 1 2 E + * N 1 2 1+2*2 = 5 = 6 A CFG is ambiguous if some string has more than one parse tree

3 Example Is S  SS | x ambiguous? Yes, because x S x S xxx

4 Disambiguation S  SS | x S  Sx | x x
Sometimes we can rewrite the grammar to remove ambiguity

5 Disambiguation F T Divide expression into terms and factors F
E  E + E | E * E | (E) | N N  1N | 2N | 1 | 2 same precedence! F T Divide expression into terms and factors F 2 * (1 + 2 * 2)

6 Disambiguation An expression is a sum of one or more terms
E  E + E | E * E | (E) | N N  1N | 2N | 1 | 2 An expression is a sum of one or more terms E  T | E + T Each term is a product of one or more factors T  F | T * F Each factor is a parenthesized expression or a number F  (E) | 1 | 2

7 Parsing example E T E T F T F E F T E T E F T T F F F
+ T F T F * E ( ) F T E + T E + F T * T F F F 2 * ( * 2) + 1

8 Disambiguation Disambiguation is not always possible because
There exist inherently ambiguous languages There is no general procedure for disambiguation In programming languages, ambiguity comes from precedence rules, and we can do like in example In English, ambiguity is sometimes a problem: He ate the cookies on the floor

9 Ambiguity in English He ate the cookies on the floor

10 Parsing S → 0S1 | 1S0S | T input: 0011 T → S | e
How would we program the computer to build a parse tree for us?

11 Parsing ✔ S → 0S1 | 1S0S | T input: 0011 T → S | e
... S 0S1 1S0S T 00S11 01S0S1 0T1 000S111 00T11 ... ... 00S11 0011 S 10S10S ... First idea: Try all derivations

12 Problems Trying all derivations may take a very long time
If input is not in the language, parsing will never stop

13 When to stop Idea 2: Stop when S → 0S1 | 1S0S | T T → S | e Problems:
|derived string| > |input| Problems: S  0S1  0T1  01 1 3 2 S  T  S  T  … Derived strings may shrink because of “e-productions” Derivation may loop because of “unit productions” Task: remove e and unit productions

14 Removal of -productions
A variable N is nullable if it derives the empty string N   * Identify all nullable variables N Remove nullable variables carefully If start variable S is nullable: Add a new start variable S’ Add special productions S’ → S | 

15 Example grammar nullable variables S  ACD A a B   C  ED | 
D  BC | b E  b B C D If X  , mark X as nullable If X  YZ…W, all marked nullable, mark X as nullable also. Repeat the following: Identify all nullable variables

16 Eliminating e-productions
D  C S  AD D  B D  e S  AC S  A C  E S  ACD A a B   C  ED |  D  BC | b E  b nullable: B, C, D If you see X → N, add X →  If you see N → , remove it. For every nullable N: Remove nullable variables carefully

17 Eliminating unit productions
A unit production is a production of the form A → B grammar: unit productions graph: S → 0S1 | 1S0S | T T → S | R |  R → 0SR S T R

18 Removal of unit productions
If there is a cycle of unit productions delete it and replace everything with A A → B → ... → C → A S T R S → 0S1 | 1S0S | T T → S | R |  R → 0SR S → 0S1 | 1S0S S → R |  R → 0SR replace T by S

19 Removal of unit productions
Replace every chain by A → , B → ,... , C →  A → B → ... → C →  S R S → 0S1 | 1S0S | R |  R → 0SR S → 0S1 | 1S0S | 0SR |  R → 0SR S → R → 0SR is replaced by S → 0SR, R → 0SR

20 Recap If input is not in the language, parsing will never stop
Problem: Solution: Eliminate  productions Eliminate unit productions important to do in this order Try all possible derivations but stop parsing when |derived string| > |input|

21 Example input: 0011 conclusion: 0011 ∉ L S → 0S1 | 0S0S | T T → S | 0
0S0S 0S1 001 00S11 00S0S1 too long too long 000S 00S10S 00S0S0S 0000 1000S1 1000S0S too long too long too long too long

22 Problems Trying all derivations may take a very long time
If input is not in the language, parsing will never stop

23 A faster way to parse: the Cocke-Younger-Kasami algorithm
Preparations A faster way to parse: the Cocke-Younger-Kasami algorithm To use it we must prepare the CFG: Eliminate  productions Eliminate unit productions Convert CFG to Chomsky Normal Form

24 Chomsky Normal Form A CFG is in Chomsky Normal Form if every production* has the form A → BC or A → a Convert to Chomsky Normal Form: Noam Chomsky A → BcDE A → BCDE C → c A → BX X → CY Y → DE break up sequences with new variables replace terminals with new variables C → c * Exception: We allow S → e for start variable only

25 Cocke-Younger-Kasami algorithm
SAC S  AB | BC A  BA | a B  CC | b C  AB | a SAC B B SA B SC SA B AC AC B AC x = baaba b a a b a Idea: We generate each substring of x bottom up

26 Parse tree reconstruction
b AC B SA SC SAC S  AB | BC A  BA | a B  CC | b C  AB | a x = baaba Tracing back the derivations, we obtain the parse tree

27 Cocke-Younger-Kasami algorithm
Grammar without e and unit productions in Chomsky Normal Form table cells Input string x = x1…xk 1k For all cells in last row If there is a production A  xi Put A in table cell ii For cells st in other rows If there is a production A  BC where B is in cell sj and C is in cell (j+1)t Put A in cell st 12 23 11 22 kk x x … xk 1 s j t k Cell ij remembers all possible derivations of substring xi…xj


Download ppt "Ambiguity Parsing algorithms"

Similar presentations


Ads by Google