Presentation is loading. Please wait.

Presentation is loading. Please wait.

CYK )Cocke-Younger-Kasami) Parsing Algorithm

Similar presentations


Presentation on theme: "CYK )Cocke-Younger-Kasami) Parsing Algorithm"— Presentation transcript:

1 CYK )Cocke-Younger-Kasami) Parsing Algorithm
دانشگاه صنعتی امیر کبیر دانشکده مهندسی کامپیوتر CYK )Cocke-Younger-Kasami) Parsing Algorithm سید محمد حسین معطر پردازش زبان طبیعی

2 Parsing Algorithms CFGs are basis for describing (syntactic) structure of NL sentences Thus - Parsing Algorithms are core of NL analysis systems Recognition vs. Parsing: Recognition - deciding the membership in the language: Parsing – Recognition+ producing a parse tree for it Parsing is more “difficult” than recognition? (time complexity) Ambiguity - an input may have exponentially many parses

3 Parsing Algorithms Parsing General CFLs vs. Limited Forms Efficiency:
Deterministic (LR) languages can be parsed in linear time A number of parsing algorithms for general CFLs require O(n3) time Asymptotically best parsing algorithm for general CFLs requires O(n2.37), but is not practical Utility - why parse general grammars and not just CNF? Grammar intended to reflect actual structure of language Conversion to CNF completely destroys the parse structure

4 CYK )Cocke-Younger-Kasami)
One of the earliest recognition and parsing algorithms The standard version of CYK can only recognize languages defined by context-free grammars in Chomsky Normal Form (CNF). It is also possible to extend the CYK algorithm to handle some grammars which are not in CNF Harder to understand Based on a “dynamic programming” approach: Build solutions compositionally from sub-solutions Store sub-solutions and re-use them whenever necessary Uses the grammar directly (no PDA is used) Recognition version: decide whether S == > w ?

5 CYK Algorithm The CYK algorithm for the membership problem is as follows: Let the input string be a sequence of n letters a1 ... an. Let the grammar contain r terminal and nonterminal symbols R1 ... Rr, and let R1 be the start symbol. Let P[n,n,r] be an array of booleans. Initialize all elements of P to false. For each i = 1 to n For each unit production Rj -> ai, set P[i,1,j] = true. For each i = 2 to n -- Length of span For each j = 1 to n-i+1 -- Start of span For each k = 1 to i-1 -- Partition of span For each production RA -> RB RC If P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true If P[1,n,1] is true Then string is member of language Else string is not member of language

6 CYK Pseudocode On input x = x1x2 … xn :
for (i = 1 to n) //create middle diagonal for (each var. A) if(Axi) add A to table[i-1][i] for (d = 2 to n) // d’th diagonal for (i = 0 to n-d) for (k = i+1 to i+d-1) for(each var. B in table[i][k]) for(each var. C in table[k][k+d]) if(ABC) add A to table[i][k+d] return Stable[0][n] ? ACCEPT : REJECT

7 CYK Algorithm this algorithm considers every possible consecutive subsequence of the sequence of letters and sets P[i,j,k] to be true if the sequence of letters starting from i of length j can be generated from Rk. Once it has considered sequences of length 1, it goes on to sequences of length 2, and so on. For subsequences of length 2 and greater, it considers every possible partition of the subsequence into two halves, and checks to see if there is some production P -> Q R such that Q matches the first half and R matches the second half. If so, it records P as matching the whole subsequence. Once this process is completed, the sentence is recognized by the grammar if the subsequence containing the entire string is matched by the start symbol

8 CYK Algorithm for Deciding Context Free Languages
Q: Consider the grammar G given by S  e | AB | XB T  AB | XB X  AT A  a B  b Is x = aaabb in L(G ) Is x = aaabbb in L(G )

9 CYK Algorithm for Deciding Context Free Languages
The algorithm is “bottom-up” in that we start with bottom of derivation tree. S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b

10 CYK Algorithm for Deciding Context Free Languages
1) Write variables for all length 1 substrings S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b A A A B B

11 CYK Algorithm for Deciding Context Free Languages
2) Write variables for all length 2 substrings S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b A A A B B S,T T

12 CYK Algorithm for Deciding Context Free Languages
3) Write variables for all length 3 substrings S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b A A A B B S,T T X

13 CYK Algorithm for Deciding Context Free Languages
4) Write variables for all length 4 substrings S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b A A A B B S,T T X S,T

14 CYK Algorithm for Deciding Context Free Languages
Write variables for all length 5 substrings. S  e | AB | XB T  AB | XB X  AT A  a B  b REJECT! a a a b b A A A B B S,T T X S,T X

15 CYK Algorithm for Deciding Context Free Languages
Now look at aaabbb : S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b b

16 CYK Algorithm for Deciding Context Free Languages
1) Write variables for all length 1 substrings. S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b b A A A B B B

17 CYK Algorithm for Deciding Context Free Languages
2) Write variables for all length 2 substrings. S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b b A A A B B B S,T

18 CYK Algorithm for Deciding Context Free Languages
3) Write variables for all length 3 substrings. S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b b A A A B B B S,T T X

19 CYK Algorithm for Deciding Context Free Languages
4) Write variables for all length 4 substrings. S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b b A A A B B B S,T T X S,T

20 CYK Algorithm for Deciding Context Free Languages
5) Write variables for all length 5 substrings. S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b b A A A B B B S,T T X S,T X

21 CYK Algorithm for Deciding Context Free Languages
6) Write variables for all length 6 substrings. S  e | AB | XB T  AB | XB X  AT A  a B  b S is included so aaabbb accepted! a a a b b b A A A B B B S,T T X S,T X S,T

22 CYK Algorithm for Deciding Context Free Languages
Can also use a table for same purpose. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbb 1:aaabbb 2:aaabbb 3:aaabbb 4:aaabbb 5:aaabbb

23 CYK Algorithm for Deciding Context Free Languages
1. Variables for length 1 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbb A 1:aaabbb 2:aaabbb 3:aaabbb B 4:aaabbb 5:aaabbb

24 CYK Algorithm for Deciding Context Free Languages
2. Variables for length 2 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbb A - 1:aaabbb 2:aaabbb S,T 3:aaabbb B 4:aaabbb 5:aaabbb

25 CYK Algorithm for Deciding Context Free Languages
3. Variables for length 3 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbb A - 1:aaabbb X 2:aaabbb S,T 3:aaabbb B 4:aaabbb 5:aaabbb

26 CYK Algorithm for Deciding Context Free Languages
4. Variables for length 4 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbb A - 1:aaabbb X S,T 2:aaabbb 3:aaabbb B 4:aaabbb 5:aaabbb

27 CYK Algorithm for Deciding Context Free Languages
5. Variables for length 5 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbb A - X 1:aaabbb S,T 2:aaabbb 3:aaabbb B 4:aaabbb 5:aaabbb

28 CYK Algorithm for Deciding Context Free Languages
6. Variables for aaabbb. ACCEPTED! end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbb A - X S,T 1:aaabbb 2:aaabbb 3:aaabbb B 4:aaabbb 5:aaabbb

29 Parsing results We keep the results for every wij in a table.
Note that we only need to fill in entries up to the diagonal – the longest substring starting at i is of length n-i+1

30 Constructing parse tree
we need to construct parse trees for string w: Idea: Keep back-pointers to the table entries that we combine At the end - reconstruct a parse from the back-pointers This allows us to find all parse trees

31 Ambiguity Efficient Representation of Ambiguities
Local Ambiguity Packing : a Local Ambiguity - multiple ways to derive the same substring from a non-terminal All possible ways to derive each non-terminal are stored together When creating back-pointers, create a single back-pointer to the “packed” representation Allows to efficiently represent a very large number of ambiguities (even exponentially many) Unpacking - producing one or more of the packed parse trees by following the back-pointers.

32 References Hopcroft and Ullman,“Intro. to Automata Theory, Lang. and Comp.”Section 6.3, pp “CYK algorithm ” , Wikipedia, the free encyclopedia A representation by Zeph Grunschlag


Download ppt "CYK )Cocke-Younger-Kasami) Parsing Algorithm"

Similar presentations


Ads by Google