Presentation is loading. Please wait.

Presentation is loading. Please wait.

CYK )Cocke-Younger-Kasami) Parsing Algorithm سید محمد حسین معطر پردازش زبان طبیعی دانشگاه صنعتی امیر کبیر دانشکده مهندسی کامپیوتر.

Similar presentations


Presentation on theme: "CYK )Cocke-Younger-Kasami) Parsing Algorithm سید محمد حسین معطر پردازش زبان طبیعی دانشگاه صنعتی امیر کبیر دانشکده مهندسی کامپیوتر."— Presentation transcript:

1 CYK )Cocke-Younger-Kasami) Parsing Algorithm سید محمد حسین معطر پردازش زبان طبیعی دانشگاه صنعتی امیر کبیر دانشکده مهندسی کامپیوتر

2 Parsing Algorithms 4 CFGs are basis for describing (syntactic) structure of NL sentences 4 Thus - Parsing Algorithms are core of NL analysis systems 4 Recognition vs. Parsing: –Recognition - deciding the membership in the language: –Parsing – Recognition+ producing a parse tree for it 4 Parsing is more “difficult” than recognition? (time complexity) 4 Ambiguity - an input may have exponentially many parses

3 Parsing Algorithms 4 Parsing General CFLs vs. Limited Forms 4 Efficiency: – Deterministic (LR) languages can be parsed in linear time – A number of parsing algorithms for general CFLs require O(n 3 ) time – Asymptotically best parsing algorithm for general CFLs requires O(n 2.37 ), but is not practical 4 Utility - why parse general grammars and not just CNF? –Grammar intended to reflect actual structure of language –Conversion to CNF completely destroys the parse structure

4 CYK )Cocke-Younger-Kasami) 4 One of the earliest recognition and parsing algorithms 4 The standard version of CYK can only recognize languages defined by context-free grammars in Chomsky Normal Form (CNF). 4 It is also possible to extend the CYK algorithm to handle some grammars which are not in CNF –Harder to understand 4 Based on a “dynamic programming” approach: –Build solutions compositionally from sub-solutions –Store sub-solutions and re-use them whenever necessary 4 Uses the grammar directly (no PDA is used) 4 Recognition version: decide whether S == > w ?

5 CYK Algorithm 4 The CYK algorithm for the membership problem is as follows: –Let the input string be a sequence of n letters a1... an. –Let the grammar contain r terminal and nonterminal symbols R1... Rr, and let R1 be the start symbol. –Let P[n,n,r] be an array of booleans. Initialize all elements of P to false. –For each i = 1 to n For each unit production Rj -> ai, set P[i,1,j] = true. –For each i = 2 to n -- Length of span For each j = 1 to n-i+1 -- Start of span –For each k = 1 to i-1 -- Partition of span »For each production RA -> RB RC »If P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true –If P[1,n,1] is true Then string is member of language Else string is not member of language

6 CYK Pseudocode On input x = x 1 x 2 … x n : for (i = 1 to n) //create middle diagonal for (each var. A) if(A  x i ) add A to table[i-1][i] for (d = 2 to n) // d’th diagonal for (i = 0 to n-d) for (k = i+1 to i+d-1) for (each var. A) for(each var. B in table[i][k]) for(each var. C in table[k][k+d]) if(A  BC) add A to table[i][k+d] return S  table[0][n] ? ACCEPT : REJECT

7 CYK Algorithm 4 this algorithm considers every possible consecutive subsequence of the sequence of letters and sets P[i,j,k] to be true if the sequence of letters starting from i of length j can be generated from Rk. 4 Once it has considered sequences of length 1, it goes on to sequences of length 2, and so on. 4 For subsequences of length 2 and greater, it considers every possible partition of the subsequence into two halves, and checks to see if there is some production P -> Q R such that Q matches the first half and R matches the second half. If so, it records P as matching the whole subsequence. 4 Once this process is completed, the sentence is recognized by the grammar if the subsequence containing the entire string is matched by the start symbol

8 CYK Algorithm for Deciding Context Free Languages Q: Consider the grammar G given by S   | AB | XB T  AB | XB X  AT A  a B  b 1. Is x = aaabb in L(G ) 2. Is x = aaabbb in L(G )

9 CYK Algorithm for Deciding Context Free Languages The algorithm is “bottom-up” in that we start with bottom of derivation tree. S   | AB | XB T  AB | XB X  AT A  a B  b aaabb

10 CYK Algorithm for Deciding Context Free Languages 1) Write variables for all length 1 substrings S   | AB | XB T  AB | XB X  AT A  a B  b aaabb AAABB

11 CYK Algorithm for Deciding Context Free Languages 2) Write variables for all length 2 substrings S   | AB | XB T  AB | XB X  AT A  a B  b aaabb AAABB TS,T

12 CYK Algorithm for Deciding Context Free Languages 3) Write variables for all length 3 substrings S   | AB | XB T  AB | XB X  AT A  a B  b aaabb AAABB T X S,T

13 CYK Algorithm for Deciding Context Free Languages 4) Write variables for all length 4 substrings S   | AB | XB T  AB | XB X  AT A  a B  b aaabb AAABB T X S,T

14 CYK Algorithm for Deciding Context Free Languages 5) Write variables for all length 5 substrings. S   | AB | XB T  AB | XB X  AT A  a B  b REJECT! aaabb AAABB T X S,T X

15 CYK Algorithm for Deciding Context Free Languages Now look at aaabbb : S   | AB | XB T  AB | XB X  AT A  a B  b aaabbb

16 CYK Algorithm for Deciding Context Free Languages 1) Write variables for all length 1 substrings. S   | AB | XB T  AB | XB X  AT A  a B  b aaabb AAABB b B

17 CYK Algorithm for Deciding Context Free Languages 2) Write variables for all length 2 substrings. S   | AB | XB T  AB | XB X  AT A  a B  b aaabb AAABB S,T b B

18 CYK Algorithm for Deciding Context Free Languages 3) Write variables for all length 3 substrings. S   | AB | XB T  AB | XB X  AT A  a B  b aaabb AAABB T X b B S,T

19 CYK Algorithm for Deciding Context Free Languages 4) Write variables for all length 4 substrings. S   | AB | XB T  AB | XB X  AT A  a B  b aaabb AAABB T X S,T b B

20 CYK Algorithm for Deciding Context Free Languages 5) Write variables for all length 5 substrings. S   | AB | XB T  AB | XB X  AT A  a B  b aaabb AAABB T X S,T b B X

21 CYK Algorithm for Deciding Context Free Languages 6) Write variables for all length 6 substrings. S   | AB | XB T  AB | XB X  AT A  a B  b S is included so aaabbb accepted! aaabb AAABB T X S,T b B X

22 CYK Algorithm for Deciding Context Free Languages Can also use a table for same purpose. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbb 1:aaabbb 2:aaabbb 3:aaabbb 4:aaabbb 5:aaabbb

23 CYK Algorithm for Deciding Context Free Languages 1. Variables for length 1 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbbA 1:aaabbbA 2:aaabbbA 3:aaabbbB 4:aaabbbB 5:aaabbbB

24 CYK Algorithm for Deciding Context Free Languages 2. Variables for length 2 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbbA- 1:aaabbbA- 2:aaabbbAS,T 3:aaabbbB- 4:aaabbbB- 5:aaabbbB

25 CYK Algorithm for Deciding Context Free Languages 3. Variables for length 3 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbbA-- 1:aaabbbA-X 2:aaabbbAS,T- 3:aaabbbB-- 4:aaabbbB- 5:aaabbbB

26 CYK Algorithm for Deciding Context Free Languages 4. Variables for length 4 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbbA--- 1:aaabbbA-XS,T 2:aaabbbAS,T-- 3:aaabbbB-- 4:aaabbbB- 5:aaabbbB

27 CYK Algorithm for Deciding Context Free Languages 5. Variables for length 5 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbbA---X 1:aaabbbA-XS,T- 2:aaabbbAS,T-- 3:aaabbbB-- 4:aaabbbB- 5:aaabbbB

28 CYK Algorithm for Deciding Context Free Languages 6. Variables for aaabbb. ACCEPTED! end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbbA---XS,T 1:aaabbbA-XS,T- 2:aaabbbAS,T-- 3:aaabbbB-- 4:aaabbbB- 5:aaabbbB

29 Parsing results 4 We keep the results for every w ij in a table. 4 Note that we only need to fill in entries up to the diagonal – the longest substring starting at i is of length n-i+1

30 Constructing parse tree 4 we need to construct parse trees for string w: 4 Idea: – Keep back-pointers to the table entries that we combine – At the end - reconstruct a parse from the back- pointers 4 This allows us to find all parse trees

31 Ambiguity Efficient Representation of Ambiguities 4 Local Ambiguity Packing : –a Local Ambiguity - multiple ways to derive the same substring from a non-terminal –All possible ways to derive each non-terminal are stored together –When creating back-pointers, create a single back-pointer to the “packed” representation 4 Allows to efficiently represent a very large number of ambiguities (even exponentially many) 4 Unpacking - producing one or more of the packed parse trees by following the back-pointers.

32 References 4 Hopcroft and Ullman,“Intro. to Automata Theory, Lang. and Comp.”Section 6.3, pp. 139- 141 4 “CYK algorithm ”, Wikipedia, the free encyclopedia 4 A representation by Zeph Grunschlag


Download ppt "CYK )Cocke-Younger-Kasami) Parsing Algorithm سید محمد حسین معطر پردازش زبان طبیعی دانشگاه صنعتی امیر کبیر دانشکده مهندسی کامپیوتر."

Similar presentations


Ads by Google