Download presentation

Presentation is loading. Please wait.

Published byBasil Long Modified about 1 year ago

1
CYK )Cocke-Younger-Kasami) Parsing Algorithm سید محمد حسین معطر پردازش زبان طبیعی دانشگاه صنعتی امیر کبیر دانشکده مهندسی کامپیوتر

2
Parsing Algorithms 4 CFGs are basis for describing (syntactic) structure of NL sentences 4 Thus - Parsing Algorithms are core of NL analysis systems 4 Recognition vs. Parsing: –Recognition - deciding the membership in the language: –Parsing – Recognition+ producing a parse tree for it 4 Parsing is more “difficult” than recognition? (time complexity) 4 Ambiguity - an input may have exponentially many parses

3
Parsing Algorithms 4 Parsing General CFLs vs. Limited Forms 4 Efficiency: – Deterministic (LR) languages can be parsed in linear time – A number of parsing algorithms for general CFLs require O(n 3 ) time – Asymptotically best parsing algorithm for general CFLs requires O(n 2.37 ), but is not practical 4 Utility - why parse general grammars and not just CNF? –Grammar intended to reflect actual structure of language –Conversion to CNF completely destroys the parse structure

4
CYK )Cocke-Younger-Kasami) 4 One of the earliest recognition and parsing algorithms 4 The standard version of CYK can only recognize languages defined by context-free grammars in Chomsky Normal Form (CNF). 4 It is also possible to extend the CYK algorithm to handle some grammars which are not in CNF –Harder to understand 4 Based on a “dynamic programming” approach: –Build solutions compositionally from sub-solutions –Store sub-solutions and re-use them whenever necessary 4 Uses the grammar directly (no PDA is used) 4 Recognition version: decide whether S == > w ?

5
CYK Algorithm 4 The CYK algorithm for the membership problem is as follows: –Let the input string be a sequence of n letters a1... an. –Let the grammar contain r terminal and nonterminal symbols R1... Rr, and let R1 be the start symbol. –Let P[n,n,r] be an array of booleans. Initialize all elements of P to false. –For each i = 1 to n For each unit production Rj -> ai, set P[i,1,j] = true. –For each i = 2 to n -- Length of span For each j = 1 to n-i+1 -- Start of span –For each k = 1 to i-1 -- Partition of span »For each production RA -> RB RC »If P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true –If P[1,n,1] is true Then string is member of language Else string is not member of language

6
CYK Pseudocode On input x = x 1 x 2 … x n : for (i = 1 to n) //create middle diagonal for (each var. A) if(A x i ) add A to table[i-1][i] for (d = 2 to n) // d’th diagonal for (i = 0 to n-d) for (k = i+1 to i+d-1) for (each var. A) for(each var. B in table[i][k]) for(each var. C in table[k][k+d]) if(A BC) add A to table[i][k+d] return S table[0][n] ? ACCEPT : REJECT

7
CYK Algorithm 4 this algorithm considers every possible consecutive subsequence of the sequence of letters and sets P[i,j,k] to be true if the sequence of letters starting from i of length j can be generated from Rk. 4 Once it has considered sequences of length 1, it goes on to sequences of length 2, and so on. 4 For subsequences of length 2 and greater, it considers every possible partition of the subsequence into two halves, and checks to see if there is some production P -> Q R such that Q matches the first half and R matches the second half. If so, it records P as matching the whole subsequence. 4 Once this process is completed, the sentence is recognized by the grammar if the subsequence containing the entire string is matched by the start symbol

8
CYK Algorithm for Deciding Context Free Languages Q: Consider the grammar G given by S | AB | XB T AB | XB X AT A a B b 1. Is x = aaabb in L(G ) 2. Is x = aaabbb in L(G )

9
CYK Algorithm for Deciding Context Free Languages The algorithm is “bottom-up” in that we start with bottom of derivation tree. S | AB | XB T AB | XB X AT A a B b aaabb

10
CYK Algorithm for Deciding Context Free Languages 1) Write variables for all length 1 substrings S | AB | XB T AB | XB X AT A a B b aaabb AAABB

11
CYK Algorithm for Deciding Context Free Languages 2) Write variables for all length 2 substrings S | AB | XB T AB | XB X AT A a B b aaabb AAABB TS,T

12
CYK Algorithm for Deciding Context Free Languages 3) Write variables for all length 3 substrings S | AB | XB T AB | XB X AT A a B b aaabb AAABB T X S,T

13
CYK Algorithm for Deciding Context Free Languages 4) Write variables for all length 4 substrings S | AB | XB T AB | XB X AT A a B b aaabb AAABB T X S,T

14
CYK Algorithm for Deciding Context Free Languages 5) Write variables for all length 5 substrings. S | AB | XB T AB | XB X AT A a B b REJECT! aaabb AAABB T X S,T X

15
CYK Algorithm for Deciding Context Free Languages Now look at aaabbb : S | AB | XB T AB | XB X AT A a B b aaabbb

16
CYK Algorithm for Deciding Context Free Languages 1) Write variables for all length 1 substrings. S | AB | XB T AB | XB X AT A a B b aaabb AAABB b B

17
CYK Algorithm for Deciding Context Free Languages 2) Write variables for all length 2 substrings. S | AB | XB T AB | XB X AT A a B b aaabb AAABB S,T b B

18
CYK Algorithm for Deciding Context Free Languages 3) Write variables for all length 3 substrings. S | AB | XB T AB | XB X AT A a B b aaabb AAABB T X b B S,T

19
CYK Algorithm for Deciding Context Free Languages 4) Write variables for all length 4 substrings. S | AB | XB T AB | XB X AT A a B b aaabb AAABB T X S,T b B

20
CYK Algorithm for Deciding Context Free Languages 5) Write variables for all length 5 substrings. S | AB | XB T AB | XB X AT A a B b aaabb AAABB T X S,T b B X

21
CYK Algorithm for Deciding Context Free Languages 6) Write variables for all length 6 substrings. S | AB | XB T AB | XB X AT A a B b S is included so aaabbb accepted! aaabb AAABB T X S,T b B X

22
CYK Algorithm for Deciding Context Free Languages Can also use a table for same purpose. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbb 1:aaabbb 2:aaabbb 3:aaabbb 4:aaabbb 5:aaabbb

23
CYK Algorithm for Deciding Context Free Languages 1. Variables for length 1 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbbA 1:aaabbbA 2:aaabbbA 3:aaabbbB 4:aaabbbB 5:aaabbbB

24
CYK Algorithm for Deciding Context Free Languages 2. Variables for length 2 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbbA- 1:aaabbbA- 2:aaabbbAS,T 3:aaabbbB- 4:aaabbbB- 5:aaabbbB

25
CYK Algorithm for Deciding Context Free Languages 3. Variables for length 3 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbbA-- 1:aaabbbA-X 2:aaabbbAS,T- 3:aaabbbB-- 4:aaabbbB- 5:aaabbbB

26
CYK Algorithm for Deciding Context Free Languages 4. Variables for length 4 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbbA--- 1:aaabbbA-XS,T 2:aaabbbAS,T-- 3:aaabbbB-- 4:aaabbbB- 5:aaabbbB

27
CYK Algorithm for Deciding Context Free Languages 5. Variables for length 5 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbbA---X 1:aaabbbA-XS,T- 2:aaabbbAS,T-- 3:aaabbbB-- 4:aaabbbB- 5:aaabbbB

28
CYK Algorithm for Deciding Context Free Languages 6. Variables for aaabbb. ACCEPTED! end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbbA---XS,T 1:aaabbbA-XS,T- 2:aaabbbAS,T-- 3:aaabbbB-- 4:aaabbbB- 5:aaabbbB

29
Parsing results 4 We keep the results for every w ij in a table. 4 Note that we only need to fill in entries up to the diagonal – the longest substring starting at i is of length n-i+1

30
Constructing parse tree 4 we need to construct parse trees for string w: 4 Idea: – Keep back-pointers to the table entries that we combine – At the end - reconstruct a parse from the back- pointers 4 This allows us to find all parse trees

31
Ambiguity Efficient Representation of Ambiguities 4 Local Ambiguity Packing : –a Local Ambiguity - multiple ways to derive the same substring from a non-terminal –All possible ways to derive each non-terminal are stored together –When creating back-pointers, create a single back-pointer to the “packed” representation 4 Allows to efficiently represent a very large number of ambiguities (even exponentially many) 4 Unpacking - producing one or more of the packed parse trees by following the back-pointers.

32
References 4 Hopcroft and Ullman,“Intro. to Automata Theory, Lang. and Comp.”Section 6.3, pp “CYK algorithm ”, Wikipedia, the free encyclopedia 4 A representation by Zeph Grunschlag

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google