# 101 The Cocke-Kasami-Younger Algorithm An example of bottom-up parsing, for CFG in Chomsky normal form G :S  AB | BB A  CC | AB | a B  BB | CA | b C.

## Presentation on theme: "101 The Cocke-Kasami-Younger Algorithm An example of bottom-up parsing, for CFG in Chomsky normal form G :S  AB | BB A  CC | AB | a B  BB | CA | b C."— Presentation transcript:

101 The Cocke-Kasami-Younger Algorithm An example of bottom-up parsing, for CFG in Chomsky normal form G :S  AB | BB A  CC | AB | a B  BB | CA | b C  BA | AA | b S B A a abb S B A S B A S B B S B B S B B An example of a CFG in CNF 2 possibilities for first production Possible splits for the string aabb

102 The CKYounger Algorithm Provides an efficient way of generating substring devisions and checking whether each substring can be legally derived A non terminal will be placed in the cell (i,j) if it can derive i consecutive symbols of the string starting at jth position Thus if the cell (4,1) contains S, string  L(G) If the cell (i,j) contains the nonterminal A 1 and the cell (i’,i+j) contains the nonterminal A 2 and there is a production A  A 1 A 2 then the cell (i+i’,j) will contain the nonterminal A

103 The CKYounger Algorithm Provides an efficient way of generating substring devisions and checking whether each substring can be legally derived G :S  AB | BB A  CC | AB | a B  BB | CA | b C  BA | AA | b A nonterminal will be placed in the cell (i,j) if it can derive i consecutive symbols of the string starting at jth position

104 The Cocke-Kasami-Younger Algorithm Relation derivation tree and pyramid S B A a abb S B A S B A

105 S B B S B B S B B

106 The Cocke-Kasami-Younger Algorithm Builds up the pyramid in a bottom- up fashion G :S  AB | BB A  CC | AB | a B  BB | CA | b C  BA | AA | b Step 1, fill the cell at row 1 Because of A  a Because of B  b, and C  b

107 The Cocke-Kasami-Younger Algorithm Builds up the pyramid in a bottom- up fashion G :S  AB | BB A  CC | AB | a B  BB | CA | b C  BA | AA | b Step 2, fill the cell at row 2 C is in cell (2,1) Because of C  AA and A is in cell (1,1) and A is in cell (1,2) S is in cell (2,3) Because of S  BB and B is in cell (1,3) and B is in cell (1,4) A is in cell (2,2) Because of A  AB and A is in cell (1,2) and B is in cell (1,3) B is in cell (2,3) Because of B  BB and B is in cell (1,3) and B is in cell (1,4)

108 The Cocke-Kasami-Younger Algorithm Builds up the pyramid in a bottom- up fashion G :S  AB | BB A  CC | AB | a B  BB | CA | b C  BA | AA | b Step 3, fill the cell at row 3 ? is in cell (3,1) Because of ?  XY X is in cell (1,1) Y is in cell (2,2) or X is in cell (2,1) Y is in cell (1,3) or A is in cell (3,1) Because of C  CC and C is in cell (2,1) and C is in cell (1,3) C is in cell (3,1) Because of C  AA and A is in cell (1,1) and A is in cell (2,2)

109 The Cocke-Kasami-Younger Algorithm Builds up the pyramid in a bottom- up fashion G :S  AB | BB A  CC | AB | a B  BB | CA | b C  BA | AA | b Step 4, fill the cell at row 4 General rule ? is in cell (i,j) Because of ?  XY X is in cell (m,j) Y is in cell (i-m,j+m) with 1  m  i-1 Since S is at the top, aabb  L(G) Step i S B A C A A aa C b b

110 Theorem The CKY algorithm is correct Given a grammar (T, N, P, S) in Chomsky normal form and w = x 1... x n  T * then A  N is in cell (i,j) of the CKY pyramid if and only if A  x j... x j+i-1 Proof by induction on the row number Base step i= 1 in row 1 we get the nonterminals from which length 1 substrings of the string to parse can be derived. This is only possible by using productions of type A  a. Thus if A is in cell (1,i), 1  i  n, then A  x i  P, thus A  x i Induction hypothesis theorem applies for all rows < i, i.e. all substrings of length < i. * *

111 Both cells have a lower row #, so induction hypothesis applies Induction step we first prove  Assume a derivation of a substring of length i, i>1, A  BC  x j... x j+i-1, then for some m > 0 there must hold that B  x j... x j+m-1 and C  x j+m... x j+i-1. Thus by the induction hypothesis if B is in cell (m,i) and C in the cell (i-m, j+m). Since there is a production A  BC, A is in the cell (m,i). We now prove  Assume A is in the cell (i,j), then form A we can derive a string x j... x j+i-1, with length i > 1, therefore there must be a production of the form A  BC with B,C  N, and for some m, 1  m  i-1, B is in cell (m,j) and C in the cell (i-m, j+m). By the induction hypothesis we have B  x j... x j+m-1 and C  x j+m... x j+i-1. Therefore we can write A  BC  x j... x j+i-1 and conclude A  x j... x j+i-1 * * * * *

112 The complexity of the CKY algorithm The time complexity for w  L(G)? Let G = (T, N, P, S) be a CFG in Chomsky normal form, with k = #N. Then using the CKY algorithm, w  L(G) can be decided in time proportional to n 3, where n = |w|. Proof First notice that the number of entries in a cell is at most k. maximum number of productions is k 3, I Complexity for row 1 cells For each A  N, we have to check if it can be placed in cell(1,i), i.e. if A derives (in 1 step) the terminal on position i. There are k nonterminals, thus cost per cell is k X 1. There are n row 1 cells, thus total cost for row 1 = kn. Cfr. 3 Each nonterminal can only occur once in a cell A  BC

113 II Complexity for cell in a row > 1 The content of a cell is the result of at most n-1 pairings of lower cells. For each paring at most k nonterminals are paired with at most k other nonterminals, and each pairing is checked against at most k 3 productions. Thus for each cell : cost  k X k X k 3 X 1 X (n-1) = k 5 X (n-1) There are (n-1)+ (n-2) + …. + 1 = n(n-1)/2 cells in rows 2 to n, thus total cost for these rows is bounded above by n(n-1)/2 X k 5 X (n-1) To conclude : The total cost is bounded above by : kn + n(n-1)/2 X k 5 X (n-1) See slide 119 Cfr. 1 and 2 Since k is independent of n the conclusion is  (n 3 )

114 Some remarks Not really of practical use since  (n 3 ) is to slow the grammar must be converted to CNF only tests membership, this is not the complexity for building the derivation tree Semantics!!!! See course on compilers for faster algorithms To think about : CKY and unambiguous grammars.

Download ppt "101 The Cocke-Kasami-Younger Algorithm An example of bottom-up parsing, for CFG in Chomsky normal form G :S  AB | BB A  CC | AB | a B  BB | CA | b C."

Similar presentations