# CYK Parser Von Carla und Cornelia Kempa. Overview Top-downBottom-up Non-directional methods Unger ParserCYK Parser.

## Presentation on theme: "CYK Parser Von Carla und Cornelia Kempa. Overview Top-downBottom-up Non-directional methods Unger ParserCYK Parser."— Presentation transcript:

CYK Parser Von Carla und Cornelia Kempa

Overview Top-downBottom-up Non-directional methods Unger ParserCYK Parser

Cocke Younger Kasami -method

Recognition phase

Example grammar Number(s)  Integer | Real Integer  Digit | Integer Digit Real  Integer Fraction Scale Fraction . Integer Scale  e Sign Integer | Empty Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Empty  ɛ Sign  + | -

Example Sentence: 32.5e+1 1. concentrate on the substrings of the input sentence

Building the recognition table

32.5e +1 is in the language What problems can we already see in this example?

Another complication: Ɛ - rules Input : 43.1

The ɛ - Problem Shortest substrings of any input sentence : ɛ -substrings We must compute R ɛ the set of non-terminals that derive ɛ R ɛ = { Empty, Scale }

Non- empty substrings of the input sentence Input : z = z 1 z 2 z 3 z 4 ….z n Compute the set of Non-Terminals that derive the substring of z starting at position i, of length l.

Terminology (also on the handout) i index we are starting at l length of this substring R s i,l set of Non-Terminals deriving the substring s i, l S i, 0 = ɛ Set of Non- Terminals that derive ɛ : R s i,0 = R ɛ

S i, l = z i z i+1 …… z i+ l-1

The set of Non- Terminals deriving the substring s i, l : R s i, l 1.) substrings of length 0 S i, 0 = ɛ and R s i, l = R ɛ 2.) short substrings 3.) longer substrings (say l = j ) All the information on substrings with l < j is available

Check each RH-side (Right-Hand -side) in the grammar to see if it derives s i, l L  A1 ….Am S i, l ( divided into m segments (= possibly empty)) A1  first segment of s i, l A2  second segment of s i, l ….  ….

A 1 ….Am  s i,l So A1  first part of s i,l (let´s say A1 has to derive a first part of s i, l of length k) A1  s i, k A1 is in the set R s i,k

A 1 ….Am  s i,l Assuming this A2…Am has to derive the rest: A2 … Am  Si+k, l-k This is attempted for every k

Problems with this Approach 1) Consider A2…Am m could be 1 and A1 a Non-terminal  We are Dealing with a unit- rule A1 must derive the whole substring s i, l and thus be a member of R s i, l But that´s the set we are computing right now …

Solution to this problem A1  s i, l Somewhere along the derivation there must be a first step not using a unit rule A1  B  …  C  * s i, l C is the first Non-Terminal using a non-unit-rule in the derivation

Solution cont. At some stage C is added to Rs i, l If we repeat the process again and again At some point B will be added and in the next step A1 will be added  We have to repeat the process again and again until no new Non-Terminals are added to R s i,l

Problem 2 Ɛ -rules Consider all but one of the A t derive Ɛ B  A1 A2 A3 A4 A5 …. At B and A1 - t are Non-Terminals A2 – At derive Ɛ So what stays is : B  A1 A unit-rule

We have computed all the Rs i,l If S is a member of Rs 1, n the start symbol derives z (=s 1, n) (the input string)

CYK recognition with a grammar in ****- form: What are the Restrictions we want to have on our grammar ?

Useful Restrictions No ɛ - rules No unit-rules Limit the length of the right- hand side of each rule, say to two What we get out of this: A  a A  BC Where a is a terminal and ABC are Non- Terminals

Chomsky-Normal-Form… (… not only to annoy students ) Perfect grammar for CYK

How CYK works for a grammar in CNF R ɛ is empty R s i, 1 can be read directly from the rules (A  a) A rule A  BC can never derive a single terminal

Procedure Iteratively (as before) : 1) Fill the sets R s, 1 directly 2) Process all substrings of length 1 3) Process all substrings of length 2 4) Process all substrings of length l For the first step we use the rules of the form A  a For all the following steps we have to use the rules of the form: A  BC

CYK and CNF Question the CYK-Parser has to answear is: Does such a k exist?

Answearing this question is easy: Just try all possibilities no problem since you are a computer ;-) Range : from 1 to (l-1) All the sets R s i,k and R s i+k, l-1 have already been computed at this point

Transform our sample CF-grammar into Chomsky Normal Form Overview 1) eliminate ɛ -rules 2) eliminate unit-rules 3) remove non-productive non-terminals 4) remove non –reachable non-terminals 5) modify the rest until all grammar rules are of the form A  a, A  BC

Our number grammar in CNF Number(s)  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Number(s)  Integer Digit Number(s)  N1 Scale´ | Integer Fraction N1  Integer Fraction Integer  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Integer  Integer Digit Fraction  T1 Integer T1 . Scale ´  N2 Integer N2  T2 Sign T2  e Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Sign  + | -

Building the recognition table Input : Our example grammar in CNF input sentence: 32.5 e + 1

Building the recognition table 1) bottom-row : read directly from the grammar (rules of the form A  a ) 2) Check each RHS in the grammar

Check each RHS of the grammar Two Ways: Example: 2.5 e ( = s 2, 4) 1) check each RHS e.g N1 Scale´ 2) compute possible RH-Sides from the recognition table

How this is done 1) N1 not in R s 2, 1 or R s 2, 2 N1 is a member of R s 2, 3 But Scale´ is not a member of R s 5, 1 2) R s 2, 4 is the set of Non- Terminals that have a RHS AB where either: A in R s 2, 1 and B in R s 3, 3 A in R s 2, 2 and B in R s 4, 2 A in R s 2, 3 and B in R s 5, 1 Possible combinations: N1 T2 or Number T2 In our grammar we do not have such a RHS, so nothing is added to R s 2, 4.

Recognition table

Recognition table (well-formed substring table)

Computing R s i, l: follow the arrows V and W simultaneously A  BC, B a member of a set on the V arrow, C a member of a set on the W arrow

Comparison This process is much less complicated than the one we saw before Why?

Conclusion »This process is much less complicated Reasons: 1) We do not have to repeat the process again and again until no new Non- Terminals are added to R s i,l (The substrings we are dealing with are really substrings and cannot be equal to the string we start with)

Reasons cont. 2) We only have to find one place where the substring must be split into two A  B C Here !

Result of the algorithm we have seen so far: Complete collection of sets R s i, l These sets can be organized in a triangular table:

Cost of CYK - algorithm Operations dependent on n, the number of input symbols: (n * ( n+1) ) / 2 substrings to be examined For each substring : n-1 different k- positions as the worst case

Cost of CYK – algorithm cont. All other operations are independent of n  The algorithm works in a time at most proportional to n ³  That´s far more efficient than exhaustive search (time exponential in the length of the input sentence)

Download ppt "CYK Parser Von Carla und Cornelia Kempa. Overview Top-downBottom-up Non-directional methods Unger ParserCYK Parser."

Similar presentations