Converting CFG into CNF CNF Extended CNF CFG in general vs. CFG for natural languages Converting CFG into CNF Converting PCFG into CNF Recovering parse trees
Definition of CNF A, B,C are non-terminal, a is terminal, S is start symbol Definition 1: –A B C, –A a, –S Where B, C are not start symbols. Definition 2: -free grammar –A B C –A a
Extended CNF Definition 3: –A B C –A a or A B We use Def 3: –Unit rules such as NP N are allowed. –No need to remove unit rules during conversion. –CYK algorithm needs to be modified.
CYK algorithm with Def 2 For every rule A w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then
CYK algorithm with Def 3 For every position i for all A, if A w_i, for all A and B, if A=>B, update For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: …. for all non-terminals A and B, if A B, update
Conversion from CFG to CNF CFG (in general) to CNF (Def 1) –Add S0 S –Remove e-rules –Remove unit rules –Replace n-ary rules with binary rules CFG (for NL) to CNF (Def 3) –CFG (for NL) has no e-rules –Unit rules are allowed in CNF (Def 3) –Only the last step is necessary
An example VP V NP PP PP To recover the parse tree w.r.t original CFG, just remove added non-terminals.
Converting PCFG into CNF VP V NP PP PP 0.1 => VP V X1 0.1 X1 NP X2 1.0 X2 PP PP 1.0
N-best parse trees Best parse tree: N-best parse trees:
CYK algorithm for N-best For every rule A w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: for each if val > one of probs in then remove the last element in and insert val to the array remove the last element in B[begin][end][A] and insert (m, B,C,i, j) to B[begin][end][A].
Mary bought books with cash S NP VP (1,1,1) S NP VP (1,1,2) VP V NP (2,1,1) VP VP PP (3,1,1) NP NP PP (3,1,1) PP P NP (4,1,1) N cash NP N ---P with S NP VP (1,1,1) VP V NP (2,1,1) N books NP N -V bought N book NP N
Three types of probability Joint prob: P(x,y)= prob of x and y happening together Conditional prob: P(x|y) = prob of x given a specific value of y Marginal prob: P(x) = prob of x for all possible values of y
Two variables A and B are independent if –P(A,B)=P(A)*P(B) –P(A)=P(A|B) –P(B)=P(B|A) Two variables A and B are conditional independent given C if –P(A,B|C)=P(A|C) * P(B|C) –P(A|B,C)=P(A|C) –P(B|A,C)=P(B|C) Independence assumption is used to remove some conditional factors, which will reduce the number of parameters in a model.
PCFG parsers It assumes each rule is independent of other rules
Problems of independence assumptions Lexical independence: –P(VP V, V bought) = P(VP V)*P(V bought) See Table 12.2 on M&S P418. cometakethinkwant VP->V9.5%2.6%4.6%5.7% VP->V NP1.1%32.1%0.2%13.9% VP->V PP34.5%3.1%7.1%0.3% VP->V SBAR6.6%0.3%73.0%0.2%
Problems of independence assumptions (cont) Structural independence: –P(S NP VP, NP Pron) = P(S NP VP) * P(NP Pron) See Table 12.3 on M&S P420. % as subj% as obj NP Pron13.7%2.1% NP Det NN 5.6%4.6% NP NP SBAR 0.5%2.6% NP NP PP 5.6%14.1%
Dealing with the problems Lexical rules: –P(VP V | V=come) –P(VP V | V=think) Adding context info: is a function that groups into equivalence classes.
PCFG It assumes each rule is independent of other rules