# Probabilistic Parsing Ling 571 Fei Xia Week 4: 10/18-10/20/05.

## Presentation on theme: "Probabilistic Parsing Ling 571 Fei Xia Week 4: 10/18-10/20/05."— Presentation transcript:

Probabilistic Parsing Ling 571 Fei Xia Week 4: 10/18-10/20/05

Outline Misc: Hw3 and Hw4: lexicalized rules CYK recap –Converting CFG into CNF –N-best Quiz #2 Common prob equations Independence assumption Lexicalized models

CYK Recap

Converting CFG into CNF CNF Extended CNF CFG in general vs. CFG for natural languages Converting CFG into CNF Converting PCFG into CNF Recovering parse trees

Definition of CNF A, B,C are non-terminal, a is terminal, S is start symbol Definition 1: –A  B C, –A  a, –S  Where B, C are not start symbols. Definition 2: -free grammar –A  B C –A  a

Extended CNF Definition 3: –A  B C –A  a or A  B We use Def 3: –Unit rules such as NP  N are allowed. –No need to remove unit rules during conversion. –CYK algorithm needs to be modified.

CYK algorithm with Def 2 For every rule A  w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then

CYK algorithm with Def 3 For every position i for all A, if A  w_i, for all A and B, if A=>B, update For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: …. for all non-terminals A and B, if A  B, update

CFG CFG in general: –G=(N, T, P, S) –Rules: CFG for natural languages: –G=(N, T, P, S) –Pre-terminal: –Rules: Syntactic rules: Lexicon:

Conversion from CFG to CNF CFG (in general) to CNF (Def 1) –Add S0  S –Remove e-rules –Remove unit rules –Replace n-ary rules with binary rules CFG (for NL) to CNF (Def 3) –CFG (for NL) has no e-rules –Unit rules are allowed in CNF (Def 3) –Only the last step is necessary

An example VP  V NP PP PP To recover the parse tree w.r.t original CFG, just remove added non-terminals.

Converting PCFG into CNF VP  V NP PP PP 0.1 => VP  V X1 0.1 X1  NP X2 1.0 X2  PP PP 1.0

CYK with N-best output

N-best parse trees Best parse tree: N-best parse trees:

CYK algorithm for N-best For every rule A  w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: for each if val > one of probs in then remove the last element in and insert val to the array remove the last element in B[begin][end][A] and insert (m, B,C,i, j) to B[begin][end][A].

Mary bought books with cash S  NP VP (1,1,1) S  NP VP (1,1,2) VP  V NP (2,1,1) VP  VP PP (3,1,1) NP  NP PP (3,1,1) PP  P NP (4,1,1) N  cash NP  N ---P  with S  NP VP (1,1,1) VP  V NP (2,1,1) N  books NP  N -V  bought N  book NP  N

Common probability equations

Three types of probability Joint prob: P(x,y)= prob of x and y happening together Conditional prob: P(x|y) = prob of x given a specific value of y Marginal prob: P(x) = prob of x for all possible values of y

Common equations

An example #(words)=100, #(nouns)=40, #(verbs)=20 “books” appears 10 times, 3 as verbs, 7 as nouns P(w=books)=0.1 P(w=books,t=noun)=0.07 P(t=noun|w=books)=0.7 P(nouns)=0.4 P(w=books|t=nouns)=7/40

More general cases

Independence assumption

Two variables A and B are independent if –P(A,B)=P(A)*P(B) –P(A)=P(A|B) –P(B)=P(B|A) Two variables A and B are conditional independent given C if –P(A,B|C)=P(A|C) * P(B|C) –P(A|B,C)=P(A|C) –P(B|A,C)=P(B|C) Independence assumption is used to remove some conditional factors, which will reduce the number of parameters in a model.

PCFG parsers It assumes each rule is independent of other rules

Problems of independence assumptions Lexical independence: –P(VP  V, V  bought) = P(VP  V)*P(V  bought) See Table 12.2 on M&S P418. cometakethinkwant VP->V9.5%2.6%4.6%5.7% VP->V NP1.1%32.1%0.2%13.9% VP->V PP34.5%3.1%7.1%0.3% VP->V SBAR6.6%0.3%73.0%0.2%

Problems of independence assumptions (cont) Structural independence: –P(S  NP VP, NP  Pron) = P(S  NP VP) * P(NP  Pron) See Table 12.3 on M&S P420. % as subj% as obj NP  Pron13.7%2.1% NP  Det NN 5.6%4.6% NP  NP SBAR 0.5%2.6% NP  NP PP 5.6%14.1%

Dealing with the problems Lexical rules: –P(VP  V | V=come) –P(VP  V | V=think) Adding context info: is a function that groups into equivalence classes.

PCFG It assumes each rule is independent of other rules

A lexicalized model

An example he likes her

Collecting the counts

Remaining problems he likes her The Prob(T,S) is the same if the sentence is changed to “her likes he”.

Previous model

A new model

New formula he likes her