Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic Parsing Ling 571 Fei Xia Week 4: 10/18-10/20/05.

Similar presentations


Presentation on theme: "Probabilistic Parsing Ling 571 Fei Xia Week 4: 10/18-10/20/05."— Presentation transcript:

1 Probabilistic Parsing Ling 571 Fei Xia Week 4: 10/18-10/20/05

2 Outline Misc: Hw3 and Hw4: lexicalized rules CYK recap –Converting CFG into CNF –N-best Quiz #2 Common prob equations Independence assumption Lexicalized models

3 CYK Recap

4 Converting CFG into CNF CNF Extended CNF CFG in general vs. CFG for natural languages Converting CFG into CNF Converting PCFG into CNF Recovering parse trees

5 Definition of CNF A, B,C are non-terminal, a is terminal, S is start symbol Definition 1: –A  B C, –A  a, –S  Where B, C are not start symbols. Definition 2: -free grammar –A  B C –A  a

6 Extended CNF Definition 3: –A  B C –A  a or A  B We use Def 3: –Unit rules such as NP  N are allowed. –No need to remove unit rules during conversion. –CYK algorithm needs to be modified.

7 CYK algorithm with Def 2 For every rule A  w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then

8 CYK algorithm with Def 3 For every position i for all A, if A  w_i, for all A and B, if A=>B, update For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: …. for all non-terminals A and B, if A  B, update

9 CFG CFG in general: –G=(N, T, P, S) –Rules: CFG for natural languages: –G=(N, T, P, S) –Pre-terminal: –Rules: Syntactic rules: Lexicon:

10 Conversion from CFG to CNF CFG (in general) to CNF (Def 1) –Add S0  S –Remove e-rules –Remove unit rules –Replace n-ary rules with binary rules CFG (for NL) to CNF (Def 3) –CFG (for NL) has no e-rules –Unit rules are allowed in CNF (Def 3) –Only the last step is necessary

11 An example VP  V NP PP PP To recover the parse tree w.r.t original CFG, just remove added non-terminals.

12 Converting PCFG into CNF VP  V NP PP PP 0.1 => VP  V X1 0.1 X1  NP X2 1.0 X2  PP PP 1.0

13 CYK with N-best output

14 N-best parse trees Best parse tree: N-best parse trees:

15 CYK algorithm for N-best For every rule A  w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: for each if val > one of probs in then remove the last element in and insert val to the array remove the last element in B[begin][end][A] and insert (m, B,C,i, j) to B[begin][end][A].

16 Mary bought books with cash S  NP VP (1,1,1) S  NP VP (1,1,2) VP  V NP (2,1,1) VP  VP PP (3,1,1) NP  NP PP (3,1,1) PP  P NP (4,1,1) N  cash NP  N ---P  with S  NP VP (1,1,1) VP  V NP (2,1,1) N  books NP  N -V  bought N  book NP  N

17 Common probability equations

18 Three types of probability Joint prob: P(x,y)= prob of x and y happening together Conditional prob: P(x|y) = prob of x given a specific value of y Marginal prob: P(x) = prob of x for all possible values of y

19 Common equations

20 An example #(words)=100, #(nouns)=40, #(verbs)=20 “books” appears 10 times, 3 as verbs, 7 as nouns P(w=books)=0.1 P(w=books,t=noun)=0.07 P(t=noun|w=books)=0.7 P(nouns)=0.4 P(w=books|t=nouns)=7/40

21 More general cases

22 Independence assumption

23 Two variables A and B are independent if –P(A,B)=P(A)*P(B) –P(A)=P(A|B) –P(B)=P(B|A) Two variables A and B are conditional independent given C if –P(A,B|C)=P(A|C) * P(B|C) –P(A|B,C)=P(A|C) –P(B|A,C)=P(B|C) Independence assumption is used to remove some conditional factors, which will reduce the number of parameters in a model.

24 PCFG parsers It assumes each rule is independent of other rules

25 Problems of independence assumptions Lexical independence: –P(VP  V, V  bought) = P(VP  V)*P(V  bought) See Table 12.2 on M&S P418. cometakethinkwant VP->V9.5%2.6%4.6%5.7% VP->V NP1.1%32.1%0.2%13.9% VP->V PP34.5%3.1%7.1%0.3% VP->V SBAR6.6%0.3%73.0%0.2%

26 Problems of independence assumptions (cont) Structural independence: –P(S  NP VP, NP  Pron) = P(S  NP VP) * P(NP  Pron) See Table 12.3 on M&S P420. % as subj% as obj NP  Pron13.7%2.1% NP  Det NN 5.6%4.6% NP  NP SBAR 0.5%2.6% NP  NP PP 5.6%14.1%

27 Dealing with the problems Lexical rules: –P(VP  V | V=come) –P(VP  V | V=think) Adding context info: is a function that groups into equivalence classes.

28 PCFG It assumes each rule is independent of other rules

29 A lexicalized model

30 An example he likes her

31 Head-head probability

32 Head-rule probability

33 Collecting the counts

34 Remaining problems he likes her The Prob(T,S) is the same if the sentence is changed to “her likes he”.

35 Previous model

36 A new model

37 New formula he likes her


Download ppt "Probabilistic Parsing Ling 571 Fei Xia Week 4: 10/18-10/20/05."

Similar presentations


Ads by Google