Download presentation

Presentation is loading. Please wait.

Published byAshlyn Lawlis Modified over 2 years ago

1
Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein

2
Unlexicalized Parsing Hierarchical, adaptive refinement: 1,140 Nonterminal symbols1621min Parsing time 531,200 Rewrites [Petrov et al. ‘06] 91.2 F1 score on Dev Set (1600 sentences) DT 1 DT 2 DT 3 DT 4 DT 5 DT 6 DT 7 DT 8 DT 1 DT 2 DT 3 DT 4 DT 1 DT DT 2

3
1621 min

4
Coarse-to-Fine Parsing [Goodman ‘97, Charniak&Johnson ‘05] Coarse grammar NP … VP NP-dog NP-cat NP-apple VP-run NP-eat… Refined grammar … Treebank Parse Prune NP-17 NP-12 NP-1 VP-6 VP-31… Refined grammar … Parse

5
Prune? For each chart item X[i,j], compute posterior probability: …QPNPVP… coarse: refined: E.g. consider the span 5 to 12: < threshold

6
1621 min 111 min (no search error)

7
[Charniak et al. ‘06] NP … VP NP-dog NP-cat NP-apple VP-run NP-eat… Refined grammar … X A,B,.. Multilevel Coarse-to-Fine Parsing Add more rounds of pre-parsing Grammars coarser than X-bar ??? ?

8
Hierarchical Pruning Consider again the span 5 to 12: …QPNPVP… coarse: split in two: …QP1QP2NP1NP2VP1VP2… …QP1 QP3QP4NP1NP2NP3NP4VP1VP2VP3VP4… split in four: split in eight: ……………………………………………

9
Intermediate Grammars X-Bar= G 0 G= G1G2G3G4G5G6G1G2G3G4G5G6 Learning DT 1 DT 2 DT 3 DT 4 DT 5 DT 6 DT 7 DT 8 DT 1 DT 2 DT 3 DT 4 DT 1 DT DT 2

10
1621 min 111 min 35 min (no search error)

11
State Drift (DT tag) some this That these Thatthissome the these thissome that Thatthissome the these thissome that …………………………………………some thesethisThatThisthat EM

12
G1G2G3G4G5G6G1G2G3G4G5G6 Learning G1G2G3G4G5G6G1G2G3G4G5G6 Projected Grammars X-Bar= G 0 G= Projection i 0(G)1(G)2(G)3(G)4(G)5(G)0(G)1(G)2(G)3(G)4(G)5(G) G

13
Estimating Projected Grammars Nonterminals? Nonterminals in G NP 1 VP 1 VP 0 S0S0 S1S1 NP 0 Nonterminals in (G) VP S NP Projection Easy:

14
Rules in G Rules in (G) Estimating Projected Grammars Rules? S 1 NP 1 VP 1 0.20 S 1 NP 1 VP 2 0.12 S 1 NP 2 VP 1 0.02 S 1 NP 2 VP 2 0.03 S 2 NP 1 VP 1 0.11 S 2 NP 1 VP 2 0.05 S 2 NP 2 VP 1 0.08 S 2 NP 2 VP 2 0.12 S NP VP ? ???

15
Treebank Estimating Projected Grammars [Corazza & Satta ‘06] Rules in (G) S NP VP Rules in G S1 NP1 VP1 0.20 S1 NP1 VP2 0.12 S1 NP2 VP1 0.02 S1 NP2 VP2 0.03 S2 NP1 VP1 0.11 S2 NP1 VP2 0.05 S2 NP2 VP1 0.08 S2 NP2 VP2 0.12 Infinite tree distribution … … 0.56 Estimating Grammars

16
Calculating Expectations Nonterminals: c k (X) : expected counts up to depth k Converges within 25 iterations (few seconds) Rules:

17
1621 min 111 min 35 min 15 min (no search error)

18
G1G2G3G4G5G6G1G2G3G4G5G6 Learning Parsing times X-Bar= G 0 G= 60 % 12 % 7 % 6 % 5 % 4 %

19
Bracket Posteriors (after G 0 )

20
Bracket Posteriors (after G 1 )

21
Bracket Posteriors (Movie)(Final Chart)

22
Bracket Posteriors (Best Tree)

23
Parse Selection Computing most likely unsplit tree is NP-hard: Settle for best derivation. Rerank n-best list. Use alternative objective function. Parses: -2 Derivations: -2 -2 -2

24
Parse Risk Minimization Expected loss according to our beliefs: T T : true tree T P : predicted tree L : loss function (0/1, precision, recall, F1) [Titov & Henderson ‘06] Use n-best candidate list and approximate expectation with samples.

25
Reranking Results ObjectivePrecisionRecallF1Exact BEST DERIVATION Viterbi Derivation89.689.489.537.4 Exact (non-sampled)90.8 41.7 Exact/F1 (oracle)95.394.495.063.9 RERANKING Precision (sampled)91.188.189.621.4 Recall (sampled)88.291.389.721.5 F1 (sampled)90.289.389.827.2 Exact (sampled)89.5 25.8

26
Dynamic Programming [Matsuzaki et al. ‘05] Approximate posterior parse distribution à la [Goodman ‘98] Maximize number of expected correct rules

27
ObjectivePrecisionRecallF1Exact BEST DERIVATION Viterbi Derivation89.689.489.537.4 DYNAMIC PROGRAMMING Variational90.790.990.841.4 Max-Rule-Sum90.591.390.940.4 Max-Rule-Product91.291.191.241.4 Dynamic Programming Results

28
Final Results (Efficiency) Berkeley Parser: 15 min 91.2 F-score Implemented in Java Charniak & Johnson ‘05 Parser 19 min 90.7 F-score Implemented in C

29
Final Results (Accuracy) ≤ 40 words F1 all F1 ENG Charniak&Johnson ‘05 (generative)90.189.6 This Work90.690.1 Charniak&Johnson ‘05 (reranked)92.091.4 GER Dubey ‘0576.3- This Work80.880.1 CHN Chiang et al. ‘0280.076.6 This Work86.383.4

30
Conclusions Hierarchical coarse-to-fine inference Projections Marginalization Multi-lingual unlexicalized parsing

31
Thank You! Parser available at http://nlp.cs.berkeley.edu

Similar presentations

OK

Inducing Structure for Perception Slav Petrov Advisors: Dan Klein, Jitendra Malik Collaborators: L. Barrett, R. Thibaux, A. Faria, A. Pauls, P. Liang,

Inducing Structure for Perception Slav Petrov Advisors: Dan Klein, Jitendra Malik Collaborators: L. Barrett, R. Thibaux, A. Faria, A. Pauls, P. Liang,

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on body language in interview Ppt on network theory migration Ppt on political parties and electoral process lesson Ppt on water resources in civil engineering Ppt on statistics in maths class Ppt on earth damage Ppt on backwater curve Ppt on successes and failures quotes Ppt on ready to serve beverages clip Ppt on fdi in india 2013