Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Similar presentations


Presentation on theme: "Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein."— Presentation transcript:

1 Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein

2 Unlexicalized Parsing Hierarchical, adaptive refinement: 1,140 Nonterminal symbols1621min Parsing time 531,200 Rewrites [Petrov et al. ‘06] 91.2 F1 score on Dev Set (1600 sentences) DT 1 DT 2 DT 3 DT 4 DT 5 DT 6 DT 7 DT 8 DT 1 DT 2 DT 3 DT 4 DT 1 DT DT 2

3 1621 min

4 Coarse-to-Fine Parsing [Goodman ‘97, Charniak&Johnson ‘05] Coarse grammar NP … VP NP-dog NP-cat NP-apple VP-run NP-eat… Refined grammar … Treebank Parse Prune NP-17 NP-12 NP-1 VP-6 VP-31… Refined grammar … Parse

5 Prune? For each chart item X[i,j], compute posterior probability: …QPNPVP… coarse: refined: E.g. consider the span 5 to 12: < threshold

6 1621 min 111 min (no search error)

7 [Charniak et al. ‘06] NP … VP NP-dog NP-cat NP-apple VP-run NP-eat… Refined grammar … X A,B,.. Multilevel Coarse-to-Fine Parsing Add more rounds of pre-parsing Grammars coarser than X-bar ??? ?

8 Hierarchical Pruning Consider again the span 5 to 12: …QPNPVP… coarse: split in two: …QP1QP2NP1NP2VP1VP2… …QP1 QP3QP4NP1NP2NP3NP4VP1VP2VP3VP4… split in four: split in eight: ……………………………………………

9 Intermediate Grammars X-Bar= G 0 G= G1G2G3G4G5G6G1G2G3G4G5G6 Learning DT 1 DT 2 DT 3 DT 4 DT 5 DT 6 DT 7 DT 8 DT 1 DT 2 DT 3 DT 4 DT 1 DT DT 2

10 1621 min 111 min 35 min (no search error)

11 State Drift (DT tag) some this That these Thatthissome the these thissome that Thatthissome the these thissome that …………………………………………some thesethisThatThisthat EM

12 G1G2G3G4G5G6G1G2G3G4G5G6 Learning G1G2G3G4G5G6G1G2G3G4G5G6 Projected Grammars X-Bar= G 0 G= Projection  i 0(G)1(G)2(G)3(G)4(G)5(G)0(G)1(G)2(G)3(G)4(G)5(G) G

13 Estimating Projected Grammars Nonterminals? Nonterminals in G NP 1 VP 1 VP 0 S0S0 S1S1 NP 0 Nonterminals in  (G) VP S NP Projection  Easy:

14 Rules in G Rules in  (G) Estimating Projected Grammars Rules? S 1  NP 1 VP 1 0.20 S 1  NP 1 VP 2 0.12 S 1  NP 2 VP 1 0.02 S 1  NP 2 VP 2 0.03 S 2  NP 1 VP 1 0.11 S 2  NP 1 VP 2 0.05 S 2  NP 2 VP 1 0.08 S 2  NP 2 VP 2 0.12 S  NP VP ? ???

15 Treebank Estimating Projected Grammars [Corazza & Satta ‘06] Rules in  (G) S  NP VP Rules in G S1  NP1 VP1 0.20 S1  NP1 VP2 0.12 S1  NP2 VP1 0.02 S1  NP2 VP2 0.03 S2  NP1 VP1 0.11 S2  NP1 VP2 0.05 S2  NP2 VP1 0.08 S2  NP2 VP2 0.12 Infinite tree distribution … … 0.56 Estimating Grammars

16 Calculating Expectations  Nonterminals:  c k (X) : expected counts up to depth k  Converges within 25 iterations (few seconds)  Rules:

17 1621 min 111 min 35 min 15 min (no search error)

18 G1G2G3G4G5G6G1G2G3G4G5G6 Learning Parsing times X-Bar= G 0 G= 60 % 12 % 7 % 6 % 5 % 4 %

19 Bracket Posteriors (after G 0 )

20 Bracket Posteriors (after G 1 )

21 Bracket Posteriors (Movie)(Final Chart)

22 Bracket Posteriors (Best Tree)

23 Parse Selection Computing most likely unsplit tree is NP-hard:  Settle for best derivation.  Rerank n-best list.  Use alternative objective function. Parses: -2 Derivations: -2 -2 -2

24 Parse Risk Minimization  Expected loss according to our beliefs:  T T : true tree  T P : predicted tree  L : loss function (0/1, precision, recall, F1) [Titov & Henderson ‘06]  Use n-best candidate list and approximate expectation with samples.

25 Reranking Results ObjectivePrecisionRecallF1Exact BEST DERIVATION Viterbi Derivation89.689.489.537.4 Exact (non-sampled)90.8 41.7 Exact/F1 (oracle)95.394.495.063.9 RERANKING Precision (sampled)91.188.189.621.4 Recall (sampled)88.291.389.721.5 F1 (sampled)90.289.389.827.2 Exact (sampled)89.5 25.8

26 Dynamic Programming [Matsuzaki et al. ‘05] Approximate posterior parse distribution à la [Goodman ‘98] Maximize number of expected correct rules

27 ObjectivePrecisionRecallF1Exact BEST DERIVATION Viterbi Derivation89.689.489.537.4 DYNAMIC PROGRAMMING Variational90.790.990.841.4 Max-Rule-Sum90.591.390.940.4 Max-Rule-Product91.291.191.241.4 Dynamic Programming Results

28 Final Results (Efficiency)  Berkeley Parser:  15 min  91.2 F-score  Implemented in Java  Charniak & Johnson ‘05 Parser  19 min  90.7 F-score  Implemented in C

29 Final Results (Accuracy) ≤ 40 words F1 all F1 ENG Charniak&Johnson ‘05 (generative)90.189.6 This Work90.690.1 Charniak&Johnson ‘05 (reranked)92.091.4 GER Dubey ‘0576.3- This Work80.880.1 CHN Chiang et al. ‘0280.076.6 This Work86.383.4

30 Conclusions  Hierarchical coarse-to-fine inference  Projections  Marginalization  Multi-lingual unlexicalized parsing

31 Thank You! Parser available at http://nlp.cs.berkeley.edu


Download ppt "Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein."

Similar presentations


Ads by Google