Download presentation

Presentation is loading. Please wait.

Published byJesus Alborn Modified about 1 year ago

1
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš Ercegovčević

2
Outline Introduction EM algorithm Latent Grammars Motivation Learning Latent PCFG Split-Merge Adaptation Efficient inference with Latent Grammars Pruning in Multilevel Coarse-to-Fine parsing Parse Selection

3
Introduction : EM Algorithm Iterative algorithm for finding MLE or MAP estimates of parameters in statistical models X – observed data; Z – set of latent variables Θ – a vector of unknown parametes Likelihood function: MLE of the marginal likelihood : However this quantity is intractable Often we don’t know both Z and Θ

4
Introduction : EM Algorithm Find the MLE of the marginal likelihood by iteratively applying two steps: Expectation step (E-step): Calculate Z under current Θ Maximization step (M-step): Find Θ that maximizes the quantity

5
Latent PCFG Standard coarse Treebank Tree Baseline for parsing F1 72.6

6
Parent annotated trees [Johnson ’98], [Klein & Manning ’03] F1 86.3 Latent PCFG

7
Head lexicalized [Collins ’99, Charniak ’00] trees F1 88.6

8
Latent PCFG Automatically clustered categories with F1 86.7 [Matsuzaki et al. ’05] Same number of subcategories for all categories

9
Latent PCFG At each step split the categories into two sets. After 6 iterations number of subcategories is 64 Initialize EM with the results of the smaller grammar

10
Learning Latent PCFG S Induce subcategories Like forward-backward for HMMs Fixed brackets Forward X1X1 X2X2 X7X7 X4X4 X5X5 X6X6 X3X3 Hewasright. Backward

11
Learning Latent Grammar Inside-Outside probabilities

12
Learning Latent Grammar Expectation step (E-step): Maximization step (M-step):

13
Latent Grammar : Adaptive splitting Without loss in Accuracy Want to split more according to the data Solution: Split everything then merge by the loss

14
Latent Grammar : Adaptive splitting The likelihood of data for tree T and sentence w: Then for two annotations the overall loss can be estimated as:

15
Number of Phrasal Subcategories

16
PP VP NPNP

17
Number of Phrasal Subcategories X NA C

18
Number of Lexical Subcategories TOTO, PO S

19
Number of Lexical Subcategories IN DT RBRB VBx

20
Number of Lexical Subcategories N NN S NN P JJ

21
Latent Grammar : Results Parser F1 ≤ 40 words F1 all words Klein & Manning ’0386.385.7 Matsuzaki et al. ’0586.786.1 Collins ’9988.688.2 Charniak & Johnson ’0590.189.6 Petrov et al. ‘0690.289.7

22
Efficient inference with Latent Grammars Latent Grammar with 91.2 F1 score on Dev Set (1600 sentences) WSJ Training time 1621: more than a minute per sentence For usage in real-word applications this is to slow Improve on inference: Hierarchical Pruning Parse Selection

23
Intermediate Grammars X-Bar= G 0 G= G1G2G3G4G5G6G1G2G3G4G5G6 Learning DT 1 DT 2 DT 3 DT 4 DT 5 DT 6 DT 7 DT 8 DT 1 DT 2 DT 3 DT 4 DT 1 DT DT 2

24
G1G2G3G4G5G6G1G2G3G4G5G6 Learning G1G2G3G4G5G6G1G2G3G4G5G6 Projected Grammars X-Bar= G 0 G= Projection i 0(G)1(G)2(G)3(G)4(G)5(G)0(G)1(G)2(G)3(G)4(G)5(G) G

25
Treebank Rules in (G) S NP VP Rules in G S1 NP1 VP1 0.20 S1 NP1 VP2 0.12 S1 NP2 VP1 0.02 S1 NP2 VP2 0.03 S2 NP1 VP1 0.11 S2 NP1 VP2 0.05 S2 NP2 VP1 0.08 S2 NP2 VP2 0.12 Infinite tree distribution … … 0.56 Estimating Grammars

26
Hierarchical Pruning Consider the span: …QPNPVP… coarse: split in two: …QP1QP2NP1NP2VP1VP2… …QP1 QP3QP4NP1NP2NP3NP4VP1VP2VP3VP4… split in four: split in eight: ……………………………………………

27
Parse Selection Given a sentence w and a split PCFG grammar G select the best parse that minimize our beliefs: Intractable: we cannot generate all the T T

28
Parse Selection Possible solutions best derivation generate n-best parses and re-rank them sampling derivations of the grammar select the minimum risk candidate based on loss function of posterior marginals:

29
Results

30
Thank You!

31
References S. Petrov, L. Barrett, R. Thibaux, D Klein. Learning Accurate, Compact, and Interpretable Tree Annotation, COLING-ACL 2006 slides. S. Petrov and D. Klein, NACL Improved Inference for Unlexicalized Parsing : 2007 slides. S. Petrov, L. Barrett, R. Thibaux, and D. Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In COLING-ACL ’06, pages 443–440. S. Petrov and D. Klein. 2007. Improved Inference for Unlexicalized Parsing. In NACL ’06. T. Matsuzaki, Y. Miyao, and J. Tsujii. 2005. Probabilistic CFG with latent annotations. In ACL ’05, pages 75–82.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google