Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.

Presentation on theme: "Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš."— Presentation transcript:

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš Ercegovčević

Outline Introduction  EM algorithm Latent Grammars  Motivation  Learning Latent PCFG Split-Merge Adaptation Efficient inference with Latent Grammars  Pruning in Multilevel Coarse-to-Fine parsing  Parse Selection

Introduction : EM Algorithm Iterative algorithm for finding MLE or MAP estimates of parameters in statistical models X – observed data; Z – set of latent variables Θ – a vector of unknown parametes Likelihood function:  MLE of the marginal likelihood :  However this quantity is intractable  Often we don’t know both Z and Θ

Introduction : EM Algorithm Find the MLE of the marginal likelihood by iteratively applying two steps: Expectation step (E-step): Calculate Z under current Θ Maximization step (M-step): Find Θ that maximizes the quantity

Latent PCFG Standard coarse Treebank Tree Baseline for parsing F1 72.6

Parent annotated trees [Johnson ’98], [Klein & Manning ’03] F1 86.3 Latent PCFG

Head lexicalized [Collins ’99, Charniak ’00] trees F1 88.6

Latent PCFG Automatically clustered categories with F1 86.7 [Matsuzaki et al. ’05] Same number of subcategories for all categories

Latent PCFG At each step split the categories into two sets. After 6 iterations number of subcategories is 64 Initialize EM with the results of the smaller grammar

Learning Latent PCFG S Induce subcategories Like forward-backward for HMMs Fixed brackets Forward X1X1 X2X2 X7X7 X4X4 X5X5 X6X6 X3X3 Hewasright. Backward

Learning Latent Grammar Inside-Outside probabilities

Learning Latent Grammar Expectation step (E-step): Maximization step (M-step):

Latent Grammar : Adaptive splitting Without loss in Accuracy Want to split more according to the data Solution: Split everything then merge by the loss

Latent Grammar : Adaptive splitting The likelihood of data for tree T and sentence w: Then for two annotations the overall loss can be estimated as:

Number of Phrasal Subcategories

PP VP NPNP

Number of Phrasal Subcategories X NA C

Number of Lexical Subcategories TOTO, PO S

Number of Lexical Subcategories IN DT RBRB VBx

Number of Lexical Subcategories N NN S NN P JJ

Latent Grammar : Results Parser F1 ≤ 40 words F1 all words Klein & Manning ’0386.385.7 Matsuzaki et al. ’0586.786.1 Collins ’9988.688.2 Charniak & Johnson ’0590.189.6 Petrov et al. ‘0690.289.7

Efficient inference with Latent Grammars Latent Grammar with 91.2 F1 score on Dev Set (1600 sentences) WSJ  Training time 1621: more than a minute per sentence  For usage in real-word applications this is to slow Improve on inference:  Hierarchical Pruning  Parse Selection

Intermediate Grammars X-Bar= G 0 G= G1G2G3G4G5G6G1G2G3G4G5G6 Learning DT 1 DT 2 DT 3 DT 4 DT 5 DT 6 DT 7 DT 8 DT 1 DT 2 DT 3 DT 4 DT 1 DT DT 2

G1G2G3G4G5G6G1G2G3G4G5G6 Learning G1G2G3G4G5G6G1G2G3G4G5G6 Projected Grammars X-Bar= G 0 G= Projection  i 0(G)1(G)2(G)3(G)4(G)5(G)0(G)1(G)2(G)3(G)4(G)5(G) G

Treebank Rules in  (G) S  NP VP Rules in G S1  NP1 VP1 0.20 S1  NP1 VP2 0.12 S1  NP2 VP1 0.02 S1  NP2 VP2 0.03 S2  NP1 VP1 0.11 S2  NP1 VP2 0.05 S2  NP2 VP1 0.08 S2  NP2 VP2 0.12 Infinite tree distribution … … 0.56 Estimating Grammars

Hierarchical Pruning Consider the span: …QPNPVP… coarse: split in two: …QP1QP2NP1NP2VP1VP2… …QP1 QP3QP4NP1NP2NP3NP4VP1VP2VP3VP4… split in four: split in eight: ……………………………………………

Parse Selection Given a sentence w and a split PCFG grammar G select the best parse that minimize our beliefs: Intractable: we cannot generate all the T T

Parse Selection Possible solutions  best derivation  generate n-best parses and re-rank them sampling derivations of the grammar  select the minimum risk candidate based on loss function of posterior marginals:

Results

Thank You!

References S. Petrov, L. Barrett, R. Thibaux, D Klein. Learning Accurate, Compact, and Interpretable Tree Annotation, COLING-ACL 2006 slides. S. Petrov and D. Klein, NACL Improved Inference for Unlexicalized Parsing : 2007 slides. S. Petrov, L. Barrett, R. Thibaux, and D. Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In COLING-ACL ’06, pages 443–440. S. Petrov and D. Klein. 2007. Improved Inference for Unlexicalized Parsing. In NACL ’06. T. Matsuzaki, Y. Miyao, and J. Tsujii. 2005. Probabilistic CFG with latent annotations. In ACL ’05, pages 75–82.

Download ppt "Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš."

Similar presentations