Download presentation

Presentation is loading. Please wait.

Published byEmma Snyder Modified over 3 years ago

1
Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley

2
The Game of Designing a Grammar Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson 98] Head lexicalization [Collins 99, Charniak 00] Automatic clustering?

3
Previous Work: Manual Annotation Manually split categories NP: subject vs object DT: determiners vs demonstratives IN: sentential vs prepositional Advantages: Fairly compact grammar Linguistic motivations Disadvantages: Performance leveled out Manually annotated [Klein & Manning 03] ModelF1 Naïve Treebank Grammar72.6 Klein & Manning 0386.3

4
Previous Work: Automatic Annotation Induction Advantages: Automatically learned: Label all nodes with latent variables. Same number k of subcategories for all categories. Disadvantages: Grammar gets too large Most categories are oversplit while others are undersplit. [Matsuzaki et. al 05, Prescher 05] ModelF1 Klein & Manning 0386.3 Matsuzaki et al. 0586.7

5
[Petrov, Barrett, Thibaux & Klein in ACL06] [Petrov & Klein in NAACL07] Overview Learning: Hierarchical Training Adaptive Splitting Parameter Smoothing Inference: Coarse-To-Fine Decoding Variational Approximation German Analysis

6
Forward Learning Latent Annotations EM algorithm: X1X1 X2X2 X7X7 X4X4 X5X5 X6X6 X3X3 Hewasright. Brackets are known Base categories are known Only induce subcategories Just like Forward-Backward for HMMs. Backward

7
Starting Point Limit of computational resources

8
Refinement of the DT tag DT-1 DT-2 DT-3 DT-4 DT

9
Refinement of the DT tag DT

10
Hierarchical Refinement of the DT tag DT

11
Hierarchical Estimation Results ModelF1 Baseline87.3 Hierarchical Training88.4

12
Refinement of the, tag Splitting all categories the same amount is wasteful:

13
The DT tag revisited Oversplit?

14
Adaptive Splitting Want to split complex categories more Idea: split everything, roll back splits which were least useful

15
Adaptive Splitting Want to split complex categories more Idea: split everything, roll back splits which were least useful

16
Adaptive Splitting Evaluate loss in likelihood from removing each split = Data likelihood with split reversed Data likelihood with split No loss in accuracy when 50% of the splits are reversed.

17
Adaptive Splitting Results ModelF1 Previous88.4 With 50% Merging89.5

18
Number of Phrasal Subcategories

19
Number of Lexical Subcategories

20
Smoothing Heavy splitting can lead to overfitting Idea: Smoothing allows us to pool statistics

21
Linear Smoothing

22
ModelF1 Previous89.5 With Smoothing90.7 Result Overview

23
Coarse-to-Fine Parsing [Goodman 97, Charniak&Johnson 05] Coarse grammar NP … VP NP-dog NP-cat NP-apple VP-run NP-eat… Refined grammar … Treebank Parse Prune NP-17 NP-12 NP-1 VP-6 VP-31… Refined grammar … Parse

24
Hierarchical Pruning Consider the span 5 to 12: …QPNPVP… coarse: split in two: …QP1QP2NP1NP2VP1VP2… …QP1 QP3QP4NP1NP2NP3NP4VP1VP2VP3VP4… split in four: split in eight: ……………………………………………

25
Intermediate Grammars X-Bar= G 0 G= G1G2G3G4G5G6G1G2G3G4G5G6 Learning DT 1 DT 2 DT 3 DT 4 DT 5 DT 6 DT 7 DT 8 DT 1 DT 2 DT 3 DT 4 DT 1 DT DT 2

26
State Drift (DT tag) some this That these Thatthissome the these thissome that Thatthissome the these thissome that …………………………………………some thesethisThatThisthat EM

27
G1G2G3G4G5G6G1G2G3G4G5G6 Learning G1G2G3G4G5G6G1G2G3G4G5G6 Projected Grammars X-Bar= G 0 G= Projection i 0 (G) 1 (G) 2 (G) 3 (G) 4 (G) 5 (G) G

28
Bracket Posteriors (after G 0 )

29
Bracket Posteriors (after G 1 )

30
Bracket Posteriors (Movie)(Final Chart)

31
Bracket Posteriors (Best Tree)

32
Parse Selection Computing most likely unsplit tree is NP-hard: Settle for best derivation. Rerank n-best list. Use alternative objective function / Variational Approximation. Parses: -2 Derivations: -2 -2 -2

33
Efficiency Results Berkeley Parser: 15 min Implemented in Java Charniak & Johnson 05 Parser 19 min Implemented in C

34
Accuracy Results 40 words F1 all F1 ENG Charniak&Johnson 05 (generative)90.189.6 This Work90.690.1 GER Dubey 0576.3- This Work80.880.1 CHN Chiang et al. 0280.076.6 This Work86.383.4

35
Parsing German Shared Task Two Pass Parsing Determine constituency structure (F1: 85/94) Assign grammatical functions One Pass Approach Treat categories+grammatical functions as labels

36
Parsing German Shared Task Two Pass Parsing Determine constituency structure Assign grammatical functions One Pass Approach Treat categories+grammatical functions as labels

37
Development Set Results

38
Shared Task Results

39
Part-of-speech splits

40
Linguistic Candy

41
Conclusions Split & Merge Learning Hierarchical Training Adaptive Splitting Parameter Smoothing Hierarchical Coarse-to-Fine Inference Projections Marginalization Multi-lingual Unlexicalized Parsing

42
Thank You! Parser is avaliable at http://nlp.cs.berkeley.edu

Similar presentations

OK

ABSTRACT: We examine how to determine the number of states of a hidden variables when learning probabilistic models. This problem is crucial for improving.

ABSTRACT: We examine how to determine the number of states of a hidden variables when learning probabilistic models. This problem is crucial for improving.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on lok sabha election 2014 Ppt on power grid Ppt on network theory books Download ppt on folk dances of india Pptm to ppt online converter Ppt on euclid geometry for class 9 download Ppt on life cycle of silkworm Download ppt on mind controlled robotic arms for humans Ppt on acid-base indicators ppt Ppt on successes and failures quotes