Presentation is loading. Please wait.

Presentation is loading. Please wait.

Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov.

Similar presentations


Presentation on theme: "Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov."— Presentation transcript:

1 Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

2 Overview  Motivation and Prior Related Research  Experimental Setup  Results  Analysis  Conclusions 2

3 Parse Tree Sentence Parameters... Derivations PCFG-LA Parser [Matsuzaki et. al ’05] [Petrov et. al ’06] [Petrov & Klein’07] 3

4 PCFG-LA Parser NP NP1NP2  Hierarchical splitting (& merging) NP1NP2NP3NP4 NP1NP2NP3NP4NP5NP6NP7NP8 Split to 2 Split to 4 Split to 8 Original Node … Increased Model Complexity n-th grammar: grammar trained after n-th split-merge rounds …  Typical learning curve  Grammar Order Selection  Use development set

5 PCFG-LA Properties Hierarchical Training ◦ Increase the number of latent states hierarchically Adaptive State Splitting ◦ Goal is to split complex categories more and simple categories less ◦ Idea: split everything, and then roll back the splits that are least useful (use loss in likelihood from removing splits, typically 50% are undone) Parameter Smoothing (Pool statistics) Decoding Methods (Max Rule Product) Coarse-to-Fine Parsing (To speed up decoding)

6 Max-Rule Decoding (Single Grammar) S NP VP [Goodman ’98, Matsuzaki et al. ’05, Petrov & Klein ’07] 6

7 Variability 7 [Petrov, ’10]

8 ... Max-Rule Decoding (Multiple Grammars) [Petrov, ’10] Treebank 8

9 Product Model Results 9 [Petrov, ’10]

10 Motivation for Self-Training 10

11 Self-training (ST) Hand Labeled Unlabeled Data Train Label Automatically Labeled Data Train Select with dev 11

12 Self-training (ST) Hand Labeled Train Model Label New Model Train Automatically Labeled Data Unlabeled Labeled 12

13 Self-Training Curve 13

14 WSJ Self-Training Results F score 14 [Huang & Harper, ’09]

15 Self-Trained Grammar Variability Self-trained Parser 15

16 Self-Trained Grammar Variability Self-trained Round 7 Self-trained Round 6 16

17 Summary  Two issues: Variability & Over-fitting  Product model  Makes use of variability  Over-fitting remains in individual grammars  Self-training  Alleviates over-fitting  Variability remains in individual grammars  Next step: combine self-training with product models 17

18 Experimental Setup  Two genres:  WSJ: Sections 2-21 for training, 22 for dev, 23 for test, 176.9K sentences per self-trained grammar  Broadcast News: WSJ+80% of BN for training, 10% for dev, 10% for test (see paper),  Training Scenarios: train 10 models with different seeds and combine using Max-Rule Decoding  Regular: treebank training with up to 7 split-merge iterations  Self-Training: three methods with up to 7 split-merge iterations 18

19 ST-Reg Label Automatically Labeled Data Unlabeled Data Hand Labeled Train Multiple Grammars? Product Train Select with dev set 19 Single automatically labeled set by round 6 product

20 ST-Prod Label Automatically Labeled Data Unlabeled Data Hand Labeled Train Product Train Use more data? Product 20 Single automatically labeled set by round 6 product

21 ST-Prod-Mult Hand Labeled Train Label Product Label Product different automatically labeled sets by round 6 product

22 22

23 23

24 24

25 A Closer Look at Regular Results 25

26 A Closer Look at Regular Results 26

27 A Closer Look at Regular Results 27

28 A Closer Look at Self-Training Results 28

29 A Closer Look at Self-Training Results 29

30 A Closer Look at Self-Training Results 30

31 Analysis of Rule Variance  We measure the average empirical variance of the log posterior probabilities of the rules among the learned grammars over a held-out set S to get at the diversity among the grammars: 31

32 Analysis of Rule Variance 32

33 English Test Set Results (WSJ 23) Single ParserRerankerProductParser Combination [Charniak ’00]Petrov et al. ’06][Carreras et al. ’08][Huang & Harper ’08]This Work[Petrov ’10]This Work[Charniak & Johnson ’05][Huang ’08][McClosky et al. ’06][Sagae & Lavie ’06][Fossum & Knight ’09][Zhang et al. ’09] 33

34 Broadcast News 34

35 Conclusions  Very high parse accuracies can be achieved by combining self-training and product models on newswire and broadcast news parsing tasks.  Two important factors: 1. Accuracy of the model used to parse the unlabeled data 2. Diversity of the individual grammars 35


Download ppt "Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov."

Similar presentations


Ads by Google