Download presentation

Presentation is loading. Please wait.

Published byLauryn Brisley Modified over 2 years ago

1
Inducing Structure for Perception Slav Petrov Advisors: Dan Klein, Jitendra Malik Collaborators: L. Barrett, R. Thibaux, A. Faria, A. Pauls, P. Liang, A. Berg a.k.a. Slav’s split&merge Hammer

2
The Main Idea Complex underlying process Observation Manually specified structure True structure MLE structure He was right.

3
The Main Idea Complex underlying process Observation He was right. Manually specified structure Automatically refined structure EM

4
Why Structure? the the the food cat dog ate and t e c a e h t g f a o d o o d n h e t d a

5
Structure is important The dog ate the cat and the food. The dog and the cat ate the food. The cat ate the food and the dog.

6
Syntactic Ambiguity Last night I shot an elephant in my pajamas.

7
Visual Ambiguity Old or young?

8
Three Peaks? Machine Learning Computer Vision Natural Language Processing

9
No, One Mountain! Machine Learning Computer Vision Natural Language Processing

10
Three Domains SpeechScenesSyntax

11
Timeline LearningInference Syntactic MT Bayesian Conditional Summer ISI ‘07 ‘08‘09 Learning Decoding Synthesis LearningInference Syntax Scenes Speech TrecVid Now

12
Syntax Speech Scenes Language Modeling Split & Merge Learning Syntactic Machine Translation Coarse-to-Fine Inference Non- parametric Bayesian Learning Generative vs. Conditional Learning Syntax

13
Learning accurate, compact and interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein

14
Motivation (Syntax) Task: He was right. Why? Information Extraction Syntactic Machine Translation

15
Treebank Treebank Parsing S NP VP.1.0 NP PRP0.5 NP DT NN0.5 … PRP She1.0 DT the1.0 … Grammar

16
Non-Independence Independence assumptions are often too strong. All NPsNPs under SNPs under VP

17
The Game of Designing a Grammar Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98]

18
The Game of Designing a Grammar Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00]

19
The Game of Designing a Grammar Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00] Automatic clustering?

20
Forward Learning Latent Annotations EM algorithm: X1X1 X2X2 X7X7 X4X4 X5X5 X6X6 X3X3 Hewasright. Brackets are known Base categories are known Only induce subcategories Just like Forward-Backward for HMMs. Backward

21
Inside/Outside Scores A x ByBy CzCz Inside:Outside: A x CzCz ByBy

22
Learning Latent Annotations (Details) E-Step: M-Step: A x ByBy CzCz

23
Overview Limit of computational resources - Hierarchical Training - Adaptive Splitting - Parameter Smoothing

24
Refinement of the DT tag DT-1 DT-2 DT-3 DT-4 DT

25
Refinement of the DT tag DT

26
Hierarchical refinement of the DT tag DT

27
Hierarchical Estimation Results ModelF1 Baseline87.3 Hierarchical Training88.4

28
Refinement of the, tag Splitting all categories the same amount is wasteful:

29
The DT tag revisited Oversplit?

30
Adaptive Splitting Want to split complex categories more Idea: split everything, roll back splits which were least useful

31
Adaptive Splitting Want to split complex categories more Idea: split everything, roll back splits which were least useful

32
Adaptive Splitting Evaluate loss in likelihood from removing each split = Data likelihood with split reversed Data likelihood with split No loss in accuracy when 50% of the splits are reversed.

33
Adaptive Splitting (Details) True data likelihood: Approximate likelihood with split at n reversed: Approximate loss in likelihood:

34
Adaptive Splitting Results ModelF1 Previous88.4 With 50% Merging89.5

35
Number of Phrasal Subcategories

36
PP VP NPNP Number of Phrasal Subcategories

37
X NA C Number of Phrasal Subcategories

38
TOTO, PO S Number of Lexical Subcategories

39
IN DT RBRB VBx Number of Lexical Subcategories

40
N NN S NN P JJ

41
Smoothing Heavy splitting can lead to overfitting Idea: Smoothing allows us to pool statistics

42
Linear Smoothing

43
ModelF1 Previous89.5 With Smoothing90.7 Result Overview

44
Proper Nouns (NNP): Personal pronouns (PRP): NNP-14Oct.Nov.Sept. NNP-12JohnRobertJames NNP-2J.E.L. NNP-1BushNoriegaPeters NNP-15NewSanWall NNP-3YorkFranciscoStreet PRP-0ItHeI PRP-1ithethey PRP-2itthemhim Linguistic Candy

45
Relative adverbs (RBR): Cardinal Numbers (CD): RBR-0furtherlowerhigher RBR-1morelessMore RBR-2earlierEarlierlater CD-7onetwoThree CD-4198919901988 CD-11millionbilliontrillion CD-0150100 CD-313031 CD-9785834

46
Nonparametric PCFGs using Dirichlet Processes Percy Liang, Slav Petrov, Dan Klein and Michael Jordan

47
Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein

48
1621 min

49
Coarse-to-Fine Parsing [Goodman ‘97, Charniak&Johnson ‘05] Coarse grammar NP … VP NP-dog NP-cat NP-apple VP-run NP-eat… Refined grammar … Treebank Parse Prune NP-17 NP-12 NP-1 VP-6 VP-31… Refined grammar … Parse

50
Prune? For each chart item X[i,j], compute posterior probability: …QPNPVP… coarse: refined: E.g. consider the span 5 to 12: < threshold

51
1621 min 111 min (no search error)

52
Hierarchical Pruning Consider again the span 5 to 12: …QPNPVP… coarse: split in two: …QP1QP2NP1NP2VP1VP2… …QP1 QP3QP4NP1NP2NP3NP4VP1VP2VP3VP4… split in four: split in eight: ……………………………………………

53
Intermediate Grammars X-Bar= G 0 G= G1G2G3G4G5G6G1G2G3G4G5G6 Learning DT 1 DT 2 DT 3 DT 4 DT 5 DT 6 DT 7 DT 8 DT 1 DT 2 DT 3 DT 4 DT 1 DT DT 2

54
1621 min 111 min 35 min (no search error)

55
State Drift (DT tag) some this That these Thatthissome the these thissome that Thatthissome the these thissome that …………………………………………some thesethisThatThisthat EM

56
G1G2G3G4G5G6G1G2G3G4G5G6 Learning G1G2G3G4G5G6G1G2G3G4G5G6 Projected Grammars X-Bar= G 0 G= Projection i 0(G)1(G)2(G)3(G)4(G)5(G)0(G)1(G)2(G)3(G)4(G)5(G) G

57
Estimating Projected Grammars Nonterminals? Nonterminals in G NP 1 VP 1 VP 0 S0S0 S1S1 NP 0 Nonterminals in (G) VP S NP Projection Easy:

58
Rules in G Rules in (G) Estimating Projected Grammars Rules? S 1 NP 1 VP 1 0.20 S 1 NP 1 VP 2 0.12 S 1 NP 2 VP 1 0.02 S 1 NP 2 VP 2 0.03 S 2 NP 1 VP 1 0.11 S 2 NP 1 VP 2 0.05 S 2 NP 2 VP 1 0.08 S 2 NP 2 VP 2 0.12 S NP VP ? ???

59
Treebank Estimating Projected Grammars [Corazza & Satta ‘06] Rules in (G) S NP VP Rules in G S1 NP1 VP1 0.20 S1 NP1 VP2 0.12 S1 NP2 VP1 0.02 S1 NP2 VP2 0.03 S2 NP1 VP1 0.11 S2 NP1 VP2 0.05 S2 NP2 VP1 0.08 S2 NP2 VP2 0.12 Infinite tree distribution … … 0.56 Estimating Grammars

60
Calculating Expectations Nonterminals: c k (X) : expected counts up to depth k Converges within 25 iterations (few seconds) Rules:

61
1621 min 111 min 35 min 15 min (no search error)

62
G1G2G3G4G5G6G1G2G3G4G5G6 Learning Parsing times X-Bar= G 0 G= 60 % 12 % 7 % 6 % 5 % 4 %

63
Bracket Posteriors (after G 0 )

64
Bracket Posteriors (after G 1 )

65
Bracket Posteriors (Movie)(Final Chart)

66
Bracket Posteriors (Best Tree)

67
Parse Selection Computing most likely unsplit tree is NP-hard: Settle for best derivation. Rerank n-best list. Use alternative objective function. Parses: -2 Derivations: -2 -2 -2

68
Final Results (Efficiency) Berkeley Parser: 15 min 91.2 F-score Implemented in Java Charniak & Johnson ‘05 Parser 19 min 90.7 F-score Implemented in C

69
Final Results (Accuracy) ≤ 40 words F1 all F1 ENG Charniak&Johnson ‘05 (generative)90.189.6 This Work90.690.1 GER Dubey ‘0576.3- This Work80.880.1 CHN Chiang et al. ‘0280.076.6 This Work86.383.4

70
Conclusions (Syntax) Split & Merge Learning Hierarchical Training Adaptive Splitting Parameter Smoothing Hierarchical Coarse-to-Fine Inference Projections Marginalization Multi-lingual Unlexicalized Parsing

71
Generative vs. Discriminative Conditional Estimation L-BFGS Iterative Scaling Conditional Structure Alternative Merging Criterion

72
How much supervision?

73
Syntactic Machine Translation Collaboration with ISI/USC: Use parse trees Use annotated parse trees Learn split synchronous grammars

74
Speech Scenes Syntax Speech Synthesis Split & Merge Learning Coarse-to-Fine Decoding Combined Generative + Conditional Learning Speech

75
Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

76
Motivation (Speech) and you couldn’t care less Words: ae n d y uh k uh d n t k ae r l eh s Phones:

77
Traditional Models dad Start End Begin - Middle - End Structure Triphones #-d-ad-a-da-d-# Triphones + Decision Tree Clustering d 17 =c(#-d-a)a 1 =c(d-a-d) d 9 =c(a-d-#) Mixtures of Gaussians

78
Model Overview Traditional: Our Model:

79
Differences to Grammars vs.

81
Refinement of the ih-phone

82
Inference Coarse-To-Fine Variational Approximation

83
Phone Classification Results MethodError Rate GMM Baseline (Sha and Saul, 2006) 26.0 % HMM Baseline (Gunawardana et al., 2005) 25.1 % SVM (Clarkson and Moreno, 1999) 22.4 % Hidden CRF (Gunawardana et al., 2005) 21.7 % This Paper 21.4 % Large Margin GMM (Sha and Saul, 2006) 21.1 %

84
Phone Recognition Results MethodError Rate State-Tied Triphone HMM (HTK) (Young and Woodland, 1994) 27.1 % Gender Dependent Triphone HMM (Lamel and Gauvain, 1993) 27.1 % This Paper 26.1 % Bayesian Triphone HMM (Ming and Smith, 1998) 25.6 % Heterogeneous classifiers (Halberstadt and Glass, 1998) 24.4 %

85
Confusion Matrix

86
How much supervision? Hand-aligned Exact phone boundaries are known Automatically-aligned Only sequence of phones is known

87
Generative + Conditional Learning Learn structure generatively Estimate Gaussians conditionally Collaboration with Fei Sha

88
Speech Synthesis Acoustic phone model: Generative Accurate Models phone internal structure well Use it for speech synthesis!

89
Large Vocabulary ASR ASR System = Acoustic Model + Decoder Coarse-to-Fine Decoder: Subphone Phone Phone Syllable Word Bigram …

90
Scenes Syntax Split & Merge Learning Decoding Scenes Speech

91
Motivation (Scenes) Sky Water Grass Rock Seascape

92
Motivation (Scenes)

93
Learning Oversegment the image Extract vertical stripes Extract features Train HMMs

94
Inference Decode stripes Enforce horizontal consistency

95
Alternative Approach Conditional Random Fields Pro: Vertical and horizontal dependencies learnt Inference more natural Contra: Computationally more expensive

96
Timeline LearningInference Syntactic MT Bayesian Conditional Summer ISI ‘07 ‘08‘09 Learning Decoding Synthesis LearningInference Syntax Scenes Speech TrecVid Now

97
Results so far State of the art parser for different languages: Automatically learnt Simple & Compact Fast & Accurate Available for download Phone recognizer: Automatically learnt Competitive performance Good foundation for speech recognizer

98
Proposed Deliverables Syntax Parser Speech Recognizer Speech Synthesizer Syntactic Translation Machine Scene Recognizer

99
Thank You!

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google