Download presentation

Presentation is loading. Please wait.

Published byElaine Wyatt Modified about 1 year ago

1
Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun Lang I2R SMT-Reading Group 1

2
Paper info Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing ACL-08 Long Paper Cited :Thirty Seven Authors: Hao Z hang Chris Quirk Robert C. Moore Daniel Gildea 2

3
Core Ideas Variational Bayes Tic-tac-toe pruning Word-to-phrase bootstrapping 3

4
Outline Paper present – Pipeline – Model – Training – Parsing (Pruning) – Result Shortcomings Discussion 4

5
Summary of the Pipeline Run IBM Model 1 on sentence-aligned data Use tic-tac-toe pruning to prune the bitext space Word-based ITG, Variational Bayes training, get the Viterbi alignment Non-compositional constraints to constrain the space of phrase pairs Phrasal ITG, VB training, Viterbi pass to get the phrasal alignment 5

6
Phrasal Inversion Transduction Grammar 6

7
Dirichlet Prior for Phrasal ITG 7

8
X1X1 X n-1 ZnZn X n+1 XNXN …….. root 0/0T/Vt/vs/u i Review : Inside-Outside Algorithm …….. Forward-backward Algorithm: not only used for HMM, but also for any State Space Model Inside-Outside Algorithm is a special case of Forward-backward Algorithm. Shujie liu 8

9
VB Algorithm for Training SITGs - E1 Inside probabilities : Initialization : Recursion : i (s/u-t/v) t/vs/u S/U j (s/u-S/U) k (S/U-t/v) Copy from liu 9

10
VB Algorithm for Training SITGs - E2 Outside probabilities : Initialization : Recursion : j (s/u-t/v) t/vS/U s/u k (S/U-s/u) i (s/u-t/v) Copy from liu 10

11
VB Algorithm for Training SITGs - E2 Outside probabilities : Initialization : Recursion : j (s/u-t/v) t/vS/U s/u k (S/U-s/u) i (s/u-t/v) Copy from liu 11

12
VB Algorithm for Training SITGs - E2 Outside probabilities : Initialization : Recursion : j (s/u-t/v) t/vS/U s/u k (S/U-s/u) i (s/u-t/v) Copy from liu 12

13
VB Algorithm for Training SITGs - E2 Outside probabilities : Initialization : Recursion : j (s/u-t/v) t/vS/U s/u k (S/U-s/u) i (s/u-t/v) Copy from liu 13

14
VB Algorithm for Training SITGs - E2 Outside probabilities : Initialization : Recursion : j (s/u-t/v) t/vS/U s/u k (S/U-s/u) i (s/u-t/v) j (s/u-t/v) S/Us/u i (S/U-s/u) k (s/u-t/v) t/v Copy from liu 14

15
VB Algorithm for Training SITGs - E2 Outside probabilities : Initialization : Recursion : j (s/u-t/v) t/vS/U s/u k (S/U-s/u) i (s/u-t/v) j (s/u-t/v) S/Us/u i (S/U-s/u) k (s/u-t/v) t/v Copy from liu 15

16
VB Algorithm for Training SITGs - M s=3, is the number of right-hand-sides for X m is the number of observed phrase pairs ψ is the digamma function 16

17
Pruning Tic-tac-toe pruning (Hao Z hang 2005) Fast Tic-tac-toe pruning (Hao Z hang 2008) High-precision alignments pruning (Haghighi ACL2009) – Prune all bitext cells that would invalidate more than 8 of high-precision alignments 1-1 alignment posterior pruning (Haghighi ACL2009) – Prune all 1-1 bitext cells that have a posterior below in both HMM Models 17

18
Tic-tac-toe pruning (Hao Z hang 2005) 18

19
Non-compositional Phrases Constraint e(i,j) number of links emitted from substring f(l,m) number of links emitted from substring 19

20
Word Alignment Evaluation Both 10 iterations training EM : lowest AER is achieved after the second iteration, which is At iteration 10, AER for EM increase to 0.42 VB : ac is 1e-9, VB get AER close to 0.35 at iteration

21
End-to-end Evaluation NIST Chinese-English training data NIST 2002 evaluation datasets for tuning and evalution 10-reference development set was used for MERT 4-reference test set was used for evaluation. 21

22
Shortcomings Grammar is not perfect Itg ordering is context independent Phrasal pairs are sparse 22

23
Grammar is not perfect Over-counting problem alternative ITG parse trees have the same word alignment matching, which is called over-counting problem. ITG Parser Tree SpaceWord Alignment Space I am rich ! ^^ vv 23

24
A better-constrained grammar A series of nested constituents with the same orientation will always have a left-heavy derivation And the second parser tree of the former example will not be generated. C->1/3C->2/4C-> 3/2C-> 4/1 A -> [C C] B -> ? 24

25
Thanks Q&A 25

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google