Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??

Similar presentations


Presentation on theme: "Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??"— Presentation transcript:

1 Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??

2 Outline IBM Model 1 Review: (from LING571) –Word alignment –Modeling –Training: formula Formulae

3 IBM Model Basics Classic paper: Brown et. al. (1993) Translation: F  E (or Fr  Eng) Resource required: –Parallel data (a set of “sentence” pairs) Main concepts: –Source channel model –Hidden word alignment –EM training

4 Intuition Sentence pairs: word mapping is one-to-one. –(1) S: a b c d e T: l m n o p –(2) S: c a e T: p n m –(3) S: d a c T: n p l  (b, o), (d, l), (e, m), and (a, p), (c, n), or (a, n), (c, p)

5 Source channel model for MT Eng sent Noisy channel Fr sent P(E)P(F | E) Two types of parameters: Language model: P(E) Translation model: P(F | E)

6 a(j)=i  a j = i a = (a 1, …, a m ) Ex: –F: f 1 f 2 f 3 f 4 f 5 –E: e 1 e 2 e 3 e 4 –a 4 =3 –a = (0, 1, 1, 3, 2) Word alignment

7 An alignment, a, is a function from Fr word position to Eng word position: a(j)=i means that the f j is generated by e i. The constraint: each fr word is generated by exactly one Eng word (including e0):

8 Modeling p(F | E) with alignment

9 Notation E: the Eng sentence: E = e 1 …e l e i : the i-th Eng word. F: the Fr sentence: f 1 … f m f j : the j-th Fr word. e 0 : the Eng NULL word F 0 : the Fr NULL word. a j : the position of Eng word that generates f j.

10 Notation (cont) l: Eng sent leng m: Fr sent leng i: Eng word position j: Fr word position e: an Eng word f: a Fr word

11 Generative process To generate F from E: –Pick a length m for F, with prob P(m | l) –Choose an alignment a, with prob P(a | E, m) –Generate Fr sent given the Eng sent and the alignment, with prob P(F | E, a, m). Another way to look at it: –Pick a length m for F, with prob P(m | l). –For j=1 to m Pick an Eng word index a j, with prob P(a j | j, m, l). Pick a Fr word f j according to the Eng word e i, where a j =I, with prob P(f j | e i ).

12 Decomposition

13 Approximation Fr sent length depends only on Eng sent length: Fr word depends only on the Eng word that generates it: Estimating P(a | E, m): All alignments are equally likely:

14 Decomposition

15 Final formula and parameters for Model 1 Two types of parameters: Length prob: P(m | l) Translation prob: P(f j | e i ), or t(f j | e i ),

16 Training Mathematically motivated: –Having an objective function to optimize –Using several clever tricks The resulting formulae –are intuitively expected –can be calculated efficiently EM algorithm –Hill climbing, and each iteration guarantees to improve objective function –It does not guaranteed to reach global optimal.

17 Training: Fractional counts Let Ct(f, e) be the fractional count of (f, e) pair in the training data, given alignment prob P. Alignment probActual count of times e and f are linked in (E,F) by alignment a

18 Estimating P(a|E,F) We could list all the alignments, and estimate P(a | E, F).

19 Formulae so far  New estimate for t(f|e)

20 The algorithm 1.Start with an initial estimate of t(f | e): e.g., uniform distribution 2.Calculate P(a | F, E) 3.Calculate Ct (f, e), Normalize to get t(f|e) 4.Repeat Steps 2-3 until the “improvement” is too small.

21 No need to enumerate all word alignments Luckily, for Model 1, there is a way to calculate Ct(f, e) efficiently.

22 The algorithm 1.Start with an initial estimate of t(f | e): e.g., uniform distribution 2.Calculate P(a | F, E) 3.Calculate Ct (f, e), Normalize to get t(f|e) 4.Repeat Steps 2-3 until the “improvement” is too small.

23 Summary of Model 1 Modeling: –Pick the length of F with prob P(m | l). –For each position j Pick an English word position a j, with prob P(a j | j, m, l). Pick a Fr word f j according to the Eng word e i, with t(f j | e i ), where i=a j –The resulting formula can be calculated efficiently. Training: EM algorithm. The update can be done efficiently. Finding the best alignment: can be easily done.

24 New stuff

25 EM algorithm EM: expectation maximization In a model with hidden states (e.g., word alignment), how can we estimate model parameters? EM does the following: –E-step: Take an initial model parameterization and calculate the expected values of the hidden data. –M-step: Use the expected values to maximize the likelihood of the training data.

26 Objective function

27 Training Summary Mathematically motivated: –Having an objective function to optimize –Using several clever tricks The resulting formulae –are intuitively expected –can be calculated efficiently EM algorithm –Hill climbing, and each iteration guarantees to improve objective function –It does not guaranteed to reach global optimal.


Download ppt "Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??"

Similar presentations


Ads by Google