Presentation is loading. Please wait.

Presentation is loading. Please wait.

Novel Reordering Approaches in Phrase-Based Statistical Machine Translation S. Kanthak, D. Vilar, E. Matusov, R. Zens & H. Ney ACL Workshop on Building.

Similar presentations


Presentation on theme: "Novel Reordering Approaches in Phrase-Based Statistical Machine Translation S. Kanthak, D. Vilar, E. Matusov, R. Zens & H. Ney ACL Workshop on Building."— Presentation transcript:

1 Novel Reordering Approaches in Phrase-Based Statistical Machine Translation S. Kanthak, D. Vilar, E. Matusov, R. Zens & H. Ney ACL Workshop on Building and Using Parallel Text 2005

2 British guy From: Goscinny/Uderzo: Astérix chex les Bretons French guy

3 Problem: Reordering Potentially many long-distance reorderings Probably source of highest number of errors in current MT systems: Would you like to go to the cinema with me on Saturday? Möchtest du mit mir am Samstag ins Kino gehen? two weeks ago in the south of france 17 years old and the sixteen years old robert friend romain with a gun and a baseball bat killed in unfounded, without motive, as a Sunday evening their johan on television one Sunday evening a fortnight ago in the south of france, johan, aged 17, and robert, aged 16, murdered their childhood friend, romain, with a firearm and a baseball bat, for no reason or motive, just like on tv

4 Basic Translation Approach: WFSTs Focus on translation of spoken language Translation needs to be integrated with speech recognition ASR systems use strict left-to-right finite- state decoding (HMMs) FST-based translation makes integration easy

5 Notation

6 Instead of using a conditional model, use a joint-probability model:

7 Each source word is aligned with a target phrase (can be empty phrase) For uniform probability distribution over all alignments A, translation model is an m-gram model over pairs of source words and target phrases Can be expressed as WFST T:

8 Reordering problem FST model does not work well when alignment is non-monotonic (does not satisfy )  bad for languages with very different word order  apply reordering during training and search to either source or target language sentences Here: reorder source words prior to training such that alignments become monotonic for all sentences

9 Reordering during training Perform bidirectional word alignment Estimate a cost matrix C for each sentence pair C ij indicates local cost of aligning source word f j to target word e i Cost is derived from state occupation probabilities: prob. of e i occuring at target sentence position i as translation of word f j State occupation probability: normalization over target sentence positions cost

10 Reordering during training Reordering is function of source words All source words must be aligned; new sentence Create second alignment as function of target words Based on new cost matrix obtained by reordering C or by re-estimating it

11 Reordering during training If cost matrix is re-estimated, monotonic alignment cannot be guaranteed Find minimum-cost monotonic alignment path through cost matrix using dynamic programming 1,… J..I..I C11 C12……c1J C21 C22… …… cI1cIJ

12 Reordering during training If cost matrix is re-estimated, monotonic alignment cannot be guaranteed Find minimum-cost monotonic alignment path through cost matrix using dynamic programming 1,… J..I..I C11 C12……c1J C21 C22… …… cI1cIJ

13

14 Reordering during search During search source sentence needs to be permuted in all possible ways (J! options) Represented as an FST with 2 J states Expensive, therefore computed on demand Beam pruning applied to eliminate unlikely permutations Each state in automaton represent permutation of subset of words Represented as bit vector: each bit stands for arc in input FSA, set to 1 if arc has been used on path from initial to final state

15 Reordering Constraints Representation makes it easy to minimize/determinize permutation automaton For long sentences, still too comples Need additional constraints on permutation IBM constraints Inverse IBM constraints Local Constraints ITG Constraints

16 Reordering Constraints IBM constraints: at each state, can translate any of the first l word positions that are still uncovered Inverse IBM constraints: choose any uncovered position for translation unless l-1 words on positions > 1 st uncovered position (j) have been translated (in that case, translate j) Local constraints: choose next word to translate from window of size l around first uncovered position (words in window may be covered or uncovered) ITG constraints: input = sequence of segments; initially, each word is a segment, then recursive combination into larger segments. At each combination step, possibility of reversing segments. Stop when only segment left is the entire sentence.

17 Permutation probabilities Monotonic orderings are given higher probability than non-monotonic translations At each state, assign probability α to the outgoing arc that maintains monotonicity Distribute probability mass 1- α to all other arcs (uniformly) Computed on demand at each state

18 Data Basic Travel Expressions Corpus (BTEC) Chinese-to-English (20K train, ~500 dev/test) Japanese-to-English (20K train, 500 dev/test) Italian to English (66K train, ~500K dev/test) Part of IWSLT Evaluated using BLEU, WER, PER, NIST Multiple reference translations in first 2 cases

19 Experiments 4-gram language model over tuples Moderate beam pruning for l>3 Window size/type of reordering constraint optimized on the dev set Rescoring of n-best lists Japanese-English: highly non-monotonic, best performance with 9-word window and inverse IBM constraints

20 Experiments Chinese-English: moderately non-monotonic Window size of 7 gave best performance but windows size 4 quite suitable for most sentences Italian-English: almost monotonic IBM or local reordering constraints with window size 3 or 4 Improvement due to reordering not as large as for other language


Download ppt "Novel Reordering Approaches in Phrase-Based Statistical Machine Translation S. Kanthak, D. Vilar, E. Matusov, R. Zens & H. Ney ACL Workshop on Building."

Similar presentations


Ads by Google