Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linguistically Motivated Reordering Modeling for Phrase-Based Statistical Machine Translation Arianna Bisazza Advisor: Marcello Federico Fondazione Bruno.

Similar presentations


Presentation on theme: "Linguistically Motivated Reordering Modeling for Phrase-Based Statistical Machine Translation Arianna Bisazza Advisor: Marcello Federico Fondazione Bruno."— Presentation transcript:

1 Linguistically Motivated Reordering Modeling for Phrase-Based Statistical Machine Translation Arianna Bisazza Advisor: Marcello Federico Fondazione Bruno Kessler / Università di Trento PhD Thesis:

2 2 PSMT decoding overview E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali Arianna Bisazza – PhD Thesis – 19 April 2013

3 Freedom of movement must be encouraged LM scores 3 PSMT decoding overview E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali LM scores TM scores ReoM scores Arianna Bisazza – PhD Thesis – 19 April 2013

4 career pathswhile ensuring that Freedom of movement must be encouraged LM scores 4 PSMT decoding overview E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali LM scores TM scores ReoM scores Arianna Bisazza – PhD Thesis – 19 April 2013 ReoM scores …

5 LM scores 5 PSMT decoding overview E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali Freedom of movement must be encouraged while ensuring that career paths LM scores TM scores ReoM scores Arianna Bisazza – PhD Thesis – 19 April 2013 ReoM scores …

6 6 Reordering Models Reordering Models E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali ReoM scores Many solutions have been proposed with different reo. classes, features, train modes, etc. Tillman 04, Zens & Ney 06 Al Onaizan & Papineni 06 Galley & Manning 08 Green & al.10, Feng & al.10 … Arianna Bisazza – PhD Thesis – 19 April 2013 ReoM scores

7 7 Reordering Models Reordering Models E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali ReoM scores No matter what reordering model is used, the permutation search space must be limited! The power of all reordering models is bound to the reordering constraints in use Tillman04, Zens&Ney06 AlOnaizan & Papineni06 Galley & Manning08 Green &al.10, Feng &al.10 … Many solutions have been proposed with different reo. classes, features, train modes, etc. Tillman 04, Zens & Ney 06 Al Onaizan & Papineni 06 Galley & Manning 08 Green & al.10, Feng & al.10 … Arianna Bisazza – PhD Thesis – 19 April 2013 ReoM scores

8 8 E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali ReoM scores Arianna Bisazza – PhD Thesis – 19 April 2013

9 9 Reordering Constraints E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali #perm = |w|! 40,000,000 Arianna Bisazza – PhD Thesis – 19 April 2013

10 10 E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali Source-to-Source distortion #perm = |w|! 40,000,000 D(w x,w y )=|y-x-1| w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 w9w9 w 10 012345678910 w0w0 0123456789 w1w1 2012345678 w2w2 3201234567 w3w3 4320123456 w4w4 5432012345 w5w5 6543201234 w6w6 7654320123 w7w7 8765432012 w8w8 9876543201 w9w9 987654320 w 10 111098765432 Reordering Constraints Arianna Bisazza – PhD Thesis – 19 April 2013

11 11 Source-to-Source distortion #perm = |w|! 40,000,000 D(w x,w y )=|y-x-1| DL=3 #perm 7,000 w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 w9w9 w 10 012345678910 w0w0 0123456789 w1w1 2012345678 w2w2 3201234567 w3w3 4320123456 w4w4 5432012345 w5w5 6543201234 w6w6 7654320123 w7w7 8765432012 w8w8 9876543201 w9w9 987654320 w 10 111098765432 DL: distortion limit Reordering Constraints Arianna Bisazza – PhD Thesis – 19 April 2013 E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

12 12 The problem with DL… Arabic-English AR EN AR EN w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 w9w9 w 10 012345678910 w0w0 0123456789 w1w1 2012345678 w2w2 3201234567 w3w3 4320123456 w4w4 5432012345 w5w5 6543201234 w6w6 7654320123 w7w7 8765432012 w8w8 9876543201 w9w9 987654320 w 10 111098765432 Arianna Bisazza – PhD Thesis – 19 April 2013

13 13 German-English DE EN DE EN w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 w9w9 w 10 012345678910 w0w0 0123456789 w1w1 2012345678 w2w2 3201234567 w3w3 4320123456 w4w4 5432012345 w5w5 6543201234 w6w6 7654320123 w7w7 8765432012 w8w8 9876543201 w9w9 987654320 w 10 111098765432 The problem with DL… Arianna Bisazza – PhD Thesis – 19 April 2013

14 14 Source-to-Source distortion w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 w9w9 w 10 012345678910 w0w0 0123456789 w1w1 2012345678 w2w2 3201234567 w3w3 4320123456 w4w4 5432012345 w5w5 6543201234 w6w6 7654320123 w7w7 8765432012 w8w8 9876543201 w9w9 987654320 w 10 111098765432 #perm = |w|! 40,000,000 D(w x,w y )=|y-x-1| DL=3 #perm 7,000 Increasing the DLimit! Arianna Bisazza – PhD Thesis – 19 April 2013 Current solution

15 15 Source-to-Source distortion w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 w9w9 w 10 012345678910 w0w0 0123456789 w1w1 2012345678 w2w2 3201234567 w3w3 4320123456 w4w4 5432012345 w5w5 6543201234 w6w6 7654320123 w7w7 8765432012 w8w8 9876543201 w9w9 987654320 w 10 111098765432 #perm = |w|! 40,000,000 D(w x,w y )=|y-x-1| DL=3 #perm 7,000 DL=7 #perm 7,000,000 Coarse reordering space definition: slower decoding worse translations Coarse reordering space definition: slower decoding worse translations Arianna Bisazza – PhD Thesis – 19 April 2013 Increasing the DLimit! Current solution

16 16 Observations Word reordering is difficult! The existing word reordering models are not perfect, but they are expected to guide search over huge search spaces Arianna Bisazza – PhD Thesis – 19 April 2013 design a perfect model problem: many have already tried and failed one way to go: simplify the task for the existing reordering models our way:

17 17Arianna Bisazza – PhD Thesis – 19 April 2013 A better definition of the reordering search space (i.e. constraints) can simplify the task of the reordering model (Shallow) linguistic knowledge can help us to refine the reordering search space for a given language pair Working hypotheses

18 18 Outline o The problem o The solutions: verb reordering lattices modified distortion matrices dynamically pruning the reordering space o Comparative evaluation & conclusions Arianna Bisazza – PhD Thesis – 19 April 2013

19 19 Outline o The problem o The solutions: verb reordering lattices modified distortion matrices dynamically pruning the reordering space o Comparative evaluation & conclusions Arianna Bisazza – PhD Thesis – 19 April 2013 Bisazza and Federico, Chunk-based Verb Reordering in VSO Sentences for Arabic-English, WMT 2010 Bisazza, Pighin, Federico, Chunk-Lattices for Verb Reordering in Arabic-English Statistical Machine Translation, MT Journal 2012

20 20 Source-to-Source distortion #perm = |w|! 40,000,000 D(w x,w y )=|y-x-1| DL=3 #perm 7,000 DL=7 #perm 7,000,000 … modify the input to allow only specific long reorderings Arianna Bisazza – PhD Thesis – 19 April 2013 w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 w9w9 w 10 012345678910 w0w0 0123456789 w1w1 2012345678 w2w2 3201234567 w3w3 4320123456 w4w4 5432012345 w5w5 6543201234 w6w6 7654320123 w7w7 8765432012 w8w8 9876543201 w9w9 987654320 w 10 111098765432 Idea: keep a low distortion limit and …

21 21 Example of VSO sentences: the Arabic verb is anticipated wrt the English order Arianna Bisazza – PhD Thesis – 19 April 2013 Typical PSMT outputs: *The Moroccan monarch King Mohamed VI __ his support to… *He renewed the Moroccan monarch King Mohamed VI his support to… Reordering patterns in Arabic-English

22 22Arianna Bisazza – PhD Thesis – 19 April 2013 We assume they are well handled in standard PSMT We try to model them explicitly! Working hypothesis Uneven distribution of long and short-range word movements: few long: verb-subject-object sentences many short: adjective-noun head-initial genitive constructions (idafa)

23 23 Chunk-based fuzzy reordering rules Chunk-based fuzzy reordering rules Shallow syntax chunking: cheaper and easier than deep parsing constrains reorderings in a softer way Fuzzy (non-determinisic) reordering rules: generate N permutations for each matching sequence final reordering decision is taken during translation, guided by all SMT models (reoM, LM...) Few rules for language pair, to only capture long reordering Arianna Bisazza – PhD Thesis – 19 April 2013

24 24Arianna Bisazza – PhD Thesis – 19 April 2013 Move verb chunk ahead by 1 to N chunks Move verb chunk ahead by 1 to N chunks Move verb chunk and following chunk ahead by 1 to N chunks Chunk-based fuzzy reordering rules Chunk-based fuzzy reordering rules … CH(*) CH(V) CH(*) CH(*) CH(*) … CH(V) CH(*) … CH(*) CH(*)CH(*) …

25 25Arianna Bisazza – PhD Thesis – 19 April 2013 The optimal reordering is the one that minimizes total distortion Chunk-based verb reordering in parallel data

26 26Arianna Bisazza – PhD Thesis – 19 April 2013 Chunk-based verb reordering in test data Move verb chunk Move verb chunk and following chunk Verb chunk Other chunks

27 27 Experiments Task: NIST-MT09 (news translation) Systems based on Moses, include lexicalized phrase reordering models [Tillmann 04; Koehn & al 05] Non-monotonic lattice decoding [Dyer & al 08] Evaluation by - BLEU [Papineni & al 01] for lexical match & local order - KRS [Birch & al 10] for global order Arianna Bisazza – PhD Thesis – 19 April 2013

28 28 Arabic-English: Test set: eval09-nw Lattices always used with pre-ordered training Oracle: test pre-ordered looking at reference (more details on lattice pruning in the thesis) Translation Quality +0.5 BLEU +0.4 KRS

29 Arianna Bisazza – PhD Thesis – 19 April 201329 Arabic-English: Test set: eval09-nw Lattices always used with pre-ordered training Oracle: test pre-ordered looking at reference (more details on lattice pruning in the thesis) Translation Quality Translation Time -0.1 BLEU -0.3 KRS Pruning Decoding

30 30Arianna Bisazza – PhD Thesis – 19 April 2013 limiting long reordering of a few chunks only use lattice to represent extra reordering decoding slow down Can we do better? Observation: lattice topology basically distorts word-to-word distances, i.e. during decoding some distant positions become closer Can we achieve the same effect more directly? Lessons learned

31 31 Outline o The problem o The solutions: verb reordering lattices modified distortion matrices dynamically pruning the reordering space o Comparative evaluation & conclusions Arianna Bisazza – PhD Thesis – 19 April 2013 Bisazza and Federico, Modified Distortion Matrices for Phrase-Based Statistical Machine Translation, ACL 2012

32 32 Source-to-Source distortion #perm = |w|! 40,000,000 D(w x,w y )=|y-x-1| DL=3 #perm 7,000 DL=7 #perm 7,000,000 w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 w9w9 w 10 012345678910 w0w0 0123456789 w1w1 2012345678 w2w2 3201234567 w3w3 4320123456 w4w4 5432012345 w5w5 6543201234 w6w6 7654320123 w7w7 8765432012 w8w8 9876543201 w9w9 987654320 w 10 111098765432 Arianna Bisazza – PhD Thesis – 19 April 2013

33 33 Source-to-Source distortion #perm = |w|! 40,000,000 D(w x,w y )=|y-x-1| DL=3 #perm 7,000 DL=7 #perm 7,000,000 DL=3 & modif(D) #perm 20,000 w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 w9w9 w 10 012345678910 w0w0 0123456789 w1w1 2012340078 w2w2 3201230067 w3w3 4320123456 w4w4 5432012345 w5w5 6543201230 w6w6 7654320123 w7w7 8765432012 w8w8 9876543201 w9w9 982254320 w 10 111098765432 Refined reordering search space Arianna Bisazza – PhD Thesis – 19 April 2013 Idea: modify the distortion matrix for each test sentence!

34 34 Arabic-English Move verb chunk (and following chunk) to the right by 1 to N chunks Chunk-based fuzzy reordering rules Chunk-based fuzzy reordering rules CC 1 VC 2 PC 3 NC 4 PC 5 Pct 6 w- $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b. and took part in the march dozens of militants from the Brigades Arianna Bisazza – PhD Thesis – 19 April 2013

35 35 Arabic-English Move verb chunk (and following chunk) to the right by 1 to N chunks CC 1 VC 2 PC 3 NC 4 PC 5 Pct 6 CC 1 VC 2 PC 3 NC 4 PC 5 VC 2 PC 3 NC 4 VC 2 PC 3 NC 4 PC 5 CC 1 PC 5 Pct 6 w- $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b. and took part in the march dozens of militants from the Brigades Chunk-based fuzzy reordering rules Chunk-based fuzzy reordering rules Arianna Bisazza – PhD Thesis – 19 April 2013

36 36 Arabic-English Move verb chunk (and following chunk) to the right by 1 to N chunks CC 1 VC 2 PC 3 NC 4 PC 5 Pct 6 CC 1 VC 2 PC 3 NC 4 PC 5 VC 2 PC 3 NC 4 VC 2 PC 3 NC 4 VC 2 PC 3 NC 4 PC 5 VC 2 PC 3 NC 4 PC 5 CC 1 PC 5 Pct 6 w- $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b. and took part in the march dozens of militants from the Brigades Chunk-based fuzzy reordering rules Chunk-based fuzzy reordering rules Arianna Bisazza – PhD Thesis – 19 April 2013

37 37 CC 1 VC 2 PC 3 NC 4 PC 5 Pct 6 CC 1 VC 2 PC 3 NC 4 PC 5 VC 2 PC 3 NC 4 VC 2 PC 3 NC 4 VC 2 PC 3 NC 4 PC 5 VC 2 PC 3 NC 4 PC 5 CC 1 PC 5 Pct 6 w- $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b. and took part in the march dozens of militants from the Brigades Chunk-based fuzzy reordering rules Chunk-based fuzzy reordering rules Reordering selection Reordered source LM 0.9 0.4 0.1 0.7 Arianna Bisazza – PhD Thesis – 19 April 2013

38 38 CC 1 VC 2 PC 3 NC 4 PC 5 Pct 6 CC 1 VC 2 PC 3 NC 4 PC 5 VC 2 PC 3 Pct 6 w- $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b. and took part in the march dozens of militants from the Brigades Chunk-based fuzzy reordering rules Chunk-based fuzzy reordering rules Reordering selection Reordered source LM 0.9 0.7 0.4 0.1 Reorderings to include in the distortion matrix NC 4 PC 5 CC 1 Arianna Bisazza – PhD Thesis – 19 April 2013

39 39 Modifying the distortion matrix CC 1 VC 2 PC 3 NC 4 PC 5 Pct 6 w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 012345678 CC 1 w0w0 01234567 VC 2 w1w1 20123456 PC 3 w2w2 32012345 w3w3 43201234 NC 4 w4w4 54320123 w5w5 65432012 PC 5 w6w6 76543201 w7w7 87654320 Pct 6 w8w8 98765432 CC 1 VC 2 PC 3 NC 4 PC 5 VC 2 PC 3 Reorderings to include in the distortion matrix NC 4 PC 5 CC 1 Pct 6 Arianna Bisazza – PhD Thesis – 19 April 2013

40 CC 1 VC 2 PC 3 NC 4 PC 5 VC 2 PC 3 NC 4 PC 5 CC 1 40 Modifying the distortion matrix CC 1 VC 2 PC 3 NC 4 PC 5 Pct 6 w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 012345678 CC 1 w0w0 00034567 VC 2 w1w1 20123456 PC 3 w2w2 32012345 w3w3 43201234 NC 4 w4w4 54320123 w5w5 65432012 PC 5 w6w6 76543201 w7w7 87654320 Pct 6 w8w8 98765432 Arianna Bisazza – PhD Thesis – 19 April 2013 Reorderings to include in the distortion matrix

41 41 Modifying the distortion matrix CC 1 VC 2 PC 3 NC 4 PC 5 Pct 6 w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 012345678 CC 1 w0w0 00034567 VC 2 w1w1 20123456 PC 3 w2w2 32012345 w3w3 42201234 NC 4 w4w4 54320123 w5w5 65432012 PC 5 w6w6 76543201 w7w7 87654320 Pct 6 w8w8 98765432 CC 1 VC 2 PC 3 NC 4 PC 5 VC 2 PC 3 NC 4 PC 5 CC 1 Pct 6 Arianna Bisazza – PhD Thesis – 19 April 2013 Reorderings to include in the distortion matrix

42 42 Modifying the distortion matrix CC 1 VC 2 PC 3 NC 4 PC 5 Pct 6 w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 012345678 CC 1 w0w0 00034567 VC 2 w1w1 20100456 PC 3 w2w2 32012345 w3w3 42201234 NC 4 w4w4 54320123 w5w5 65432012 PC 5 w6w6 76543201 w7w7 87654320 Pct 6 w8w8 98765432 CC 1 VC 2 PC 3 NC 4 PC 5 VC 2 PC 3 NC 4 PC 5 CC 1 Pct 6 Arianna Bisazza – PhD Thesis – 19 April 2013 Reorderings to include in the distortion matrix

43 43 Modifying the distortion matrix CC 1 VC 2 PC 3 NC 4 PC 5 Pct 6 w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 012345678 CC 1 w0w0 00000567 VC 2 w1w1 20100456 PC 3 w2w2 32012345 w3w3 42201234 NC 4 w4w4 54320123 w5w5 65432012 PC 5 w6w6 76543201 w7w7 87654320 Pct 6 w8w8 98765432 CC 1 VC 2 PC 3 NC 4 PC 5 VC 2 PC 3 NC 4 PC 5 CC 1 Pct 6 Arianna Bisazza – PhD Thesis – 19 April 2013 Reorderings to include in the distortion matrix

44 44 Modifying the distortion matrix CC 1 VC 2 PC 3 NC 4 PC 5 Pct 6 w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 012345678 CC 1 w0w0 00000567 VC 2 w1w1 20100456 PC 3 w2w2 32012345 w3w3 42201234 NC 4 w4w4 54320123 w5w5 65432012 PC 5 w6w6 72543201 w7w7 82654320 Pct 6 w8w8 98765432 CC 1 VC 2 PC 3 NC 4 PC 5 VC 2 PC 3 NC 4 PC 5 CC 1 Pct 6 Arianna Bisazza – PhD Thesis – 19 April 2013 Reorderings to include in the distortion matrix

45 45 Modifying the distortion matrix CC 1 VC 2 PC 3 NC 4 PC 5 Pct 6 w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 012345678 CC 1 w0w0 00000567 VC 2 w1w1 20100456 PC 3 w2w2 32012340 w3w3 42201230 NC 4 w4w4 54320123 w5w5 65432012 PC 5 w6w6 72543201 w7w7 82654320 Pct 6 w8w8 98765432 CC 1 VC 2 PC 3 NC 4 PC 5 VC 2 PC 3 NC 4 PC 5 CC 1 Pct 6 Arianna Bisazza – PhD Thesis – 19 April 2013 Reorderings to include in the distortion matrix

46 46 Modifying the distortion matrix CC 1 VC 2 PC 3 NC 4 PC 5 Pct 6 w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 012345678 CC 1 w0w0 00000567 VC 2 w1w1 20100456 PC 3 w2w2 32012340 w3w3 42201230 NC 4 w4w4 54320123 w5w5 65432012 PC 5 w6w6 72543201 w7w7 82654320 Pct 6 w8w8 98765432 CC 1 VC 2 PC 3 NC 4 PC 5 VC 2 PC 3 NC 4 PC 5 CC 1 Pct 6 Arianna Bisazza – PhD Thesis – 19 April 2013 Reorderings to include in the distortion matrix

47 47 Modifying the distortion matrix CC 1 VC 2 PC 3 NC 4 PC 5 Pct 6 w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 012345678 CC 1 w0w0 00000567 VC 2 w1w1 20100456 PC 3 w2w2 32012340 w3w3 42201230 NC 4 w4w4 54320123 w5w5 65432012 PC 5 w6w6 72543201 w7w7 82654320 Pct 6 w8w8 98765432 Arianna Bisazza – PhD Thesis – 19 April 2013 w- $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b. Decoder input

48 48 Experiments Tasks: NIST-MT09 for Ar-En, WMT10 for De-En Systems based on Moses, include state-of-the-art hierarchical lexicalized reordering models [Tillmann 04; Koehn & al 05; Galley & Manning 08] Baseline Distortion Limits: 5 in Ar-En, 10 in De-En Evaluation by: - BLEU for lexical match & local order - KRS for global order Arianna Bisazza – PhD Thesis – 19 April 2013

49 49 Arabic-English: Test set: eval09-nw Distortion modified with 3-best reorderings per rule-matching sequence Translation Quality Translation Time +0.9 BLEU +0.6 KRS

50 Arianna Bisazza – PhD Thesis – 19 April 201350 German-English: Test set: newstest10 Distortion modified with 3-best reorderings per rule-matching sequence Translation Quality Translation Time +0.5 BLEU +0.7 KRS

51 51Arianna Bisazza – PhD Thesis – 19 April 2013 modified distortion matrices improve reordering without decoding overhead language-specific reordering rules are still needed Can we learn everything from the data? Lessons learned

52 52 Outline o The problem o The solutions: verb reordering lattices modified distortion matrices dynamically pruning the reordering space o Comparative evaluation & conclusions Arianna Bisazza – PhD Thesis – 19 April 2013 Bisazza and Federico, Dynamically Shaping the Reordering Search Space of Phrase-Based Statistical Machine Translation, Transactions of ACL 2013 (accepted with minor revisions)

53 53 A fully data-driven approach Train a binary classifier to learn if an input word w y is to be translated right after another w x Word-after-Word (WaW) reordering model... anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet yes no Arianna Bisazza – PhD Thesis – 19 April 2013 No rules required, all is learnt from parallel data Approach is easily portable to new language pairs with similar reordering characteristics

54 54 [usual approach] additional feature function [novel approach dynamically prune the reordering space: use model score to decide (early) if a given reordering path is promising enough to be further explored Arianna Bisazza – PhD Thesis – 19 April 2013 Decoder-integration usual approach novel approach

55 55 Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet. Die Budapester Staat~ anwaltschaft hat ihreErmittlungen zumVorfall eingeleitet. Early reordering pruning Test time: run classifier for each input sentence Arianna Bisazza – PhD Thesis – 19 April 2013

56 56 Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet. Die Budapester Staat~ anwaltschaft hat ihreErmittlungen zumVorfall eingeleitet. Arianna Bisazza – PhD Thesis – 19 April 2013 Early reordering pruning Test time: run classifier for each input sentence 0.60.50.20.10.30.1 0.2 0.110 0.60.50.10.30.1 0.30.10.20.1 0.60.90.40.2 0.1 0.20.1 0.60.50.80.40.20.30.4 0.2 0.40.30.90.30.40.60.20.50.3 0.10.30.60.70.90.30.40.60.70.1 0.40.50.20.60.80.4 0.2 0.40.20.30.40.60.20.80.40.1 0.30.50.30.10.90.50.7 0.2 0.10.2 0.10.40.60.5 0.1 0.20.1 0.80.60.10.30.6 0.1 0.20.10.30.1 Consider a larger space (DL)

57 57 0.60.50.20.10.30.1 0.2 0.110 0.60.50.10.30.1 0.40.10.20.1 0.60.90.40.2 0.1 0.20.1 0.60.50.80.40.20.30.4 0.2 0.40.30.90.30.40.60.20.50.3 0.10.30.60.70.90.30.40.60.70.1 0.40.50.20.60.80.4 0.2 0.40.20.30.40.60.20.80.40.1 0.30.50.30.10.90.50.7 0.2 0.10.2 0.10.40.60.5 0.1 0.20.1 0.80.60.10.30.6 0.1 0.20.10.30.1 Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet. Die Budapester Staat~ anwaltschaft hat ihreErmittlungen zumVorfall eingeleitet. Arianna Bisazza – PhD Thesis – 19 April 2013 Early reordering pruning Test time: run classifier for each input sentence Consider a larger space (DL)

58 58 0.60.50.20.10.30.1 0.2 0.110 0.60.50.10.30.1 0.40.10.20.1 0.60.90.40.2 0.1 0.20.1 0.60.50.80.40.20.30.4 0.2 0.40.30.90.30.40.60.20.50.3 0.10.30.60.70.90.30.40.60.70.1 0.40.50.20.60.80.4 0.2 0.40.20.30.40.60.20.80.40.1 0.30.50.30.10.90.50.7 0.2 0.10.2 0.10.40.60.5 0.1 0.20.1 0.80.60.10.30.6 0.1 0.20.10.30.1 Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet. Die Budapester Staat~ anwaltschaft hat ihreErmittlungen zumVorfall eingeleitet. Arianna Bisazza – PhD Thesis – 19 April 2013 Early reordering pruning Test time: run classifier for each input sentence Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion

59 59 Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet. Die Budapester Staat~ anwaltschaft hat ihreErmittlungen zumVorfall eingeleitet. Arianna Bisazza – PhD Thesis – 19 April 2013 Early reordering pruning Test time: run classifier for each input sentence Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion For example after Die… 0.60.50.20.10.30.1 0.2 0.110 0.60.50.10.30.1 0.40.10.20.1 0.60.90.40.2 0.1 0.20.1 0.60.50.80.40.20.30.4 0.2 0.40.30.90.30.40.60.20.50.3 0.10.30.60.70.90.30.40.60.70.1 0.40.50.20.60.80.4 0.2 0.40.20.30.40.60.20.80.40.1 0.30.50.30.10.90.50.7 0.2 0.10.2 0.10.40.60.5 0.1 0.20.1 0.80.60.10.30.6 0.1 0.20.10.30.1

60 60 Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet. Die Budapester Staat~ anwaltschaft hat ihreErmittlungen zumVorfall eingeleitet. Arianna Bisazza – PhD Thesis – 19 April 2013 Early reordering pruning Test time: run classifier for each input sentence 0.60.50.20.10.30.1 0.2 0.110 0.60.50.10.30.1 0.40.10.20.1 0.60.90.40.2 0.1 0.20.1 0.60.50.80.40.20.30.4 0.2 0.40.30.90.30.40.60.20.50.3 0.10.30.60.70.90.30.40.60.70.1 0.40.50.20.60.80.4 0.2 0.40.20.30.40.60.20.80.40.1 0.30.50.30.10.90.50.7 0.2 0.10.2 0.10.40.60.5 0.1 0.20.1 0.80.60.10.30.6 0.1 0.20.10.30.1 Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion For example after Die…

61 61 Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet. Die Budapester Staat~ anwaltschaft hat ihreErmittlungen zumVorfall eingeleitet. Arianna Bisazza – PhD Thesis – 19 April 2013 Early reordering pruning Test time: run classifier for each input sentence 0.60.50.20.10.30.1 0.2 0.110 0.60.50.10.30.1 0.40.10.20.1 0.60.90.40.2 0.1 0.20.1 0.60.50.80.40.20.30.4 0.2 0.40.30.90.30.40.60.20.50.3 0.10.30.60.70.90.30.40.60.70.1 0.40.50.20.60.80.4 0.2 0.40.20.30.40.60.20.80.40.1 0.30.50.30.10.90.50.7 0.2 0.10.2 0.10.40.60.5 0.1 0.20.1 0.80.60.10.30.6 0.1 0.20.10.30.1 Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion For example after Die… … after Staat…

62 62 Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet. Die Budapester Staat~ anwaltschaft hat ihreErmittlungen zumVorfall eingeleitet. Arianna Bisazza – PhD Thesis – 19 April 2013 Early reordering pruning Test time: run classifier for each input sentence 0.60.50.20.10.30.1 0.2 0.110 0.60.50.10.30.1 0.40.10.20.1 0.60.90.40.2 0.1 0.20.1 0.60.50.80.40.20.30.4 0.2 0.40.30.90.30.40.60.20.50.3 0.10.30.60.70.90.30.40.60.70.1 0.40.50.20.60.80.4 0.2 0.40.20.30.40.60.20.80.40.1 0.30.50.30.10.90.50.7 0.2 0.10.2 0.10.40.60.5 0.1 0.20.1 0.80.60.10.30.6 0.1 0.20.10.30.1 Consider a larger space (DL) Dynamically prune reorderings before each hypothesis expansion For example after Die… … after Staat…

63 63 Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet. Die Budapester Staat~ anwaltschaft hat ihreErmittlungen zumVorfall eingeleitet. Improved Word Reordering for PBSMT Decoder-integration How to reduce early pruning errors? always allow short jumps! 0.60.50.20.10.30.1 0.2 0.110 0.60.50.10.30.1 0.40.10.20.1 0.60.90.40.2 0.1 0.20.1 0.60.50.80.40.20.30.4 0.2 0.40.30.90.30.40.60.20.50.3 0.10.30.60.70.90.30.40.60.70.1 0.40.50.20.60.80.4 0.2 0.40.20.30.40.60.20.80.40.1 0.30.50.30.10.90.50.7 0.2 0.10.2 0.10.40.60.5 0.1 0.20.1 0.80.60.10.30.6 0.1 0.20.10.30.1

64 64 0.60.50.20.10.30.1 0.2 0.110 0.60.50.10.30.1 0.40.10.20.1 0.60.90.40.2 0.1 0.20.1 0.60.50.80.40.20.30.4 0.2 0.40.30.90.30.40.60.20.50.3 0.10.30.60.70.90.30.40.60.70.1 0.40.50.20.60.80.4 0.2 0.40.20.30.40.60.20.80.40.1 0.30.50.30.10.90.50.7 0.2 0.10.2 0.10.40.60.5 0.1 0.20.1 0.80.60.10.30.6 0.1 0.20.10.30.1 Die Budapester Staat~ anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet. Die Budapester Staat~ anwaltschaft hat ihreErmittlungen zumVorfall eingeleitet. Improved Word Reordering for PBSMT Decoder-integration How to reduce early pruning errors? always allow short jumps! Off limits Prunable zone Non-prunable zone

65 65 Experiments Same tasks Similar baselines, but with early distortion cost [Moore & Quirk 07] Baseline Distortion Limit: 8 Evaluation by: - BLEU, KRS - KRS-V Weighted KRS, only sensitive to verbs Arianna Bisazza – PhD Thesis – 19 April 2013

66 66 Arabic-English: Translation Quality +0.3 BLEU +0.8 KRS-V Test set: eval09-nw Non-prunable zone width: 5 (more metrics and test sets in the thesis)

67 Arianna Bisazza – PhD Thesis – 19 April 201367 Arabic-English: Translation Quality Translation Time +0.6 BLEU +1.2 KRS-V Test set: eval09-nw Non-prunable zone width: 5 (more metrics and test sets in the thesis)

68 Arianna Bisazza – PhD Thesis – 19 April 201368 German-English: Translation Quality Test set: newstest10 Non-prunable zone width: 5 (more metrics and test sets in the thesis) +0.2 BLEU +0.7 KRS-V

69 Arianna Bisazza – PhD Thesis – 19 April 201369 German-English: Translation Quality Test set: newstest10 Non-prunable zone width: 5 (more metrics and test sets in the thesis) Translation Time +1.3 BLEU +4.0 KRS-V

70 70 Outline o The problem o The solutions: verb reordering lattices modified distortion matrices dynamically pruning the reordering space o Comparative evaluation & conclusions Arianna Bisazza – PhD Thesis – 19 April 2013

71 71 Experiments Same PSMT baselines Best enhanced PSMT systems: -Ar-En: WaW model & erly reo. pruning -De-En: reo. lattices pruned with reo. source LM Hierarchical phrase-based system: -default configuration (max span for rule extract.: 10 words) -max span for decoding: 10 or 20 Evaluation by: -BLEU, KRS -KRS-V Weighted KRS, only sensitive to verbs Arianna Bisazza – PhD Thesis – 19 April 2013

72 72 Translation Quality Translation Time Test set: eval09-nw Non-prunable zone width: 5 (more metrics and test sets in the thesis) Arabic-English:

73 Arianna Bisazza – PhD Thesis – 19 April 201373 Translation Quality Test set: newstest10 Lattices pruned with reo. source LM (more metrics and test sets in the thesis) Translation Time German-English:

74 74Arianna Bisazza – PhD Thesis – 19 April 2013 Arabic-English examples (1)

75 75Arianna Bisazza – PhD Thesis – 19 April 2013 Arabic-English examples (1)

76 76Arianna Bisazza – PhD Thesis – 19 April 2013 Arabic-English examples (2)

77 77Arianna Bisazza – PhD Thesis – 19 April 2013 Arabic-English examples (2)

78 78Arianna Bisazza – PhD Thesis – 19 April 2013 German-English examples (1)

79 79Arianna Bisazza – PhD Thesis – 19 April 2013 German-English examples (1)

80 80Arianna Bisazza – PhD Thesis – 19 April 2013 German-English examples (2)

81 81Arianna Bisazza – PhD Thesis – 19 April 2013 German-English examples (2)

82 82 Conclusions Our techniques advance the state of the art in reordering modeling within the PSMT framework: capture long-range reordering patterns without sacrificing decoding efficiency proved importance of refining the reordering search space Positive results on large-scale news translation task in two difficult language pairs: significant gains in reordering-specific metrics while generic scores are preserved or increased our best PSMT systems compare favorably with a strong tree-based approach (HSMT) - both in quality and e ciency Arianna Bisazza – PhD Thesis – 19 April 2013

83 83 Future Directions Improve the proposed methods by: refining chunk-based reordering rules with POS or lexical clues increasing accuracy of WaW model with new features combining different reordering scores for early pruning Evaluate on language pairs with similar reordering characteristics Analyze the effect of improved long reordering on post-editing effort by human translators Address the problem of reordering search space definition in HSMT, possibly with analogous strategies Arianna Bisazza – PhD Thesis – 19 April 2013

84 84 Related publications A. Bisazza, M. Federico, Chunk-based Verb Reordering in VSO Sentences for Arabic-English, WMT 2010. C. Hardmeier, A. Bisazza, M. Federico, Word Lattices for Morphological Reduction and Chunk-based Reordering, WMT 2010. A. Bisazza, D. Pighin, M. Federico, Chunk-Lattices for Verb Reordering in Arabic-English Statistical Machine Translation, MT Journal, Special Issues on MT for Arabic, 2012. A. Bisazza, M. Federico, Modified Distortion Matrices for Phrase-Based Statistical Machine Translation, ACL 2012. A. Bisazza, M. Federico, Dynamically Shaping the Reordering Search Space of Phrase-Based Statistical Machine Translation, Transactions of the ACL 2013 (accepted with minor revisions). Arianna Bisazza – PhD Thesis – 19 April 2013

85 85 w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 w9w9 w 10 012345678910 w0w0 0123456789 w1w1 2012345678 w2w2 3T01234567 w3w3 4H20123Y56 w4w4 5ATTENTION! w5w5 6N43201U34 w6w6 7K5432FOR23 w7w7 8S65432012 w8w8 9876543201 w9w9 987654320 w 10 111098765432 Arianna Bisazza – PhD Thesis – 19 April 2013

86 86 w0w0 w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 w9w9 w 10 012345678910 w0w0 0123456789 w1w1 2012345678 w2w2 3T01234567 w3w3 4H20123Y56 w4w4 5ATTENTION! w5w5 6N43201U34 w6w6 7K5432FOR23 w7w7 8S65432012 w8w8 9876543201 w9w9 987654320 w 10 111098765432 Arianna Bisazza – PhD Thesis – 19 April 2013


Download ppt "Linguistically Motivated Reordering Modeling for Phrase-Based Statistical Machine Translation Arianna Bisazza Advisor: Marcello Federico Fondazione Bruno."

Similar presentations


Ads by Google