Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.

Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL talk, Boulder, June 2009

2 The Problem Statistical Machine Translation (SMT) M F  E is a standard log-linear model and is composed of two main components: –Phrase tables –Language model Good phrase tables are typically learned from large bilingual (F,E)-text –What if we don’t have large bilingual text? MFEMFE Language F Language E

3 A Solution Suppose we are given a large monolingual text in the source language F Pay a human expert and ask him/her to translate these sentences into the target language E –This way, we will have a bigger bilingual text But our budget is limited ! –We cannot afford to translate all monolingual sentences

4 A Better Solution Choose a subset of monolingual sentences for which: if we had the translation, the SMT performance would increase the most Only ask the human expert for the translation of these highly informative sentences This is the goal of Active Learning –Workshop on Active Learning for NLP

5 Active Learning for SMT Train MFEMFE Bilingual text F F E E Monolingual text Decode Translated text F F E E Translate by human F F E E F F Select Informative Sentences Select Informative Sentences Re- For more details, see the paper

6 Outline General idea of active learning (AL) for statistical machine translation (SMT) Sentence Selection Strategies –Similarity, Decoder’s Confidence –Hierarchical Adaptive Sampling –Sentence merit based on the translation units Experiments –The simulated AL setting –The real AL setting

7 Intuitive Underpinnings for Sent. Selection Sentences for which the model is not confident about their translations –Hopefully high confident translations are good ones Sentences similar to bilingual text are easy to translate by the model –Select the dissimilar ones to the bilingual text Cluster monolingual sentences –Choose some representative sentences for each cluster

8 Sentence Selection strategies Baseline: Randomly choose sentences from the pool of monolingual sentences Previous Work: Decoder’s confidence for the translations (Kato & Barnard, 2007) Our proposed methods: –Similarity to the bilingual training data –Reverse model –Hierarchical Adaptive Sampling (HAS) –Utility of the translation units

9 Sentence Selection strategies Baseline: Randomly choose sentences from the pool of monolingual sentences Previous Work: Decoder’s confidence for the translations (Kato & Barnard, 2007) Our proposed methods: –Similarity to the bilingual training data  Reverse model  Hierarchical Adaptive Sampling (HAS)  Utility of the translation units

10 Reverse Model Comparing –the original sentence, and –the final sentence Tells us something about the value of the sentence I will let you know about the issue later Je vais vous faire plus tard sur la question I will later on the question MEFMEF Rev: M F  E

11 Hierarchical Adaptive Sampling U 0 : Monolingual sentences U1U1 U2U2 U 2,2 U 2,1 Average Decoder’s Score Sort sentences wrt similarity to the Bilingual text Sample sentences from these two nodes MFEMFE Bilingual text F F E E (Dasgupta & Hsu, 2008)

12 Utility of the Translation Units Phrases are the basic units of translations in phrase-based SMT I will let you know about the issue later Monolingual Text 6 6 1 8 3 Bilingual Text 5 6 1 2 3 7 The more frequent a phrase is in the monolingual text, the more important it is The more frequent a phrase is in the bilingual text, the less important it is

13 Generative Models for Phrases Monolingual TextBilingual Text 6 6 1 8 3 Count.25.05.33.12 Probability 5 6 1 2 3 7 CountProbability.21.22.05.09.14.29 mm bb

14 Averaged Probability Ratio Score For a monolingual sentence S –Consider, the bag of its phrases –Score: Normalized probability ratio P(S|  m )/P(S|  b ) –We will refer to it as Geom-Phrase Dividing the phrase probabilities captures our intuition about the utility of the translation units

15 Sentence Segmentation How to prepare the bag of phrases for a sentence S? –For the bilingual text, we have the segmentation from the training phase of the SMT model –For the monolingual text, we run the SMT model to produce the top-n translations and the corresponding segmentations

16 Extensions of the Score Instead of using phrases, we may use n-grams We may alternatively use the following score –We will refer to it as Arithmetic Average

17 Sentence Selection strategies (Recap) Baseline: Randomly choose sentences from the pool of monolingual sentences Previous Work: Decoder’s confidence for the translations (Kato & Barnard, 2007) Our proposed methods:  Similarity to the bilingual training data  Reverse model  Hierarchical Adaptive Sampling (HAS)  Utility of the translation units

18 Outline General idea of active learning (AL) for statistical machine translation (SMT) Sentence Selection Strategies –Similarity, Decoder’s Confidence –Hierarchical Adaptive Sampling –Sentence merit based on the translation units Experiments –The simulated AL setting –The real AL setting

19 Experimental Setup Dataset size: We select 200 (or 100) sentences from the monolingual sentence set for 25 (or 5) iterations We use Portage from NRC as the underlying SMT system (Ueffing et al, 2007) Bilingual textMonolingual Texttest Bangla-English11K20K1K Fr,Gr,Sp-English5K20K2K

20 The Simulated AL Setting Geometric Phrase Random Decoder’s Confidence Better

21 The Real AL Setting Our human translator is different from the text author –The methods are good at adapting to the new writing style Geometric Phrase Random

22 Domain Adaptation Now suppose the both test and monolingual text are out-of-domain with respect to the bilingual text –The ‘Decoder’s Confidence’ does a good job –The ‘Geom 1-gram’ outperforms other methods since it quickly expands the lexicon set in an effective manner Geom 1-gram Random Decoder’s Conf

23 Analysis The coverage of the bilingual text is important but is not the only factor –Notice the Geom 1-gram and Geom-phrase methods Coverage

24 Analysis

25 Conclusions We presented different sentence selection methods for SMT in an AL setting Using knowledge about the internal architecture of the SMT system is crucial Yet, we are after better sentence selection strategies –See our upcoming paper in ACL09

26 Merci Thank You

27 Domain Adaptation Selecting sentences based on: –The ‘Confidence’ does a good job –The ‘1-gram’ outperforms other methods since it quickly expands the lexicon set in an effective manner MethodBleu% per% wer% Geom 1-gram 14.92 34.83 46.06 Confidence 14.74 35.02 46.11 Random 14.11 35.28 46.47

28 The Simulated AL Setting Language PairGeometric Average Bleu% per% wer% Random (Baseline) Bleu% per% wer% French-English 22.49 27.99 38.4521.97 28.31 38.80 German-English 17.54 31.51 44.2817.25 31.63 44.41 Spanish-English 23.03 28.86 39.1723.00 28.97 39.21 Using other measure other than BLEU –wer: word error rate –per: position independent word error rate

Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.

Similar presentations

Presentation on theme: "Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.

Similar presentations

Presentation on theme: "Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL."— Presentation transcript:

Similar presentations

About project

Feedback