Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Alignment Entropy as an Automated Predictor of Bitext Fidelity for Statistical Machine Translation Shankar Ananthakrishnan Rohit Prasad Prem Natarajan.

Similar presentations


Presentation on theme: "1 Alignment Entropy as an Automated Predictor of Bitext Fidelity for Statistical Machine Translation Shankar Ananthakrishnan Rohit Prasad Prem Natarajan."— Presentation transcript:

1 1 Alignment Entropy as an Automated Predictor of Bitext Fidelity for Statistical Machine Translation Shankar Ananthakrishnan Rohit Prasad Prem Natarajan Speech and Language Processing Unit BBN Technologies Cambridge, MA

2 2 Talk progress Statistical machine translation Word alignment Alignment entropy Alignment error analysis Bitext translation quality Translation quality analysis Conclusion and future directions

3 3 Statistical machine translation (SMT) Start with a large bitext Parallel corpora or “sentence pairs” Lots (thousands/millions) of translation pairs! Align sentence pairs at the word level Extract phrase pairs or translation rules Constrained by word alignments Decode source with extracted phrases/rules In conjunction with a language model

4 4 Talk progress Statistical machine translation Word alignment Alignment entropy Alignment error analysis Bitext translation quality Translation quality analysis Conclusion and future directions

5 5 Word alignment Link corresponding words in sentence pairs Forms basis of almost all SMT architectures Statistical word alignment [Brown93, Vogel96] Probabilistic noisy-channel-based translation model T m Estimated using expectation-maximization (EM) Choose most likely (Viterbi) alignment A v NULL tjAr bDAEp commodity traders

6 6 Word alignment quality Errors in alignment are caused by Data sparsity (low-resource languages) Translation errors Paraphrasing, non-literal translations Alignment errors affect translation quality [Fraser07] Correcting or discarding bad alignments may help How do we identify poorly aligned constituents? Need automated alignment quality metric Unsupervised: no manual intervention Correlates with supervised measures (e.g. AER) Scales up from the word- to the corpus-level

7 7 An obvious candidate metric Length-normalized Viterbi alignment score Monotonic function of p( A v | T m ) By-product of alignment process Benefits Readily available unsupervised metric Intuitive, easy to understand Drawbacks A low probability alignment need not be incorrect Poor granularity: only sentence-level alignment quality

8 8 Talk progress Statistical machine translation Word alignment Alignment entropy Alignment error analysis Bitext translation quality Translation quality analysis Conclusion and future directions

9 9 Alignment entropy Uncertainty of a link in the Viterbi alignment Higher uncertainty implies poorer alignment? Basis for automated alignment quality metric Need a probability distribution over alignments Different contexts for a given sentence pair Estimate a multinomial distribution over word alignments Bootstrapping simulates different contexts Resample original bitext with replacement

10 10 Defining alignment entropy j th word of i th source sentence iterate over all target words (including NULL) index of target word to which f ij is aligned = 1 iff f ij is aligned to e ik in the l-th bag = 0 otherwise { set of resampled bitexts in which the i th sentence pair occurs

11 11 Evaluating alignment entropy

12 12 Notes on alignment entropy Measures variability of alignments across bags Defined only for IBM model alignments Each source word linked to exactly one target word Unidirectional: defined for source-target links Reverse alignment for target-source alignment entropy Combine the two for bidirectional alignment entropy Sentence-pair specific Not fixed for a given source vocabulary word Defined for each source word in every sentence pair

13 13 Talk progress Statistical machine translation Word alignment Alignment entropy Alignment error analysis Bitext translation quality Translation quality analysis Conclusion and future directions

14 14 Alignment error analysis IBM-4 alignments using GIZA++ [Al-Onaizan99] English/Arabic: 129,126 pairs (ca. 1.5M words) 100 training contexts (1 original, 99 resampled) Bidirectional sentence-level alignment entropy Bin into (H)ighest, (L)ow, and (Z)ero entropy sets Select ca. 250 sentence pairs from each set Length-normalized Viterbi alignment score Pool sentence pair sets selected above Re-rank by normalized Viterbi alignment score Pick ca. 250 pairs with worst scores (A) Gold-standard manual alignments for each set Precision, recall, AER, balanced F-measure

15 15 Alignment error analysis MeasureZ-setL-setH-setA-set Align. Ent. 0.000.130.570.47 Precision 94.3%82.7%55.0%61.8% Recall 82.3%73.0%54.1%61.0% AER 12.0%22.2%45.4%38.6% Balanced F 87.9%77.6%54.5%61.4% Table 1 Alignment entropy vs. alignment quality

16 16 Notes on alignment error analysis Results support our hypothesis Higher alignment entropy indicates poorer quality Superior to normalized Viterbi alignment score AER(H-set) > AER(A-set) by 6.8% absolute

17 17 Talk progress Statistical machine translation Word alignment Alignment entropy Alignment error analysis Bitext translation quality Translation quality analysis Conclusion and future directions

18 18 Bitext translation quality Human translations often contain errors Non-native speakers of one language Some constructs difficult to translate (e.g. idioms) Oversight, inadequate quality control Predicting problems in human translations Semantic errors, missing chunks, etc. Non-literality (paraphrasing) Use alignment entropy to identify problems? Is it correlated with translation quality?

19 19 Measuring bitext translation quality TER/HTER analysis of existing translations [Snover06] Against carefully prepared gold-standard translations Translation Edit Rate (TER) # insertions, deletions, substitutions, and shifts Lexically-based, no notion of semantic equivalence Human-targeted Translation Edit Rate (HTER) Human expert produces targeted references Minimally edit hypotheses for semantic equivalence to untargeted gold-standard references HTER = TER evaluated against targeted references Minimizes impact of lexical choice

20 20 Talk progress Statistical machine translation Word alignment Alignment entropy Alignment error analysis Bitext translation quality Translation quality analysis Conclusion and future directions

21 21 Translation quality analysis Existing translations are the “hypotheses” New gold-standard references for H-, L-, and Z-sets Minimal paraphrasing, as literal as possible Thoroughly checked for quality Evaluate TER between “hypotheses” and gold-standard Measure of translation literality HTER evaluation Targeted references from hypotheses and gold-standard HTER = TER of hypotheses w.r.t. targeted references Measure of semantic translation correctness

22 22 Translation quality analysis Eval setAERTERHTER Z-set 12.0%23.7%1.7% L-set 22.2%48.3%0.9% H-set 45.4%63.0%8.6% Table 2 Alignment entropy vs. translation quality

23 23 Notes on translation quality analysis Predicting translation literality Higher alignment entropy produces higher TER Indicative of paraphrasing Semantic correctness of translation pairs Excellent equivalence in zero/low-entropy pairs Significant errors in highest alignment entropy pairs

24 24 Talk progress Statistical machine translation Word alignment Alignment entropy Alignment error analysis Bitext translation quality Translation quality analysis Conclusion and future directions

25 25 Conclusion Excellent predictor of alignment quality Fine-grained, extensible, word-level measure Superior to normalized Viterbi alignment score Serves as measure of translation literality Identifies translation pairs with gross errors Useful tool for validating human translations

26 26 Future directions Bootstrapped phrase confidence for SMT [ongoing] Consistency of phrase pairs across resampled bitexts Integrated as a phrase level feature (tuned with MERT) Modest BLEU improvements (0.7-1.0 point) Online human translation validation [planned] Identify potential translation errors on the fly Assist human translators for rapid SMT development Enriched machine translation [planned] Project features across high-confidence alignments Availability of a fine-grained measure is key

27 27 References [Brown93] Peter E. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19:263-311. [Vogel96] Stephan Vogel, Hermann Ney, and Christoph Tillmann. 1996. HMM-based Word Alignment in Statistical Translation. In Proceedings of the 16 th conference on Computational Linguistics, pp. 836-841, Morristown, NJ. [Fraser07] Alexander Fraser and Daniel Marcu. 2007. Measuring Word Alignment Quality for Statistical Machine Translation. Computational Linguistics, 33(3):293- 303. [Al-Onaizan99] Yaser Al-Onaizan, Jan Curin, Michael Jahr, Kevin Knight, John Lafferty, Dan Melamed, Franz Josef Och, David Purdy, Noah A. Smith, and David Yarowsky. 1999. Statistical Machine Translation: Final Report. Technical Report, JHU Summer Workshop. [Snover06] Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A Study of Translation Edit Rate with Targeted Human Annotation. In Proceedings AMTA, pp. 223-231.

28 28 Thank you!

29 29 Supervised alignment quality Annotate “sure” (S) and “possible” (P) links Evaluate against hypothesis alignment A


Download ppt "1 Alignment Entropy as an Automated Predictor of Bitext Fidelity for Statistical Machine Translation Shankar Ananthakrishnan Rohit Prasad Prem Natarajan."

Similar presentations


Ads by Google