Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation M arine C ARPUAT and D ekai W U Human Language Technology.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
MEANT: semi-automatic metric for evaluating for MT evaluation via semantic frames an asembling of ACL11,IJCAI11,SSST11 Chi-kiu Lo & Dekai Wu Presented.
Word Sense Disambiguation for Machine Translation Han-Bin Chen
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Hybridity in MT: Experiments on the Europarl Corpus Declan Groves 24 th May, NCLT Seminar Series 2006.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
June 2004 D ARPA TIDES MT Workshop Measuring Confidence Intervals for MT Evaluation Metrics Ying Zhang Stephan Vogel Language Technologies Institute Carnegie.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
2D1431 Machine Learning Boosting.
Orange: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin & Franz Josef Och (presented by Bilmes) or Orange: a.
Novel Reordering Approaches in Phrase-Based Statistical Machine Translation S. Kanthak, D. Vilar, E. Matusov, R. Zens & H. Ney ACL Workshop on Building.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
Minimum Error Rate Training in Statistical Machine Translation By: Franz Och, 2003 Presented By: Anna Tinnemore, 2006.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
© 2014 The MITRE Corporation. All rights reserved. Stacey Bailey and Keith Miller On the Value of Machine Translation Adaptation LREC Workshop: Automatic.
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Evaluation in NLP Zdeněk Žabokrtský. Intro The goal of NLP evaluation is to measure one or more qualities of an algorithm or a system Definition of proper.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
METEOR-Ranking & M-BLEU: Flexible Matching & Parameter Tuning for MT Evaluation Alon Lavie and Abhaya Agarwal Language Technologies Institute Carnegie.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Statistical Machine Translation Part IV – Log-Linear Models Alexander Fraser Institute for Natural Language Processing University of Stuttgart
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
A Language Independent Method for Question Classification COLING 2004.
A daptable A utomatic E valuation M etrics for M achine T ranslation L ucian V lad L ita joint work with A lon L avie and M onica R ogati.
Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.
2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
CS Ensembles1 Ensembles. 2 A “Holy Grail” of Machine Learning Automated Learner Just a Data Set or just an explanation of the problem Hypothesis.
Korea Maritime and Ocean University NLP Jung Tae LEE
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
Ling573 NLP Systems and Applications May 7, 2013.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
1 Measuring the Semantic Similarity of Texts Author : Courtney Corley and Rada Mihalcea Source : ACL-2005 Reporter : Yong-Xiang Chen.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Neural Machine Translation
METEOR: Metric for Evaluation of Translation with Explicit Ordering An Improved Automatic Metric for MT Evaluation Alon Lavie Joint work with: Satanjeev.
Approaches to Machine Translation
Statistical NLP: Lecture 9
Approaches to Machine Translation
Statistical Machine Translation Papers from COLING 2004
Statistical NLP Spring 2011
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation M arine C ARPUAT and D ekai W U Human Language Technology Center Department of Computer Science and Engineering HKUST

New resources for SMT: context- dependent phrasal translation lexicons A key new resource for Phrase Sense Disambiguation (PSD) for SMT [Carpuat & Wu 2007] Entirely automatically acquired Consistently improves 8 translation quality metrics [EMNLP 2007] Fully phrasal just like conventional SMT lexicons [TMI 2007] But… much larger than conventional lexicons! Why is this extremely large resource necessary? Is its contribution observably useful? Is it used by the SMT system differently than conventional SMT lexicons? HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Our finding: context-dependent lexicons directly improve lexical choice in SMT Exploit the available vocabulary better for phrasal segmentation more and longer phrases are used in decoding consistent with other findings [ TMI2007] fully phrasal context-dependent lexicons yield more reliable improvements than single word lexicons Select better translation candidates even after compensating for differences in phrasal segmentation  improvements in BLEU, TER, METEOR, etc. really reflect improved lexical choice

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Problems with current SMT systems Input 张 教 授 给 一 群 人 就 “ 中 国 和 印 度 ” 上 课 。 Ref. Prof. Zhang gave a lecture on “China and India” to a packed audience. SMT1 Prof. Zhang to a group of people on `China and India` class. SMT2 Prof. Zhang and a group of people go into class on “China and India”. Correct translation SMT2 Prof. Zhang and a group of people go into class on “China and India”. Ref. Prof. Zhang gave a lecture on “China and India” to a packed audience. up go into climb … attend gave …

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Translation lexicons in SMT are independent of context! 张 教 授 给 一 群 人 就 “ 中 国 和 印 度 ” 上 课 。张 教 授 给 一 群 人 就 “ 中 国 和 印 度 ” 上 课 。 欢 迎 大 家 明 天 来 上 课 , 题 目 是 “ 中 国 和 印 度 ” 。欢 迎 大 家 明 天 来 上 课 , 题 目 是 “ 中 国 和 印 度 ” 。 Prof. Zhang gave a lecture on “China and India” to a packed audience. Everyone is welcome to attend class tomorrow, on the topic “China and India”. up go into climb … attend gave up go into climb … attend gave … … … …

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Phrasal lexicons in SMT are independent of context too! 张 教 授 给 一 群 人 就 “ 中 国 和 印 度 ” 上 课 。张 教 授 给 一 群 人 就 “ 中 国 和 印 度 ” 上 课 。 欢 迎 大 家 明 天 来 上 课 , 题 目 是 “ 中 国 和 印 度 ” 。欢 迎 大 家 明 天 来 上 课 , 题 目 是 “ 中 国 和 印 度 ” 。 Prof. Zhang gave a lecture on “China and India” to a packed audience. Everyone is welcome to attend class tomorrow, on the topic “China and India”. attend class gave a lecture … attend class gave a lecture ….45.15

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Current SMT systems are hurt by very weak models of context Translation disambiguation models are too simplistic: Phrasal lexicon translation probabilities are static, so not sensitive to context Context in input language is only modeled weakly by phrase segments Context in output language is only modeled weakly by n-grams Error analysis reveals many lexical choice errors Yet, few attempts at directly modeling context

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Today’s SMT systems ignore the contextual features that would help lexical choice No full sentential context merely local n-gram context No POS information merely surface form of words No structural information merely word n-gram identities

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 attend class gave a lecture … Correct translation disambiguation requires rich context features 张 教 授 给 一 群 人 就 “ 中 国 和 印 度 ” 上 课 。张 教 授 给 一 群 人 就 “ 中 国 和 印 度 ” 上 课 。 欢 迎 大 家 明 天 来 上 课 , 题 目 是 “ 中 国 和 印 度 ” 。欢 迎 大 家 明 天 来 上 课 , 题 目 是 “ 中 国 和 印 度 ” 。 Prof. Zhang gave a lecture on “China and India” to a packed audience. Everyone is welcome to attend class tomorrow, on the topic “China and India” attend class gave a lecture … N N P N P N V V N AD V V N V N SUBJ

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Today’s SMT systems ignore context in their phrasal translation lexicons

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Today’s SMT systems ignore context in their phrasal translation lexicons c j (f) Entire input sentence context

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 But context-dependent lexical choice does not necessarily improve translation quality Early pilot study [Brown et al. 1991] use single most discriminative feature to disambiguate between 2 English translations of a French word WSD improves French-English translation quality, but not on a significant vocabulary and allowing only 2 senses Context-dependent lexical choice helps word alignment, but not really translation quality [Garcia Varea et al. 2001, 2002] maximum-entropy trained bilexicon replaces IBM-4/5 translation probabilities improves AER on Canadian Hansards and Verbmobil tasks small improvement on WER and PER by rescoring n-best lists, but not statistically significant [Garcia Varea & Casacuberta 2005]

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling improves quality of Statistical MT [Carpuat & Wu 2007] Introduced context-dependent phrasal lexicons for SMT leverage WSD techniques for SMT lexical choice generalize conventional WSD to Phrase Sense Disambiguation Context-dependent modeling always improves SMT accuracy on all tasks - 3 different IWSLT06 datasets, NIST04 on all 8 common automatic metrics - BLEU, NIST, METEOR, METEOR+synsets, TER, WER, PER, CDER

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 No other WSD for SMT approach improves translation quality as consistently Until recently, using WSD to improve SMT quality has met with mixed or disappointing results Carpuat & Wu [ACL-2005], Cabezas & Resnik [unpub] Last year, for the first time, different approaches showed that WSD can help translation quality WSD improved BLEU (but how about other metrics??) on 3 Chinese-English tasks [Carpuat et al. IWSLT-2006] WSD improved BLEU (but how about other metrics??) on Chinese-English NIST task [Chan et al. ACL-2007] WSD improved METEOR (but not BLEU!) on Spanish-English Europarl task [Giménez & Màrquez WMT-2007] Phrasal WSD improves BLEU, NIST, METEOR (but how about error rates??) on Italian-English and Chinese-English IWSLT task [Stroppa et al. TMI-2007] But no other approach improves on 8 metrics on 4 different tasks

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 But how useful are the context- dependent lexicons as resources? Improving translation quality is great, but… Metrics aggregate impact of many different factors Metrics ignore how translation hypotheses are generated Context-dependent lexicons are more expensive to train, so… Are their contributions observably useful? Direct analysis needed: how do SMT systems use context-dependent vs. conventional lexicons?

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Learning context-dependent vs. conventional lexicons for SMT learned from the same word-aligned parallel data: cover the same phrasal input vocabulary know the same phrasal translation candidates Only difference: an additional context-dependent parameter dynamically computed vs. static conventional scores Uses WSD modeling vs. MLE in conventional lexicons

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Word Sense Disambiguation provides appropriate models of context WSD has long targeted the questions of how to design context features how to combine contextual evidence into a sense prediction Senseval/SemEval have extensively evaluated WSD systems with different feature sets with different machine learning classifiers Senseval multilingual lexical sample tasks use observable lexical translations as senses just like lexical choice in SMT E.g. Senseval-2003 English-Hindi, SemEval-2007 Chinese-English

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Leveraging a Senseval WSD system Top Senseval-3 Chinese Lexical Sample system [Carpuat et al. 2004] standard classification models maximum entropy, SVM, boosted decision stumps, naïve Bayes rich lexical and syntactic features bag of word sentence context position sensitive co-occurring words and POS tags basic syntactic dependency features

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Generalizing WSD to PSD for context- dependent phrasal translation lexicons One PSD model per input language phrase regardless of POS, length, etc. Generalization of standard WSD models Sense candidates are the phrase translation candidates seen in training The sense candidates are extracted just like the conventional SMT phrasal lexicon typically, output language phrases consistent with the intersection of bidirectional IBM alignments

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Extracting PSD senses and training examples from word-aligned parallel text is there a new - age music concert within the next few days ? 在 最近 一段 时间 里 有 流行音乐 会 吗 ?

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Extracting PSD senses and training examples from word-aligned parallel text 在 最近 一段 时间 里 有 流行音乐 会 吗 ? is there a new - age music concert within the next few days ? 在 最近 一段 时间 里 有 流行音乐 会 吗 ? Extracted PSD training instances:

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Extracting PSD senses and training examples from word-aligned parallel text 在 最近 一段 时间 里 有 流行音乐 会 吗 ? Extracted PSD training instances: is there a new - age music concert within the next few days ? 在 最近 一段 时间 里 有 流行音乐 会 吗 ?

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Extracting PSD senses and training examples from word-aligned parallel text 在 最近 一段 时间 里 有 流行音乐 会 吗 ? is there a new - age music concert within the next few days ? 在 最近 一段 时间 里 有 流行音乐 会 吗 ? Extracted PSD training instances: 在 最近 一段 时间 里 有 流行音乐 会 吗 ?

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Integrating context-dependent lexicon into phrase-based SMT architectures The context-dependent phrasal lexicon probabilities Are conditional translation probabilities can naturally be added as a feature in log linear translation models Unlike conventional translation probabilities, they are dynamically computed dependent on full-sentence context Decoding can make full use of context-dependent phrasal lexicons predictions at all stages of decoding unlike in n-best reranking

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Evaluating context-dependent phrasal translation lexicons lexical choice only vs. translation quality [Carpuat & Wu EMNLP 2007] integrated evaluation in SMT vs. stand-alone as in Senseval [Carpuat et al. 2004] fully phrasal lexicons only vs. single-word context-dependent lexicon [Carpuat & Wu TMI 2007] Translation task Test set: NIST-04 Chinese-English text translation 1788 sentences 4 reference translations Standard phrase-based SMT decoder (Moses)

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Experimental setup Learning the lexicons Standard conventional lexicon learning Newswire Chinese-English corpus ~2M sentences Standard word-alignment methodology GIZA++ Intersection using “grow-diag” heuristics [Koehn et al. 2003] Standard Pharaoh/Moses phrase-table Maximum phrase length = 10 Translation probabilities in both directions, lexical weights Context-dependent lexicons Use the exact same word-aligned parallel data Train a WSD model for each known phrase

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Step 1: Evaluating phrasal segmentation with context-dependent vs. conventional lexicons Goal: compare the phrasal segmentation of the input sentence used to produce the top hypothesis Method: We do not evaluate accuracy There is no gold standard phrasal segmentation! Instead, we analyze how the input phrases available in lexicons are used

SMT uses longer input phrases with context-dependent lexicons Context-dependent lexicons help use longer, less ambiguous phrases HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 SMT uses more input phrase types with context-dependent lexicons 26% of phrase types used with context-dependent lexicon are not used with conventional lexicon 96% of those lexicon entries are truly phrasal (not single words) Context-dependent lexicons make better use of available input language vocabulary

SMT uses more rare phrases with context-dependent lexicons With context modeling, less training data is needed for phrases to be used HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Step 2: Comparing translation selection Goal: compare translation selection only Method: We compare accuracy of translation selection for identical segments only Because different lexicons yield different phrasal segmentations A translation is considered accurate if it matches any of the reference translations Because input sentence and references are not word-aligned

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent lexicon predictions match references better Context-dependent lexicons yield more matches than conventional lexicons 48% of errors made with conventional lexicons are corrected with context-dependent lexicons Lexicon:ConventionalMatchNo match Context-dependent Match No match

Conclusion: context-dependent phrasal translation lexicons are useful resources for SMT A key new resource for Phrase Sense Disambiguation (PSD) for SMT [Carpuat & Wu 2007] Entirely automatically acquired Consistently improves 8 translation quality metrics [EMNLP 2007] Fully phrasal just like conventional SMT lexicons [TMI 2007] But… much larger than conventional lexicons! Why is this extremely large resource necessary? Is its contribution observably useful? Is it used by the SMT system differently than conventional SMT lexicons? HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Conclusion: context-dependent phrasal translation lexicons are useful resources for SMT Improve phrasal segmentation Exploit available input vocabulary better More phrases, longer phrases and more rare phrases are used in decoding Consistent with other findings fully phrasal context-dependent lexicons yield more reliable improvements than single word lexicons [Carpuat & Wu TMI2007] Improve translation candidate selection Even after compensating for differences in phrasal segmentation Genuinely improve lexical choice Not just BLEU and other metrics!

Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation M arine C ARPUAT and D ekai W U Human Language Technology Center Department of Computer Science and Engineering HKUST

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Translation quality evaluation Not just BLEU, but 8 automatic metrics N-gram matching metrics BLEU4 NIST METEOR METEOR+synsets augmented with WordNet synonym matching Edit distances TER WER PER CDER

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling consistently improves translation quality Test set ExperimentBLEUNISTMETEOR (no syn) TERWERPERCDER IWSLT 1SMT SMT+WSD IWSLT 2SMT SMT+WSD IWSLT 3SMT SMT+WSD NISTSMT SMT+WSD

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Results are statistically significant NIST results are statistically significant at the 95% level Tested using paired bootstrap resampling

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Translations with context-dependent phrasal lexicons often differ from SMT translations Test setTranslations changed by context modeling IWSLT % IWSLT % IWSLT % NIST95.74%

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling helps even for small and single-domain IWSLT IWSLT is a single-domain task with very short sentences Even in these conditions, context-dependent phrasal lexicons are helpful there are genuine sense ambiguities E.g. “turn” vs. “transfer” context-features are available 19 observed features per occurrence of a Chinese phrase

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 The most useful context features are not available in standard SMT The 3 most useful context feature types are: POS tag of word preceding the target phrase POS tag of word following the target phrase Bag-of-word context We use weights learned by maximum entropy classifier to determine the most useful features: We normalized feature weights for each WSD model and then compute average weight of each feature type

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Dynamic context-dependent sense predictions are better than static predictions Context-dependent modeling often helps rank the correct translation first Even when context-dependent modeling picks the same translation candidate, the WSD scores are more discriminative than baseline translation probabilities better at overriding incorrect LM predictions gives higher confidence to translate longer input phrases when appropriate

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling improves phrasal lexical choice examples

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling improves phrasal lexical choice examples

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling prefers longer phrases Input Ref. No parliament members voted against him. SMT Without any congressmen voted against him. SMT+WSD No congressmen voted against him.

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling prefers longer phrases Input Ref. No parliament members voted against him. SMT Without any congressmen voted against him. SMT+WSD No congressmen voted against him.

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling prefers longer phrases Input Ref. No parliament members voted against him. SMT Without any congressmen voted against him. SMT+WSD No congressmen voted against him.

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling prefers longer phrases Average length of Chinese phrases used is higher with context-dependent phrasal lexicon This confirms that Context-dependent predictions for all phrases are useful Context-dependent predictions should be available at all stages of decoding This explains why using WSD for single words only has a less reliable impact on translation quality as in Cabezas & Resnik [2005], Carpuat et al. [2006]

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent lexicons should be phrasal to always help translation Test set ExperimentBLEUNISTMETEOR (no syn) TERWERPERCDER # 1SMT word lex phrasal lex # 2SMT word lex phrasal lex # 3SMT word lex phrasal lex

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 No other WSD for SMT approach improves translation quality as consistently Until recently, using WSD to improve SMT quality has met with mixed or disappointing results Carpuat & Wu [ACL-2005], Cabezas & Resnik [unpub] Last year, for the first time, different approaches showed that WSD can help translation quality WSD improved BLEU (but how about other metrics??) on 3 Chinese-English tasks [Carpuat et al. IWSLT-2006] WSD improved BLEU (but how about other metrics??) on Chinese-English NIST task [Chan et al. ACL-2007] WSD improved METEOR (but not BLEU!) on Spanish-English Europarl task [Giménez & Màrquez WMT-2007] Phrasal WSD improves BLEU, NIST, METEOR (but how about error rates??) on Italian-English and Chinese-English IWSLT task [Stroppa et al. TMI-2007] But no other approach improves on 8 metrics on 4 different tasks

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling improves quality of Statistical MT Presenting context-dependent phrasal lexicons for SMT leverage WSD techniques for SMT lexical choice Context-dependent modeling always improves SMT accuracy on all tasks - 3 different IWSLT06 datasets, NIST04 on all 8 common automatic metrics - BLEU, NIST, METEOR, METEOR+synsets, TER, WER, PER, CDER Why? Most useful context features are unavailable to current SMT systems Better phrasal segmentation Better phrasal lexical choice more accurate rankings more discriminative scores

HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Maxent-based sense disambiguation in Candide [Berger 1996] No evaluation of impact on translation quality only 2 example sentences, no contrastive evaluation by human judgment nor any automatic metric extension by Garcia Varea et al. does not significantly improve translation quality Still does not model input language context Overly simplified context model does not use full sentential context only 3 words to the left, 3 words to the right does not generalize over word identities only words, no POS tags does not generalize to phrasal disambiguation targets only words Does not augment the existing SMT model only replace context-independent translation probability