Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation M arine C ARPUAT and D ekai W U Human Language Technology.

Similar presentations


Presentation on theme: "Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation M arine C ARPUAT and D ekai W U Human Language Technology."— Presentation transcript:

1 Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation M arine C ARPUAT and D ekai W U Human Language Technology Center Department of Computer Science and Engineering HKUST

2 New resources for SMT: context- dependent phrasal translation lexicons A key new resource for Phrase Sense Disambiguation (PSD) for SMT [Carpuat & Wu 2007] Entirely automatically acquired Consistently improves 8 translation quality metrics [EMNLP 2007] Fully phrasal just like conventional SMT lexicons [TMI 2007] But… much larger than conventional lexicons! Why is this extremely large resource necessary? Is its contribution observably useful? Is it used by the SMT system differently than conventional SMT lexicons? HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008

3 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Our finding: context-dependent lexicons directly improve lexical choice in SMT Exploit the available vocabulary better for phrasal segmentation more and longer phrases are used in decoding consistent with other findings [ TMI2007] fully phrasal context-dependent lexicons yield more reliable improvements than single word lexicons Select better translation candidates even after compensating for differences in phrasal segmentation  improvements in BLEU, TER, METEOR, etc. really reflect improved lexical choice

4 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Problems with current SMT systems Input 张 教 授 给 一 群 人 就 “ 中 国 和 印 度 ” 上 课 。 Ref. Prof. Zhang gave a lecture on “China and India” to a packed audience. SMT1 Prof. Zhang to a group of people on `China and India` class. SMT2 Prof. Zhang and a group of people go into class on “China and India”. Correct translation SMT2 Prof. Zhang and a group of people go into class on “China and India”. Ref. Prof. Zhang gave a lecture on “China and India” to a packed audience. up go into climb … attend gave …

5 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Translation lexicons in SMT are independent of context! 张 教 授 给 一 群 人 就 “ 中 国 和 印 度 ” 上 课 。张 教 授 给 一 群 人 就 “ 中 国 和 印 度 ” 上 课 。 欢 迎 大 家 明 天 来 上 课 , 题 目 是 “ 中 国 和 印 度 ” 。欢 迎 大 家 明 天 来 上 课 , 题 目 是 “ 中 国 和 印 度 ” 。 Prof. Zhang gave a lecture on “China and India” to a packed audience. Everyone is welcome to attend class tomorrow, on the topic “China and India”. up go into climb … attend gave up go into climb … attend gave.25.20 ….10.05 ….25.20 ….10.05 …

6 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Phrasal lexicons in SMT are independent of context too! 张 教 授 给 一 群 人 就 “ 中 国 和 印 度 ” 上 课 。张 教 授 给 一 群 人 就 “ 中 国 和 印 度 ” 上 课 。 欢 迎 大 家 明 天 来 上 课 , 题 目 是 “ 中 国 和 印 度 ” 。欢 迎 大 家 明 天 来 上 课 , 题 目 是 “ 中 国 和 印 度 ” 。 Prof. Zhang gave a lecture on “China and India” to a packed audience. Everyone is welcome to attend class tomorrow, on the topic “China and India”. attend class gave a lecture … attend class gave a lecture ….45.15

7 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Current SMT systems are hurt by very weak models of context Translation disambiguation models are too simplistic: Phrasal lexicon translation probabilities are static, so not sensitive to context Context in input language is only modeled weakly by phrase segments Context in output language is only modeled weakly by n-grams Error analysis reveals many lexical choice errors Yet, few attempts at directly modeling context

8 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Today’s SMT systems ignore the contextual features that would help lexical choice No full sentential context merely local n-gram context No POS information merely surface form of words No structural information merely word n-gram identities

9 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 attend class gave a lecture … Correct translation disambiguation requires rich context features 张 教 授 给 一 群 人 就 “ 中 国 和 印 度 ” 上 课 。张 教 授 给 一 群 人 就 “ 中 国 和 印 度 ” 上 课 。 欢 迎 大 家 明 天 来 上 课 , 题 目 是 “ 中 国 和 印 度 ” 。欢 迎 大 家 明 天 来 上 课 , 题 目 是 “ 中 国 和 印 度 ” 。 Prof. Zhang gave a lecture on “China and India” to a packed audience. Everyone is welcome to attend class tomorrow, on the topic “China and India”..45.15 attend class gave a lecture ….15.80.70.20 N N P N P N V V N AD V V N V N SUBJ

10 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Today’s SMT systems ignore context in their phrasal translation lexicons

11 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Today’s SMT systems ignore context in their phrasal translation lexicons c j (f) Entire input sentence context

12 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 But context-dependent lexical choice does not necessarily improve translation quality Early pilot study [Brown et al. 1991] use single most discriminative feature to disambiguate between 2 English translations of a French word WSD improves French-English translation quality, but not on a significant vocabulary and allowing only 2 senses Context-dependent lexical choice helps word alignment, but not really translation quality [Garcia Varea et al. 2001, 2002] maximum-entropy trained bilexicon replaces IBM-4/5 translation probabilities improves AER on Canadian Hansards and Verbmobil tasks small improvement on WER and PER by rescoring n-best lists, but not statistically significant [Garcia Varea & Casacuberta 2005]

13 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling improves quality of Statistical MT [Carpuat & Wu 2007] Introduced context-dependent phrasal lexicons for SMT leverage WSD techniques for SMT lexical choice generalize conventional WSD to Phrase Sense Disambiguation Context-dependent modeling always improves SMT accuracy on all tasks - 3 different IWSLT06 datasets, NIST04 on all 8 common automatic metrics - BLEU, NIST, METEOR, METEOR+synsets, TER, WER, PER, CDER

14 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 No other WSD for SMT approach improves translation quality as consistently Until recently, using WSD to improve SMT quality has met with mixed or disappointing results Carpuat & Wu [ACL-2005], Cabezas & Resnik [unpub] Last year, for the first time, different approaches showed that WSD can help translation quality WSD improved BLEU (but how about other metrics??) on 3 Chinese-English tasks [Carpuat et al. IWSLT-2006] WSD improved BLEU (but how about other metrics??) on Chinese-English NIST task [Chan et al. ACL-2007] WSD improved METEOR (but not BLEU!) on Spanish-English Europarl task [Giménez & Màrquez WMT-2007] Phrasal WSD improves BLEU, NIST, METEOR (but how about error rates??) on Italian-English and Chinese-English IWSLT task [Stroppa et al. TMI-2007] But no other approach improves on 8 metrics on 4 different tasks

15 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 But how useful are the context- dependent lexicons as resources? Improving translation quality is great, but… Metrics aggregate impact of many different factors Metrics ignore how translation hypotheses are generated Context-dependent lexicons are more expensive to train, so… Are their contributions observably useful? Direct analysis needed: how do SMT systems use context-dependent vs. conventional lexicons?

16 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Learning context-dependent vs. conventional lexicons for SMT learned from the same word-aligned parallel data: cover the same phrasal input vocabulary know the same phrasal translation candidates Only difference: an additional context-dependent parameter dynamically computed vs. static conventional scores Uses WSD modeling vs. MLE in conventional lexicons

17 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Word Sense Disambiguation provides appropriate models of context WSD has long targeted the questions of how to design context features how to combine contextual evidence into a sense prediction Senseval/SemEval have extensively evaluated WSD systems with different feature sets with different machine learning classifiers Senseval multilingual lexical sample tasks use observable lexical translations as senses just like lexical choice in SMT E.g. Senseval-2003 English-Hindi, SemEval-2007 Chinese-English

18 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Leveraging a Senseval WSD system Top Senseval-3 Chinese Lexical Sample system [Carpuat et al. 2004] standard classification models maximum entropy, SVM, boosted decision stumps, naïve Bayes rich lexical and syntactic features bag of word sentence context position sensitive co-occurring words and POS tags basic syntactic dependency features

19 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Generalizing WSD to PSD for context- dependent phrasal translation lexicons One PSD model per input language phrase regardless of POS, length, etc. Generalization of standard WSD models Sense candidates are the phrase translation candidates seen in training The sense candidates are extracted just like the conventional SMT phrasal lexicon typically, output language phrases consistent with the intersection of bidirectional IBM alignments

20 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Extracting PSD senses and training examples from word-aligned parallel text is there a new - age music concert within the next few days ? 在 最近 一段 时间 里 有 流行音乐 会 吗 ?

21 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Extracting PSD senses and training examples from word-aligned parallel text 在 最近 一段 时间 里 有 流行音乐 会 吗 ? is there a new - age music concert within the next few days ? 在 最近 一段 时间 里 有 流行音乐 会 吗 ? Extracted PSD training instances:

22 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Extracting PSD senses and training examples from word-aligned parallel text 在 最近 一段 时间 里 有 流行音乐 会 吗 ? Extracted PSD training instances: is there a new - age music concert within the next few days ? 在 最近 一段 时间 里 有 流行音乐 会 吗 ?

23 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Extracting PSD senses and training examples from word-aligned parallel text 在 最近 一段 时间 里 有 流行音乐 会 吗 ? is there a new - age music concert within the next few days ? 在 最近 一段 时间 里 有 流行音乐 会 吗 ? Extracted PSD training instances: 在 最近 一段 时间 里 有 流行音乐 会 吗 ?

24 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Integrating context-dependent lexicon into phrase-based SMT architectures The context-dependent phrasal lexicon probabilities Are conditional translation probabilities can naturally be added as a feature in log linear translation models Unlike conventional translation probabilities, they are dynamically computed dependent on full-sentence context Decoding can make full use of context-dependent phrasal lexicons predictions at all stages of decoding unlike in n-best reranking

25 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Evaluating context-dependent phrasal translation lexicons lexical choice only vs. translation quality [Carpuat & Wu EMNLP 2007] integrated evaluation in SMT vs. stand-alone as in Senseval [Carpuat et al. 2004] fully phrasal lexicons only vs. single-word context-dependent lexicon [Carpuat & Wu TMI 2007] Translation task Test set: NIST-04 Chinese-English text translation 1788 sentences 4 reference translations Standard phrase-based SMT decoder (Moses)

26 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Experimental setup Learning the lexicons Standard conventional lexicon learning Newswire Chinese-English corpus ~2M sentences Standard word-alignment methodology GIZA++ Intersection using “grow-diag” heuristics [Koehn et al. 2003] Standard Pharaoh/Moses phrase-table Maximum phrase length = 10 Translation probabilities in both directions, lexical weights Context-dependent lexicons Use the exact same word-aligned parallel data Train a WSD model for each known phrase

27 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Step 1: Evaluating phrasal segmentation with context-dependent vs. conventional lexicons Goal: compare the phrasal segmentation of the input sentence used to produce the top hypothesis Method: We do not evaluate accuracy There is no gold standard phrasal segmentation! Instead, we analyze how the input phrases available in lexicons are used

28 SMT uses longer input phrases with context-dependent lexicons Context-dependent lexicons help use longer, less ambiguous phrases HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008

29 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 SMT uses more input phrase types with context-dependent lexicons 26% of phrase types used with context-dependent lexicon are not used with conventional lexicon 96% of those lexicon entries are truly phrasal (not single words) Context-dependent lexicons make better use of available input language vocabulary

30 SMT uses more rare phrases with context-dependent lexicons With context modeling, less training data is needed for phrases to be used HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008

31 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Step 2: Comparing translation selection Goal: compare translation selection only Method: We compare accuracy of translation selection for identical segments only Because different lexicons yield different phrasal segmentations A translation is considered accurate if it matches any of the reference translations Because input sentence and references are not word-aligned

32 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent lexicon predictions match references better Context-dependent lexicons yield more matches than conventional lexicons 48% of errors made with conventional lexicons are corrected with context-dependent lexicons Lexicon:ConventionalMatchNo match Context-dependent Match14352139 No match6832272

33 Conclusion: context-dependent phrasal translation lexicons are useful resources for SMT A key new resource for Phrase Sense Disambiguation (PSD) for SMT [Carpuat & Wu 2007] Entirely automatically acquired Consistently improves 8 translation quality metrics [EMNLP 2007] Fully phrasal just like conventional SMT lexicons [TMI 2007] But… much larger than conventional lexicons! Why is this extremely large resource necessary? Is its contribution observably useful? Is it used by the SMT system differently than conventional SMT lexicons? HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008

34 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Conclusion: context-dependent phrasal translation lexicons are useful resources for SMT Improve phrasal segmentation Exploit available input vocabulary better More phrases, longer phrases and more rare phrases are used in decoding Consistent with other findings fully phrasal context-dependent lexicons yield more reliable improvements than single word lexicons [Carpuat & Wu TMI2007] Improve translation candidate selection Even after compensating for differences in phrasal segmentation Genuinely improve lexical choice Not just BLEU and other metrics!

35 Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation M arine C ARPUAT and D ekai W U Human Language Technology Center Department of Computer Science and Engineering HKUST

36 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Translation quality evaluation Not just BLEU, but 8 automatic metrics N-gram matching metrics BLEU4 NIST METEOR METEOR+synsets augmented with WordNet synonym matching Edit distances TER WER PER CDER

37 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling consistently improves translation quality Test set ExperimentBLEUNISTMETEOR (no syn) TERWERPERCDER IWSLT 1SMT42.217.88865.4063.2440.4545.5837.8040.09 SMT+WSD42.387.90265.7363.6439.9845.3037.6039.91 IWSLT 2SMT41.498.16766.2563.8540.9546.4237.5240.35 SMT+WSD41.978.24466.3563.8640.6346.1437.2540.10 IWSLT 3SMT49.919.01673.3670.7035.6040.6032.3035.46 SMT+WSD51.059.14274.1371.4434.6839.7531.7134.58 NISTSMT20.417.15560.2156.1576.7688.2661.7170.32 SMT+WSD20.927.46860.3056.7971.3483.3757.2967.38

38 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Results are statistically significant NIST results are statistically significant at the 95% level Tested using paired bootstrap resampling

39 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Translations with context-dependent phrasal lexicons often differ from SMT translations Test setTranslations changed by context modeling IWSLT 125.49% IWSLT 230.40% IWSLT 329.25% NIST95.74%

40 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling helps even for small and single-domain IWSLT IWSLT is a single-domain task with very short sentences Even in these conditions, context-dependent phrasal lexicons are helpful there are genuine sense ambiguities E.g. “turn” vs. “transfer” context-features are available 19 observed features per occurrence of a Chinese phrase

41 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 The most useful context features are not available in standard SMT The 3 most useful context feature types are: POS tag of word preceding the target phrase POS tag of word following the target phrase Bag-of-word context We use weights learned by maximum entropy classifier to determine the most useful features: We normalized feature weights for each WSD model and then compute average weight of each feature type

42 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Dynamic context-dependent sense predictions are better than static predictions Context-dependent modeling often helps rank the correct translation first Even when context-dependent modeling picks the same translation candidate, the WSD scores are more discriminative than baseline translation probabilities better at overriding incorrect LM predictions gives higher confidence to translate longer input phrases when appropriate

43 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling improves phrasal lexical choice examples

44 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling improves phrasal lexical choice examples

45 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling prefers longer phrases Input Ref. No parliament members voted against him. SMT Without any congressmen voted against him. SMT+WSD No congressmen voted against him.

46 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling prefers longer phrases Input Ref. No parliament members voted against him. SMT Without any congressmen voted against him. SMT+WSD No congressmen voted against him.

47 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling prefers longer phrases Input Ref. No parliament members voted against him. SMT Without any congressmen voted against him. SMT+WSD No congressmen voted against him.

48 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling prefers longer phrases Average length of Chinese phrases used is higher with context-dependent phrasal lexicon This confirms that Context-dependent predictions for all phrases are useful Context-dependent predictions should be available at all stages of decoding This explains why using WSD for single words only has a less reliable impact on translation quality as in Cabezas & Resnik [2005], Carpuat et al. [2006]

49 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent lexicons should be phrasal to always help translation Test set ExperimentBLEUNISTMETEOR (no syn) TERWERPERCDER # 1SMT42.217.88865.4063.2440.4545.5837.8040.09 +word lex.41.947.91165.5563.5240.5945.6137.7540.09 +phrasal lex.42.387.90265.7363.6439.9845.3037.6039.91 # 2SMT41.498.16766.2563.8540.9546.4237.5240.35 +word lex.41.318.16166.2363.7241.3446.8237.9840.69 +phrasal lex.41.978.24466.3563.8640.6346.1437.2540.10 # 3SMT49.919.01673.3670.7035.6040.6032.3035.46 +word lex.49.739.01773.3270.8235.7240.6132.1035.30 +phrasal lex.51.059.14274.1371.4434.6839.7531.7134.58

50 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 No other WSD for SMT approach improves translation quality as consistently Until recently, using WSD to improve SMT quality has met with mixed or disappointing results Carpuat & Wu [ACL-2005], Cabezas & Resnik [unpub] Last year, for the first time, different approaches showed that WSD can help translation quality WSD improved BLEU (but how about other metrics??) on 3 Chinese-English tasks [Carpuat et al. IWSLT-2006] WSD improved BLEU (but how about other metrics??) on Chinese-English NIST task [Chan et al. ACL-2007] WSD improved METEOR (but not BLEU!) on Spanish-English Europarl task [Giménez & Màrquez WMT-2007] Phrasal WSD improves BLEU, NIST, METEOR (but how about error rates??) on Italian-English and Chinese-English IWSLT task [Stroppa et al. TMI-2007] But no other approach improves on 8 metrics on 4 different tasks

51 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Context-dependent modeling improves quality of Statistical MT Presenting context-dependent phrasal lexicons for SMT leverage WSD techniques for SMT lexical choice Context-dependent modeling always improves SMT accuracy on all tasks - 3 different IWSLT06 datasets, NIST04 on all 8 common automatic metrics - BLEU, NIST, METEOR, METEOR+synsets, TER, WER, PER, CDER Why? Most useful context features are unavailable to current SMT systems Better phrasal segmentation Better phrasal lexical choice more accurate rankings more discriminative scores

52 HKUST Human Language Technology CenterCarpuat & Wu, LREC 2008 Maxent-based sense disambiguation in Candide [Berger 1996] No evaluation of impact on translation quality only 2 example sentences, no contrastive evaluation by human judgment nor any automatic metric extension by Garcia Varea et al. does not significantly improve translation quality Still does not model input language context Overly simplified context model does not use full sentential context only 3 words to the left, 3 words to the right does not generalize over word identities only words, no POS tags does not generalize to phrasal disambiguation targets only words Does not augment the existing SMT model only replace context-independent translation probability


Download ppt "Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation M arine C ARPUAT and D ekai W U Human Language Technology."

Similar presentations


Ads by Google