9/12/2003LTI Student Research Symposium1 An Integrated Phrase Segmentation/Alignment Algorithm for Statistical Machine Translation Joy Advisor: Stephan.

9/12/2003LTI Student Research Symposium1 An Integrated Phrase Segmentation/Alignment Algorithm for Statistical Machine Translation Joy Advisor: Stephan Vogel and Alex Waibel

9/12/2003LTI Student Research Symposium2 Outline Background Phrase Alignment Algorithms in SMT Segmentation Approaches Integrated Segmentation and Alignment Algorithm (ISA) Experiments Discussions

9/12/2003LTI Student Research Symposium3 Statistical Machine Translation Statistical Machine Translation (Brown et al, 93) –Noisy Channel Model Translating from F to E Given a testing sentence f, generate translation e*, which is Pr(e): Language Model (LM) Pr(f|e): Translation Model (TM)

9/12/2003LTI Student Research Symposium4 Training –Training Using large English corpora (e.g. Wall Street Journal) to train an LM Using bilingual corpora (e.g. Canadian Hansard) to train the TM –To get the building blocks for Pr(f|e) »Word to word translation or phrase to phrase translations »Reordering information »Other features

9/12/2003LTI Student Research Symposium5 Alignment Alignment for one sentence pair (e,f): –Suppose e has l words: and f has m words –Then alignment a can be represented as: Of m values, each between 0 and l. a j =i means f j is “aligned” to e i, where e 0 stands for NULL word –In short: alignment tells us which word in e is translated into which word in f

9/12/2003LTI Student Research Symposium6 Alignment Example

9/12/2003LTI Student Research Symposium7 Alignment Models Alignment algorithms: –IBM model 1 to 5 (Brown et al.) –HMM model similar to IBM2 (Vogel) –Competitive linking (Melamed) –Flow Network (Gaussier) –Others

9/12/2003LTI Student Research Symposium8 IBM Model 1 IBM model 1 –Easy to train –Simple to understand –Used very often in MT research –One serious problem for IBM models Word-to-word alignment assumption

9/12/2003LTI Student Research Symposium9 Phrase-to-phrase Alignment Phrase-to-phrase alignment is better –Mismatch between languages –Phrases encapsulate the context of words –Phrases encapsulate local reordering

9/12/2003LTI Student Research Symposium11 Alignment Algorithms Based on initial word alignment –Train word alignment –Read off phrase-to-phrase alignments from Viterbi path –Examples: HMM phrase alignment (Vogel) Alignment templates from IBM 4 (Och) Bilingual bracketing (Wu, B. Zhao) Popular in SMT research

9/12/2003LTI Student Research Symposium13 Segmentation Approaches Identify monolingual phrases and segment/bracket phrases into one unit (super-word) (Zhang 2000) Train the regular word-to-word alignment

9/12/2003LTI Student Research Symposium14 Problems in Segmentation Approaches Segmentation uses only monolingual information Good segmentations may make alignment even harder 

9/12/2003LTI Student Research Symposium15 Outline Background Alignment Algorithms in SMT Segmentation Approaches Integrated Segmentation and Alignment Algorithm (ISA) Experiments Discussions

9/12/2003LTI Student Research Symposium16 Integrated Segmentation and Alignment Let’s look at an example first

9/12/2003LTI Student Research Symposium17 Integrated Segmentation and Alignment Represent a sentence pair (e,f) as a matrix D D(i,j) = I’(e i,f j ). I’ is a modified point-wise mutual information A partition over D is a series of non-overlapping rectangle regions d 1, d 2,…,d m. Region d k (r s,r e,c s,c e ) indicates: are aligned to Segmentation and alignment are achieved at the same time

9/12/2003LTI Student Research Symposium18 Integrated Segmentation and Alignment Best partition should yield maximum Computationally intractable to search all possible partitions –Exponential to sentence length –DP: not a good idea. An optimal policy has the property that whatever the initial state and the initial decisions are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision. -- Richard Bellman's Principle of OptimalityRichard Bellman But here, decision of how to expand the first cell changes the search space for the rest of the cells Using a computationally cheap algorithm to find the “good” partitions

9/12/2003LTI Student Research Symposium19 An Example

9/12/2003LTI Student Research Symposium20 Computational Cheap Algorithm Assumption: –if the translation for e 1 e 2 is f, I’(e 1, f) should be very “similar” to I’(e 2, f). –Example: Algorithm –Step1: find the cell in D with max value of I’ –Step2: expand this cell to a rectangle region where all cells in the region has similar I’ as this cell –Repeat Step1 and Step2 until no more regions can be found

9/12/2003LTI Student Research Symposium21 Example: Apply the Algorithm

9/12/2003LTI Student Research Symposium22 Estimate the probabilities for phrase translations The decoder needs the conditional probabilities P(f|e) Can not be estimated directly: data sparseness Convert I’(f,e) to P(f|e) IBM model 1 style: Context-dependent style where: and

9/12/2003LTI Student Research Symposium24 Experiments Chinese-English Small data track Evaluation: NIST score against 4 human references SentencesChinese Words English Words Training3540 pairs90 K115 K Testing99326 K

9/12/2003LTI Student Research Symposium25 Results Baseline: IBM model1 + HMM phrase Compare to using ISA only, and ISA+Baseline PrecLength Penalty FinalScore Baseline6.771.006.77 ISA6.970.976.78 ISA+Baseline7.050.997.06

9/12/2003LTI Student Research Symposium26 T-test Student's t-test at the sentence level Precision ScoresFinal Scores t-valueConfidence Level t-valueConfidence Level ISA vs. Baseline6.608499.99%1.689095.00% ISA+Baseline vs. Baseline 9.077299.99%4.100799.99%

9/12/2003LTI Student Research Symposium27 Compared to IBM1 Using 20M words LM LDC+IBM NIST=6.6235 LenPenalty=0.9998 LDC+IBM+ISA NIST=7.4234 LenPenalty=0.9915 Incr. +0.800 #Type%Contrib.#Type%Contrib. 1-gram24250.605.58726800.666.161+0.574 2-gram36010.220.87743690.271.091+0.214 3-gram18070.080.13024030.110.186+0.056 4-gram7880.030.02410960.050.036+0.012 5-gram3820.020.0074990.020.011+0.004 Sum6.6257.486+0.861 Large data track (2.6M English words, 414K Chinese words)

9/12/2003LTI Student Research Symposium28 No IBM1 is Better Small data track (LDC+IBM1+ISA) ISA is better even on unigram match than IBM1 W IBM1 NIST1-gram score 2-gram score 3-gram score 1.006.705.431.070.16 0.506.765.471.080.16 0.206.785.491.090.16 0.026.795.501.090.16 0.00 (no IBM1) 6.815.511.100.16

9/12/2003LTI Student Research Symposium29 Summary Integrated Alignment and Segmentation Simple algorithm Enhanced translation quality –Better than IBM models –Higher quality than HMM alignment A major component in the CMU SMT system

9/12/2003LTI Student Research Symposium30 ISA Toolkit Location: –/afs/cs.cmu.edu/user/joy/Release/PhraseAlign Documentation: –/afs/cs.cmu.edu/user/joy/Release/PhraseAlign/documen tation/readme.txt Speed –Example: 4172 sentence pairs (133K En words, 20K Ch words) –About 160 seconds for the alignment (10 loops for each sentence pair)

9/12/2003LTI Student Research Symposium31 Selected References Franz Josef Och, Christoph Tillmann, Hermann Ney, “Improved Alignment Models for Statistical Machine Translation,” Proceedings of the Joint Conference of Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 20-28. University of Maryland, College Park, MD, June 1999. Stephan Vogel, Hermann Ney, and Christoph Till-mann, “HMM-based Word Alignment in Statistical Translation,” Proceedings of COLING '96: The 16th International Conference on Computational Linguistics, pp. 836-841. Copenhagen, August 1996. Stephan Vogel, Ying Zhang, Fei Huang, Alicia Tribble, Ashish Venogupal, Bing Zhao, Alex Waibel, “The CMU Statistical Translation System,” to appear in the Proceedings of MT Summit IX, New Orleans, LA, U.S.A., September 2003.Ying ZhangMT Summit IX, Ying Zhang, Ralf D. Brown, Robert E. Frederking and Alon Lavie, “Pre-processing of Bilingual Corpora for Mandarin-English EBMT,” Proceedings of MT Summit VIII, Santiago de Compostela, Spain, September 2001. Ying Zhang, Stephan Vogel, Alex Waibel, "Integrated Phrase Segmentation and Alignment Algorithm for Statistical Machine Translation," in the Proceedings of International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE'03), Beijing, China, October 2003.Ying Zhang

9/12/2003LTI Student Research Symposium1 An Integrated Phrase Segmentation/Alignment Algorithm for Statistical Machine Translation Joy Advisor: Stephan.

Similar presentations

Presentation on theme: "9/12/2003LTI Student Research Symposium1 An Integrated Phrase Segmentation/Alignment Algorithm for Statistical Machine Translation Joy Advisor: Stephan."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

9/12/2003LTI Student Research Symposium1 An Integrated Phrase Segmentation/Alignment Algorithm for Statistical Machine Translation Joy Advisor: Stephan.

Similar presentations

Presentation on theme: "9/12/2003LTI Student Research Symposium1 An Integrated Phrase Segmentation/Alignment Algorithm for Statistical Machine Translation Joy Advisor: Stephan."— Presentation transcript:

Similar presentations

About project

Feedback