(Statistical) Approaches to Word Alignment

Slides:

Advertisements

Similar presentations

Statistical Machine Translation

Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Statistical Machine Translation IBM Model 1 CS626/CS460 Anoop Kunchukuttan Under the guidance of Prof. Pushpak Bhattacharyya.

Word Alignment Philipp Koehn USC/Information Sciences Institute USC/Computer Science Department School of Informatics University of Edinburgh Some slides.

Translation Model Parameters & Expectation Maximization Algorithm Lecture 2 (adapted from notes from Philipp Koehn & Mary Hearne) Dr. Declan Groves, CNGL,

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering.

1 An Introduction to Statistical Machine Translation Dept. of CSIE, NCKU Yao-Sheng Chang Date:

Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.

A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.

ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June Competitive Grouping in Integrated Segmentation and Alignment.

Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.

C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.

Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.

Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.

Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.

Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??

Corpora and Translation Parallel corpora Statistical MT (not to mention: Corpus of translated text, for translation studies)

9/12/2003LTI Student Research Symposium1 An Integrated Phrase Segmentation/Alignment Algorithm for Statistical Machine Translation Joy Advisor: Stephan.

Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.

1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.

Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.

MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.

THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.

Stephan Vogel - Machine Translation1 Machine Translation Word Alignment Stephan Vogel Spring Semester 2011.

Natural Language Processing Expectation Maximization.

Translation Model Parameters (adapted from notes from Philipp Koehn & Mary Hearne) 24 th March 2011 Dr. Declan Groves, CNGL, DCU

Stephan Vogel - Machine Translation1 Statistical Machine Translation Word Alignment Stephan Vogel MT Class Spring Semester 2011.

An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:

Direct Translation Approaches: Statistical Machine Translation

Machine Translation Discriminative Word Alignment Stephan Vogel Spring Semester 2011.

Building Lexicons Jae Dong Kim Matthias Eck. Building Lexicons lIntroduction lPrevious Work lTranslation Model Decomposition lReestimated Models lParameter.

Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.

Machine Translation Course 5 Diana Trandab ă ț Academic year:

1 Machine Translation MIRA and MBR Stephan Vogel Spring Semester 2011.

Stochastic Inversion Transduction Grammars Dekai Wu Advanced Machine Translation Seminar Presented by: Sanjika Hewavitharana 04/13/2006.

The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.

NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.

Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.

NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.

LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.

SMT – TIDES – and all that Stephan Vogel Language Technologies Institute Carnegie Mellon University Aus der Vogel-Perspektive A Bird’s View (human translation)

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.

A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협.

NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Towards Syntactically Constrained Statistical Word Alignment Greg Hanneman : Advanced Machine Translation Seminar April 30, 2008.

Machine Translation Course 4 Diana Trandab ă ț Academic year:

September 2004CSAW Extraction of Bilingual Information from Parallel Texts Mike Rosner.

Ling 575: Machine Translation Yuval Marton Winter 2016 January 19: Spill-over from last class, some prob+stats, word alignment, phrase-based and hierarchical.

Statistical Machine Translation Part II: Word Alignments and EM

Statistical NLP Spring 2011

Statistical NLP: Lecture 13

CSCI 5832 Natural Language Processing

CSCI 5832 Natural Language Processing

Expectation-Maximization Algorithm

Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.

Machine Translation and MT tools: Giza++ and Moses

Statistical Machine Translation Papers from COLING 2004

Word Alignment David Kauchak CS159 – Fall 2019 Philipp Koehn

Lecture 12: Machine Translation (II) November 4, 2004 Dan Jurafsky

Machine Translation and MT tools: Giza++ and Moses

Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011

CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011.

Presentation transcript:

(Statistical) Approaches to Word Alignment 11-734 Advanced Machine Translation Seminar Sanjika Hewavitharana Language Technologies Institute Carnegie Mellon University 02/02/2006

Word Alignment Models We want to learn how to translate words and phrases Can learn it from parallel corpora Typically work with sentence aligned corpora Available from LDC, etc For specific applications new data collection required Model the associations between the different languages Word to word mapping -> lexicon Differences in word order -> distortion model ‘Wordiness’, i.e. how many words to express a concept -> fertility Statistical translation is based on word alignment models

Alignment Example Observations: Often 1-1 Often monotone Some 1-to-many Some 1-to-nothing

Word Alignment Models IBM1 – lexical probabilities only IBM2 – lexicon plus absolut position IBM3 – plus fertilities IBM4 – inverted relative position alignment IBM5 – non-deficient version of model 4 HMM – lexicon plus relative position BiBr – Bilingual Bracketing, lexical probabilites plus reordering via parallel segmentation Syntactical alignment models [Brown et.al. 1993, Vogel et.al. 1996, Och et al 1999, Wu 1997, Yamada et al. 2003]

Notation Source language Target language f : source (French) word J : length of source sentence j : position in source sentence; j = 1,2,...,J : source sentence Target language e : target (English) word I : length of target sentence i : position in target sentence; i = 1,2,...,I : target sentence

SMT - Principle Translate a ‘French’ string into an ‘English’ string Bayes’ decision rule for translation: Based on Noisy channel model We will call f source and e target

Alignment as Hidden Variable ‘Hidden alignments’ to capture word-to-word correspondences Number of connections: J * I (each source word with each target word) Number of alignments: 2JI Restricted alignment Each source word has one connection – a function i = aj: position i of ei which is connected to j Number of alignments is now: IJ : whole alignment Relationship between Translation Model and Alignment Model

Empty Position (Null Word) Sometimes a word has no correspondence Alignment function aligns each source word to one target word, i.e. cannot skip source word Solution: Introduce empty position 0 with null word e0 ‘Skip’ source word fj by aligning it to e0 Target sentence is extended to: Alignment is extended to:

Translation Model Sum over all possible alignments 3 probability distributions: Length: Alignment: Lexicon:

Model Assumptions Decompose interaction into pairwise dependencies Length: Source length only dependent on target length (very weak) Alignment: Zero order model: target position only dependent on source position First order model: target position only dependent on previous target position Lexicon: source word only dependent on aligned word

IBM Model 1 Length: Source length only dependent on target length Alignment: Assume uniform probability for position alignment Lexicon: source word only dependent on aligned word Alignment probability

IBM Model 1 – Generative Process To generate a French string from an English string : Step 1: Pick the length of All lengths are equally probable; is a constant Step 2: Pick an alignment with probability Step 3: Pick the French words with probability Final Result:

IBM Model 1 – Training Parameters of the model: Training data: parallel sentence pairs We adjust parameters s.t. it maximize Normalized for each : EM Algorithm used for the estimation Initialize the parameters uniformly Collect counts for each pair in the corpus Re-estimate parameters using counts Repeated for several iterations Model simple enough to compute over all alignments Parameters does not depend on initial values

IBM Model 1 Training– Pseudo Code # Accumulation (over corpus) For each sentence pair For each source position j Sum = 0.0 For each target position i Sum += p(fj|ei) Count(fj,ei) += p(fj|ei)/Sum # Re-estimate probabilities (over count table) For each target word e For each source word f Sum += Count(f,e) p(f|e) = Count(f,e)/Sum # Repeat for several iterations

IBM Model 2 Only Difference from Model 1 is in Alignment Probability Length: Source length only dependent on target length Alignment: Target position depends on the source position (in addition to the source length and target length) Model 1 is a special case of Model 2, where Lexicon: source word only dependent on aligned word

IBM Model 2 – Generative Process To generate a French string from an English string : Step 1: Pick the length of All lengths are equally probable; is a constant Step 2: Pick an alignment with probability Step 3: Pick the French words with probability Final Result:

IBM Model 2 – Training Parameters of the model: Training data: parallel sentence pairs We maximize w.r.t translation and alignment params. EM Algorithm used for the estimation Initialize alignment parameters uniformly, and translation probabilities from Model 1 Accumulate counts, re-estimate parameters Model simple enough to compute over all alignments

Fertility-based Alignment Models Models 3-5 are based on Fertility Fertility: Number of source words connected with a target word : fertility values of = probability that is connected with source words Alignment: Defined in the reverse-direction (target to source) = probability of French position j given English position is i

IBM Model 3 – Generative Process To generate a French string from an English string : Step 1: Choose (I+1) fertilities with probability

IBM Model 3 – Generative Process Step 2: For each , for k =1… , choose a position 1…J and a French word with probability For a given alignment, there are orderings

IBM Model 3 – Example [Knight 99] e0 Mary did not slap the green witch 1 0 1 3 1 1 1 Mary not slap slap slap the green witch Mary not slap slap slap NULL the green witch Mary no daba una botefada a la verde bruja Mary no daba una botefada a la bruja verde 1 2 3 4 5 6 7 8 9 1 3 4 4 4 0 5 7 6 [e] 1 [choose fertility] [fertility for e0] [choose translation] [choose target positions j ] [aj ]

IBM Model 3 – Training Parameters of the model: EM Algorithm used for the estimation Not possible to compute exact EM updates Initialize n,d,p uniformly, and translation probabilities from Model 2 Accumulate counts, re-estimate parameters Cannot efficiently compute over all alignments Only Viterbi alignment is used Model 3 is deficient Probability mass is wasted on impossible translations

IBM Model 4 Try to model re-ordering of phrases is replaced with two sets of parameters: One for placing the first word (head) of a group of words One for placing rest of the words relative to the head Deficient Alignment can generate source positions outside of sentence length J Model 5 removes this deficiency

HMM Alignment Model Idea: relative position model Target Source [Vogel 96]

HMM Alignment First order model: target position dependent on previous target position (captures movement of entire phrases) Alignment probability: Alignment depends on relative position Maximum approximation:

IBM2 vs HMM [Vogel 96]

Enhancements to HMM & IBM Models HMM model with empty word Adding I empty words to the target side Model 6 IBM 4: predicts distance between subsequent target positions HMM: predicts distance between subsequent source positions Model 6: A log-linear combination of IBM 4 and HMM Models Smoothing Alignment prob. – Interpolate with uniform dist. Fertility prob. – Depends of number of letters in a word Symmetrization Heuristic postprocessing to combine alignments in both directions

Experimental Results [Franz 03] Refined models perform better Models 4,5,6 better than Model 1 or Dice coefficient model HMM better than IBM 2 Alignment quality based on the training method and bootstrap scheme used IBM 1->HMM->IBM 3 better than IBM 1->IBM 2->IBM 3 Smoothing and Symmetrization have a significant effect on alignment quality More alignments in training yields better results Using word classes Improvement for large corpora but not for small corpora

References: Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, Robert L. Mercer (1993). The Mathematics of Statistical Machine Translation , Computational Linguistics, vol. 19, no. 2. Stephan Vogel, Hermann Ney, Christoph Tillmann (1996). HMM-based Word Alignment in Statistical Translation , COLING, The 16th Int. Conf. on Computational Linguistics, Copenhagen, Denmark, August, pp. 836-841. Franz Josef Och, Hermann Ney (2003), A Systematic Comparison of Various Statistical Alignment Models , Computational Linguistics, vol. 29, no.1, pp. 19-51. Knight, Kevin, (1999), A Statistical MT Tutorial Workbook, Available at http://www.isi.edu/natural-language/mt/wkbk.rtf.