Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

Statistical Machine Translation
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance July 27 EMNLP 2011 Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
1 Improving a Statistical MT System with Automatically Learned Rewrite Patterns Fei Xia and Michael McCord (Coling 2004) UW Machine Translation Reading.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Factored Language Models EE517 Presentation April 19, 2005 Kevin Duh
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Scalable Text Mining with Sparse Generative Models
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Stephan Vogel - Machine Translation1 Machine Translation Word Alignment Stephan Vogel Spring Semester 2011.
Natural Language Processing Expectation Maximization.
Stephan Vogel - Machine Translation1 Statistical Machine Translation Word Alignment Stephan Vogel MT Class Spring Semester 2011.
Alignment by Bilingual Generation and Monolingual Derivation Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,
The CMU-UKA Statistical Machine Translation Systems for IWSLT 2007 Ian Lane, Andreas Zollmann, Thuy Linh Nguyen, Nguyen Bach, Ashish Venugopal, Stephan.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Machine Translation Discriminative Word Alignment Stephan Vogel Spring Semester 2011.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
The CMU Arabic-to-English Statistical MT System Alicia Tribble, Stephan Vogel Language Technologies Institute Carnegie Mellon University.
Morphology & Machine Translation Eric Davis MT Seminar 02/06/08 Professor Alon Lavie Professor Stephan Vogel.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
1 Machine Translation MIRA and MBR Stephan Vogel Spring Semester 2011.
Statistical Machine Translation Part III – Phrase-based SMT Alexander Fraser CIS, LMU München WSD and MT.
Translating from Morphologically Complex Languages: A Paraphrase-Based Approach Preslav Nakov & Hwee Tou Ng.
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
Ibrahim Badr, Rabih Zbib, James Glass. Introduction Experiment on English-to-Arabic SMT. Two domains: text news,spoken travel conv. Explore the effect.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
Stephan Vogel - Machine Translation1 Machine Translation Decoder for Phrase-Based SMT Stephan Vogel Spring Semester 2011.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
(Statistical) Approaches to Word Alignment
A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Statistical Machine Translation Part II: Word Alignments and EM
Approaches to Machine Translation
Approaches to Machine Translation
Statistical Machine Translation Papers from COLING 2004
Statistical Machine Translation Part VI – Phrase-based Decoding
Presentation transcript:

Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation2 Overview lFactored Language Models lMulti-Stream Word Alignment lFactored Translation Models

Stephan Vogel - Machine Translation3 Motivation lVocabulary grows dramatically for morphology rich languages lLooking at surface word form does not take connections (morphological derivations) between words into account lExample: ‘book’ and ‘books’ as unrelated as ‘book’ and ‘sky’ lDependencies within sentences between words are not well detected lExample: number or gender agreement Singular: der alte Tisch (the old table) Plural: die alten Tische (the old tables) lConsider word as a bundle of factors lSurface word form, stem, root, prefix, suffix, POS, gender marker, case marker, number marker, …

Stephan Vogel - Machine Translation4 Two solutions lMorphological decomposition into stream of morphemes lCompound noun splitting lPrefix-stem-suffix splitting lWords as bundle of (parallel) factors word lemma POS morphology word class prefix stem suffix prefix stem suffix … w1 w2 w3 w4 ….

Stephan Vogel - Machine Translation5 Questions lWhich information is the most useful lHow to use this information? lIn the language model lIn the translation model lHow to use it at training time lHow to use it at decoding time

Stephan Vogel - Machine Translation6 Factored Models lMorphological preprocessing lA significant body of work lFactored language models lKirchhoff et al lHierarchical lexicon lNiessen at al lBi-Stream alignment lZhao et al lFactored translation models lKoehn et al

Stephan Vogel - Machine Translation7 Factored Language Model Some papers: Bilmes and Kirchhoff, 2003 Factored Language Models and Generalized Parallel Backoff Duh and Kirchhoff, 2004 Automatic learning of language model structure Kirchhoff and Yang, 2005 Improved Language Modeling for Statistical Machine Translation

Stephan Vogel - Machine Translation8 Factored Language Model lRepresentation: lLM probability:

Stephan Vogel - Machine Translation9 Language Model Backoff lSmoothing by backing off lBackoff paths in standard LM in factored LM

Stephan Vogel - Machine Translation10 Choosing Backoff Paths lDifferent possibitities lFixed path lChoose path dynamically during training lChoose multiple paths dynamically during training and combine results (Generalized Parallel Backoff) lMany paths -> optimization problem lDuh & Kirchhoff (2004) use genetic algorithm lBilmes and Kirchhoff (2003) report LM perplexities lKirchhoff and Yang (2005) use FLM to rescore n-best list generated by SMT system l3-gram FLM slightly worse then standard 4-gram LM lCombined LM does not outperform standard 4-gram LM 

Stephan Vogel - Machine Translation11 Hierarchical Lexicon l Morphological analysis lUsing GERCG, a constraint grammar parser for German for lexical analysis and morphological and syntactic disambiguation lBuild equivalence classes lGroup words which tend to translate into same target word lDon’t distinguish what does not need to be distinguished! lEg. for nouns: gender is irrelevant as is nominative, dative, and accusative; but genitive translates differently Sonja Nießen and Hermann Ney, Toward hierarchical models for statistical machine translation of inflected languages. Proceedings of the workshop on data-driven methods in machine translation - Volume 14, 2001.

Stephan Vogel - Machine Translation12 Hierarchical Lexicon lEquivalence classes at different levels of abstraction lExample: ankommen ln is full analysis ln-1: drop “first person” -> group “ankomme”, “ankommst”, “ankommt” ln-2: drop singular/plural distinction l…

Stephan Vogel - Machine Translation13 Hierarchical Lexicon lTranslation probability Probability for taking all factors up to i into account lAssumption: does not depend on e and word form follows unambiguously from tags lLinear combination of p i

Stephan Vogel - Machine Translation14 Multi-Stream Word Alignment lUse multiple annotations: stem, POS, … lConsider each annotation as additional stream or tier lUse generative alignment models lModel each stream lBut tie streams through alignment lExample: Bi-Stream HMM word alignment (Zhao et al 2005)

Stephan Vogel - Machine Translation15 Bi-Stream HMM Alignment lHMM: lRelative word position as distortion component (can be conditioned on word classes) lForward-backward algorithms for training

Stephan Vogel - Machine Translation16 Bi-Stream HMM Alignment lBi-Stream HMM: lAssume the hidden alignment generates 2 data stream: words and word class labels Stream 1: Stream 2: Stream 1 Stream 2

Stephan Vogel - Machine Translation17 Second Stream: Bilingual Word Clusters lIdeally, we want to use word classes with group translations of words in source language cluster into cluster on target side lBilingual Word Clusters (Och, 1999) lAssume monolingual clusters fixed first lOptimize the clusters for the other language (mkcls in GIZA++) lBilingual Word Spectral Clusters lEigen-structure analysis lK-means or single linkage clustering. lOther Word Clusters lLDA (Blei, 2000) lCo-clusters, etc.

Stephan Vogel - Machine Translation18 Bi-Stream HMM with Word Clusters lEvaluating Word Alignment Accuracy: F-measure lBi-stream HMM (Bi-HMM) is better than HMM; lBilingual word-spectral clusters are better than traditional ones; lHelps more for small training data. TreeBank, F2ETreeBank, E2F FBIS, F2EFBIS, E2F F-Measure HMM Trad.with-SpecHMMTrad. with-Spec

Stephan Vogel - Machine Translation19 Factored Translation Models Paper: Koehn and Hoang, Factored Translation Models, EMNLP 2007 lFactored translation model as extension of phrase-based SMT lInteresting for translating into or between morphology rich languages lExperiments for English-German, English Spanish, English-Czech (I follow that paper. Description on Moses web site is nearly identical. See Example also from:

Stephan Vogel - Machine Translation20 Factored Model lAnalysis as preprocessing lNeed to specify the transfer lNeed to specify the generation word lemma POS morphology word class word lemma POS morphology word class …… InputOutput word lemma POS morphology word class word lemma POS morphology word class …… InputOutput Factored Representation Factored Model: transfer and generation

Stephan Vogel - Machine Translation21 Transfer lMapping individual factors: lAs we do with non-factored models lExample: Haus -> house, home, building, shell lMapping combinations of factors: lNew vocabulary as Cartesian product of the vocabularies of the individual factors, e.g. NN and singular -> NN|singular lMap these combinations lExample: NN|plural|nominative -> NN|plural, NN|singular lNumber of factors on source and target side can differ

Stephan Vogel - Machine Translation22 Generation lGenerate surface form from factors lExamples: house|NN|plural -> houses house|NN|singular -> house house|VB|present|3 rd -person -> houses

Stephan Vogel - Machine Translation23 Example including all Steps lGerman word Häuser lAnalysis: lhäuser|haus|NN|plural|nominative|neutral lTranslation lMapping lemma: { ?|house|?|?|?, ?|home|?|?|?, ?|building|?|?|? } lMapping morphology: { ?|house|NN|plural, ?|house|NN|singular, ?|home|NN|plural, ?|building|NN||plural } lGeneration lGenerating surface forms: {houses|house|NN|plural, house|house|NN|singular, homes|home|NN|plural, buildings|building|NN||plural }

Stephan Vogel - Machine Translation24 Training the Model lParallel data needs to be annotated -> preprocessing lSource and target side annotation typically independent of each other lSome work on ‘coupled’ annotation, e.g. inducing word classes through clustering with mkcls, or morphological analysis of Arabic conditioned on English side (Linh) lWord alignment lOperate on surface form only lUse multi-stream alignment (example: BiStream HMM) lUse discriminative alignment (example: CRF approach) lEstimate translation probabilities: collect counts for factors or combination of factors lPhrase alignment lExtract from word alignment using standard heuristics lEstimate various scoring functions

Stephan Vogel - Machine Translation25 Training the Model lWord alignment (symmetrized)

Stephan Vogel - Machine Translation26 Training the Model lExtract phrase: natürlich hat john # naturally john has

Stephan Vogel - Machine Translation27 Training the Model lExtract phrase for other factors: ADV V NNP # ADV NNP V

Stephan Vogel - Machine Translation28 Training the Generation Steps lTrain on target side of corpus lCan use additional monolingual data lMap factor(s) to factor(s), e.g. word->POS and POS->word lExample: lThe/DET big/ADJ tree/NN lCount collection: count( the, DET )++ count( big, ADJ )++ count( tree, NN )++ lProbability distributions (maximum likelihood estimates) p( the | DET ) and p( DET | the ) p( big | ADJ ) and p( ADJ | big ) p( tree | NN ) and p( NN | tree )

Stephan Vogel - Machine Translation29 Combination of Components lLog-linear components of feature functions lSentence translation generated from a set of phrase pairs Translation component: Feature functions h  defined over phrase pairs Generation component: Feature function h  defined over output words

Stephan Vogel - Machine Translation30 Decoding with Factored Models lInstead of just phrase table, now multiple tables lImportant: all mappings operate on same segmentation of source sentence into phrases lMore target translations are now possible lExample: … beat … can be verb or noun Translations: beat # schlag (NN or VB), schlagen (VB), Rhythmus (NN) … beat … schlag schlagen Rhythmus … beat … schlag|NN|Nom schlag|VB|1-person|singular schlag|NN|Dat schlag|NN|Akk Not-factored Factored

Stephan Vogel - Machine Translation31 Decoding with Factored Models lCombinatorial explosion -> harsher pruning needed lNotice: translation step features and generation step features depend only on phrase pair lAlternative translations can be generated and inserted into the translation lattice before best-path search begins (building fully expanded phrase table?) lFeatures can be calculate and used for translation model pruning (observation pruning) lPruning in Moses decoder lNon-factored model: default is 20 alternatives lFactored model: default is 50 alternative lIncrease in decoding time: factor 2-3

Stephan Vogel - Machine Translation32 Factored LMs in Moses lThe training script allows to specify multiple LMs on different factors, with individual orders (history length) lExample: --lm 0:3:factored-corpus/surface.lm // surface form 3-gram LM --lm 2:3:factored-corpus/pos.lm // POS 3-gram LM lThis generates different LMs on the different factors, not a factored LM lDifferent LMs are used as independent features in decoder lNo backing-off between different factors

Stephan Vogel - Machine Translation33 Summary lFactored models to lDeal with large vocabulary in morphology rich LMs l‘Connect’ words, thereby getting better model estimates lExplicitly model morphological dependencies within sentences lFactored models are not always called factored models lHierarchical model (lexicon) lMulti-stream model (alignment) lFactored LMs introduced for ASR lMany backoff paths lMoses decoder lAllows factored TMs and factored LMs lBut no backing-off between factors, only log-linear combination