C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,
Enhancing Translation Systems with Bilingual Concordancing Functionalities V. ANTONOPOULOSC. MALAVAZOS I. TRIANTAFYLLOUS. PIPERIDIS Presentation: V. Antonopoulos.
C SC 620 Advanced Topics in Natural Language Processing Lecture 22 4/15.
Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.
Statistical Machine Translation. General Framework Given sentences S and T, assume there is a “translator oracle” that can calculate P(T|S), the probability.
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
C SC 620 Advanced Topics in Natural Language Processing Lecture 21 4/13.
C SC 620 Advanced Topics in Natural Language Processing Lecture 23 4/20.
1 An Introduction to Statistical Machine Translation Dept. of CSIE, NCKU Yao-Sheng Chang Date:
1 Duluth Word Alignment System Bridget Thomson McInnes Ted Pedersen University of Minnesota Duluth Computer Science Department 31 May 2003.
C SC 620 Advanced Topics in Natural Language Processing Lecture 19 4/6.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
1 Language Model (LM) LING 570 Fei Xia Week 4: 10/21/2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA A A.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 18: 10/26.
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
Corpora and Translation Parallel corpora Statistical MT (not to mention: Corpus of translated text, for translation studies)
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Microsoft Research Faculty Summit Robert Moore Principal Researcher Microsoft Research.
Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.
Albert Gatt Corpora and Statistical Methods Lecture 9.
THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.
Natural Language Processing Expectation Maximization.
SI485i : NLP Set 3 Language Models Fall 2012 : Chambers.
1 Advanced Smoothing, Evaluation of Language Models.
Statistical Alignment and Machine Translation
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
February 2006Machine Translation II.21 Postgraduate Diploma In Translation Example Based Machine Translation Statistical Machine Translation.
Language modelling María Fernández Pajares Verarbeitung gesprochener Sprache.
Chapter 23: Probabilistic Language Models April 13, 2004.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Neural Net Language Models
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem Mikhail Zaslavskiy Marc Dymetman Nicola Cancedda ACL 2009.
A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협.
Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.
Natural Language Processing Statistical Inference: n-grams
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Machine Translation Course 4 Diana Trandab ă ț Academic year:
September 2004CSAW Extraction of Bilingual Information from Parallel Texts Mike Rosner.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Language Model for Machine Translation Jang, HaYoung.
Statistical Machine Translation Part II: Word Alignments and EM
Statistical NLP: Lecture 13
Neural Language Model CS246 Junghoo “John” Cho.
CSCI 5832 Natural Language Processing
CS249: Neural Language Model
Presentation transcript:

C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22

Reading List Readings in Machine Translation, Eds. Nirenburg, S. et al. MIT Press –19. Montague Grammar and Machine Translation. Landsbergen, J. –20. Dialogue Translation vs. Text Translation – Interpretation Based Approach. Tsujii, J.-I. And M. Nagao –21. Translation by Structural Correspondences. Kaplan, R. et al. –22. Pros and Cons of the Pivot and Transfer Approaches in Multilingual Machine Translation. Boitet, C. –31. A Framework of a Mechanical Translation between Japanese and English by Analogy Principle. Nagao, M. –32. A Statistical Approach to Machine Translation. Brown, P. F. et al.

Paper 32. A Statistical Approach to Machine Translation. Brown, P. F. et al. Time: Early 1990s Emergence of the Statistical Approach to MT and to language modelling in general –Statistical learning methods for context-free grammars inside-outside algorithm Like the the popular Example-Based Machine Translation (EBMT) framework discussed last time, we avoid the explicit construction of linguistically sophisticated models of grammar Why now, and not in the 1950s? –Computers 10 5 times faster –Gigabytes of storage –Large, machine-readable corpora readily available for parameter estimation –It’s our turn – symbolic methods have been tried for 40 years

Paper 32. A Statistical Approach to Machine Translation. Brown, P. F. et al. Machine Translation –Source sentence S –Target sentence T –Every pair (S,T) has a probability –P(T|S) = probability target is T given S –Bayes’ theorem P(S|T) = P(S)P(T|S)/P(T)

Paper 32. A Statistical Approach to Machine Translation. Brown, P. F. et al.

The Language Model: P(S) –bigrams: w 1 w 2 w 3 w 4 w 5 w 1 w 2, w 2 w 3, w 3 w 4, w 4 w 5 –sequences of words S = w 1 … w n P(S) = P(w 1 )P(w 2 | w 1 )…P(w n | w 1 …w n-1 ) –product of probability of w i given preceding context for w i problem: we need to know too many probabilities –bigram approximation limit the context P(S) ≈ P(w 1 )P(w 2 | w 1 )…P(w n | w n-1 ) –bigram probability estimation from corpora P(w i | w i-1 ) ≈ freq(w i-1 w i )/freq(w i-1 ) in a corpus

Paper 32. A Statistical Approach to Machine Translation. Brown, P. F. et al. The Language Model: P(S) –n-gram models used successfully in speech recognition –could use trigrams: w 1 w 2 w 3 w 4 w 5 w 1 w 2 w 3, w 2 w 3 w 4, w 3 w 4 w 5 –problem need even more data for parameter estimation sparse data problem even with large corpora handled using smoothing –interpolate for missing data –estimate trigram probabilities from bigram and unigram data

Paper 32. A Statistical Approach to Machine Translation. Brown, P. F. et al. The Translation Model: P(T|S) –Alignment model: assume there is a transfer relationship between source and target words not necessarily 1-to-1 –Example S = w 1 w 2 w 3 w 4 w 5 w 6 w 7 T = u 1 u 2 u 3 u 4 u 5 u 6 u 7 u 8 u 9 w 4 -> u 3 u 5 fertility of w 4 = 2 distortion w 5 -> u 9

Paper 32. A Statistical Approach to Machine Translation. Brown, P. F. et al. Alignment notation –use word positions in parentheses –no word position, no mapping –Example ( Les propositions ne seront pas mises en application maintenant | The(1) proposal(2) will(4) not(3,5) now(9) be implemented(6,7,8) ) This particular alignment is not correct, an artifact of their algorithm

Paper 32. A Statistical Approach to Machine Translation. Brown, P. F. et al. How to compute probability of an alignment? –Need to estimate Fertility probabilities –P(fertility=n|w) = probability word w has fertility n Distortion probabilities –P(i|j,l) = probability target word is at position i given source word at position j and l is the length of the target –Example (Le chien est battu par Jean | John(6) does beat(3,4) the(1) dog(2)) –P(f=1|John)P(Jean|John) x –P(f=0|does) x –P(f=2|beat)P(est|beat)P(battu|beat) x –P(f=1|the)P(Le|the) x –P(f=1|dog)P(chien|dog) x –P(f=1| )P(par| ) x distortion probabilities…

Paper 32. A Statistical Approach to Machine Translation. Brown, P. F. et al. Not done yet –Given T –translation problem is to find S that maximizes P(S)P(T|S) –can’t look for all possible S in the language Idea (Search): –construct best S incrementally –start with a highly likely word transfer –and find a valid alignment –extending candidate S at each step –(Jean aime Marie | * ) –(Jean aime Marie | John(1) * ) Failure? –best S not a good translation language model failed or translation model failed –couldn’t find best S search failure

Paper 32. A Statistical Approach to Machine Translation. Brown, P. F. et al. Parameter Estimation –English/French from the Hansard corpus –100 million words –bilingual Canadian parliamentary proceedings –unaligned corpus –Language Model P(S) from bigram model –Translation Model how to estimate this with an unaligned corpus? Used EM (Estimation and Maximization) algorithm, an iterative algorithm for re-estimating probabilities Need –P(u|w) for words u in T and w in S –P(n|w) for fertility n and w in S –P(i|j,l) for target position i and source position j and target length l

Paper 32. A Statistical Approach to Machine Translation. Brown, P. F. et al. Experiment 1: Parameter Estimation for the Translation Model –Pick 9,000 most common words for French and English –40,000 sentence pairs –81,000,000 parameters –Initial guess: minimal assumptions

Paper 32. A Statistical Approach to Machine Translation. Brown, P. F. et al. Experiment 1: results –(English) Hear, hear! –(French) Bravo!

Paper 32. A Statistical Approach to Machine Translation. Brown, P. F. et al. Experiment 2: Translation from French to English –Make task manageable English lexicon –1,000 most frequent English words in corpus French lexicon –1,700 most frequent French words in translations completely covered by the selected English words 117,000 sentence pairs with words covered by the lexicons 17 million parameters estimated for the translation model bigram model of English –570,000 sentences –12 million words –73 test sentences Categories: (exact, alternate, different), wrong, ungrammatical

Paper 32. A Statistical Approach to Machine Translation. Brown, P. F. et al.

48% (Exact, alternate, different) Editing 776 keystrokes 1,916 Hansard

Paper 32. A Statistical Approach to Machine Translation. Brown, P. F. et al. Plans –Used only a small fraction of the data available Parameters can only get better… –Many-to-one problem only one-to-many allowed in current model can’t handle –to go -> aller –will … be -> seront –No model of phrases displacement of phrases

Paper 32. A Statistical Approach to Machine Translation. Brown, P. F. et al. Plans –Trigram model perplexity = measure of degree of uncertainty in the language model with respect to a corpus Experiment 2: bigram model (78), trigram model (9) trigram model, general English (247) –No morphology stemming will help statistics –Could define translation between phrases in a probabilistic phrase structure grammar

Administrivia Away next week at the University of Geneva –work on your projects and papers –reachable by Last class –Tuesday May 4th