NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation IBM Model 1 CS626/CS460 Anoop Kunchukuttan Under the guidance of Prof. Pushpak Bhattacharyya.
Word Alignment Philipp Koehn USC/Information Sciences Institute USC/Computer Science Department School of Informatics University of Edinburgh Some slides.
Translation Model Parameters & Expectation Maximization Algorithm Lecture 2 (adapted from notes from Philipp Koehn & Mary Hearne) Dr. Declan Groves, CNGL,
Segmentation and Fitting Using Probabilistic Methods
DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University.
. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Apaydin slides with a several modifications and additions by Christoph Eick.
Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein.
1 An Introduction to Statistical Machine Translation Dept. of CSIE, NCKU Yao-Sheng Chang Date:
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau.
Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
Gaussian Mixture Example: Start After First Iteration.
Expectation Maximization Algorithm
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.
Natural Language Processing Expectation Maximization.
SI485i : NLP Set 3 Language Models Fall 2012 : Chambers.
Statistical Alignment and Machine Translation
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
Machine Translation Course 5 Diana Trandab ă ț Academic year:
12/08/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Translation: Alignment and Parameter Estimation.
Fall 2005 Lecture Notes #9 EECS 595 / LING 541 / SI 661 Natural Language Processing.
SI485i : NLP Set 14 Missing Parts and the Future.
12/07/1999 JHU CS /Jan Hajic 1 *Introduction to Natural Language Processing ( ) Statistical Machine Translation Dr. Jan Hajič cCS Dept., Johns.
Korea Maritime and Ocean University NLP Jung Tae LEE
Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang I2R SMT-Reading Group.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
CSE 517 Natural Language Processing Winter 2015
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
(Statistical) Approaches to Word Alignment
A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep Some.
Machine Translation Course 4 Diana Trandab ă ț Academic year:
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Computational Linguistics Seminar LING-696G Week 6.
Ling 575: Machine Translation Yuval Marton Winter 2016 January 19: Spill-over from last class, some prob+stats, word alignment, phrase-based and hierarchical.
Statistical Machine Translation Part II: Word Alignments and EM
CSE 517 Natural Language Processing Winter 2015
CSCI 5832 Natural Language Processing
Statistical Machine Translation
CSCI 5832 Natural Language Processing
Expectation-Maximization Algorithm
Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.
Machine Translation and MT tools: Giza++ and Moses
Machine Translation(MT)
Word Alignment David Kauchak CS159 – Fall 2019 Philipp Koehn
Machine Translation and MT tools: Giza++ and Moses
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
CSCI 5582 Artificial Intelligence
Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011
CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011.
Presentation transcript:

NLP

Machine Translation

Source-channel model of communication Parametric probabilistic models of language and translation

Given f, guess e e f E  F encoder e’ F  E decoder e’ = argmax P(e|f) = argmax P(f|e) P(e) e e translation modellanguage model

p(e)p(f|e)p(e)*p(f|e) a flower red red flower a flower red a a red dog dog cat mouse a red flower Translate from French: “une fleur rouge”?

p(e)p(f|e)p(e)*p(f|e) a flower redlowhighlow red flower a flower red a a red dog dog cat mouse a red flower Translate from French: “une fleur rouge”?

p(e)p(f|e)p(e)*p(f|e) a flower redlowhighlow red flower alowhighlow flower red a a red dog dog cat mouse a red flower Translate from French: “une fleur rouge”?

p(e)p(f|e)p(e)*p(f|e) a flower redlowhighlow red flower alowhighlow flower red alowhighlow a red dog dog cat mouse a red flower Translate from French: “une fleur rouge”?

p(e)p(f|e)p(e)*p(f|e) a flower redlowhighlow red flower alowhighlow flower red alowhighlow a red doghighlow dog cat mouselow a red flowerhigh Translate from French: “une fleur rouge”?

p(Chinese|English) x p(English) ~p(English|Chinese)

Text-to-text (summarization) –also text-to-signal, speech recognition, OCR, spelling correction Example (OCR) –P(text|pixels) = P(text) P(pixels|text)

I watched an interesting play I watched watched an interesting play play play I watched watched an play play play interesting J’ ai vu une pièce de théâtre intéressante

Word translation Local alignment Fertilities Class-based alignment Non-deficient algorithm (avoid overlaps, overflow)

Tokenization Sentence alignment (1-1, 2-2, 2-1 mappings) –Church and Gale (based on sentence length) –Church (sequences of 4-grams) – based on cognates

[Church/Gale 1993]

Alignments –La maison bleue –The blue house –Alignments: {1,2,3}, {1,3,2}, {1,3,3}, {1,1,1} –All are equally likely Conditional probabilities –P(f|A,e) = ?

Algorithm –Pick length of translation –Choose an alignment –Pick the French words –That gives you P(f,A|e) –We need P(f|A,e) –Use EM (expectation-maximization) to find the hidden variables

We need p(f|e) but we don’t know the word alignments (which are assumed to be equally likely)

green house the house casa verde la casa Corpus: Uniform translation model:

E-step 1: compute the expected counts E[count(t(f|e))] for all word pairs (f j,e aj ) E-step 1a: compute P(a,f|e) by multiplying all t probabilities using E-step 1b: normalize P(a,f|e) to get P(a|e,f) using E-step 1c: compute expected fractional counts, by weighting each count by P(a|e,f)

M-step 1: Compute the MLE probability params by normalizing the tcounts to sum to 1. E-step 2a: Recompute P(a,f|e) again by multiplying the t probabilities More iterations are needed (until convergence)

Distortion parameters D(i|j,l,m) –i and j are words in the two sentences –l and m are the lengths of these sentences Example –D(“boy”|”garçon”,5,6)

Fertility P(  i |e) Examples –(a) play = pièce de théâtre –(to) place = mettre en place p 1 is an extra parameter that defines  0

(an awesome tutorial by Kevin Knight) (a comprehensive site, including references to the old IBM papers, pointers to Moses, etc.)

NLP