NLP
Machine Translation
Source-channel model of communication Parametric probabilistic models of language and translation
Given f, guess e e f E F encoder e’ F E decoder e’ = argmax P(e|f) = argmax P(f|e) P(e) e e translation modellanguage model
p(e)p(f|e)p(e)*p(f|e) a flower red red flower a flower red a a red dog dog cat mouse a red flower Translate from French: “une fleur rouge”?
p(e)p(f|e)p(e)*p(f|e) a flower redlowhighlow red flower a flower red a a red dog dog cat mouse a red flower Translate from French: “une fleur rouge”?
p(e)p(f|e)p(e)*p(f|e) a flower redlowhighlow red flower alowhighlow flower red a a red dog dog cat mouse a red flower Translate from French: “une fleur rouge”?
p(e)p(f|e)p(e)*p(f|e) a flower redlowhighlow red flower alowhighlow flower red alowhighlow a red dog dog cat mouse a red flower Translate from French: “une fleur rouge”?
p(e)p(f|e)p(e)*p(f|e) a flower redlowhighlow red flower alowhighlow flower red alowhighlow a red doghighlow dog cat mouselow a red flowerhigh Translate from French: “une fleur rouge”?
p(Chinese|English) x p(English) ~p(English|Chinese)
Text-to-text (summarization) –also text-to-signal, speech recognition, OCR, spelling correction Example (OCR) –P(text|pixels) = P(text) P(pixels|text)
I watched an interesting play I watched watched an interesting play play play I watched watched an play play play interesting J’ ai vu une pièce de théâtre intéressante
Word translation Local alignment Fertilities Class-based alignment Non-deficient algorithm (avoid overlaps, overflow)
Tokenization Sentence alignment (1-1, 2-2, 2-1 mappings) –Church and Gale (based on sentence length) –Church (sequences of 4-grams) – based on cognates
[Church/Gale 1993]
Alignments –La maison bleue –The blue house –Alignments: {1,2,3}, {1,3,2}, {1,3,3}, {1,1,1} –All are equally likely Conditional probabilities –P(f|A,e) = ?
Algorithm –Pick length of translation –Choose an alignment –Pick the French words –That gives you P(f,A|e) –We need P(f|A,e) –Use EM (expectation-maximization) to find the hidden variables
We need p(f|e) but we don’t know the word alignments (which are assumed to be equally likely)
green house the house casa verde la casa Corpus: Uniform translation model:
E-step 1: compute the expected counts E[count(t(f|e))] for all word pairs (f j,e aj ) E-step 1a: compute P(a,f|e) by multiplying all t probabilities using E-step 1b: normalize P(a,f|e) to get P(a|e,f) using E-step 1c: compute expected fractional counts, by weighting each count by P(a|e,f)
M-step 1: Compute the MLE probability params by normalizing the tcounts to sum to 1. E-step 2a: Recompute P(a,f|e) again by multiplying the t probabilities More iterations are needed (until convergence)
Distortion parameters D(i|j,l,m) –i and j are words in the two sentences –l and m are the lengths of these sentences Example –D(“boy”|”garçon”,5,6)
Fertility P( i |e) Examples –(a) play = pièce de théâtre –(to) place = mettre en place p 1 is an extra parameter that defines 0
(an awesome tutorial by Kevin Knight) (a comprehensive site, including references to the old IBM papers, pointers to Moses, etc.)
NLP