CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Language Divergence Theory: Lexico- Semantic Divergences (ref: Dave, Parikh, Bhattacharyya, Journal of MT, 2002) Conflational divergence F: vomir; E: to be sick E: stab; H: churaa se maaranaa (knife-with hit) S: Utrymningsplan; E: escape plan Structural divergence E: SVO; H: SOV Categorial divergence Change is in POS category (many examples discussed) Head swapping divergence E: Prime Minister of India; H: bhaarat ke pradhaan mantrii (India-of Prime Minister) Lexical divergence E: advise; H: paraamarsh denaa (advice give): Noun Incorporation- very common Indian Language Phenomenon

Language Divergence Theory: Syntactic Divergences Constituent Order divergence E: Singh, the PM of India, will address the nation today; H: bhaarat ke pradhaan mantrii, singh, … (India-of PM, Singh…) Adjunction Divergence E: She will visit here in the summer; H: vah yahaa garmii meM aayegii (she here summer-in will come) Preposition-Stranding divergence E: Who do you want to go with?; H: kisake saath aap jaanaa chaahate ho? (who with…) Null Subject Divergence E: I will go; H: jaauMgaa (subject dropped) Pleonastic Divergence E: It is raining; H: baarish ho rahii haai (rain happening is: no translation of it)

Alignment Completely aligned Your answer is right Votre response est just Problematic alignment We first met in Paris Nous nous sommes rencontres pour la premiere fois a Paris

The Statistical MT model: notation Source language: F Target Language: E Source language sentence: f Target language sentence: e Source language word: w f Target language word: w e

The Statistical MT model To translate f: 1.Assume that all sentences in E are translations of f with some probability! 2.Choose the translation with the highest probability

SMT Model What is a good translation? –Faithful to source –Fluent in target fluency faithfulness

Language Modeling Task to find P(e) (assigning probabilities to sentences)

Language Modeling: The N-gram approximation Probability of the word given the previous N-1 words N=2: bigram approximation N=3: trigram approximation Bigram approximation:

Translation Modeling Task: to find P(f|e) Cannot use the counts of f and e Approximate: P(f|e) using the product of word translation probabilities (IBM model 1) Problem: How to calculate word translation probabilities? Note: We do not have counts – training corpus is sentence-aligned, not word-aligned

Word-alignment example (1) (2) (3) (4) Ram has an apple राम के पास एक सेब है (1) (2)(3) (4) (5) (6)

Expectation Maximization for the translation model

Expectation-Maximization algorithm 1. Start with uniform word translation probabilities 2. Use these probabilities to find the counts (fractional) 3. Use these new counts to recompute the word translation probabilities 4. Repeat the above steps till values converge Works because of the co-occurrence of words that are actually translations It can be proven that EM converges

The counts in IBM Model 1 Works by maximizing P(f|e) over the entire corpus For IBM Model 1, we get the following relationship:

The translation probabilities in IBM Model 1

English-French example of alignment Completely aligned Your 1 answer 2 is 3 right 4 Votre 1 response 2 est 3 just 4 Alignment: 1  1, 2  2, 3  3, 4  4 Problematic alignment We 1 first 2 met 3 in 4 Paris 5 Nous 1 nous 2 sommes 3 rencontres 4 pour 5 la 6 premiere 7 fois 8 a 9 Paris 10 Alignment: 1  (1,2), 2  (5,6,7,8), 3  4, 4  9, 5  10 Fertilty?: yes

EM for word alignment from sentence alignment: example English (1)three rabits ab (2)rabbits of Grenoble bcd French (1)trois lapins xy (2) lapins de Grenoble xyz

Initial Probabilities: each cell denotes t(a  w), t(a  x) etc. abcd w1/4 x y z

The counts in IBM Model 1 Works by maximizing P(f|e) over the entire corpus For IBM Model 1, we get the following relationship:

Example of expected count C[a  w; (a b)  (w x)] t(a  w) =------------------------- X #(a in ‘a b’) X #(w in ‘w x’) t(a  w)+t(a  x) 1/4 =----------------- X 1 X 1= 1/2 1/4+1/4

“counts” b c d  x y z abcd w0000 x01/3 y0 z0 a b  w x abcd w1/2 00 x 00 y0000 z0000

Revised probability: example t revised (a  w) 1/2 = ---------------------------------------- (½+1/2 +0+0 ) (a b)  ( w x) +(0+0+0+0 ) (b c d)  (x y z)

Revised probabilities table abcd w1/21/400 x1/25/121/3 y01/61/3 z01/61/3

“revised counts” b c d  x y z abcd w0000 x05/91/3 y02/91/3 z02/91/3 a b  w x abcd w1/23/800 x1/25/800 y0000 z0000

Re-Revised probabilities table abcd w1/23/1600 x1/285/1441/3 y01/91/3 z01/91/3 Continue until convergence; notice that (b,x) binding gets progressively stronger

Another Example A four-sentence corpus: a b ↔ x y (illustrated book ↔ livre illustrie) b c ↔ x z (book shop ↔ livre magasin) Assuming no null alignments. Possible alignments: a b a b b c b c x y x y x z x z

Iteration 1

Iteration 2

Normalized probabilities: after iteration 2

Normalized probabilities: after iteration 3

Translation Model: Exact expression Five models for estimating parameters in the expression [2] Model-1, Model-2, Model-3, Model-4, Model-5 Choose alignment given e and m Choose the identity of foreign word given e, m, a Choose the length of foreign language string given e

Proof of Translation Model: Exact expression m is fixed for a particular f, hence ; marginalization

Model-1 Simplest model Assumptions Pr(m|e) is independent of m and e and is equal to ε Alignment of foreign language words (FLWs) depends only on length of English sentence = ( l +1) -1 l is the length of English sentence The likelihood function will be Maximize the likelihood function constrained to

Model-1: Parameter estimation Using Lagrange multiplier for constrained maximization, the solution for model-1 parameters λ e : normalization constant; c(f|e; f,e) expected count; δ(f,f j ) is 1 if f & f j are same, zero otherwise. Estimate t(f|e) using Expectation Maximization (EM) procedure

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011.

Similar presentations

Presentation on theme: "CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011.

Similar presentations

Presentation on theme: "CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011."— Presentation transcript:

Similar presentations

About project

Feedback