Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011

Similar presentations


Presentation on theme: "Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011"— Presentation transcript:

1 Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 12–IBM Model 1) Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011

2 Grammar based and N-gram based models of Language
Rule based Model of Language is Grammar A set of rule (grammar) determine whether a sentence is valid in that language. NP -> N |Adj P NP| N PP | Art NP 1/0 decision Recursive rules allow generation of infinite number of sentences in the language Statistical model (e.g. bi-gram , tri-gram) calculates score in the range of 0 to 1 to determine belongingness NOT a 1/0 decision, but a ranking

3 Statistical Machine Translation (SMT)
Data driven approach Goal is to find out the English sentence e given foreign language sentence f whose p(e|f) is maximum. Translations are generated on the basis of statistical model Parameters are estimated using bilingual parallel corpora

4 SMT: Language Model To detect good English sentences
Probability of an English sentence s1s2 …… sn can be written as Pr(s1s2 …… sn) = Pr(s1) * Pr(s2|s1) *. . . * Pr(sn|s1 s sn-1) Here Pr(sn|s1 s sn-1) is the probability that word sn follows word string s1 s sn-1. N-gram model probability Trigram model probability calculation

5 SMT: Translation Model
P(f|e): Probability of some f given hypothesis English translation e How to assign the values to p(e|f) ? Sentences are infinite, not possible to find pair(e,f) for all sentences Introduce a hidden variable a, that represents alignments between the individual words in the sentence pair Sentence level Word level

6 Alignment If the string, e= e1l= e1 e2 …el, has l words, and the string, f= f1m=f1f2...fm, has m words, then the alignment, a, can be represented by a series, a1m= a1a2...am , of m values, each between 0 and l such that if the word in position j of the f-string is connected to the word in position i of the e-string, then aj= i, and if it is not connected to any English word, then aj= O

7 Example of alignment English: Ram went to school
Hindi: Raama paathashaalaa gayaa

8 Alignment between source and target sentence
e0=Φ f0 = Φ e1=Ram f1 =Raama e2=went f2 = paathshala e3=to f3 = gayaa e4=school Alignment a1=1 a2=4 a3=2

9 Translation Model: Exact expression
Choose the length of foreign language string given e Choose alignment given e and m Choose the identity of foreign word given e, m, a Five models for estimating parameters in the expression [2] Model-1, Model-2, Model-3, Model-4, Model-5

10 Proof of Translation Model: Exact expression
; marginalization ; marginalization m is fixed for a particular f, hence

11 Model-1 Simplest model Assumptions The likelihood function will be
Pr(m|e) is independent of m and e and is equal to ε Alignment of foreign language words (FLWs) depends only on length of English sentence = (l+1)-1 l is the length of English sentence The likelihood function will be Maximize the likelihood function constrained to

12 Model-1: Parameter estimation
Using Lagrange multiplier for constrained maximization, the solution for model-1 parameters λe : normalization constant; c(f|e; f,e) expected count; δ(f,fj) is 1 if f & fj are same, zero otherwise. Estimate t(f|e) using Expectation Maximization (EM) procedure


Download ppt "Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011"

Similar presentations


Ads by Google