Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original.

Statistical Machine Translation Raghav Bashyal

Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original Notice patterns, associate words

SMT Process Knight – A Statistical Translation Workbook Basic probabilities – P(word) Conditional probabilities – P(word | word) … Pick the most probable translation

SMT process http://isoft.postech.ac.kr/research/SMT/images/math.jpg

Project Translate basic text from Spanish to English Test effectiveness with/without hard-coded components (syntax) Specific procedures/algorithms that add speed

Literature Guides on Statistical Machine Translation Most research project follow the same procedure as outlined by Knight “state of the art” implementation – Google

Literature NLTK – Christina Wallin UC Berkeley – Modifications – Larger corpora more useful Syntax based – hard-code – Higher translation quality when used with SMT

Procedure NLTK – Natural Language ToolKit Python Made from Natural Language processing projects Current procedure – read the SMT worksheet Code along with worksheet

Development Create corpora Tokenization – Clean string Probability – P(word) in corpora

Smoothing Coefficients used to modify probability – Large coefficients for trigrams – Small for bigrams and single words Normalizes the weight of all the words/phrases – Trigrams are more valuable

Algorithm For translation, IMB Model 3 is used: 1. For each English word ei indexed by i = 1, 2,..., 1, choose fertility phi-i with probability n(phi-i | ei) 2. Choose the number phi-0 of "spurious" French words to be generated from e0 = NULL, using probability p1 and the sum of fertilities from step 1 3. Let m be the sum of fertilities for all words, including NULL 4. For each i = 0, 1, 2,...., 1, and each k = 1, 2,..., phi-i, choose a French word tau-ik with probability t(tau-ik | ei) 5. For each i = 1, 2,..., 1, and each k = 1, 2,..., phi-i, choose target French position pi-ik with probability d(pi-ik | i, l, m) 6. For each k = 1, 2,..., phi-0, choose a position pi-0k from the phi-0 - k + 1 remaining vacant positions in 1, 2,...m, for a total probability of 1/phi-0! 7. Output the French sentence with words tau-ik in positions pi-ik (0<=i<=1, 1<=k<phi-i)

Expected Results Probably will be very basic translation Usually perform better with “sample” text than “real” text Highlighted errors Program should use reference data to find some errors Error frequency plots for certain words Test the effectiveness of adjustments Hard coding, other algorithms

Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original.

Similar presentations

Presentation on theme: "Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original.

Similar presentations

Presentation on theme: "Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original."— Presentation transcript:

Similar presentations

About project

Feedback