CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE.

Slides:



Advertisements
Similar presentations
Slide 1 Insert your own content. Slide 2 Insert your own content.
Advertisements

Statistical Machine Translation
0 - 0.
Statistical Machine Translation with Moses Hieu Hoang Localization World
Database System Concepts and Architecture
Chapter 8 HTML Editors Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 8-2 Text Editors No single method Notepad Textpad, Notetab, and.
1 Minimally Supervised Morphological Analysis by Multimodal Alignment David Yarowsky and Richard Wicentowski.
The PLASTIC Model to HUTN transformation tool UDA.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
CS626: NLP, Speech and the Web
GIZA ++ A review of how to run GIZA++ By: Bridget McInnes
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Statistical NLP: Lecture 11
Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕.
Albert Gatt Corpora and Statistical Methods Lecture 8.
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
Natural Language Processing Expectation Maximization.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011.
12/13/2007Chia-Ho Ling1 SRILM Language Model Student: Chia-Ho Ling Instructor: Dr. Veton Z. K ë puska.
Name:Venkata subramanyan sundaresan Instructor:Dr.Veton Kepuska.
CMU-Statistical Language Modeling & SRILM Toolkits
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
CS460/626 : Natural Language Processing/Speech, NLP and the Web Lecture 33: Transliteration Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th Nov, 2012.
Graphical models for part of speech tagging
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
NLP Language Models1 Language Models, LM Noisy Channel model Simple Markov Models Smoothing Statistical Language Models.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 3 (10/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Statistical Formulation.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
Yuya Akita , Tatsuya Kawahara
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
CS621: Artificial Intelligence
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 27– SMT Assignment; HMM recap; Probabilistic Parsing cntd) Pushpak Bhattacharyya.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-25: Vowels cntd and a “grand” assignment.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 34: Precision, Recall, F- score, Map.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
2/29/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 38-39: Baum Welch Algorithm; HMM training.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Probabilistic Pronunciation + N-gram Models CMSC Natural Language Processing April 15, 2003.
Build MT systems with Moses MT Marathon Americas 2016 Hieu Hoang.
A CASE STUDY OF GERMAN INTO ENGLISH BY MACHINE TRANSLATION: MOSES EVALUATED USING MOSES FOR MERE MORTALS. Roger Haycock 
Pushpak Bhattacharyya CSE Dept., IIT Bombay
Alexander Fraser CIS, LMU München Machine Translation
--Mengxue Zhang, Qingyang Li
Build MT systems with Moses
Statistical Models for Automatic Speech Recognition
Expectation-Maximization Algorithm
Machine Translation and MT tools: Giza++ and Moses
CS344 : Introduction to Artificial Intelligence
CS621: Artificial Intelligence
Machine Translation and MT tools: Giza++ and Moses
Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011
Presentation transcript:

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011

Going forward from word alignment Word alignment Phrase AlignmentDecoding (going to bigger units(best possible Of correspondence) translation)

Abstract Problem Given: e o e 1 e 2 e 3 ….e n e n+1 (Entities) Goal: l o l 1 l 2 l 3 ….l n l n+1 (Labels) The Goal is to find the best possible label sequence Generative Model

Simplification Using Markov Assumption, the Language Model can be represented using bigrams Similarly translation model can also be represented in the following way:

Statistical Machine Translation Finding the best possible English sentence given the foreign sentence P(E)= Language Model P(F|E) = Translation Model E: English, F: Foreign Language

Problems in the framework Labels are words of the target language Very large in number Who do you want to_go with ? With whom do you want to go ? _ _ (Aap kis ke_sath jaana chahate_ho)who do doand so onyou want to_go to_gowith Each word have multiple translation options. Preposition Stranding

Column of words of target language on the source language words ^ Aap kis ke_sath jaana chahate_ho. whowho do doand so on youyou ^ want want…. to_go to_go withwith Find the best possible path from ^ to. using transition and Observation probabilities. Viterbi can be used

TUTORIAL ON Giza++ and Moses tools (delivered by Kushal Ladha)

Word-based alignment For each word in source language, align words from target language that this word possibly produces Based on IBM models 1-5 Model 1 – simplest As we go from models 1 to 5, models get more complex but more realistic This is all that Giza++ does

Alignment A function from target position to source position: 10 The alignment sequence is: 2,3,4,5,6,6,6 Alignment function A: A(1) = 2, A(2) = 3.. A different alignment function will give the sequence:1,2,1,2,3,4,3,4 for A(1), A(2).. To allow spurious insertion, allow alignment with word 0 (NULL) No. of possible alignments: (I+1) J

IBM Model 1: Generative Process 11

Training Alignment Models 12 Given a parallel corpora, for each (F,E) learn the best alignment A and the component probabilities: t(f|e) for Model 1 lexicon probability P(f|e) and alignment probability P(a i |a i-1,I) How to compute these probabilities if all you have is a parallel corpora

Intuition : Interdependence of Probabilities 13 If you knew which words are probable translation of each other then you can guess which alignment is probable and which one is improbable If you were given alignments with probabilities then you can compute translation probabilities Looks like a chicken and egg problem EM algorithm comes to the rescue

Limitation: Only 1->Many Alignments allowed 14

Phrase-based alignment More natural Many-to-one mappings allowed

Giza++ and Moses Package Select your Ubuntu version Browse the nlp folder Download debian package of giza++, moses, mkcls, srilm Resolve all the dependencies and they get installed For alternate installation, refer to

Steps Input - sentence aligned parallel corpus Output- target side tagged data Training Tuning Generate output on test corpus (decoding)

Training Create a folder named corpus containing test, train and tuning file Giza++ is used to generate alignment Phrase table is generated after training Before training language model needs to be build on target side mkdir lm ; /usr/bin/ngram-count -order 3 -interpolate -kndiscount -text $PWD/corpus/train_surface.hi -lm lm/train.lm; /usr/share/moses/scripts/training/train-factored-phrase-model.perl -scripts- root-dir /usr/share/moses/scripts -root-dir. -corpus train.clean -e hi -f en - lm 0:3:$PWD/lm/train.lm:0;

Example train.en h e l l o w o r l d c o m p o u n d w o r d h y p h e n a t e d o n e b o o m k w e e z l e b o t t e r train.pr hh eh l ow hh ah l ow w er l d k aa m p aw n d w er d hh ay f ah n ey t ih d ow eh n iy b uw m k w iy z l ah b aa t ah r

Sample from Phrase-table b o ||| b aa ||| (0) (1) ||| (0) (1) ||| b ||| b ||| (0) ||| (0) ||| c o m p o ||| aa m p ||| (2) (0,1) (1) (0) (1) ||| (1,3) (1,2,4) (0) ||| c ||| p ||| (0) ||| (0) ||| d w ||| d w ||| (0) (1) ||| (0) (1) ||| d ||| d ||| (0) ||| (0) ||| e b ||| ah b ||| (0) (1) ||| (0) (1) ||| e l l ||| ah l ||| (0) (1) (1) ||| (0) (1,2) ||| e l l ||| eh l ||| (0) (0) (1) ||| (0,1) (2) ||| e l ||| eh ||| (0) (0) ||| (0,1) ||| e ||| ah ||| (0) ||| (0) ||| h e ||| hh ah ||| (0) (1) ||| (0) (1) ||| h ||| hh ||| (0) ||| (0) ||| l e b ||| l ah b ||| (0) (1) (2) ||| (0) (1) (2) ||| l e ||| l ah ||| (0) (1) ||| (0) (1) ||| l l o ||| l ow ||| (0) (0) (1) ||| (0,1) (2) ||| l l ||| l ||| (0) (0) ||| (0,1) ||| l o ||| l ow ||| (0) (1) ||| (0) (1) ||| l ||| l ||| (0) ||| (0) ||| m ||| m ||| (0) ||| (0) ||| n d ||| n d ||| (0) (1) ||| (0) (1) ||| n e ||| eh n iy ||| (1) (2) ||| () (0) (1) ||| n e ||| n iy ||| (0) (1) ||| (0) (1) ||| n ||| eh n ||| (1) ||| () (0) ||| o o m ||| uw m ||| (0) (0) (1) ||| (0,1) (2) ||| o o ||| uw ||| (0) (0) ||| (0,1) ||| o ||| aa ||| (0) ||| (0) ||| o ||| ow eh ||| (0) ||| (0) () ||| o ||| ow ||| (0) ||| (0) ||| w o r ||| w er ||| (0) (1) (1) ||| (0) (1,2) ||| w ||| w ||| (0) ||| (0) |||

Tuning Not a compulsory step but will improve the decoding by a small percentage mkdir tuning; cp $WDIR/corpus/tun.en tuning/input; cp $WDIR/corpus/tun.hi tuning/reference; /usr/share/moses/scripts/training/mert-moses.pl $PWD/tuning/input $PWD/tuning/reference /usr/bin/moses $PWD/model/moses.ini --working- dir $PWD/tuning --rootdir /usr/share/moses/scripts It will take around 1 hour on a server with 32GB RAM

Testing mkdir evaluation; /usr/bin/moses -config $WDIR/tuning/moses.ini -input- file $WDIR/corpus/test.en >evaluation/test.output; The output will be in evaluation/test.output file Sample Output h o t hh aa t p h o n e p|UNK hh ow eh n iy b o o k b uw k