English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical modelling of MT output corpora for Information Extraction.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Word Sense Disambiguation for Machine Translation Han-Bin Chen
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
TURKALATOR A Suite of Tools for English to Turkish MT Siddharth Jonathan Gorkem Ozbek CS224n Final Project June 14, 2006.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Corpora and Translation Parallel corpora Statistical MT (not to mention: Corpus of translated text, for translation studies)
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
Natural Language Processing Expectation Maximization.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,
Machine translation Context-based approach Lucia Otoyo.
WSTA 20: Machine Translation
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Technical Report of NEUNLPLab System for CWMT08 Xiao Tong, Chen Rushan, Li Tianning, Ren Feiliang, Zhang Zhuyu, Zhu Jingbo, Wang Huizhen
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Achieving Domain Specificity in SMT without Over Siloing William Lewis, Chris Wendt, David Bullock Microsoft Research Machine Translation.
Direct Translation Approaches: Statistical Machine Translation
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Statistical Machine Translation Part IV – Log-Linear Models Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Sanjay Chatterji Dev shri Roy Sudeshna Sarkar Anupam Basu CSE, IIT Kharagpur A Hybrid Approach for Bengali to Hindi Machine Translation.
1 Machine Translation Dai Xinyu Outline  Introduction  Architecture of MT  Rule-Based MT vs. Data-Driven MT  Evaluation of MT  Development.
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Statistical Machine Translation Part III – Phrase-based SMT Alexander Fraser CIS, LMU München WSD and MT.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal.
Korea Maritime and Ocean University NLP Jung Tae LEE
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
FEISGILTT Dublin 2014 Yves Savourel ENLASO Corporation QuEst Integration in Okapi This presentation was made possible by This project is sponsored by the.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information.
MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.
Neural Machine Translation
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
Approaches to Machine Translation
Alexander Fraser CIS, LMU München Machine Translation
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
Approaches to Machine Translation
Machine Translation and MT tools: Giza++ and Moses
Machine Translation(MT)
Machine Translation and MT tools: Giza++ and Moses
Statistical Machine Translation Part VI – Phrase-based Decoding
Presentation transcript:

English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011

Outline 2  MT Introduction  SMT Introduction  Requirements for SMT  Evaluation metrics  English-Persian MT challenges  English-Persian SMT  System1  System2  Problems in English-Persian SMT

MT Introduction 3  Automatic translation of text written in a natural language into another one by the use of computers is referred to as Machine Translation.  There are several way to do this work:  Dictionary-based  Rule-based  Example-based  Statistical approach

SMT Introduction 4  First ideas of Statistical machine translation was proposed by Warren Weaver in  Statistical machine translation tries to learn the translation by examining the translations made by humans.

SMT Introduction(Cont.) 5  Statistical MT models take the view that every sentence in the target language is a translation of the source language sentence with some probability.  The best translation, of course, is the sentence that has the highest probability.  The key problems in statistical MT are:  estimating the probability of a translation  and efficiently finding the sentence with the highest probability.

SMT Introduction(Cont.) 6  Given a Source sentence f, we seek the target sentence e that maximizes P(e | f). e‘ = argmax e P(e | f)  Intuitively, P(e|f) should depend on two factors:  P(e|f) = P(e) * P(f | e) / P(f)  argmax e P(e | f) = argmax e P(e) * P(f | e) fluency faithfulness

SMT Introduction(Cont.) 7  Philipp koehn 

Why SMT? 8  Better use of resources  Not need linguistic knowledge  It can use for any pair of language  But  We need a big training corpus

Steps of SMT 9

Requirements for SMT 10  Bilingual and Monolingual Corpus:  For bilingual need tow file aligned sentence by sentence (one file for source language and other for target language)  Microsoft Bi-Lingual sentence Aligner  Language Model:  We need a tool to compute P(e)  For this step we need to monolingual corpus  SRILM: a tool for create N-grams

LM output 11

Requirements for SMT 12  Translation Model:  We need a tool for compute P(f|e)  For this step we need to bilingual corpus  GIZA++  The output of this tool is a phrase table  Decode:  For search and find best translation  Moses

Phrase table 13

Moses tool 14

The training steps 15  Prepare data  Run GIZA++  Align words  Get lexical translation table  Extract phrases  Score phrases  Build reordering model  Build generation models  Create configuration file

Evaluation metrics 16  BLEU(BiLingual Evaluation Understudy)  Developed at IBM’s  The closer a MT is to a professional human translation, the better it is  NIST

English-Persian MT challenges 17  The Persian language structure is very different in comparison to English  The structure of Persian language is very complex  There has been little previous work done for this language pair  Effective SMT systems rely on very large bilingual corpora but there are not readily available for the English/Persian language pair

English-Persian SMT 18  There have been few English-Persian MT systems developed  Most of them are purely rule-based  There are two work on English-Persian SMT  Mohaghegh and Sarrafzadeh (Massey University)  Pilevar and Faili (Tehran University)

System1 19  Corpus: BBC news

System1(Cont.) 20  Tools: SRILM, GIZA++, Moses

System1: Improved Language Modeling 21

System2 22  Corpus:  Bidirectional(TEP): Subtitle of films, 3 books, KDE4

System2(Cont.) 23  Corpus:  Monolingual: Hamshahri, subtitle of films

System2(Cont.) 24  Tools: SRILM, GIZA++, Moses PersianSMT with 4-gram Sub-LM

Comparison PersianSMT with Google Translator 25

Problems in English-Persian SMT 26  compound verbs (aligning problem)  Use a phrase-based SMT system  But problem is inflectional morphology  Large number of inflected verb forms does not let the system learn to translate all the individual forms of a compound verb  Persian takes personal pronouns as an optional element in the sentence (aligning problem)

Problems(Cont.) 27  failure of the system to place the elements of the sentence in the right order  Use a phrase-based SMT system  Re-rank the n-best output list and/or reorder the output sentences  Prior to translation, the input sentence is reordered using morpho-syntactic information, so that the word order resembles better that of the target language.

28

References 29  [1] A. Ramanathan, "Statistical Machine Translation", Ph.D. Seminar Report, Department of Computer Science and Engineering Indian Institute of Technology,  [2] A. LOPEZ, "Statistical Machine Translation", ACM Computing Surveys,  [3] M. Mohaghegh, & A. Sarrafzadeh, “The first english-persian statistical machine translation”, New Zealand Postgraduate Conference,  [4] M. Mohaghegh, & A. Sarrafzadeh, " An analysis of the effect of training data variation in English-Persian Statistical Machine Translation”, 2009 International Conference on Innovations in Information Technology (IIT 2009)  [5] M. Mohaghegh, & A. Sarrafzadeh, " Performance evaluation of various training data in English-Persian statistical machine translation “, Appear in Proceedings of the 10th International Conference on the Statistical Analysis of Textual Data (JADT 2010), Rome, Italy, June 9-11,  [6] M. Mohaghegh, & A. Sarrafzadeh, " Improved Language Modeling for English-Persian Statistical Machine Translation”, COLING 2010 / SIGMT Workshop 23rd International Conference on Computational Linguistics Beijing, China 28 August 2010

References(Cont.) 30  [7] M.T. Pilevar and H. Faili, "PersianSMT: A First Attempt to English-Persian Statistical Machine Translation", to appear in Proc. of 10th International Conference on statistical analysis of textual data (JADT 2010)