MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Research & Development ICASSP' Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition D. Jouvet & K. Bartkova France.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Information Extraction Lecture 12 – Multilingual Extraction CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
Module 1 Dictionary skills Part 1
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Hybridity in MT: Experiments on the Europarl Corpus Declan Groves 24 th May, NCLT Seminar Series 2006.
The current status of Chinese- English EBMT -where are we now Joy (Ying Zhang) Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
Natural Language Processing Expectation Maximization.
Why do we study English? Form 9, unit 6.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Technical Report of NEUNLPLab System for CWMT08 Xiao Tong, Chen Rushan, Li Tianning, Ren Feiliang, Zhang Zhuyu, Zhu Jingbo, Wang Huizhen
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
Named Entity Recognition based on Bilingual Co-training Li Yegang School of Computer, BIT.
THE DEPARTMENT OF DIPLOMATIC TRANSLATION
Translating from Morphologically Complex Languages: A Paraphrase-Based Approach Preslav Nakov & Hwee Tou Ng.
Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Xinxiong Chen, Yabin Zheng, Maosong Sun 2011, FCCNLL Automatic Keyphrase.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
New RCLayout. Do product layout 3 improvements All products Local databases New functionalities.
Martin KayTranslation—Meaning1 Martin Kay Stanford University with thanks to Kevin Knight.
Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
For Wednesday No reading Homework –Chapter 23, exercise 15 –Process: 1.Create 5 sentences 2.Select a language 3.Translate each sentence into that language.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
Capitalization DAY 1 COMPLETE SLIDES 1-8
Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem Mikhail Zaslavskiy Marc Dymetman Nicola Cancedda ACL 2009.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
September 2004CSAW Extraction of Bilingual Information from Parallel Texts Mike Rosner.
1 Hitting The Right Paraphrases In Good Time Stanley Kok Dept. of Comp. Sci. & Eng. Univ. of Washington Seattle, USA Chris Brockett NLP Group Microsoft.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Find International Driving Document Translator Online
How to teach what you don’t know: a call for pedagogical experiments Anthony Pym.
Languages of Europe Romance, Germanic, and Slavic.
Is Neural Machine Translation the New State of the Art?
Statistical Machine Translation Part II: Word Alignments and EM
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
Measuring Monolinguality
How to teach translation technologies
8. Translation resources
KantanNeural™ LQR Experiment
Joint Training for Pivot-based Neural Machine Translation
--Mengxue Zhang, Qingyang Li
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
On the Impact of Various Types of Noise on Neural Machine Translation
Statistical Machine Translation Papers from COLING 2004
Machine Translation(MT)
The XMU SMT System for IWSLT 2007
Code-Switching and Style Shifting
Idiap Research Institute University of Edinburgh
Statistical Machine Translation Part VI – Phrase-based Decoding
Presentation transcript:

MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011

Paper  Intersecting multilingual data for faster and better statistical translations. Association for Computational Linguistics, June Yu Chen Martin Kay Andreas Eisele

Introduction  Statistical Machine Translation (SMT): Spanish (foreign) Translation model language model Broken English model translation model language model

Statistical MT Overview Bilingual data model monolingual data learned parameters Foreign sentence Translation Find the best translation given the foreign sentence and the model English sentence

Problems  Noise from word/phrase alignment  Efficiency: time and space

Problems Sie lieben ihre Kinder nicht They love their children not They don’t love their children ihre Kinder nicht => a) their children are not b) their children Language model =/= They love their children are not Solution: remove b) from translation model

    

Solution: Triangulation SOURCE LANGUAGETARGET LANGUAGE BRIDGE LANGUAGE

Triangulation  Look at source-target phrase pair (S,T)  S => X, T => X, S => T (keep)  S => X, T => Y, S =/> T (drop)  S => ?, T => ?, ??? (keep)

Experimental Design  Database: Europarl (1.3 million)  Divided by maximal sentence length  Spanish => English; bridges: French, Portuguese, Danish, German, Finnish  Models from Moses toolkit  Train, develop, test

Results

Phrase Length  "Long phrases suck” (Koehn, 2003)

Further Research

 German => English

Problems / Conclusion  Throwing away a lot of data (94%)  Picking a bridge language (sometimes)  Limitations of data set (Arabic? Chinese?)  Overall, great!

Thanks!