MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Multi-Mode Survey Management An Approach to Addressing its Challenges
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.
BLEU, Its Variants & Its Critics Arthur Chan Prepared for Advanced MT Seminar.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
NICE: Native language Interpretation and Communication Environment Lori Levin, Jaime Carbonell, Alon Lavie, Ralf Brown Carnegie Mellon University.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
TIDES MT Workshop Review. Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
Scalable Text Mining with Sparse Generative Models
Overview of Search Engines
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Survey of Semantic Annotation Platforms
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
METEOR-Ranking & M-BLEU: Flexible Matching & Parameter Tuning for MT Evaluation Alon Lavie and Abhaya Agarwal Language Technologies Institute Carnegie.
METEOR: Metric for Evaluation of Translation with Explicit Ordering An Automatic Metric for MT Evaluation with Improved Correlations with Human Judgments.
July 24, 2007GALE Update: Alon Lavie1 Statistical Transfer and MEMT Activities Multi-Engine Machine Translation –MEMT service within the cross-GALE IOD.
MEMT: Multi-Engine Machine Translation Machine Translation Alon Lavie February 19, 2007.
Carnegie Mellon School of Computer Science Copyright © 2001, Carnegie Mellon. All Rights Reserved. JAVELIN Project Briefing 1 AQUAINT Phase I Kickoff December.
Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication.
FEISGILTT Dublin 2014 Yves Savourel ENLASO Corporation QuEst Integration in Okapi This presentation was made possible by This project is sponsored by the.
Combining GATE and UIMA Ian Roberts. University of Sheffield NLP 2 Overview Introduction to UIMA Comparison with GATE Mapping annotations between GATE.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Toward an Open Source Textual Entailment Platform (Excitement Project) Bernardo Magnini (on behalf of the Excitement consortium) 1 STS workshop, NYC March.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
July 24, 2007GALE Update: Alon Lavie1 Statistical Transfer and MEMT Activities Chinese-to-English Statistical Transfer MT system (Stat-XFER) –Developed.
CMU Statistical-XFER System Hybrid “rule-based”/statistical system Scaled up version of our XFER approach developed for low-resource languages Large-coverage.
MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
Combining GATE and UIMA Ian Roberts. 2 Overview Introduction to UIMA Comparison with GATE Mapping annotations between GATE and UIMA.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.
Multi-Engine Machine Translation
METEOR: Metric for Evaluation of Translation with Explicit Ordering An Improved Automatic Metric for MT Evaluation Alon Lavie Joint work with: Satanjeev.
Search Engine Architecture
Language Technologies Institute Carnegie Mellon University
Monoligual Semantic Text Alignment and its Applications in Machine Translation Alon Lavie March 29, 2012.
Evergreen Data Systems
Database Management Systems
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
CMU Y2 Rosetta GnG Distillation
Presentation transcript:

MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with: Gregory Hanneman, Justin Merrill, Shyamsundar Jayaraman, Satanjeev Banerjee, Jaime Carbonell

March 22, 2006GALE: MEMT2 MEMT Goals and Approach Scientific Challenge: –How to combine the output of multiple MT engines into a synthetic output that outperforms the originals in translation quality –Synthetic combination of the output from the original systems, NOT just selecting the best system Engineering Challenge: –How to integrate multiple distributed translation engines and the MEMT combination engine in a common framework that supports ongoing development and evaluation

March 22, 2006GALE: MEMT3 Synthetic Combination MEMT Two Stage Approach: 1.Identify common words and phrases across the translations provided by the engines 2.Decode: search the space of synthetic combinations of words/phrases and select the highest scoring combined translation Example: 1.announced afghan authorities on saturday reconstituted four intergovernmental committees 2.The Afghan authorities on Saturday the formation of the four committees of government

March 22, 2006GALE: MEMT4 Synthetic Combination MEMT Two Stage Approach: 1.Identify common words and phrases across the translations provided by the engines 2.Decode: search the space of synthetic combinations of words/phrases and select the highest scoring combined translation Example: 1.announced afghan authorities on saturday reconstituted four intergovernmental committees 2.The Afghan authorities on Saturday the formation of the four committees of government MEMT: the afghan authorities announced on Saturday the formation of four intergovernmental committees

March 22, 2006GALE: MEMT5 The Word Alignment Matcher Developed by Satanjeev Banerjee as a component in our METEOR Automatic MT Evaluation metric Finds maximal alignment match with minimal “crossing branches” Allows alignment of: –Identical words –Morphological variants of words –Synonymous words (based on WordNet synsets) Implementation: Clever search algorithm for best match using pruning of sub-optimal sub- solutions

March 22, 2006GALE: MEMT6 Matcher Example the sri lanka prime minister criticizes the leader of the country President of Sri Lanka criticized by the country’s Prime Minister

March 22, 2006GALE: MEMT7 Scoring MEMT Hypotheses Scoring: –Word confidence score [0,1] based on engine confidence and reinforcement from alignments of the words –LM score based on trigram LM –Log-linear combination: weighted sum of logs of confidence score and LM score –Select best scoring hypothesis based on: Total score (bias towards shorter hypotheses) Average score per word

March 22, 2006GALE: MEMT8 Demo

March 22, 2006GALE: MEMT9 Example IBM: victims russians are one man and his wife and abusing their eight year old daughter plus a ( 11 and 7 years ) man and his wife and driver, egyptian nationality. : ISI: The victims were Russian man and his wife, daughter of the most from the age of eight years in addition to the young girls ) 11 7 years ( and a man and his wife and the bus driver Egyptian nationality. : CMU: the victims Cruz man who wife and daughter both critical of the eight years old addition to two Orient ( 11 ) 7 years ) woman, wife of bus drivers Egyptian nationality. : MEMT Sentence : Selected : the victims were russian man and his wife and daughter of the eight years from the age of a 11 and 7 years in addition to man and his wife and bus drivers egyptian nationality Oracle : the victims were russian man and wife and his daughter of the eight years old from the age of a 11 and 7 years in addition to the man and his wife and bus drivers egyptian nationality young girls

March 22, 2006GALE: MEMT10 System Development Initial development tests performed on TIDES 2003 Arabic-to-English MT data, using IBM, ISI and CMU SMT system output Evaluation tests performed on Arabic-to- English EBMT Apptek and SYSTRAN system output and on three Chinese-to-English COTS systems Tests on GALE dry-run data currently in progress: –MT systems from IBM, CMU, UMD

March 22, 2006GALE: MEMT11 Experimental Results: Arabic-to-English SystemMETEOR Score Apptek.4241 EBMT.4231 Systran.4405 Choosing best online translation.4432 MEMT.5185 Best hypothesis generated by MEMT.5883

March 22, 2006GALE: MEMT12 Architecture and Engineering Challenge: How do we construct an effective architecture for running MEMT within large- scale distributed projects? –Example: GALE Project –Multiple MT engines running at different locations –Input may be text or output of speech recognizers, Output may go downstream to other applications (IE, Summarization, TDT) Approach: Using IBM’s UIMA: Unstructured Information Management Architecture –Provides support for building robust processing “workflows” with heterogeneous components –Components act as “annotators” at the character level within documents

March 22, 2006GALE: MEMT13 UIMA-based MEMT MT engines and MEMT engine are set up as distributed servers: –Communication over socket connections –Sentence-by-sentence translation Java “wrappers” convert these into UIMA-style annotator components UIMA-based “workflows” implement a variety of a- synchronous tasks, with results stored in a common Annotations Database (ADB) –Translation workflows –MEMT workflow –Evaluation/scoring workflow ADB and ADB Collection Reader/Consumer components developed at CMU by Eric Nyberg’s group

March 22, 2006GALE: MEMT14 UIMA-based MEMT MEMT Workflow: –Retrieve document translation annotations labeled by X, Y, Z from ADB –“Annotate” the document with a new MEMT annotation –Write back MEMT annotation into ADB

March 22, 2006GALE: MEMT15 Conclusions New sentence-level MEMT approach with nice properties and encouraging results Easy to run on both research and COTS systems UIMA-based architecture design for effective integration in large distributed systems/projects –Pilot study has been very positive –Can serve as a model for integration framework(s) under GALE

March 22, 2006GALE: MEMT16 Open Research Issues Main Open Research Issues: –Improvements to the underlying algorithm: better word alignments, “artificial” word alignments –Confidence scores at the sentence or word/phrase level –Engines providing phrasal information –Decoding is still suboptimal Oracle scores show there is much room for improvement Need for additional discriminant features –Extend approach to Multi-Engine SR combination –Engineering issues: synchronization, human friendly interfaces with workflows

March 22, 2006GALE: MEMT17 References 2005, Jayaraman, S. and A. Lavie. "Multi-Engine Machine Translation Guided by Explicit Word Matching". In Companion Volume of Proceedings of the 43th Annual Meeting of the Association of Computational Linguistics (ACL-2005), Ann Arbor, Michigan, June 2005.Jayaraman, S. and A. Lavie. "Multi-Engine Machine Translation Guided by Explicit Word Matching" 2005, Jayaraman, S. and A. Lavie. "Multi-Engine Machine Translation Guided by Explicit Word Matching". In Proceedings of the 10th Annual Conference of the European Association for Machine Translation (EAMT- 2005), Budapest, Hungary, May 2005.Jayaraman, S. and A. Lavie. "Multi-Engine Machine Translation Guided by Explicit Word Matching"