Presentation is loading. Please wait.

Presentation is loading. Please wait.

MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work.

Similar presentations


Presentation on theme: "MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work."— Presentation transcript:

1 MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with: Gregory Hanneman, Justin Merrill, Shyamsundar Jayaraman, Satanjeev Banerjee, Jaime Carbonell

2 March 22, 2006GALE: MEMT2 MEMT Goals and Approach Scientific Challenge: –How to combine the output of multiple MT engines into a synthetic output that outperforms the originals in translation quality –Synthetic combination of the output from the original systems, NOT just selecting the best system Engineering Challenge: –How to integrate multiple distributed translation engines and the MEMT combination engine in a common framework that supports ongoing development and evaluation

3 March 22, 2006GALE: MEMT3 Synthetic Combination MEMT Two Stage Approach: 1.Identify common words and phrases across the translations provided by the engines 2.Decode: search the space of synthetic combinations of words/phrases and select the highest scoring combined translation Example: 1.announced afghan authorities on saturday reconstituted four intergovernmental committees 2.The Afghan authorities on Saturday the formation of the four committees of government

4 March 22, 2006GALE: MEMT4 Synthetic Combination MEMT Two Stage Approach: 1.Identify common words and phrases across the translations provided by the engines 2.Decode: search the space of synthetic combinations of words/phrases and select the highest scoring combined translation Example: 1.announced afghan authorities on saturday reconstituted four intergovernmental committees 2.The Afghan authorities on Saturday the formation of the four committees of government MEMT: the afghan authorities announced on Saturday the formation of four intergovernmental committees

5 March 22, 2006GALE: MEMT5 The Word Alignment Matcher Developed by Satanjeev Banerjee as a component in our METEOR Automatic MT Evaluation metric Finds maximal alignment match with minimal “crossing branches” Allows alignment of: –Identical words –Morphological variants of words –Synonymous words (based on WordNet synsets) Implementation: Clever search algorithm for best match using pruning of sub-optimal sub- solutions

6 March 22, 2006GALE: MEMT6 Matcher Example the sri lanka prime minister criticizes the leader of the country President of Sri Lanka criticized by the country’s Prime Minister

7 March 22, 2006GALE: MEMT7 Scoring MEMT Hypotheses Scoring: –Word confidence score [0,1] based on engine confidence and reinforcement from alignments of the words –LM score based on trigram LM –Log-linear combination: weighted sum of logs of confidence score and LM score –Select best scoring hypothesis based on: Total score (bias towards shorter hypotheses) Average score per word

8 March 22, 2006GALE: MEMT8 Demo

9 March 22, 2006GALE: MEMT9 Example IBM: victims russians are one man and his wife and abusing their eight year old daughter plus a ( 11 and 7 years ) man and his wife and driver, egyptian nationality. : 0.6327 ISI: The victims were Russian man and his wife, daughter of the most from the age of eight years in addition to the young girls ) 11 7 years ( and a man and his wife and the bus driver Egyptian nationality. : 0.7054 CMU: the victims Cruz man who wife and daughter both critical of the eight years old addition to two Orient ( 11 ) 7 years ) woman, wife of bus drivers Egyptian nationality. : 0.5293 MEMT Sentence : Selected : the victims were russian man and his wife and daughter of the eight years from the age of a 11 and 7 years in addition to man and his wife and bus drivers egyptian nationality. 0.7647 -3.25376 Oracle : the victims were russian man and wife and his daughter of the eight years old from the age of a 11 and 7 years in addition to the man and his wife and bus drivers egyptian nationality young girls. 0.7964 -3.44128

10 March 22, 2006GALE: MEMT10 System Development Initial development tests performed on TIDES 2003 Arabic-to-English MT data, using IBM, ISI and CMU SMT system output Evaluation tests performed on Arabic-to- English EBMT Apptek and SYSTRAN system output and on three Chinese-to-English COTS systems Tests on GALE dry-run data currently in progress: –MT systems from IBM, CMU, UMD

11 March 22, 2006GALE: MEMT11 Experimental Results: Arabic-to-English SystemMETEOR Score Apptek.4241 EBMT.4231 Systran.4405 Choosing best online translation.4432 MEMT.5185 Best hypothesis generated by MEMT.5883

12 March 22, 2006GALE: MEMT12 Architecture and Engineering Challenge: How do we construct an effective architecture for running MEMT within large- scale distributed projects? –Example: GALE Project –Multiple MT engines running at different locations –Input may be text or output of speech recognizers, Output may go downstream to other applications (IE, Summarization, TDT) Approach: Using IBM’s UIMA: Unstructured Information Management Architecture –Provides support for building robust processing “workflows” with heterogeneous components –Components act as “annotators” at the character level within documents

13 March 22, 2006GALE: MEMT13 UIMA-based MEMT MT engines and MEMT engine are set up as distributed servers: –Communication over socket connections –Sentence-by-sentence translation Java “wrappers” convert these into UIMA-style annotator components UIMA-based “workflows” implement a variety of a- synchronous tasks, with results stored in a common Annotations Database (ADB) –Translation workflows –MEMT workflow –Evaluation/scoring workflow ADB and ADB Collection Reader/Consumer components developed at CMU by Eric Nyberg’s group

14 March 22, 2006GALE: MEMT14 UIMA-based MEMT MEMT Workflow: –Retrieve document translation annotations labeled by X, Y, Z from ADB –“Annotate” the document with a new MEMT annotation –Write back MEMT annotation into ADB

15 March 22, 2006GALE: MEMT15 Conclusions New sentence-level MEMT approach with nice properties and encouraging results Easy to run on both research and COTS systems UIMA-based architecture design for effective integration in large distributed systems/projects –Pilot study has been very positive –Can serve as a model for integration framework(s) under GALE

16 March 22, 2006GALE: MEMT16 Open Research Issues Main Open Research Issues: –Improvements to the underlying algorithm: better word alignments, “artificial” word alignments –Confidence scores at the sentence or word/phrase level –Engines providing phrasal information –Decoding is still suboptimal Oracle scores show there is much room for improvement Need for additional discriminant features –Extend approach to Multi-Engine SR combination –Engineering issues: synchronization, human friendly interfaces with workflows

17 March 22, 2006GALE: MEMT17 References 2005, Jayaraman, S. and A. Lavie. "Multi-Engine Machine Translation Guided by Explicit Word Matching". In Companion Volume of Proceedings of the 43th Annual Meeting of the Association of Computational Linguistics (ACL-2005), Ann Arbor, Michigan, June 2005.Jayaraman, S. and A. Lavie. "Multi-Engine Machine Translation Guided by Explicit Word Matching" 2005, Jayaraman, S. and A. Lavie. "Multi-Engine Machine Translation Guided by Explicit Word Matching". In Proceedings of the 10th Annual Conference of the European Association for Machine Translation (EAMT- 2005), Budapest, Hungary, May 2005.Jayaraman, S. and A. Lavie. "Multi-Engine Machine Translation Guided by Explicit Word Matching"


Download ppt "MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work."

Similar presentations


Ads by Google