July 24, 2007GALE Update: Alon Lavie1 Statistical Transfer and MEMT Activities Chinese-to-English Statistical Transfer MT system (Stat-XFER) –Developed.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
Novel Reordering Approaches in Phrase-Based Statistical Machine Translation S. Kanthak, D. Vilar, E. Matusov, R. Zens & H. Ney ACL Workshop on Building.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
TIDES MT Workshop Review. Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
Microsoft Research Faculty Summit Robert Moore Principal Researcher Microsoft Research.
© 2014 The MITRE Corporation. All rights reserved. Stacey Bailey and Keith Miller On the Value of Machine Translation Adaptation LREC Workshop: Automatic.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
Machine translation Context-based approach Lucia Otoyo.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Technical Report of NEUNLPLab System for CWMT08 Xiao Tong, Chen Rushan, Li Tianning, Ren Feiliang, Zhang Zhuyu, Zhu Jingbo, Wang Huizhen
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
Large Language Models in Machine Translation Conference on Empirical Methods in Natural Language Processing 2007 報告者:郝柏翰 2013/06/04 Thorsten Brants, Ashok.
METEOR-Ranking & M-BLEU: Flexible Matching & Parameter Tuning for MT Evaluation Alon Lavie and Abhaya Agarwal Language Technologies Institute Carnegie.
July 24, 2007GALE Update: Alon Lavie1 Statistical Transfer and MEMT Activities Multi-Engine Machine Translation –MEMT service within the cross-GALE IOD.
Learning Phonetic Similarity for Matching Named Entity Translation and Mining New Translations Wai Lam, Ruizhang Huang, Pik-Shan Cheung ACM SIGIR 2004.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
MEMT: Multi-Engine Machine Translation Machine Translation Alon Lavie February 19, 2007.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
Statistical XFER: Hybrid Statistical Rule-based Machine Translation Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.
AMTEXT: Extraction-based MT for Arabic Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Laura Kieras, Peter Jansen Informant: Loubna El Abadi.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
Hebrew-to-English XFER MT Project - Update Alon Lavie June 2, 2004.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,
The CMU Mill-RADD Project: Recent Activities and Results Alon Lavie Language Technologies Institute Carnegie Mellon University.
MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
CMU Statistical-XFER System Hybrid “rule-based”/statistical system Scaled up version of our XFER approach developed for low-resource languages Large-coverage.
MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.
Eliciting a corpus of word- aligned phrases for MT Lori Levin, Alon Lavie, Erik Peterson Language Technologies Institute Carnegie Mellon University.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Multi-Engine Machine Translation
METEOR: Metric for Evaluation of Translation with Explicit Ordering An Improved Automatic Metric for MT Evaluation Alon Lavie Joint work with: Satanjeev.
Approaches to Machine Translation
Monoligual Semantic Text Alignment and its Applications in Machine Translation Alon Lavie March 29, 2012.
Alon Lavie “Visionary Talk” LTI Faculty Retreat May 4, 2007
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Approaches to Machine Translation
Presentation transcript:

July 24, 2007GALE Update: Alon Lavie1 Statistical Transfer and MEMT Activities Chinese-to-English Statistical Transfer MT system (Stat-XFER) –Developed over past year –Included in Rosetta Phase-II GNG evaluation –Team: Erik Peterson and Alon Lavie Multi-Engine Machine Translation –MEMT service within the cross-GALE IOD –MEMT system combination for GNG evaluation within Rosetta consortium –Team: Greg Hanneman, Shilpa Aurora, Dave Svoboda, Alon Lavie and Eric Nyberg

July 24, 2007GALE Update: Alon Lavie2 CMU Statistical-XFER System Truly hybrid “rule-based”/statistical system Scaled up version of our XFER approach developed for low-resource languages (under NSF funding) Large-coverage “clean” bilingual lexicon + syntactic transfer rules (human written + extracted from data) XFER formalism is a Synchronous CFG + feature unification constraints Supports morphological analysis and generation as “plug in” components Two-stage translation process: –Build lattice of translation constituents at all grammar levels “bottom-up” –Monotonic decoder selects best combination of lattice edges –Beam-search with multiple features at both stages –Features include: LM, fragmentation, lexical probabilities, length, etc. –Optimized Log-linear combination of feature scores

July 24, 2007GALE Update: Alon Lavie3 Chinese-English S-XFER System Bilingual lexicon: over 1.1 million entries (multiple resources, incl. ADSO, Wikipedia, extracted base NPs) Manual syntactic XFER grammar: 76 rules! (mostly NPs, a few PPs, and reordering of NPs/PPs within VPs) Multiple overlapping Chinese word segmentations English morphology generation Uses CMU SMT-group’s Suffix-Array LM toolkit for LM Current Performance (GALE dev-test): –NW XFER: 10.89(B)/0.4509(M) Best (UMD): 15.58(B)/0.4769(M) –NG XFER: 8.92(B)/0.4229(M) Best (UMD): 12.96(B)/0.4455(M) In Progress: –Automatic extraction of “clean” base NPs from parallel data –Automatic learning and extraction of high-quality transfer-rules from parallel data

July 24, 2007GALE Update: Alon Lavie4 Recent Performance Analysis What fraction of the time does each MT system produce the best translation (sentence-by-sentence)? Evaluated on Chinese GALE dev-test (text) data BLEU METEOR CMU-PhraseSyntaxCombination 60 of 284 (21.1%) 41 of 284 (14.4%) IBM-smt 50 of 284 (17.6%) 49 of 284 (17.2%) IBM-ylee 64 of 284 (22.5%) 50 of 284 (17.6%) maryland-jhu-combination 71 of 284 (25.0%) 77 of 284 (27.1%) Stat-XFER 32 of 284 (11.2%) 56 of 284 (19.7%)

July 24, 2007GALE Update: Alon Lavie5 Translation Example REFERENCE: When responding to whether it is possible to extend Russian fleet's stationing deadline at the Crimean peninsula, Yanukovych replied, "Without a doubt. Stat-XFER (0.3989): In reply to whether the possibility to extend the Russian fleet stationed in Crimea Pen. left the deadline of the problem, Yanukovich replied : " of course. IBM-ylee (0.2203): In response to the possibility to extend the deadline for the presence in Crimea peninsula, the Queen Vic said : " of course. CMU-SMT (0.2067): In response to a possible extension of the fleet in the Crimean Peninsula stay on the issue, Yanukovych vetch replied : " of course. maryland-hiero (0.1878): In response to the possibility of extending the mandate of the Crimean peninsula in, replied: "of course. IBM-smt (0.1862):The answer is likely to be extended the Crimean peninsula of the presence of the problem, Yanukovych said: " Of course. CMU-syntax (0.1639): In response to the possibility of extension of the presence in the Crimean Peninsula, replied : " of course.

July 24, 2007GALE Update: Alon Lavie6 Synthetic Combination MEMT Two Stage Approach: 1.Identify common words and phrases across the translations provided by the engines 2.Decode: search the space of synthetic combinations of words/phrases and select the highest scoring combined translation Example: 1.announced afghan authorities on saturday reconstituted four intergovernmental committees 2.The Afghan authorities on Saturday the formation of the four committees of government

July 24, 2007GALE Update: Alon Lavie7 Synthetic Combination MEMT Two Stage Approach: 1.Identify common words and phrases across the translations provided by the engines 2.Decode: search the space of synthetic combinations of words/phrases and select the highest scoring combined translation Example: 1.announced afghan authorities on saturday reconstituted four intergovernmental committees 2.The Afghan authorities on Saturday the formation of the four committees of government MEMT: the afghan authorities announced on Saturday the formation of four intergovernmental committees

July 24, 2007GALE Update: Alon Lavie8 Scoring MEMT Hypotheses Scoring Features: –Word confidence score [0,1] based on engine confidence and reinforcement from alignments –LM score based on suffix-array 5-gram LM –Exponential “occurrence-based” long n-gram model –Select best scoring hypothesis based on optimized log- linear combination of feature scores

July 24, 2007GALE Update: Alon Lavie9

July 24, 2007GALE Update: Alon Lavie10 MEMT – Main Activities Preserving Source Alignments: target phrases that originate from same source word can be marked as unbreakable units (performance effects are currently under testing…) LM experiments using CMU’s Suffix-Array LM toolkit and new LM features (work in progress) Case Restoration: scheme for selecting the case of words in final MEMT output Handling of varying number of MEMT input engines Parameter Optimization for MEMT decoder Upgrades to MEMT software infrastructure to support IOD-3 requirements: UIMA v2.0 and ActiveMQ MEMT server is up 24/7 for ongoing IOD runs Active participation in phase-II GNG evaluation

July 24, 2007GALE Update: Alon Lavie11 Recent Evaluation Results

July 24, 2007GALE Update: Alon Lavie12 MEMT Fresh Example Extracted from IOD overnight run of Sunday, July 22 IBM_MT: “defence minister announced that the Afghan security forces and Afghan joint NATO began this morning the process of wide security” RWTH_MT: "afghan defence minister that the security forces, a joint us-afghan started today in the morning, the implementation of a wide security operation.” CMU_MEMT: "afghan defence minister announced that the security forces and Afghan joint NATO began this morning in the process of the a wide security.”

July 24, 2007GALE Update: Alon Lavie13 Future Plans and New Directions Classifiers for Hypothesis Selection –Simpler than MEMT, but perhaps more effective (given recent analysis results) Constrained Search-spaces for MEMT –Other groups use more constrained combination spaces with good results – can we also do better? Discriminative feature-rich LMs for MT (and MEMT) –Standard statistical LMs are not sufficiently discriminative for MT –New NSF grant (with Rebecca Hwa) to explore novel feature-rich “occurrence-based” models M-TER –Create a fully automatic metric that approximates H-TER using the stemming and synonymy capabilities of METEOR to create “targeted” references

July 24, 2007GALE Update: Alon Lavie14 Translation Example REFERENCE: I believe there is a detail that worth mentioning. Stat-XFER (0.5799):There is a detail, I believe worth together. maryland-hiero(0.3221):There is a one details, I think they deserve. IBM-ylee (0.2525):There is one of the details, I think it is worth. IBM-smt (0.2062):There is a details, I think is a way. CMU-SMT (0.2000):One of the details, I think it is worth to join in. CMU-syntax (0.1546):a detail, I think it is a cold front. maryland-jhu (0.1031):One to details, I think they deserve.

July 24, 2007GALE Update: Alon Lavie15 Translation Example REFERENCE: Xinhua News Agency, Zhuhai, Nov. 2 (reporters Wang Hongshan, Li Xuanliang) - Yang Liwei, Deputy Director of China Astronaut Research and Training Center and 'Space Hero', said in Zhuhai on the 1st that China has no plan to select female astronaut yet. Stat-XFER (0.6349): The Xinhua News Agency and the Zhuhai November seconds ( reporters Wang Hongshan, li announced good ) Chinese astronaut research and training center deputy director, " aerospace hero " Yang Liwei and the first said in zhuhai, china now not yet to select woman astronaut plan. IBM-ylee (0.5325): Zhuhai November 2 ( Xinhua News Agency, Chinese astronauts scientific training center, deputy director of the " space hero Yang Liwei said on July 1 in Zhuhai, China is still no plan and the flight, astronauts. IBM-smt (0.5194): Xinhua Zhuhai, November 2nd (Xinhua) (Reporter,) Chinese astronauts scientific training, deputy director of the Center, " Space hero ' yang Liwei, 1st in Zhuhai, said that China has no plan of selecting astronaut. maryland-hiero (0.5180): ZHUHAI, November 2 (Xinhua, ) deputy director of the Chinese astronauts scientific research and training center, "1" space hero Yang Liwei said in Zhuhai, China does not plan to make astronaut selection. CMU-combination (0.4701): zhuhai, november 2nd ( xinhua ) deputy director of china's astronaut training centre, " space hero yang liwei, 1st in zhuhai, china currently has no plans of the selected astronaut.

July 24, 2007GALE Update: Alon Lavie16 Chinese-English Example - Before THE SCIENTISTS IN ORDER TO Øü TO CLOSE IN THE EARLY PERIOD TO GO THE THE KNOWLEDGE THE THE DISEASE IN THE CHROMOSOME HAS BEEN COMPLETED IS SCHEDULED TO ORDER Overall: , Prob: , Rules: , Frag: 0.4, Length: , Words: 13,

July 24, 2007GALE Update: Alon Lavie17 Chinese-English Example - After THE SCIENTISTS COMPLETED SEQUENCING FOR THE CHROMOSOMES WHICH RELATED TO THE INITIAL STAGE DEMENTIA Overall: , Prob: , Rules: , Frag: 0, Length: , Words: 8, < : ¿Æѧ¼Ò Ϊ Øü¹Ø ³õÆÚ Ê§ÖÇÖ¢ µÄ ȾɫÌå Íê³É ¶¨Ðò (S,1 (NP,1 (LITERAL 'THE') (NB,1 (N,21601 'SCIENTISTS'))) (VP,4 (VP,1 (V,7513 'COMPLETED')(NP,2 (NB,1 (N, 'SEQUENCING')))) (PP,1 (PREP,5 'FOR')(NPRC,1 (NP,1 (LITERAL 'THE') (NB,1 (N, 'CHROMOSOMES'))) (LITERAL 'WHICH') (VP,1 (V,18 'RELATED TO') (NPASSOC,5 (NP,1 (LITERAL 'THE') (NB,1 (N,7637 'INITIAL STAGE'))) (NP,2 (NB,1 (N,445 'DEMENTIA')))))))))>

July 24, 2007GALE Update: Alon Lavie18 C/E Stat-XFER Lexicon Lexical sources and their sizes: –Named Entities from LDC lexical entries –Filtered base NPs from Parallel Corpus lexical entries –ADSO bilingual lexicon lexical entries –LDC word bilingual glossary lexical entries –Wikipedia extracted bilingual lexicon lexical entries –Phrases from Parallel Corpus lexical entries –Manual bilingual lexicon (high freq) 1149 lexical entries