CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.

Slides:



Advertisements
Similar presentations
Rule Learning – Overview Goal: learn transfer rules for a language pair where one language is resource-rich, the other is resource-poor Learning proceeds.
Advertisements

Enabling MT for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University.
The current status of Chinese- English EBMT -where are we now Joy (Ying Zhang) Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
Word and Phrase Alignment Presenters: Marta Tatu Mithun Balakrishna.
NICE: Native language Interpretation and Communication Environment Lori Levin, Jaime Carbonell, Alon Lavie, Ralf Brown Carnegie Mellon University.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
Automatic Rule Learning for Resource-Limited Machine Translation Alon Lavie, Katharina Probst, Erik Peterson, Jaime Carbonell, Lori Levin, Ralf Brown Language.
Machine Translation with Scarce Resources The Avenue Project.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
Machine Translation Challenges and Language Divergences Alon Lavie Language Technologies Institute Carnegie Mellon University : Machine Translation.
Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication.
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
Stat-XFER: A General Framework for Search-based Syntax-driven MT Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Machine Translation & Automated Speech Recognition Jaime Carbonell With Richard Stern and Alex Rudnicky Language Technologies Institute Carnegie Mellon.
Machine translation Context-based approach Lucia Otoyo.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Overview of the Language Technologies Institute and AVENUE Project Jaime Carbonell, Director March 2, 2002.
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
Statistical XFER: Hybrid Statistical Rule-based Machine Translation Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Dependency Tree-to-Dependency Tree Machine Translation November 4, 2011 Presented by: Jeffrey Flanigan (CMU) Lori Levin, Jaime Carbonell In collaboration.
Multi-Engine MT for Quick MT. Missing Technology for Quick MT LingWear ISI MT NICE Core Rapid MT - Multi-Engine MT - Omnivorous resource usage - Pervasive.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System Alon Lavie Language Technologies Institute Carnegie Mellon University.
Rule Learning - Overview Goal: Syntactic Transfer Rules 1) Flat Seed Generation: produce rules from word- aligned sentence pairs, abstracted only to POS.
AMTEXT: Extraction-based MT for Arabic Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Laura Kieras, Peter Jansen Informant: Loubna El Abadi.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
AVENUE Automatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University.
Carnegie Mellon Goal Recycle non-expert post-editing efforts to: - Refine translation rules automatically - Improve overall translation quality Proposed.
Data Collection and Language Technologies for Mapudungun Lori Levin, Rodolfo Vega, Jaime Carbonell, Ralf Brown, Alon Lavie Language Technologies Institute.
Overview of the Language Technologies Institute and AVENUE Project Jaime Carbonell, Director March 2, 2002.
Hebrew-to-English XFER MT Project - Update Alon Lavie June 2, 2004.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
Learning Transfer Rules for Machine Translation with Limited Data Thesis Defense Katharina Probst Committee: Alon Lavie (Chair) Jaime Carbonell Lori Levin.
Machine Translation Overview Alon Lavie Language Technologies Institute Carnegie Mellon University Open House March 18, 2005.
Language Technologies Institute School of Computer Science Carnegie Mellon University NSF, August 6, 2001 Machine Translation for Indigenous Languages.
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
NICE: Native Language Interpretation and Communication Environment Lori Levin, Jaime Carbonell, Alon Lavie, Ralf Brown, Erik Peterson, Katharina Probst,
The CMU Mill-RADD Project: Recent Activities and Results Alon Lavie Language Technologies Institute Carnegie Mellon University.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
Bridging the Gap: Machine Translation for Lesser Resourced Languages
Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
CMU Statistical-XFER System Hybrid “rule-based”/statistical system Scaled up version of our XFER approach developed for low-resource languages Large-coverage.
Eliciting a corpus of word- aligned phrases for MT Lori Levin, Alon Lavie, Erik Peterson Language Technologies Institute Carnegie Mellon University.
Seed Generation and Seeded Version Space Learning Version 0.02 Katharina Probst Feb 28,2002.
AMTEXT: Extraction-based MT for Arabic Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Laura Kieras, Peter Jansen Informant: Loubna El Abadi.
AVENUE: Machine Translation for Resource-Poor Languages NSF ITR
FROM BITS TO BOTS: Women Everywhere, Leading the Way Lenore Blum, Anastassia Ailamaki, Manuela Veloso, Sonya Allin, Bernardine Dias, Ariadna Font Llitjós.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Minority Languages Katharina Probst Language Technologies Institute Carnegie Mellon.
Enabling MT for Languages with Limited Resources Alon Lavie and Lori Levin Language Technologies Institute Carnegie Mellon University.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
NLP Midterm Solution #1 bilingual corpora –parallel corpus (document-aligned, sentence-aligned, word-aligned) (4) –comparable corpus (4) Source.
The AVENUE Project: Automatic Rule Learning for Resource-Limited Machine Translation Faculty: Alon Lavie, Jaime Carbonell, Lori Levin, Ralf Brown Students:
Eliciting a corpus of word-aligned phrases for MT
Approaches to Machine Translation
Faculty: Alon Lavie, Jaime Carbonell, Lori Levin, Ralf Brown Students:
Approaches to Machine Translation
AMTEXT: Extraction-based MT for Arabic
Presentation transcript:

CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin Probst, Erik Peterson, Chris Monson Language Technologies Institute, CMU

MilliRADD Approaches MT for minimal-resource languages Limited parallel corpora (100K) Limited dictionary (10K) Limited native speaker access No pre-existing grammars, tree-banks, etc. MT Engines Statistical MT (SMT) Example Based MT (EBMT) Transfer-based MT (Learning transfer rules) Multi-Engine (MEMT)

Statistical MT Core engine: same as MegaRADD Improvements since dry run: Treatment of articles/morphology Phrasal alignments and units Dynamic re-segmentation Best performing MT engine (Official NIST score = 6.14 Latest NIST score = 6.30)

Improvements to SMT Measured on Dry-Run Data BASELINE SMT 6.38 SMT + phrasing for S-T and T-S 6.74 SMT + phrasing for S-S and T-T 6.80 SMT + both phrasings 6.88

Example-Based MT Longest-fragment match in corpus Permits inexact matching Multiple matches generate lattice Target language model finds best path Augmentation 10K dictionary + bilingual word-pairs extracted statistically from 100K corpus Official NIST score = 5.29 (3.97 w/o dict)

EBMT: Combine Trx Fragments English: I would like to meet her. Mapudungun: Ayükefun trawüael fey engu. English: The tallest man is my father. Mapudungun: Chi doy fütra chi wentru fey ta inche ñi chaw. English: I would like to meet the tallest man Mapudungun (new): Ayükefun trawüael Chi doy fütra chi wentru Mapudungun (correct): Ayüken ñi trawüael chi doy fütra wentruengu.

Transfer Rules Manually-developed transfer rules (only 71 hours development time) Strict compositionality Lexicon = 10K + statistical pairs from 100K parallel corpus. Target Language model disambiguation Official NIST score = 4.84

Segmentation Differences on Mandarin Dry-Run Evaluation Segmentation with large dict 10K-dictionary + T-bank w’s SMT EBMT Transfer

Next Major Developents Continued improvements to all engines New algorithm for EBMT (for Mega- and Milli-RADD) Multi-Engine combinations Automatically-acquired Transfer Rule Approach based on seeded VS’s

Compositionality Adjust rule to reflect compositionality NP rule can be used to translate part of the sentence; keep / add context constraints, eliminate unnecessary ones Flat Seed Generation The highly qualified applicant visits the company. Der äußerst qualifizierte Bewerber besucht die Firma. ((1,1),(2,2),(3,3),(4,4),(5,5),(6,6)) S::S [det adv adj n v det n] [det adv adj n v det n] ((x1::y1) (x2::y2)…. ((x4 agr) = *3-sing) … ((y3 case) = *nom)…) Goal: Syntactic Transfer Rules 1) Flat Seed Generation: produce rules from word-aligned sentence pairs, abstracted only to POS level; no syntactic structure 2) Add compositional structure to Seed Rule by exploiting previously learned rules 3) Seeded Version Space Learning group seed rules by constituent sequences and alignments, seed rules form s-boundary of VS; generalize with validation Seeded Version Space Learning Group seed rules into version spaces: … NP v det n… Notes: 1) Partial order of rules in VS 2) Generalization via merging S::S [NP v det n] [NP n v det n] ((x1::y1) (x2::y2)…. … ((y1 case) = *nom)…) NP::NP [det adv adj n] [det adv adj n] ((x1::y1)… ((y4 agr) = (x4 agr) ….) Merge two rules: 1) Deletion of constraint 2) Raising of two value to one Agreement constraint, e.g. ((x1 num) = *pl), ((x3 num) = *pl) ((x1 num) = (x3 num) 3) Use merged rule to translate