Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System Alon Lavie Language Technologies Institute Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

Word Sense Disambiguation for Machine Translation Han-Bin Chen
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
Resource Acquisition for Syntax-based MT from Parsed Parallel data Alon Lavie, Alok Parlikar and Vamshi Ambati Language Technologies Institute Carnegie.
Stat-XFER: A General Framework for Search-based Syntax-driven MT Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Enabling MT for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University.
Automatic Rule Learning for Resource-Limited Machine Translation Alon Lavie, Katharina Probst, Erik Peterson, Jaime Carbonell, Lori Levin, Ralf Brown Language.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Stat-XFER: A General Framework for Search-based Syntax-driven MT Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
Stat-XFER: A General Framework for Search-based Syntax-driven MT Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
MT for Languages with Limited Resources Machine Translation April 20, 2011 Based on Joint Work with: Lori Levin, Jaime Carbonell, Stephan Vogel,
Stat-XFER: A General Framework for Search-based Syntax-driven MT Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
Statistical XFER: Hybrid Statistical Rule-based Machine Translation Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System Alon Lavie Language Technologies Institute Carnegie Mellon University.
Improving Statistical Machine Translation by Means of Transfer Rules Nurit Melnik.
Rule Learning - Overview Goal: Syntactic Transfer Rules 1) Flat Seed Generation: produce rules from word- aligned sentence pairs, abstracted only to POS.
AMTEXT: Extraction-based MT for Arabic Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Laura Kieras, Peter Jansen Informant: Loubna El Abadi.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
Hebrew-to-English XFER MT Project - Update Alon Lavie June 2, 2004.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
Nov 17, 2005Learning-based MT1 Learning-based MT Approaches for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon.
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
The CMU Mill-RADD Project: Recent Activities and Results Alon Lavie Language Technologies Institute Carnegie Mellon University.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
CMU Statistical-XFER System Hybrid “rule-based”/statistical system Scaled up version of our XFER approach developed for low-resource languages Large-coverage.
Eliciting a corpus of word- aligned phrases for MT Lori Levin, Alon Lavie, Erik Peterson Language Technologies Institute Carnegie Mellon University.
NATURAL LANGUAGE PROCESSING
Seed Generation and Seeded Version Space Learning Version 0.02 Katharina Probst Feb 28,2002.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
AVENUE: Machine Translation for Resource-Poor Languages NSF ITR
Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Minority Languages Katharina Probst Language Technologies Institute Carnegie Mellon.
Enabling MT for Languages with Limited Resources Alon Lavie and Lori Levin Language Technologies Institute Carnegie Mellon University.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
The AVENUE Project: Automatic Rule Learning for Resource-Limited Machine Translation Faculty: Alon Lavie, Jaime Carbonell, Lori Levin, Ralf Brown Students:
Approaches to Machine Translation
Faculty: Alon Lavie, Jaime Carbonell, Lori Levin, Ralf Brown Students:
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Stat-Xfer מציגים: יוגב וקנין ועומר טבח, 05/01/2012
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Stat-XFER: A General Framework for Search-based Syntax-driven MT
Approaches to Machine Translation
Stat-XFER: A General Framework for Search-based Syntax-driven MT
AMTEXT: Extraction-based MT for Arabic
Presentation transcript:

Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with: Shuly Wintner, Danny Shacham, Nurit Melnik, Yuval Krymolowski - University of Haifa Erik Peterson – Carnegie Mellon University

June 20, 2007ISCOL/BISFAI Outline Context of this Work CMU Statistical Transfer MT Framework Hebrew and its Challenges for MT Hebrew-to-English System Morphological Analysis and Generation MT Resources: lexicon and grammar Translation Examples Performance Evaluation Conclusions, Current and Future Work

June 20, 2007ISCOL/BISFAI Current State-of-the-art in Machine Translation MT underwent a major paradigm shift over the past 15 years: –From manually crafted rule-based systems with manually designed knowledge resources –To search-based approaches founded on automatic extraction of translation models/units from large sentence- parallel corpora Current Dominant Approach: Phrase-based Statistical MT: –Extract and statistically model large volumes of phrase-to- phrase correspondences from automatically word-aligned parallel corpora –“Decode” new input by searching for the most likely sequence of phrase matches, using a statistical Language Model for the target language

June 20, 2007ISCOL/BISFAI Current State-of-the-art in Machine Translation Phrase-based MT State-of-the-art: –Requires minimally several million words of parallel text for adequate training –Limited to language-pairs for which such data exists: major European languages, Chinese, Japanese, a few others… –Linguistically shallow and highly lexicalized models result in weak generalization –Best performance levels (BLEU=~0.6) on Arabic-to- English provide understandable but often still somewhat disfluent translations –Ill suited for Hebrew and most of the world’s minor languages

June 20, 2007ISCOL/BISFAI CMU’s Statistical-Transfer (XFER) Approach Framework: Statistical search-based approach with syntactic translation transfer rules that can be acquired from data but also developed and extended by experts Elicitation: use bilingual native informants to produce a small high-quality word-aligned bilingual corpus of translated phrases and sentences Transfer-rule Learning: apply ML-based methods to automatically acquire syntactic transfer rules for translation between the two languages XFER + Decoder: –XFER engine produces a lattice of possible transferred structures at all levels –Decoder searches and selects the best scoring combination Rule Refinement: refine the acquired rules via a process of interaction with bilingual informants Word and Phrase bilingual lexicon acquisition

Transfer Engine English Language Model Transfer Rules {NP1,3} NP1::NP1 [NP1 "H" ADJ] -> [ADJ NP1] ((X3::Y1) (X1::Y2) ((X1 def) = +) ((X1 status) =c absolute) ((X1 num) = (X3 num)) ((X1 gen) = (X3 gen)) (X0 = X1)) Translation Lexicon N::N |: ["$WR"] -> ["BULL"] ((X1::Y1) ((X0 NUM) = s) ((Y0 lex) = "BULL")) N::N |: ["$WRH"] -> ["LINE"] ((X1::Y1) ((X0 NUM) = s) ((Y0 lex) = "LINE")) Hebrew Input בשורה הבאה Decoder English Output in the next line Translation Output Lattice (0 1 (1 1 (2 2 (1 2 "THE (0 2 "IN (0 4 "IN THE NEXT Preprocessing Morphology

June 20, 2007ISCOL/BISFAI Transfer Rule Formalism Type information Part-of-speech/constituent information Alignments x-side constraints y-side constraints xy-constraints, e.g. ((Y1 AGR) = (X1 AGR)) ; SL: the old man, TL: ha-ish ha-zaqen NP::NP [DET ADJ N] -> [DET N DET ADJ] ( (X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) ((X1 AGR) = *3-SING) ((X1 DEF = *DEF) ((X3 AGR) = *3-SING) ((X3 COUNT) = +) ((Y1 DEF) = *DEF) ((Y3 DEF) = *DEF) ((Y2 AGR) = *3-SING) ((Y2 GENDER) = (Y4 GENDER)) )

June 20, 2007ISCOL/BISFAI The Transfer Engine Main algorithm: chart-style bottom-up integrated parsing+transfer with beam pruning –Seeded by word-to-word translations –Driven by transfer rules –Generates a lattice of transferred translation segments at all levels Some Unique Features: –Works with either learned or manually-developed transfer grammars –Handles rules with or without unification constraints –Supports interfacing with servers for morphological analysis and generation –Can handle ambiguous source-word analyses and/or SL segmentations represented in the form of lattice structures

June 20, 2007ISCOL/BISFAI XFER Output Lattice (28 28 "AND" "W" "(CONJ,0 'AND')") (29 29 "SINCE" "MAZ " "(ADVP,0 (ADV,5 'SINCE')) ") (29 29 "SINCE THEN" "MAZ " "(ADVP,0 (ADV,6 'SINCE THEN')) ") (29 29 "EVER SINCE" "MAZ " "(ADVP,0 (ADV,4 'EVER SINCE')) ") (30 30 "WORKED" "&BD " "(VERB,0 (V,11 'WORKED')) ") (30 30 "FUNCTIONED" "&BD " "(VERB,0 (V,10 'FUNCTIONED')) ") (30 30 "WORSHIPPED" "&BD " "(VERB,0 (V,12 'WORSHIPPED')) ") (30 30 "SERVED" "&BD " "(VERB,0 (V,14 'SERVED')) ") (30 30 "SLAVE" "&BD " "(NP0,0 (N,34 'SLAVE')) ") (30 30 "BONDSMAN" "&BD " "(NP0,0 (N,36 'BONDSMAN')) ") (30 30 "A SLAVE" "&BD " "(NP,1 (LITERAL 'A') (NP2,0 (NP1,0 (NP0,0 (N,34 'SLAVE')) ) ) ) ") (30 30 "A BONDSMAN" "&BD " "(NP,1 (LITERAL 'A') (NP2,0 (NP1,0 (NP0,0 (N,36 'BONDSMAN')) ) ) ) ")

June 20, 2007ISCOL/BISFAI The Lattice Decoder Simple Stack Decoder, similar in principle to simple Statistical MT decoders Searches for best-scoring path of non-overlapping lattice arcs No reordering during decoding Scoring based on log-linear combination of scoring components, with weights trained using MERT Scoring components: –Statistical Language Model –Fragmentation: how many arcs to cover the entire translation? –Length Penalty –Rule Scores –Lexical Probabilities (not fully integrated)

June 20, 2007ISCOL/BISFAI XFER Lattice Decoder 0 0 ON THE FOURTH DAY THE LION ATE THE RABBIT TO A MORNING MEAL Overall: , Prob: , Rules: 0, Frag: , Length: 0, Words: 13, < : B H IWM RBI&I (PP,0 (PREP,3 'ON')(NP,2 (LITERAL 'THE') (NP2,0 (NP1,1 (ADJ,2 (QUANT,0 'FOURTH'))(NP1,0 (NP0,1 (N,6 'DAY')))))))> 918 < : H ARIH AKL AT H $PN (S,2 (NP,2 (LITERAL 'THE') (NP2,0 (NP1,0 (NP0,1 (N,17 'LION')))))(VERB,0 (V,0 'ATE'))(NP,100 (NP,2 (LITERAL 'THE') (NP2,0 (NP1,0 (NP0,1 (N,24 'RABBIT')))))))> 584 < : L ARWXH BWQR (PP,0 (PREP,6 'TO')(NP,1 (LITERAL 'A') (NP2,0 (NP1,0 (NNP,3 (NP0,0 (N,32 'MORNING'))(NP0,0 (N,27 'MEAL')))))))>

June 20, 2007ISCOL/BISFAI XFER MT Prototypes General XFER framework under development for past five years Prototype systems so far: –German-to-English –Dutch-to-English –Chinese-to-English –Hindi-to-English –Hebrew-to-English In progress or planned: –Mapudungun-to-Spanish –Quechua-to-Spanish –Brazilian Portuguese-to-English –Native-Brazilian languages to Brazilian Portuguese –Hebrew-to-Arabic

June 20, 2007ISCOL/BISFAI Challenges for Hebrew MT Puacity in existing language resources for Hebrew –No publicly available broad coverage morphological analyzer –No publicly available bilingual lexicons or dictionaries –No POS-tagged corpus or parse tree-bank corpus for Hebrew –No large Hebrew/English parallel corpus Scenario well suited for CMU transfer-based MT framework for languages with limited resources

June 20, 2007ISCOL/BISFAI Modern Hebrew Spelling Two main spelling variants –“KTIV XASER” (difficient): spelling with the vowel diacritics, and consonant words when the diacritics are removed –“KTIV MALEH” (full): words with I/O/U vowels are written with long vowels which include a letter KTIV MALEH is predominant, but not strictly adhered to even in newspapers and official publications  inconsistent spelling Example: –niqud (spelling): NIQWD, NQWD, NQD –When written as NQD, could also be niqed, naqed, nuqad

June 20, 2007ISCOL/BISFAI Morphological Analyzer We use a publicly available morphological analyzer distributed by the Technion’s Knowledge Center, adapted for our system Coverage is reasonable (for nouns, verbs and adjectives) Produces all analyses or a disambiguated analysis for each word Output format includes lexeme (base form), POS, morphological features Output was adapted to our representation needs (POS and feature mappings)

June 20, 2007ISCOL/BISFAI Morphology Example Input word: B$WRH | B$WRH | |-----B-----|$WR|--H--| |--B--|-H--|--$WRH---|

June 20, 2007ISCOL/BISFAI Morphology Example Y0: ((SPANSTART 0) Y1: ((SPANSTART 0) Y2: ((SPANSTART 1) (SPANEND 4) (SPANEND 2) (SPANEND 3) (LEX B$WRH) (LEX B) (LEX $WR) (POS N) (POS PREP)) (POS N) (GEN F) (GEN M) (NUM S) (NUM S) (STATUS ABSOLUTE)) (STATUS ABSOLUTE)) Y3: ((SPANSTART 3) Y4: ((SPANSTART 0) Y5: ((SPANSTART 1) (SPANEND 4) (SPANEND 1) (SPANEND 2) (LEX $LH) (LEX B) (LEX H) (POS POSS)) (POS PREP)) (POS DET)) Y6: ((SPANSTART 2) Y7: ((SPANSTART 0) (SPANEND 4) (SPANEND 4) (LEX $WRH) (LEX B$WRH) (POS N) (POS LEX)) (GEN F) (NUM S) (STATUS ABSOLUTE))

June 20, 2007ISCOL/BISFAI Translation Lexicon Constructed our own Hebrew-to-English lexicon, based primarily on existing “Dahan” H-to-E and E-to-H dictionary made available to us, augmented by other public sources Coverage is not great but not bad as a start –Dahan H-to-E is about 15K translation pairs –Dahan E-to-H is about 7K translation pairs Base forms, POS information on both sides Converted Dahan into our representation, added entries for missing closed-class entries (pronouns, prepositions, etc.) Had to deal with spelling conventions Recently augmented with ~50K translation pairs extracted from Wikipedia (mostly proper names and named entities)

June 20, 2007ISCOL/BISFAI Manual Transfer Grammar (human-developed) Initially developed by Alon in a couple of days, extended and revised by Nurit over time Current grammar has 36 rules: –21 NP rules –one PP rule –6 verb complexes and VP rules –8 higher-phrase and sentence-level rules Captures the most common (mostly local) structural differences between Hebrew and English

June 20, 2007ISCOL/BISFAI Transfer Grammar Example Rules {NP1,2} ;;SL: $MLH ADWMH ;;TL: A RED DRESS NP1::NP1 [NP1 ADJ] -> [ADJ NP1] ( (X2::Y1) (X1::Y2) ((X1 def) = -) ((X1 status) =c absolute) ((X1 num) = (X2 num)) ((X1 gen) = (X2 gen)) (X0 = X1) ) {NP1,3} ;;SL: H $MLWT H ADWMWT ;;TL: THE RED DRESSES NP1::NP1 [NP1 "H" ADJ] -> [ADJ NP1] ( (X3::Y1) (X1::Y2) ((X1 def) = +) ((X1 status) =c absolute) ((X1 num) = (X3 num)) ((X1 gen) = (X3 gen)) (X0 = X1) )

June 20, 2007ISCOL/BISFAI Hebrew-to-English MT Prototype Initial prototype developed within a two month intensive effort Accomplished: –Adapted available morphological analyzer –Constructed a preliminary translation lexicon –Translated and aligned Elicitation Corpus –Learned XFER rules –Developed (small) manual XFER grammar –System debugging and development –Evaluated performance on unseen test data using automatic evaluation metrics

June 20, 2007ISCOL/BISFAI Example Translation Input: – לאחר דיונים רבים החליטה הממשלה לערוך משאל עם בנושא הנסיגה –After debates many decided the government to hold referendum in issue the withdrawal Output: –AFTER MANY DEBATES THE GOVERNMENT DECIDED TO HOLD A REFERENDUM ON THE ISSUE OF THE WITHDRAWAL

June 20, 2007ISCOL/BISFAI Noun Phrases – Construct State decision.3SF-CSthe-president.3SMthe-first.3SM החלטת הנשיא הראשון החלטת הנשיא הראשונה decision.3SF-CSthe-president.3SMthe-first.3SF THE DECISION OF THE FIRST PRESIDENT THE FIRST DECISION OF THE PRESIDENT

June 20, 2007ISCOL/BISFAI Noun Phrases - Possessives HNSIAHKRIZ$HM$IMHHRA$WNH$LWTHIH the-presidentannouncedthat-the-task.3SFthe-first.3SFof-himwill.3SF LMCWA PTRWNLSKSWKBAZWRNW to-findsolutionto-the-conflictin-region-POSS.1P הנשיא הכריז שהמשימה הראשונה שלו תהיה למצוא פתרון לסכסוך באזורנו Without transfer grammar: THE PRESIDENT ANNOUNCED THAT THE TASK THE BEST OF HIM WILL BE TO FIND SOLUTION TO THE CONFLICT IN REGION OUR With transfer grammar: THE PRESIDENT ANNOUNCED THAT HIS FIRST TASK WILL BE TO FIND A SOLUTION TO THE CONFLICT IN OUR REGION

June 20, 2007ISCOL/BISFAI Subject-Verb Inversion ATMWLHWDI&HHMM$LH yesterdayannounced.3SFthe-government.3SF אתמול הודיעה הממשלה שתערכנה בחירות בחודש הבא $T&RKNHBXIRWTBXWD$HBA that-will-be-held.3PFelections.3PFin-the-monththe-next Without transfer grammar: YESTERDAY ANNOUNCED THE GOVERNMENT THAT WILL RESPECT OF THE FREEDOM OF THE MONTH THE NEXT With transfer grammar: YESTERDAY THE GOVERNMENT ANNOUNCED THAT ELECTIONS WILL ASSUME IN THE NEXT MONTH

June 20, 2007ISCOL/BISFAI Subject-Verb Inversion LPNIKMH$BW&WTHWDI&HHNHLTHMLWN beforeseveralweeksannounced.3SFmanagement.3SF.CSthe-hotel לפני כמה שבועות הודיעה הנהלת המלון שהמלון יסגר בסוף השנה $HMLWNISGRBSWFH$NH that-the-hotel.3SMwill-be-closed.3SMat-end.3SM.CSthe-year Without transfer grammar: IN FRONT OF A FEW WEEKS ANNOUNCED ADMINISTRATION THE HOTEL THAT THE HOTEL WILL CLOSE AT THE END THIS YEAR With transfer grammar: SEVERAL WEEKS AGO THE MANAGEMENT OF THE HOTEL ANNOUNCED THAT THE HOTEL WILL CLOSE AT THE END OF THE YEAR

June 20, 2007ISCOL/BISFAI Evaluation Results Test set of 62 sentences from Haaretz newspaper, 2 reference translations SystemBLEUNISTPRMETEOR No Gram Learned Manual

June 20, 2007ISCOL/BISFAI Current and Future Work Issues specific to the Hebrew-to-English system: –Coverage: further improvements in the translation lexicon and morphological analyzer –Manual Grammar development –Acquiring/training of word-to-word translation probabilities –Acquiring/training of a Hebrew language model at a post- morphology level that can help with disambiguation General Issues related to XFER framework: –Discriminative Language Modeling for MT –Effective models for assigning scores to transfer rules –Improved grammar learning –Merging/integration of manual and acquired grammars

June 20, 2007ISCOL/BISFAI Conclusions Test case for the CMU XFER framework for rapid MT prototyping Preliminary system was a two-month, three person effort – we were quite happy with the outcome Core concept of XFER + Decoding is very powerful and promising for MT We experienced the main bottlenecks of knowledge acquisition for MT: morphology, translation lexicons, grammar...

June 20, 2007ISCOL/BISFAI Questions?