Statistical XFER: Hybrid Statistical Rule-based Machine Translation Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:

Statistical XFER: Hybrid Statistical Rule-based Machine Translation Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with: Jaime Carbonell, Lori Levin, Bob Frederking, Erik Peterson, Christian Monson, Vamshi Ambati, Greg Hanneman, Kathrin Probst, Ariadna Font-Llitjos, Alison Alvarez, Roberto Aranovich

Aug 29, 2007Statistical XFER MT2 Outline Background and Rationale Stat-XFER Framework Overview Elicitation Learning Transfer Rules Automatic Rule Refinement Example Prototypes Major Research Challenges

Aug 29, 2007Statistical XFER MT3 Progression of MT Started with rule-based systems –Very large expert human effort to construct language- specific resources (grammars, lexicons) –High-quality MT extremely expensive  only for handful of language pairs Along came EBMT and then Statistical MT… –Replaced human effort with extremely large volumes of parallel text data –Less expensive, but still only feasible for a small number of language pairs –We “traded” human labor with data Where does this take us in 5-10 years? –Large parallel corpora for maybe 25-50 language pairs What about all the other languages? Is all this data (with very shallow representation of language structure) really necessary? Can we build MT approaches that learn deeper levels of language structure and how they map from one language to another?

Aug 29, 2007Statistical XFER MT4 Rule-based vs. Statistical MT Traditional Rule-based MT: –Expressive and linguistically-rich formalisms capable of describing complex mappings between the two languages –Accurate “clean” resources –Everything constructed manually by experts –Main challenge: obtaining broad coverage Phrase-based Statistical MT: –Learn word and phrase correspondences automatically from large volumes of parallel data –Search-based “decoding” framework: Models propose many alternative translations Effective search algorithms find the “best” translation –Main challenge: obtaining high translation accuracy

Aug 29, 2007Statistical XFER MT5 Main Principles of Stat-XFER Integrate the major strengths of rule-based and statistical MT within a common framework: –Linguistically rich formalism that can express complex and abstract compositional transfer rules –Rules can be written by human experts and also acquired automatically from data –Easy integration of morphological analyzers and generators –Word and basic phrase correspondences (i.e. base NPs) can be automatically acquired from parallel text when available –Search-based decoding from statistical MT adapted to find the best translation within the search space: multi-feature scoring, beam-search, parameter optimization, etc. –Framework suitable for both resource-rich and resource- poor language scenarios

Aug 29, 2007Statistical XFER MT6 Stat-XFER MT Approach Interlingua Syntactic Parsing Semantic Analysis Sentence Planning Text Generation Source (e.g. Quechua) Target (e.g. English) Transfer Rules Direct: SMT, EBMT Statistical-XFER

Aug 29, 2007Statistical XFER MT7 Transfer Engine Language Model + Additional Features Transfer Rules {NP1,3} NP1::NP1 [NP1 "H" ADJ] -> [ADJ NP1] ((X3::Y1) (X1::Y2) ((X1 def) = +) ((X1 status) =c absolute) ((X1 num) = (X3 num)) ((X1 gen) = (X3 gen)) (X0 = X1)) Translation Lexicon N::N |: ["$WR"] -> ["BULL"] ((X1::Y1) ((X0 NUM) = s) ((Y0 lex) = "BULL")) N::N |: ["$WRH"] -> ["LINE"] ((X1::Y1) ((X0 NUM) = s) ((Y0 lex) = "LINE")) Source Input בשורה הבאה Decoder English Output in the next line Translation Output Lattice (0 1 "IN" @PREP) (1 1 "THE" @DET) (2 2 "LINE" @N) (1 2 "THE LINE" @NP) (0 2 "IN LINE" @PP) (0 4 "IN THE NEXT LINE" @PP) Preprocessing Morphology

Aug 29, 2007Statistical XFER MT8 Transfer Rule Formalism Type information Part-of-speech/constituent information Alignments x-side constraints y-side constraints xy-constraints, e.g. ((Y1 AGR) = (X1 AGR)) ; SL: the old man, TL: ha-ish ha-zaqen NP::NP [DET ADJ N] -> [DET N DET ADJ] ( (X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) ((X1 AGR) = *3-SING) ((X1 DEF = *DEF) ((X3 AGR) = *3-SING) ((X3 COUNT) = +) ((Y1 DEF) = *DEF) ((Y3 DEF) = *DEF) ((Y2 AGR) = *3-SING) ((Y2 GENDER) = (Y4 GENDER)) )

Aug 29, 2007Statistical XFER MT9 Transfer Rule Formalism (II) Value constraints Agreement constraints ;SL: the old man, TL: ha-ish ha-zaqen NP::NP [DET ADJ N] -> [DET N DET ADJ] ( (X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) ((X1 AGR) = *3-SING) ((X1 DEF = *DEF) ((X3 AGR) = *3-SING) ((X3 COUNT) = +) ((Y1 DEF) = *DEF) ((Y3 DEF) = *DEF) ((Y2 AGR) = *3-SING) ((Y2 GENDER) = (Y4 GENDER)) )

Aug 29, 2007Statistical XFER MT10 Hebrew Manual Transfer Grammar (human-developed) Initially developed in a couple of days, with some later revisions by a CL post-doc Current grammar has 36 rules: –21 NP rules –one PP rule –6 verb complexes and VP rules –8 higher-phrase and sentence-level rules Captures the most common (mostly local) structural differences between Hebrew and English

Aug 29, 2007Statistical XFER MT11 Hebrew Transfer Grammar Example Rules {NP1,2} ;;SL: $MLH ADWMH ;;TL: A RED DRESS NP1::NP1 [NP1 ADJ] -> [ADJ NP1] ( (X2::Y1) (X1::Y2) ((X1 def) = -) ((X1 status) =c absolute) ((X1 num) = (X2 num)) ((X1 gen) = (X2 gen)) (X0 = X1) ) {NP1,3} ;;SL: H $MLWT H ADWMWT ;;TL: THE RED DRESSES NP1::NP1 [NP1 "H" ADJ] -> [ADJ NP1] ( (X3::Y1) (X1::Y2) ((X1 def) = +) ((X1 status) =c absolute) ((X1 num) = (X3 num)) ((X1 gen) = (X3 gen)) (X0 = X1) )

Aug 29, 2007Statistical XFER MT12 The XFER Engine Input: source-language input sentence, or source- language confusion network Output: lattice representing collection of translation fragments at all levels supported by transfer rules Basic Algorithm: “bottom-up” integrated “parsing- transfer-generation” guided by the transfer rules –Start with translations of individual words and phrases from translation lexicon –Create translations of larger constituents by applying applicable transfer rules to previously created lattice entries –Beam-search controls the exponential combinatorics of the search-space, using multiple scoring features

Aug 29, 2007Statistical XFER MT13 Source-language Confusion Network Hebrew Example Input word: B$WRH 0 1 2 3 4 |--------B$WRH--------| |-----B-----|$WR|--H--| |--B--|-H--|--$WRH---|

Aug 29, 2007Statistical XFER MT14 XFER Output Lattice (28 28 "AND" -5.6988 "W" "(CONJ,0 'AND')") (29 29 "SINCE" -8.20817 "MAZ " "(ADVP,0 (ADV,5 'SINCE')) ") (29 29 "SINCE THEN" -12.0165 "MAZ " "(ADVP,0 (ADV,6 'SINCE THEN')) ") (29 29 "EVER SINCE" -12.5564 "MAZ " "(ADVP,0 (ADV,4 'EVER SINCE')) ") (30 30 "WORKED" -10.9913 "&BD " "(VERB,0 (V,11 'WORKED')) ") (30 30 "FUNCTIONED" -16.0023 "&BD " "(VERB,0 (V,10 'FUNCTIONED')) ") (30 30 "WORSHIPPED" -17.3393 "&BD " "(VERB,0 (V,12 'WORSHIPPED')) ") (30 30 "SERVED" -11.5161 "&BD " "(VERB,0 (V,14 'SERVED')) ") (30 30 "SLAVE" -13.9523 "&BD " "(NP0,0 (N,34 'SLAVE')) ") (30 30 "BONDSMAN" -18.0325 "&BD " "(NP0,0 (N,36 'BONDSMAN')) ") (30 30 "A SLAVE" -16.8671 "&BD " "(NP,1 (LITERAL 'A') (NP2,0 (NP1,0 (NP0,0 (N,34 'SLAVE')) ) ) ) ") (30 30 "A BONDSMAN" -21.0649 "&BD " "(NP,1 (LITERAL 'A') (NP2,0 (NP1,0 (NP0,0 (N,36 'BONDSMAN')) ) ) ) ")

Aug 29, 2007Statistical XFER MT15 The Lattice Decoder Simple Stack Decoder, similar in principle to simple Statistical MT decoders Searches for best-scoring path of non-overlapping lattice arcs No reordering during decoding Scoring based on log-linear combination of scoring components, with weights trained using MERT Scoring components: –Statistical Language Model –Fragmentation: how many arcs to cover the entire translation? –Length Penalty –Rule Scores –Lexical Probabilities

Aug 29, 2007Statistical XFER MT16 XFER Lattice Decoder 0 0 ON THE FOURTH DAY THE LION ATE THE RABBIT TO A MORNING MEAL Overall: -8.18323, Prob: -94.382, Rules: 0, Frag: 0.153846, Length: 0, Words: 13,13 235 < 0 8 -19.7602: B H IWM RBI&I (PP,0 (PREP,3 'ON')(NP,2 (LITERAL 'THE') (NP2,0 (NP1,1 (ADJ,2 (QUANT,0 'FOURTH'))(NP1,0 (NP0,1 (N,6 'DAY')))))))> 918 < 8 14 -46.2973: H ARIH AKL AT H $PN (S,2 (NP,2 (LITERAL 'THE') (NP2,0 (NP1,0 (NP0,1 (N,17 'LION')))))(VERB,0 (V,0 'ATE'))(NP,100 (NP,2 (LITERAL 'THE') (NP2,0 (NP1,0 (NP0,1 (N,24 'RABBIT')))))))> 584 < 14 17 -30.6607: L ARWXH BWQR (PP,0 (PREP,6 'TO')(NP,1 (LITERAL 'A') (NP2,0 (NP1,0 (NNP,3 (NP0,0 (N,32 'MORNING'))(NP0,0 (N,27 'MEAL')))))))>

Aug 29, 2007Statistical XFER MT17 Data Elicitation for Languages with Limited Resources Rationale: –Large volumes of parallel text not available  create a small maximally-diverse parallel corpus that directly supports the learning task –Bilingual native informant(s) can translate and align a small pre-designed elicitation corpus, using elicitation tool –Elicitation corpus designed to be typologically and structurally comprehensive and compositional –Transfer-rule engine and new learning approach support acquisition of generalized transfer-rules from the data

Aug 29, 2007Statistical XFER MT18 Elicitation Tool: English-Chinese Example

Aug 29, 2007Statistical XFER MT19 Elicitation Tool: English-Chinese Example

Aug 29, 2007Statistical XFER MT20 Elicitation Tool: English-Hindi Example

Aug 29, 2007Statistical XFER MT21 Elicitation Tool: English-Arabic Example

Aug 29, 2007Statistical XFER MT22 Elicitation Tool: Spanish-Mapudungun Example

Aug 29, 2007Statistical XFER MT23 Designing Elicitation Corpora Goal: Create a small representative parallel corpus that contains examples of the most important translation correspondences and divergences between the two languages Method: –Elicit translations and word alignments for a broad diversity of linguistic phenomena and constructions Current Elicitation Corpus: ~3100 sentences and phrases, constructed based on a broad feature-based specification Open Research Issues: –Feature Detection: discover what features exist in the language and where/how they are marked Example: does the language mark gender of nouns? How and where are these marked? –Dynamic corpus navigation based on feature detection: no need to elicit for combinations involving non-existent features

Aug 29, 2007Statistical XFER MT24 Rule Learning - Overview Goal: Acquire Syntactic Transfer Rules Use available knowledge from the source side (grammatical structure) Three steps: 1.Flat Seed Generation: first guesses at transfer rules; flat syntactic structure 2.Compositionality Learning: use previously learned rules to learn hierarchical structure 3.Constraint Learning: refine rules by learning appropriate feature constraints

Aug 29, 2007Statistical XFER MT25 Flat Seed Rule Generation Learning Example: NP Eng: the big apple Heb: ha-tapuax ha-gadol Generated Seed Rule: NP::NP [ART ADJ N]  [ART N ART ADJ] ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))

Aug 29, 2007Statistical XFER MT26 Compositionality Learning Initial Flat Rules: S::S [ART ADJ N V ART N]  [ART N ART ADJ V P ART N] ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) (X4::Y5) (X5::Y7) (X6::Y8)) NP::NP [ART ADJ N]  [ART N ART ADJ] ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2)) NP::NP [ART N]  [ART N] ((X1::Y1) (X2::Y2)) Generated Compositional Rule: S::S [NP V NP]  [NP V P NP] ((X1::Y1) (X2::Y2) (X3::Y4))

Aug 29, 2007Statistical XFER MT27 Constraint Learning Input: Rules and their Example Sets S::S [NP V NP]  [NP V P NP] {ex1,ex12,ex17,ex26} ((X1::Y1) (X2::Y2) (X3::Y4)) NP::NP [ART ADJ N]  [ART N ART ADJ] {ex2,ex3,ex13} ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2)) NP::NP [ART N]  [ART N] {ex4,ex5,ex6,ex8,ex10,ex11} ((X1::Y1) (X2::Y2)) Output: Rules with Feature Constraints: S::S [NP V NP]  [NP V P NP] ((X1::Y1) (X2::Y2) (X3::Y4) (X1 NUM = X2 NUM) (Y1 NUM = Y2 NUM) (X1 NUM = Y1 NUM))

Aug 29, 2007Statistical XFER MT28 Automated Rule Refinement Bilingual informants can identify translation errors and pinpoint the errors A sophisticated trace of the translation path can identify likely sources for the error and do “Blame Assignment” Rule Refinement operators can be developed to modify the underlying translation grammar (and lexicon) based on characteristics of the error source: –Add or delete feature constraints from a rule –Bifurcate a rule into two rules (general and specific) –Add or correct lexical entries See [Font-Llitjos, Carbonell & Lavie, 2005]

Aug 29, 2007Statistical XFER MT29 Stat-XFER MT Prototypes General Statistical XFER framework under development for past five years (funded by NSF and DARPA) Prototype systems so far: –Chinese-to-English –Dutch-to-English –French-to-English –Hindi-to-English –Hebrew-to-English –Mapudungun-to-Spanish In progress or planned: –Brazilian Portuguese-to-English –Native-Brazilian languages to Brazilian Portuguese –Hebrew-to-Arabic –Iñupiaq-to-English –Urdu-to-English –Turkish-to-English

Aug 29, 2007Statistical XFER MT30 Chinese-English Stat-XFER System Bilingual lexicon: over 1.1 million entries (multiple resources, incl. ADSO, Wikipedia, extracted base NPs) Manual syntactic XFER grammar: 76 rules! (mostly NPs, a few PPs, and reordering of NPs/PPs within VPs) Multiple overlapping Chinese word segmentations English morphology generation Uses CMU SMT-group’s Suffix-Array LM toolkit for LM Current Performance (GALE dev-test): –NW: XFER: 10.89(B)/0.4509(M) Best (UMD): 15.58(B)/0.4769(M) –NG XFER: 8.92(B)/0.4229(M) Best (UMD): 12.96(B)/0.4455(M) In Progress: –Automatic extraction of “clean” base NPs from parallel data –Automatic learning and extraction of high-quality transfer- rules from parallel data

Aug 29, 2007Statistical XFER MT31 Translation Example REFERENCE: When responding to whether it is possible to extend Russian fleet's stationing deadline at the Crimean peninsula, Yanukovych replied, "Without a doubt. Stat-XFER (0.3989): In reply to whether the possibility to extend the Russian fleet stationed in Crimea Pen. left the deadline of the problem, Yanukovich replied : " of course. IBM-ylee (0.2203): In response to the possibility to extend the deadline for the presence in Crimea peninsula, the Queen Vic said : " of course. CMU-SMT (0.2067): In response to a possible extension of the fleet in the Crimean Peninsula stay on the issue, Yanukovych vetch replied : " of course. maryland-hiero (0.1878): In response to the possibility of extending the mandate of the Crimean peninsula in, replied: "of course. IBM-smt (0.1862):The answer is likely to be extended the Crimean peninsula of the presence of the problem, Yanukovych said: " Of course. CMU-syntax (0.1639): In response to the possibility of extension of the presence in the Crimean Peninsula, replied : " of course.

Aug 29, 2007Statistical XFER MT32 Major Research Directions Automatic Transfer Rule Learning: –From manually word-aligned elicitation corpus –From large volumes of automatically word-aligned “wild” parallel data –In the absence of morphology or POS annotated lexica –Compositionality and generalization –Identifying “good” rules from “bad” rules –Effective models for rule scoring for Decoding: using scores at runtime Pruning the large collections of learned rules –Learning Unification Constraints

Aug 29, 2007Statistical XFER MT33 Major Research Directions Extraction of Base-NP translations from parallel data: –Base-NPs are extremely important “building blocks” for transfer-based MT systems Frequent, often align 1-to-1, improve coverage Correctly identifying them greatly helps automatic word- alignment of parallel sentences –Parsers (or NP-chunkers) available for both languages: Extract base-NPs independently on both sides and find their correspondences –Parsers (or NP-chunkers) available for only one language (i.e. English): Extract base-NPs on one side, and find reliable correspondences for them using word-alignment, frequency distributions, other features… Promising preliminary results

Aug 29, 2007Statistical XFER MT34 Major Research Directions Algorithms for XFER and Decoding –Integration and optimization of multiple features into search-based XFER parser –Complexity and efficiency improvements (i.e. “Cube Pruning”) –Non-monotonicity issues (LM scores, unification constraints) and their consequences on search

Aug 29, 2007Statistical XFER MT35 Major Research Directions Discriminative Language Modeling for MT: –Current standard statistical LMs provide only weak discrimination between good and bad translation hypotheses –New Idea: Use “occurrence-based” statistics: Extract instances of lexical, syntactic and semantic features from each translation hypothesis Determine whether these instances have been “seen before” (at least once) in a large monolingual corpus –The Conjecture: more grammatical MT hypotheses are likely to contain higher proportions of feature instances that have been seen in a corpus of grammatical sentences. –Goals: Find the set of features that provides the best discrimination between good and bad translations Learn how to combine these into a LM-like function for scoring alternative MT hypotheses

Aug 29, 2007Statistical XFER MT36 Major Research Directions Building Elicitation Corpora: –Feature Detection –Corpus Navigation Automatic Rule Refinement Translation for highly polysynthetic languages such as Mapudungun and Iñupiaq

Aug 29, 2007Statistical XFER MT37 Questions?

Statistical XFER: Hybrid Statistical Rule-based Machine Translation Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:

Similar presentations

Presentation on theme: "Statistical XFER: Hybrid Statistical Rule-based Machine Translation Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistical XFER: Hybrid Statistical Rule-based Machine Translation Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:

Similar presentations

Presentation on theme: "Statistical XFER: Hybrid Statistical Rule-based Machine Translation Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:"— Presentation transcript:

Similar presentations

About project

Feedback