11-731 Machine Translation Syntax-Based Translation Models – Principles, Approaches, Acquisition Alon Lavie 16 March 2011.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Cluster Computing for Statistical Machine Translation Qin Gao, Kevin Gimpel, Alok Palikar, Andreas Zollmann Stephan Vogel, Noah Smith.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Resource Acquisition for Syntax-based MT from Parsed Parallel data Alon Lavie, Alok Parlikar and Vamshi Ambati Language Technologies Institute Carnegie.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
1 Improving a Statistical MT System with Automatically Learned Rewrite Patterns Fei Xia and Michael McCord (Coling 2004) UW Machine Translation Reading.
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June Competitive Grouping in Integrated Segmentation and Alignment.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation Linguistics 580 (Machine Translation) Scott Drellishak, 2/21/2006.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Stat-XFER: A General Framework for Search-based Syntax-driven MT Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
PFA Node Alignment Algorithm Consider the parse trees of a Chinese-English parallel pair of sentences.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Syntax for MT EECS 767 Feb. 1, Outline Motivation Syntax-based translation model  Formalization  Training Using syntax in MT  Using multiple.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Statistical Machine Translation Part V – Better Word Alignment, Morphology and Syntax Alexander Fraser CIS, LMU München Seminar: Open Source.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Dependency Tree-to-Dependency Tree Machine Translation November 4, 2011 Presented by: Jeffrey Flanigan (CMU) Lori Levin, Jaime Carbonell In collaboration.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
AMTEXT: Extraction-based MT for Arabic Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Laura Kieras, Peter Jansen Informant: Loubna El Abadi.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
CSA2050 Introduction to Computational Linguistics Parsing I.
Natural Language - General
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
Supertagging CMSC Natural Language Processing January 31, 2006.
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Towards Syntactically Constrained Statistical Word Alignment Greg Hanneman : Advanced Machine Translation Seminar April 30, 2008.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
Seed Generation and Seeded Version Space Learning Version 0.02 Katharina Probst Feb 28,2002.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Minority Languages Katharina Probst Language Technologies Institute Carnegie Mellon.
Basic Parsing with Context Free Grammars Chapter 13
Statistical Machine Translation Papers from COLING 2004
Dekai Wu Presented by David Goss-Grubbs
David Kauchak CS159 – Spring 2019
Presentation transcript:

Machine Translation Syntax-Based Translation Models – Principles, Approaches, Acquisition Alon Lavie 16 March 2011

Machine Translation (2011)2 Outline lSyntax-based Translation Models: Rationale and Motivation lResource Scenarios and Model Definitions lString-to-Tree, Tree-to-String and Tree-to-Tree lHierarchical Phrase-based Models (Chiang’s Hiero) lSyntax-Augmented Hierarchical Models (Venugopal and Zollmann) lString-to-Tree Models lPhrase-Structure-based Model (Galley et al., 2004, 2006) lTree-to-Tree Models lPhrase-Structure-based Stat-XFER Model (Lavie et al., 2008) lDCU Tree-bank Alignment method (Zhachev, Tinsley et. al.) lTree-to-String Models lTree Transduction Models (Yamada and Knight, Gildea et al.)

Machine Translation (2011)3 Syntax-based Models: Rationale lPhrase-based models model translation at very shallow levels: lTranslation equivalence modeled at the multi-word lexical level lPhrases capture some cross-language local reordering, but only for phrases that were seen in training – No effective generalization lNon-local cross-language reordering is modeled only by permuting order of phrases during decoding lNo explicit modeling of syntax, structural divergences or syntax-to- semantic mapping differences lGoal: Improve translation quality using syntax-based models lCapture generalizations, reorderings and divergences at appropriate levels of abstraction lModels direct the search during decoding to more accurate translations lStill Statistical MT: Acquire translation models automatically from (annotated) parallel-data and model them statistically!

Machine Translation (2011)4 Syntax-based Statistical MT lBuilding a syntax-based Statistical MT system: lSimilar in concept to simpler phrase-based SMT methods: lModel Acquisition from bilingual sentence-parallel corpora lDecoders that given an input string can find the best translation according to the models lOur focus today will be on the models and their acquisition lNext week: Chris Dyer will cover decoding for hierarchical and syntax-based MT

Machine Translation (2011)5 Syntax-based Resources vs. Models lImportant Distinction: 1.What structural information for the parallel-data is available during model acquisition and training? 2.What type of translation models are we acquiring from the annotated parallel data? lStructure available during Acquisition – Main Distinctions: lSyntactic/structural information for the parallel training data: lGiven by external components (parsers) or inferred from the data? lSyntax/Structure available for one language or for both? lPhrase-Structure or Dependency-Structure? lWhat do we extract from parallel-sentences? lSub-sentential units of translation equivalence annotated with structure lRules/structures that determine how these units combine into full transductions

Machine Translation (2011)6 Syntax-based Translation Models lString-to-Tree: lModels explain how to transduce a string in the source language into a structural representation in the target language lDuring decoding: lNo separate parsing on source side lDecoding results in set of possible translations, each annotated with syntactic structure lThe best-scoring string+structure can be selected as the translation lExample: ne VB pas  (VP (AUX (does) RB (not) x2

Machine Translation (2011)7 Syntax-based Translation Models lTree-to-String: lModels explain how to transduce a structural representation of the source language input into a string in the target language lDuring decoding: lParse the source string to derive its structure lDecoding explores various ways of decomposing the parse tree into a sequence of composable models, each generating a translation string on the target side lThe best-scoring string can be selected as the translation lExamples:

Machine Translation (2011)8 Syntax-based Translation Models lTree-to-Tree: lModels explain how to transduce a structural representation of the source language input into a structural representation in the target language lDuring decoding: lDecoder synchronously explores alternative ways of parsing the source- language input string and transduce it into corresponding target-language structural output. lThe best-scoring structure+string can be selected as the translation lExample: NP::NP [VP 北 CD 有 邦交 ]  [one of the CD countries that VP] ( ;; Alignments (X1::Y7) (X3::Y4) )

Machine Translation (2011)9 Structure Available During Acquisition lWhat information/annotations are available for the bilingual sentence-parallel training data? l(Symerticized) Viterbi Word Alignments (i.e. from GIZA++) l(Non-syntactic) extracted phrases for each parallel sentence lParse-trees/dependencies for “source” language lParse-trees/dependencies for “target” language lSome major potential issues and problems: lGIZA++ word alignments are not aware of syntax – word-alignment errors can have bad consequences on the extracted syntactic models lUsing external monolingual parsers is also problematic: lUsing single-best parse for each sentence introduces parsing errors lParsers were designed for monolingual parsing, not translation lParser design decisions for each language may be very different: Different notions of constituency and structure Different sets of POS and constituent labels

Machine Translation (2011)10 Hierarchical Phrase-Based Models lProposed by David Chiang in 2005 lNatural hierarchical extension to phrase-based models lRepresentation: rules in the form of synchronous CFGs lFormally syntactic, but with no direct association to linguistic syntax lSingle non-terminal “X” lAcquisition Scenario: Similar to standard phrase-based models lNo independent syntactic parsing on either side of parallel data lUses “symetricized” bi-directional viterbi word alignments lExtracts phrases and rules (hierarchical phrases) from each parallel sentence lModels the extracted phrases statistically using MLE scores

Machine Translation (2011)11 Hierarchical Phrase-Based Models lExtraction Process Overview: 1.Start with standard phrase extraction from symetricized viterbi word- aligned sentence-pair 2.For each phrase-pair, find all embedded phrase-pairs, and create a hierarchical rule for each instance 3.Accumulate collection of all such rules from the entire corpus along with their counts 4.Model them statistically using maximum likelihood estimate (MLE) scores: lP(target|source) = count(source,target)/count(source) lP(source|target) = count(source,target)/count(target) 5.Filtering: lRules of length < 5 (terminals and non-terminals) lAt most two non-terminals X lNon-terminals must be separated by a terminal

Machine Translation (2011)12 Hierarchical Phrase-Based Models lExample: lChinese-to-English Rules:

Machine Translation (2011)13 Syntax-Augmented Hierarchical Model lProposed by CMU’s Venugopal and Zollmann in 2006 lRepresentation: rules in the form of synchronous CFGs lMain Goal: add linguistic syntax to the hierarchical rules that are extracted by the Hiero method: lHiero’s “X” labels are completely generic – allow substituting any sub- phrase into an X-hole (if context matches) lLinguistic structure has labeled constituents – the labels determine what sub-structures are allowed to combine together lIdea: use labels that are derived from parse structures on one side of parallel data to label the “X” labels in the extracted rules lLabels from one language (i.e. English) are “projected” to the other language (i.e. Chinese) lMajor Issues/Problems: lHow to label X-holes that are not complete constituents? lWhat to do about rule “fragmentation” – rules that are the same other than the labels inside them?

Machine Translation (2011)14 Syntax-Augmented Hierarchical Model lExtraction Process Overview: 1.Parse the “strong” side of the parallel data (i.e. English) 2.Run the Hiero extraction process on the parallel-sentence instance and find all phrase-pairs and all hierarchical rules for parallel-sentence 3.Labeling: for each X-hole that corresponds to a parse constituent C, label X as C. For all other X-holes, assign combination labels 4.Accumulate collection of all such rules from the entire corpus along with their counts 5.Model the rules statistically: Venagopal & Zollman use six different rule score features instead of just two MLE scores. 6.Filtering: similar to Hiero rule filtering lAdvanced Modeling: Preference Grammars lAvoid rule fragmentation: instead of explicitly labeling the X-holes in the rules with different labels, keep them as “X”, with distributions over the possible labels that could fill the “X”. These are used as features during decoding

Machine Translation (2011)15 Syntax-Augmented Hierarchical Model lExample:

Machine Translation (2011)16 Tree-to-Tree: Stat-XFER lDeveloped by Lavie, Ambati and Parlikar in 2007 lGoal: Extract linguistically-supported syntactic phrase-pairs and synchronous transfer rules automatically from parsed parallel corpora lRepresentation: Synchronous CFG rules with constituent- labels, POS-tags or lexical items on RHS of rules. Syntax- labeled phrases are fully-lexicalized S-CFG rules. lAcquisition Scenario: lParallel corpus is word-aligned using GIZA++, symetricized. lPhrase-structure parses for source and/or target language for each parallel-sentence are obtained using monolingual parsers

Machine Translation (2011)17 Transfer Rule Formalism Type information Part-of-speech/constituent information Alignments x-side constraints y-side constraints xy-constraints, e.g. ((Y1 AGR) = (X1 AGR)) ; SL: the old man, TL: ha-ish ha-zaqen NP::NP [DET ADJ N] -> [DET N DET ADJ] ( (X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) ((X1 AGR) = *3-SING) ((X1 DEF = *DEF) ((X3 AGR) = *3-SING) ((X3 COUNT) = +) ((Y1 DEF) = *DEF) ((Y3 DEF) = *DEF) ((Y2 AGR) = *3-SING) ((Y2 GENDER) = (Y4 GENDER)) )

Machine Translation (2011)18 Translation Lexicon: French-to-English Examples DET::DET |: [“le"] -> [“the"] ( (X1::Y1) ) Prep::Prep |:[“dans”] -> [“in”] ( (X1::Y1) ) N::N |: [“principes"] -> [“principles"] ( (X1::Y1) ) N::N |: [“respect"] -> [“accordance"] ( (X1::Y1) ) NP::NP |: [“le respect"] -> [“accordance"] ( ) PP::PP |: [“dans le respect"] -> [“in accordance"] ( ) PP::PP |: [“des principes"] -> [“with the principles"] ( )

Machine Translation (2011)19 French-English Transfer Grammar Example Rules (Automatically-acquired) {PP,24691} ;;SL: des principes ;;TL: with the principles PP::PP [“des” N] -> [“with the” N] ( (X1::Y1) ) {PP,312} ;;SL: dans le respect des principes ;;TL: in accordance with the principles PP::PP [Prep NP] -> [Prep NP] ( (X1::Y1) (X2::Y2) )

Machine Translation (2011)20 Syntax-driven Acquisition Process lOverview of Extraction Process: 1.Word-align the parallel corpus (GIZA++) 2.Parse the sentences independently for both languages 3.Tree-to-tree Constituent Alignment: a)Run our Constituent Aligner over the parsed sentence pairs b)Enhance alignments with additional Constituent Projections 4.Extract all aligned constituents from the parallel trees 5.Extract all derived synchronous transfer rules from the constituent-aligned parallel trees 6.Construct a “data-base” of all extracted parallel constituents and synchronous rules with their frequencies and model them statistically (assign them MLE maximum-likelihood probabilities)

Machine Translation (2011)21 PFA Constituent Node Aligner lInput: a bilingual pair of parsed and word-aligned sentences lGoal: find all sub-sentential constituent alignments between the two trees which are translation equivalents of each other lEquivalence Constraint: a pair of constituents are considered translation equivalents if: lAll words in yield of are aligned only to words in yield of (and vice- versa) lIf has a sub-constituent that is aligned to, then must be a sub-constituent of (and vice-versa) lAlgorithm is a bottom-up process starting from word- level, marking nodes that satisfy the constraints

Machine Translation (2011)22 PFA Node Alignment Algorithm Example Words don’t have to align one-to-one Constituent labels can be different in each language Tree Structures can be highly divergent

Machine Translation (2011)23 PFA Node Alignment Algorithm Example Aligner uses a clever arithmetic manipulation to enforce equivalence constraints Resulting aligned nodes are highlighted in figure

Machine Translation (2011)24 PFA Node Alignment Algorithm Example Extraction of Phrases: Get the yields of the aligned nodes and add them to a phrase table tagged with syntactic categories on both source and target sides Example: NP # NP :: 澳洲 # Australia

Machine Translation (2011)25 PFA Node Alignment Algorithm Example All Phrases from this tree pair: 1.IP # S :: 澳洲 是 与 北韩 有 邦交 的 少数 国家 之一 。 # Australia is one of the few countries that have diplomatic relations with North Korea. 2.VP # VP :: 是 与 北韩 有 邦交 的 少数 国家 之一 # is one of the few countries that have diplomatic relations with North Korea 3.NP # NP :: 与 北韩 有 邦交 的 少数 国家 之一 # one of the few countries that have diplomatic relations with North Korea 4.VP # VP :: 与 北韩 有 邦交 # have diplomatic relations with North Korea 5.NP # NP :: 邦交 # diplomatic relations 6.NP # NP :: 北韩 # North Korea 7.NP # NP :: 澳洲 # Australia

Machine Translation (2011)26 Further Improvements lThe Tree-to-Tree (T2T) method is high precision but suffers from low recall lAlternative: Tree-to-String (T2S) methods (i.e. [Galley et al., 2006]) use trees on ONE side and project the nodes based on word alignments lHigh recall, but lower precision lRecent work by Vamshi Ambati [Ambati and Lavie, 2008]: combine both methods (T2T*) by seeding with the T2T correspondences and then adding in additional consistent projected nodes from the T2S method lCan be viewed as restructuring target tree to be maximally isomorphic to source tree lProduces richer and more accurate syntactic phrase tables that improve translation quality (versus T2T and T2S)

Machine Translation (2011)27 Extracted Syntactic Phra ses EnglishFrench The principlesPrincipes With the principles Principes Accordance with the.. Respect des principes AccordanceRespect In accordance with the… Dans le respect des principes Is all in accordance with.. Tout ceci dans le respect… Thiset EnglishFrench The principlesPrincipes With the principlesdes Principes Accordance with the.. Respect des principes AccordanceRespect In accordance with the… Dans le respect des principes Is all in accordance with.. Tout ceci dans le respect… Thiset EnglishFrench The principlesPrincipes With the principles des Principes AccordanceRespect TnS TnT TnT*

Machine Translation (2011)28 Comparative Results: French-to-English lMT Experimental Setup lDev Set: 600 sents, WMT 2006 data, 1 reference lTest Set: 2000 sents, WMT 2007 data, 1 reference lNO transfer rules, Stat-XFER monotonic decoder lSALM Language Model (4M words)

Machine Translation (2011)29 Transfer Rule Acquisition lInput: Constituent-aligned parallel trees lIdea: Aligned nodes act as possible decomposition points of the parallel trees lThe sub-trees of any aligned pair of nodes can be broken apart at any lower-level aligned nodes, creating an inventory of “tree-fragment” correspondences lSynchronous “tree-frags” can be converted into synchronous rules lAlgorithm: lFind all possible tree-frag decompositions from the node aligned trees l“Flatten” the tree-frags into synchronous CFG rules

Machine Translation (2011)30 Rule Extraction Algorithm Sub-Treelet extraction: Extract Sub-tree segments including synchronous alignment information in the target tree. All the sub-trees and the super-tree are extracted.

Machine Translation (2011)31 Rule Extraction Algorithm Flat Rule Creation: Each of the treelets pairs is flattened to create a Rule in the ‘Stat-XFER Formalism’ – Four major parts to the rule: 1. Type of the rule: Source and Target side type information 2. Constituent sequence of the synchronous flat rule 3. Alignment information of the constituents 4. Constraints in the rule (Currently not extracted)

Machine Translation (2011)32 Rule Extraction Algorithm Flat Rule Creation: Sample rule: IP::S [ NP VP.] -> [NP VP.] ( ;; Alignments (X1::Y1) (X2::Y2) ;;Constraints )

Machine Translation (2011)33 Rule Extraction Algorithm Flat Rule Creation: Sample rule: NP::NP [VP 北 CD 有 邦交 ] -> [one of the CD countries that VP] ( ;; Alignments (X1::Y7) (X3::Y4) ) Note: 1.Any one-to-one aligned words are elevated to Part-Of-Speech in flat rule. 2.Any non-aligned words on either source or target side remain lexicalized

Machine Translation (2011)34 Rule Extraction Algorithm All rules extracted: VP::VP [VC NP] -> [VBZ NP] ( (*score* 0.5) ;; Alignments (X1::Y1) (X2::Y2) ) VP::VP [VC NP] -> [VBZ NP] ( (*score* 0.5) ;; Alignments (X1::Y1) (X2::Y2) ) NP::NP [NR] -> [NNP] ( (*score* 0.5) ;; Alignments (X1::Y1) (X2::Y2) ) VP::VP [ 北 NP VE NP] -> [ VBP NP with NP] ( (*score* 0.5) ;; Alignments (X2::Y4) (X3::Y1) (X4::Y2) ) All rules extracted: NP::NP [VP 北 CD 有 邦交 ] -> [one of the CD countries that VP] ( (*score* 0.5) ;; Alignments (X1::Y7) (X3::Y4) ) IP::S [ NP VP ] -> [NP VP ] ( (*score* 0.5) ;; Alignments (X1::Y1) (X2::Y2) ) NP::NP [ “ 北韩 ”] -> [“North” “Korea”] ( ;Many to one alignment is a phrase ) 34

Machine Translation (2011)35 Some Chinese XFER Rules ;;SL::(2,4) 对 台 贸易 ;;TL::(3,5) trade to taiwan ;;Score::22 {NP, } NP::NP [PP NP ] -> [NP PP ] ((*score* ) (X2::Y1) (X1::Y2)) ;;SL::(2,7) 直接 提到 伟 哥 的 广告 ;;TL::(1,7) commercials that directly mention the name viagra ;;Score::5 {NP, } NP::NP [VP " 的 " NP ] -> [NP "that" VP ] ((*score* ) (X3::Y1) (X1::Y3)) ;;SL::(4,14) 有 一 至 多 个 高 新 技术 项目 或 产品 ;;TL::(3,14) has one or more new, high level technology projects or products ;;Score::4 {VP, } VP::VP [" 有 " NP ] -> ["has" NP ] ((*score* 0.1) (X2::Y2))

Machine Translation (2011)36 DCU Tree-bank Alignment method lProposed by Tinsley, Zhechev et al. in 2007 lMain Idea: lFocus on parallel treebank scenario: parallel sentences annotated with constituent parse-trees for both sides (obtained by parsing) lSame notion and idea as Lavie et al. : find sub-sentential constituent nodes across the two trees that are translation equivalents lMain difference: does not depend on the viterbi word alignments lInstead, use the lexical probabilities (obtained by GIZA++) to score all possible node-to-node alignments and incrementally grow the set of aligned-nodes. lVarious types of rules can then be extracted (i.e. Stat-XFER rules, etc.) lOvercomes some of the problems due to incorrect and sparse word alignments lProduces surprisingly different collections of rules than the Stat-XFER method

Machine Translation (2011)37 String-to-Tree: Galley et al. (GHKM) lProposed by Galley et al. in 2004 and improved in 2006 lIdea: model full syntactic structure on the target-side only in order to produce translations that are more grammatical lRepresentation: synchronous hierarchical strings on the source side and their corresponding tree fragments on the target side lExample: ne VB pas  (VP (AUX (does) RB (not) x2

Machine Translation (2011)38 String-to-Tree: Galley et al. (GHKM) lOverview of Extraction Process: 1.Obtain symetricized viterbi word-alignments for parallel sentences 2.Parse the “strong” side of the parallel data (i.e. English) 3.Find all constituent nodes in the source-language tree that have consistent word alignments to strings in target-language 4.Treat these as “decomposition” points: extract tree-fragments on target-side along with corresponding “gapped” string on source-side 5.Labeling: for each “gap” that corresponds to a parse constituent C, label the gap as C. 6.Accumulate collection of all such rules from the entire corpus along with their counts 7.Model the rules statistically: initially used “standard” P(tgt|src) MLE scores. Also experimented with other scores, similar to SAMT lAdvanced Modeling: Extraction of composed rules, not just minimal rules

Machine Translation (2011)39 Tree Transduction Models lOriginally proposed by Yamada and Knight, Influenced later work by Gildea et al. on Tree-to-String models lConceptually simpler than most other models: lLearn finite-state transductions on source-language parse-trees in order to map them into well-ordered and well-formed target sentences, based on the viterbi word alignments l Representation: simple local transformations on tree structure, given contextual structure in the tree: lTransduce leaf words in the tree from source to target language lDelete a leaf-word or a sub-tree in a given context lInsert a leaf-word or a sub-tree in a given context lTranspose (invert order) of two sub-trees in a given context l[Advanced model by Gildea: duplicate and insert a sub-tree]

Machine Translation (2011)40 Tree Transduction Models lMain Issues/Problems: lSome complex reorderings and correspondences cannot be modeled using these simple tree transductions lHighly sensitive to errors in the source-language parse-tree and to word-alignment errors

Machine Translation (2011)41 Summary lVariety of structure and syntax based models: string-to-tree, tree-to-string, tree-to-tree lDifferent models utilize different structural annotations on training resources and depend on different independent components (parsers, word alignments) lDifferent model acquisition processes from parallel data, but several recurring themes: lFinding sub-sentential translation equivalents and relating them via hierarchical and/or syntax-based structure lStatistical modeling of the massive collections of rules acquired from the parallel data

Machine Translation (2011)42 Major Challenges lSparse Coverage: the acquired syntax-based models are often much sparser in coverage than non-syntactic phrases lBecause they apply additional hard constraints beyond word-alignment as evidence of translation equivalence lBecause the models fragment the data – they are often observed far fewer times in training data  more difficult to model them statistically lConsequently, “pure” syntactic models often lag behind phrase-based models in translation performance – observed and learned again and again by different groups (including our own) lThis motivates approaches that integrate syntax-based models with phrase-based models lOvercoming Pipeline Errors: lAdding independent components (parser output, viterbi word alignments) introduces cumulative errors that are hard to overcome lVarious approaches try to get around these problems lAlso recent work on “syntax-aware” word-alignment, “bi-lingual-aware” parsing

Machine Translation (2011)43 Major Challenges lOptimizing for Structure Granularity and Labels: lSyntactic structure in MT heavily based on Penn TreeBank structures and labels (POS and constituents) – are these needed and optimal for MT, even for MT into English? lApproaches range from single abstract hierarchical “X” label, to fully lexicalized constituent labels. What is optimal? How do we answer this question? lAlternative Approaches (i.e. ITGs) aim to overcome this problem by unsupervised inference of the structure from the data lDirect Contrast and Comparison of alternative approaches is extremely difficult: lDecoding with these syntactic models is highly complex and computationally intensive lDifferent groups/approaches develop their own decoders lHard to compare anything beyond BLEU (or other metric) scores lDifferent groups continue to pursue different approaches – this is at the forefront of current research in Statistical MT

Machine Translation (2011)44 References l(2008) Vamshi Ambati & Alon Lavie: Improving syntax driven translation models by re-structuring divergent and non-isomorphic parse tree structures. AMTA MT at work: Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas, Waikiki, Hawai’i, October 2008; pp Improving syntax driven translation models by re-structuring divergent and non-isomorphic parse tree structures l(2005) David Chiang: A hierarchical phrase-based model for statistical machine translation. ACL-2005: 43rd Annual meeting of the Association for Computational Linguistics, University of Michigan, Ann Arbor, June 2005; pp A hierarchical phrase-based model for statistical machine translation l(2004) Michel Galley, Mark Hopkins, Kevin Knight & Daniel Marcu: What’s in a translation rule? HLT- NAACL 2004: Human Language Technology conference and North American Chapter of the Association for Computational Linguistics annual meeting, May 2-7, 2004, The Park Plaza Hotel, Boston, USA; pp What’s in a translation rule? l(2006) Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, & Ignacio Thayer: Scalable inference and training of context-rich syntatic translation models. Coling-ACL 2006: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, July 2006; pp Scalable inference and training of context-rich syntatic translation models l(2008) Alon Lavie, Alok Parlikar, & Vamshi Ambati: Syntax-driven learning of sub-sentential translation equivalents and translation rules from parsed parallel corpora. Second ACL Workshop on Syntax and Structure in Statistical Translation (ACL-08 SSST-2), Proceedings, 20 June 2008, Columbus, Ohio, USA; pp Syntax-driven learning of sub-sentential translation equivalents and translation rules from parsed parallel corpora l(2007) John Tinsley, Ventsislav Zhechev, Mary Hearne, & Andy Way: Robust language pair-independent sub-tree alignment. MT Summit XI, September 2007, Copenhagen, Denmark. Proceedings; pp Robust language pair-independent sub-tree alignment l(2007) Ashish Venugopal & Andreas Zollmann: Hierarchical and syntax structured MT. First Machine Translation Marathon, Edinburgh, April 16-20, 2007; 52pp.Hierarchical and syntax structured MT l(2001) Kenji Yamada & Kevin Knight: A syntax-based statistical translation model ACL-EACL-2001: 39th Annual meeting [of the Association for Computational Linguistics] and 10th Conference of the European Chapter [of ACL], July 9th - 11th 2001, Toulouse, France; pp A syntax-based statistical translation model l(2006) Andreas Zollmann & Ashish Venugopal: Syntax augmented machine translation via chart parsing. HLT-NAACL 2006: Proceedings of the Workshop on Statistical Machine Translation, New York, NY, USA, June 2006; pp Syntax augmented machine translation via chart parsing