A Path-based Transfer Model for Machine Translation

Slides:



Advertisements
Similar presentations
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
1 Improving a Statistical MT System with Automatically Learned Rewrite Patterns Fei Xia and Michael McCord (Coling 2004) UW Machine Translation Reading.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
© 2005 by Prentice Hall Chapter 9 Structuring System Requirements: Logic Modeling Modern Systems Analysis and Design Fourth Edition.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
Feature Selection for Automatic Taxonomy Induction The Features Input: Two terms Output: A numeric score, or. Lexical-Syntactic Patterns Co-occurrence.
Chapter 9 Structuring System Requirements: Logic Modeling
Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Comp 249 Programming Methodology Chapter 15 Linked Data Structure - Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Sentence Compression Based on ILP Decoding Method Hongling Wang, Yonglei Zhang, Guodong Zhou NLP Lab, Soochow University.
Martin KayTranslation—Meaning1 Martin Kay Stanford University with thanks to Kevin Knight.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
Unit 8 Syntax. Syntax Syntax deals with rules for combining words into sentences, as well as with relationship between elements in one sentence Basic.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
© 2005 by Prentice Hall Chapter 9 Structuring System Requirements: Logic Modeling Modern Systems Analysis and Design Fourth Edition Jeffrey A. Hoffer Joey.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Acquisition of Character Translation Rules for Supporting SNOMED CT Localizations Jose Antonio Miñarro-Giménez a Johannes Hellrich b Stefan Schulz a a.
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
Lower Bounds & Sorting in Linear Time
Statistical Machine Translation Part II: Word Alignments and EM
CSC 594 Topics in AI – Natural Language Processing
Statistical NLP: Lecture 3
Chapter 9 Structuring System Requirements: Logic Modeling
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Statistical NLP: Lecture 13
Neural Machine Translation By Learning to Jointly Align and Translate
(2,4) Trees 11/15/2018 9:25 AM Sorting Lower Bound Sorting Lower Bound.
Statistical NLP: Lecture 9
Chapter 9 Structuring System Requirements: Logic Modeling
Merge Sort 11/28/2018 2:21 AM The Greedy Method The Greedy Method.
Chapter 9 Structuring System Requirements: Logic Modeling
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Topic 1: Problem Solving
Pumping Lemma for Context-free Languages
Expectation-Maximization Algorithm
Lower Bounds & Sorting in Linear Time
Statistical Machine Translation Papers from COLING 2004
(2,4) Trees 2/28/2019 3:21 AM Sorting Lower Bound Sorting Lower Bound.
Teori Bahasa dan Automata Lecture 9: Contex-Free Grammars
CS 261 – Data Structures Trees.
Huffman Encoding.
Chapter 9 Structuring System Requirements: Logic Modeling
Chapter 9 Structuring System Requirements: Logic Modeling
Word embeddings (continued)
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Statistical Machine Translation Part VI – Phrase-based Decoding
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

A Path-based Transfer Model for Machine Translation Dekang Lin presented by Joshua Johanson

Training Get a parallel corpus Extract the paths from dependency trees Source language is in dependency trees The text is word-aligned Extract the paths from dependency trees Learn translation rules from the paths using word alignment

Translation Parse the sentence into dependency trees Extract Paths Merge the paths Choose the transfer rules that give the highest probability Output the resulting sentence

What is a Dependency Tree? A dependency tree shows the relationship between the words of a sentence. Links are directed from the head to the modifier.

Comparing a Dependency Tree with a POS Tree

What is a path? A simple path is a link of two nodes or two links with an unassigned preposition.

Learning the Transfer Rules Extracts only paths with all words aligned A prepostion in the middle of a path is allowed to be aligned. Uses the word alignment to create the relative order of the paths. (there could be gaps) Learns the word alignment and the remapping.

Phrases Head span: the word sequence aligned with the node n. Phrase span: the word sequence from the lower bound of the head spans of all nodes in the subtree rooted at n to the upper bound of the same set of spans. All of these correspond to the index of the target language.

Start with a simple path, let’s say from Connect (H) to controller (M), where H aligns to Branchez (H’) and M aligns to contrôleur (M’). (A simple path can have a middle node with an unaligned preposition, like “to”.) Connect both power cables to the controller Branchez les deux câbles d’ alimentation sur le contrôleur

Start with a simple path, let’s say from Connect (H) to controller (M), where H aligns to Branchez (H’) and M aligns to contrôleur (M’). (A simple path can have a middle node with an unaligned preposition, like “to”.) Let S be the phrase span of a sibling of M (or head span of H) that is between H’ and M’ and closest to M’. In this case it corresponds to câbles d’ alimentation (S). Connect both power cables to the controller Branchez les deux câbles d’ alimentation sur le contrôleur

The right hand side is the simple link in the original language Start with a simple path, let’s say from Connect (H) to controller (M), where H aligns to Branchez (H’) and M aligns to contrôleur (M’). (A simple path can have a middle node with an unaligned preposition, like “to”.) Let S be the phrase span of a sibling of M (or head span of H) that is between H’ and M’ and closest to M’. In this case it corresponds to câbles d’ alimentation (S). The right hand side is the simple link in the original language Connect both power cables to the controller Branchez les deux câbles d’ alimentation sur le contrôleur

The right hand side is the simple link in the original language Start with a simple path, let’s say from Connect (H) to controller (M), where H aligns to Branchez (H’) and M aligns to contrôleur (M’). (A simple path can have a middle node with an unaligned preposition, like “to”.) Let S be the phrase span of a sibling of M (or head span of H) that is between H’ and M’ and closest to M’. In this case it corresponds to câbles d’ alimentation (S). The right hand side is the simple link in the original language The left hand side is: The link between H’ and M’ Connect both power cables to the controller Branchez les deux câbles d’ alimentation sur le contrôleur

The right hand side is the simple link in the original language Start with a simple path, let’s say from Connect (H) to controller (M), where H aligns to Branchez (H’) and M aligns to contrôleur (M’). (A simple path can have a middle node with an unaligned preposition, like “to”.) Let S be the phrase span of a sibling of M (or head span of H) that is between H’ and M’ and closest to M’. In this case it corresponds to câbles d’ alimentation (S). The right hand side is the simple link in the original language The left hand side is: The link between H’ and M’ A link between M’ and the nodes between S and the phrase span of M. Connect both power cables to the controller Branchez les deux câbles d’ alimentation sur le contrôleur

The right hand side is the simple link in the original language Start with a simple path, let’s say from Connect (H) to controller (M), where H aligns to Branchez (H’) and M aligns to contrôleur (M’). (A simple path can have a middle node with an unaligned preposition, like “to”.) Let S be the phrase span of a sibling of M (or head span of H) that is between H’ and M’ and closest to M’. In this case it corresponds to câbles d’ alimentation (S). The right hand side is the simple link in the original language The left hand side is: The link between H’ and M’ A link between M’ and the nodes between S and the phrase span of M. All unaligned word (like sur) will be leaf nodes. Connect both power cables to the controller Branchez les deux câbles d’ alimentation sur le contrôleur

To align more complicated paths, just combine the translation of more simple paths This can create rules that are not paths:

Divergences This will create dependency trees that are not consistent with the new language. In this case the translation will still produce the words in the correct order. X swim across Y X cruzar Y nadando

21 permutations

Generalize

Calculate Translation Probability Si is the path (Connect to controller) Ti is the tree fragment (Branchez sur contrôleur) c(Si) is the count of Si c(Ti,Si) is the count of both Ti and Si occuring together M is a constant

Translation Parse the sentence to obtain its dependency structure.

Translation Parse the sentence to obtain its dependency structure. Extract all the paths in the dependency tree and retrieve the translations of all the paths.

Translation Parse the sentence to obtain its dependency structure. Extract all the paths in the dependency tree and retrieve the translations of all the paths. Find rules that can be merged to cover the whole tree

Merging If two target nodes are mapped to the same source node, it gets merged. Merging will not create a loop We only have to worry about the unaligned words, which are leaf nodes and don’t point to anything This new translation is a tree They are all connected and there aren’t any loops.

Node ordering If two nodes go on different sides of h, then go to the respective sides. deux câbles & câbles existants deux câbles existants

Node ordering If they are on the same side as h in the target sentence, stay the same distance from h as in the source sentence. existing coaxial cables câbles coaxiaux existants

Node ordering If they are on the same side in the target sentence, but not the source sentence, use the word order of the original in the source sentence m1 h m2 (source) h m1 m2 (target)

Translation Parse the sentence to obtain its dependency structure. Extract all the paths in the dependency tree and retrieve the translations of all the paths. Find rules that can be merged to cover the whole tree Output the one with highest probability

Probability C is a set of paths covering S There can be overlap in C, but no path will completely contained in another in the final output. This is a direct translation (not noisy channel model)

Experiment Used English-French portion of 1999 European Parliament Proceedings. Used 1,755 sentences with 5-15 words out of 116,889. Used Minipar to parse the sentences. Used ProAlign to align the words.

Results System IBM Model 4 Path-based Phrasal Model BLEU Score .2555 .2612 .3149

What is different about this approach? Translations are based on a dependency tree in the source language Syntactically based There are fewer paths than subtrees (quadratic instead of exponential) Less sparse It automatically learns word order No need to know anything but syntax of target language