Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information.

Slides:



Advertisements
Similar presentations
PERMUTATIONS AND COMBINATIONS
Advertisements

Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
Novel Reordering Approaches in Phrase-Based Statistical Machine Translation S. Kanthak, D. Vilar, E. Matusov, R. Zens & H. Ney ACL Workshop on Building.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
1 Improving a Statistical MT System with Automatically Learned Rewrite Patterns Fei Xia and Michael McCord (Coling 2004) UW Machine Translation Reading.
TIDES MT Workshop Review. Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation Linguistics 580 (Machine Translation) Scott Drellishak, 2/21/2006.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
Permutation A permutation is an arrangement in which order matters. A B C differs from B C A.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Technical Report of NEUNLPLab System for CWMT08 Xiao Tong, Chen Rushan, Li Tianning, Ren Feiliang, Zhang Zhuyu, Zhu Jingbo, Wang Huizhen
Syntax for MT EECS 767 Feb. 1, Outline Motivation Syntax-based translation model  Formalization  Training Using syntax in MT  Using multiple.
S /1/2015 Math 2 Honors - Santowski 1.  Use the Fundamental Counting Principle to determine the number of outcomes in a problem.  Use the idea.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
Permutations S /4/ Math 2 Honors - Santowski.
Ch. 6: Permutations!.
Bayesian Subtree Alignment Model based on Dependency Trees Toshiaki Nakazawa Sadao Kurohashi Kyoto University 1 IJCNLP2011.
Effective Use of Linguistic and Contextual Information for Statistical Machine Translation Libin Shen and Jinxi Xu and Bing Zhang and Spyros Matsoukas.
Korea Maritime and Ocean University NLP Jung Tae LEE
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
Arrangements, Permutations & Combinations
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
PERMUTATIONS AND COMBINATIONS BOTH PERMUTATIONS AND COMBINATIONS USE A COUNTING METHOD CALLED FACTORIAL.
Linguistically-motivated Tree-based Probabilistic Phrase Alignment Toshiaki Nakazawa, Sadao Kurohashi (Kyoto University)
Algebra 1 Predicting Patterns & Examining Experiments Unit 7: You Should Probably Change Section 4: Making Choices.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
Algebra 1 Predicting Patterns & Examining Experiments Unit 7: You Should Probably Change Section 2: Making Arrangements.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
March 11, 2005 Recursion (Implementation) Satish Dethe
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.
Cross-language Projection of Dependency Trees Based on Constrained Partial Parsing for Tree-to-Tree Machine Translation Yu Shen, Chenhui Chu, Fabien Cromieres.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
Permutations and Combinations
Organic Molecule Building Block
Traveling Salesperson Problem
Approaches to Machine Translation
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Counting 3-Dec-18.
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Permutations and Combinations
Approaches to Machine Translation
Welcome to the World of Algebra
Within Subjects (Participants) Designs
Statistical Machine Translation Papers from COLING 2004
6.1 Counting Principles and Permutations
The Shapley-Shubik Power Index
Presentation transcript:

Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information and Communications Technology ATR Spoken Language Communication Research Labs. Kindai University School of Science and Engineering Department of Information

Background In current SMT, erroneous word reordering is one of the most serious problems, especially for dis- similar language pair such as English-Chinese or English-Japanese. 1) To introduce linguistic syntax directly. Not robust to parsing error Tree-to-string String-to-tree Tree-to-tree

Background In current SMT, erroneous word reordering is one of the most serious problems, especially for not similar language pair such as English-Chinese or English-Japanese. 2) To assign probabilistic constraints for word reordering Weaker constraints than the first type To introduce syntax information to second type IBM distortion, Lexical reordering, ITG

ITG Constraints Translation source sentences are represented by binary tree. Translation target sentences can be generated by rotating branches of nodes of source tree. BADC dbca BADC acbd Above target word order cannot be generated from any source binary tree. Source binary tree instance is not considered.

Basic Idea of IST-ITG To use ITG constraints under the given source tree BADC BADC abcd, abdc, bacd, badc, cdad, cdba, dcab, dcba abcd, bacd, cabd, cbad, dabc, dbac, dcab, dcba In original ITG constraints, 22 combinations are allowed.

The Number of Word Order Combinations For binary source tree, word order combinations are allowed without constraints. Under the IST-ITG constraints, this number is reduced to. If Without constraints ITG constraints IST-ITG If Without constraints ITG constraints IST-ITG

Extension to Non-binary Tree Parsing results sometimes are not binary tree. For the nodes which have more than two branches, any word reorderings are allowed. BADC abcd, abdc, acbd, acdb, adbc, adcb, bcda, bdca, cbda, cdba, dbca, dcba

Extension to Non-binary Tree Parsing results sometimes are not binary tree. For the node which have more than two branches, any word reorderings are allowed. For non-binary tree, the number of combinations of IST-ITG can represented by. ( represents number of branches in -th node)

IST-ITG in Phrase-based SMT (1) × The unit of parsing tree is “word”, but the unit of phrase-based SMT is “phrase”. Units are different. Additional rules for phrase-based SMT 1) Word reordering that breaks a phrase is not allowed. 2) Phrase internal word reordering is not checked. ○ Word-to-word alignments are sometimes not one-to-one. But phrase-to-phrase alignments are always one-to-one

IST-ITG in Phrase-based SMT (2) EFG 23 A Ph BCD :NG 2:NG 3:OK 4:NG 5:OK (unacceptable)

IST-ITG in Phrase-based SMT (2) EFG 23 A Ph BCD :NG 2:NG 3:OK 4:NG 5:OK Ph

IST-ITG in Phrase-based SMT (2) EFG 23 A Ph BCD :NG 2:NG 3:OK 4:NG 5:OK Ph

IST-ITG in Phrase-based SMT (2) EFG 23 A Ph BCD :NG 2:NG 3:OK 4:NG 5:OK

IST-ITG in Phrase-based SMT (2) EFG 23 A Ph BCD :NG 2:NG 3:OK 4:NG 5:OK Ph

IST-ITG in Phrase-based SMT (2) EFG 23 A Ph BC D :NG 2:NG 3:OK 4:NG 5:OK

Decoding Algorithm with IST-ITG EFGA BCD :Untranslated 1 : Translated 2 : Translating d e HI 00 0

Decoding Algorithm with IST-ITG EFGA BCD NG 0 HI 00 0 If phrases A and B are translated, Sub-tree that includes more than two “2”  NG d e a b

Decoding Algorithm with IST-ITG EFGA BCD HI 00 0 Consider minimum Translating sub-tree (sub-tree that includes both “0” and “1”.) d e

Decoding Algorithm with IST-ITG EFGA BCD HI 10 2 All of minimum Translating sub-tree are translated.  OK d e f g h

Decoding Algorithm with IST-ITG EFGA BCD HI 00 0 Translate sub-part of minimum Translating sub-tree.  OK d e g

English and Japanese Patent Corpus Experiments # of sent. Total Words # of entry E/J Train E/J Dev E/J Eval Experimental corpus size 1.8M M/64M 30K/32K 29K/32K 188K/118K 4,072/3,646 3,967/3,682 Single reference

Other Experimental Conditions LM training: SRI Language model toolkit (5-grams) Word alignment for TM training: GIZA++ Decoder: Moses compatible in-house decoder named CleopATRa Evaluation measures BLEU,NIST,WER,PER

English and Japanese Patent Translation Experimental Results IBM+Lex IBM+Lex+IST BLEUNISTWERPER English-to-Japanese IST-ITG Monotone No Constraint IBM

English and Japanese Patent Translation Experimental Results IBM+Lex IBM+Lex+IST BLEUNISTWERPER English-to-Japanese IST-ITG Monotone No Constraint IBM

English and Japanese Patent Translation Experimental Results IBM+Lex +IST-ITG BLEUNISTWERPER Japanese-to-English

English and Japanese Patent Translation Experimental Results IBM+Lex +IST-ITG BLEUNISTWERPER Japanese-to-English

Chinese-to-English Translation Experiments NIST MT08 English-to-Chinese track IBM+Lex +IST-ITG W-BleuC-BleuWERCER Experimental Results Training data for TM Training data for LM Development data Evaluation data 6.2M 20.1M 1,664 1,859 1 reference 4 reference

Conclusion We proposed new word reordering constrains IST-ITG using source tree structure. It is extension of ITG constraints. We conducted three experiments of proposed method: E-J and J-E patent translation and NIST MT08 E-C track. In all experiments, improvements of BLEU and WER are confirmed. Especially, improvement for WER is very large, and effectiveness for global word reordering is confirmed.

Thank you!