SYNTAX BASED MACHINE TRANSLATION UNDER GUIDANCE OF PROF PUSHPAK BHATTACHARYYA PRESENTED BY ROUVEN R Ӧ HRIG (10V05101) ERANKI KIRAN (10438004) SRIHARSA.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

The Learning Non-Isomorphic Tree Mappings for Machine Translation Jason Eisner - Johns Hopkins Univ. a b A B events of misinform wrongly report to-John.
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
A Scalable Decoder for Parsing-based Machine Translation with Equivalent Language Model State Maintenance Zhifei Li and Sanjeev Khudanpur Johns Hopkins.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
1 Improving a Statistical MT System with Automatically Learned Rewrite Patterns Fei Xia and Michael McCord (Coling 2004) UW Machine Translation Reading.
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June Competitive Grouping in Integrated Segmentation and Alignment.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
More on Text Management. Context Free Grammars Context Free Grammars are a more natural model for Natural Language Syntax rules are very easy to formulate.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Phrase Based Machine Translation
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Microsoft Research Faculty Summit Robert Moore Principal Researcher Microsoft Research.
Natural Language Processing Expectation Maximization.
PFA Node Alignment Algorithm Consider the parse trees of a Chinese-English parallel pair of sentences.
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Syntax for MT EECS 767 Feb. 1, Outline Motivation Syntax-based translation model  Formalization  Training Using syntax in MT  Using multiple.
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Grammatical Machine Translation Stefan Riezler & John Maxwell.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
1 Novel Inference, Training and Decoding Methods over Translation Forests Zhifei Li Center for Language and Speech Processing Computer Science Department.
Statistical Machine Translation Part V – Better Word Alignment, Morphology and Syntax Alexander Fraser CIS, LMU München Seminar: Open Source.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Dependency Tree-to-Dependency Tree Machine Translation November 4, 2011 Presented by: Jeffrey Flanigan (CMU) Lori Levin, Jaime Carbonell In collaboration.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Haitham Elmarakeby.  Speech recognition
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem Mikhail Zaslavskiy Marc Dymetman Nicola Cancedda ACL 2009.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Towards Syntactically Constrained Statistical Word Alignment Greg Hanneman : Advanced Machine Translation Seminar April 30, 2008.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
1 Local Search for Optimal Permutations Jason Eisner and Roy Tromble with Very Large-Scale Neighborhoods in Machine Translation.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Statistical NLP: Lecture 13
Statistical NLP Spring 2011
Zhifei Li and Sanjeev Khudanpur Johns Hopkins University
Statistical Machine Translation Papers from COLING 2004
Dekai Wu Presented by David Goss-Grubbs
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Presentation transcript:

SYNTAX BASED MACHINE TRANSLATION UNDER GUIDANCE OF PROF PUSHPAK BHATTACHARYYA PRESENTED BY ROUVEN R Ӧ HRIG (10V05101) ERANKI KIRAN ( ) SRIHARSA MOHAPATRA ( ) ARJUN ATREYA ( ) 9/4/2011

MotivationIntroduction Synchronous grammar Syntax based Language Model for SMT Hierarchical Phrase-Based MT Example Hindi translations Joshua Toolkit Conclusions OUTLINE

Motivation Consider the following English-Japanese example: (1) The boy stated that the student said that the teacher danced (2) shoonen-ga gakusei-ga sensei-ga odotta to itta to hanasita The-boy the-student the-teacher danced that said that stated -> Easy to translate the words. -> Very hard find the correct reordering! Syntax-based machine translation techniques start with the syntax. Some can deliver guaranteed correct syntax! David Chiang - An Introduction to Synchronous Grammars, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, 21 June 2006.

Introduction (1) Syntax-based Language Model Noisy channel model Uses 3 steps starting from the parse tree 1. Reordering- create foreign language syntax tree 2. Insertion- add extra words which are required in target language 3. Translation- Translation of leaf words Eugene Charniak, Kevin et al. - Syntax based Language Models for Statistical Machine Translation Brown Univ.(2002)

Introduction (2) Basic phrase-based model Uses phrases instead of words Instance of noisy channel model Modeled as known: arg maxP(e | f) = arg maxP(e, f) = arg max(P(e) x P(f | e)) Then1. Segmentation of e into phrases ē1… ēI, 2. Reordering of ēi 3. Translation of ēi using P(f ̄ | ē) Problem:usually phrases reordered independent of their content  It is desirable to include a larger scope David Chiang - A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, June 2005.

Introduction (3) Hierarchical Phrase-Based Model Consists of words and phrases. For example: English: "Australia is one of the few countries that have diplomatic relations with North Korea" German: ''Australien ist eines der weniges Länder, das diplomatische Beziehungen mit Nord-Korea hat" One example of of a hierarchical phrase is [i] are placeholders for sub-phrases. Captures the fact of different placing in German and English David Chiang - A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, June 2005.

Synchronous grammar (1) Production of a syntactic correct source language string will always deliver a syntactic correct target language string Generalizes context-free grammars (CFGs) Generates pair of strings e.g. (1) S → NP[1] VP[2],NP[1] VP[2] (2) VP → V[1] NP[2], NP[2] V[1] [i] model the relations of non-terminal symbols Applying rule (1) and (2) produces: Replacing S → NP[1] VP[2], NP[1] VP[2] => NP[1] V[3] NP[4],NP[1] NP[4] V[3] - David Chiang - An Introduction to Synchronous Grammars, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, 21 June 2006.

Synchronous grammar (2) NP[1] V[3] NP[4],NP[1] NP[4] V[3] When applying a rule, both sides have to be replaced similarly! When replacing NP[1] on the left side, then also NP[1] on the ride side. NP → I, watashi wa NP → the box, hako wo V → open, akemasu => I open the box,watashi wa hako wo akemasu David Chiang - An Introduction to Synchronous Grammars, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, 21 June 2006.

Synchronous grammar (3) Solution for everything? -> Lowering or raising of tree is not possible! Example: John misses Mary Mary manque à John (Mary is-missed by John) S →  “à John“ is part of the VP NP → Not possible to replace correctly! An Introduction to Synchronous Grammars - David Chiang, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, 21 June 2006.

Noisy channel model – where source Language Sentence E is distorted by the channel into the Foriegn Language F. argmax E p(E|F) = argmax E p(E)p(F|E)......(1) LM TM Base SMT System: It is a parse tree-string tranlsation model (english parse tree[input]-->French sentence [output] p(E|F) ∞ ∑p(E, π)p(F|E, π) where π – parse tree of english sentence This model performs 3 types of operations – reorders, insertion, translation The direction of real translation(decoding) is reverse of translation Model Extract CFG rules from parsed corpus of english, using std. Bottom up parser. A decoder is given chinese sentence to get best english parse tree p(E), p(F|E) Syntax based Language Model for SMT Eugene Charniak, Kevin et al. - Syntax based Language Models for Statistical Machine Translation Brown Univ.(2002)

Parsing/Language Model: comprises of 2 stages based on Penn tree bank corpus a. Non-Lexical PCFG (create large parse forest for sentence) b. Pruning step π n n l,m є p(e k i,j | w 1,n ) = α(n i,j k ) p(rule(e i,j k )) π n n l,m є rhs(e i,j k ) β(n l,m n ) c. Lexical PCFG( examine edges and pull out most probable parse tree from forest) Issues while parsing – incompatibilities with translation model, phrasal translations, non-linear word ordering. Syntax based Language Model for SMT p(w 1,n ) Computes the inside, outside probabilities of parse forest and eliminate edges which fall below a empirical set of threshold. Eugene Charniak, Kevin et al. - Syntax based Language Models for Statistical Machine Translation Brown Univ.(2002)

Syntax based Translation Model for SMT Input: ”He adores listening to music” [english parse tree] Output: Kare ha ongaku wo kiku no ga daisuki desu [Japanese sentence] VB2 VB PRP VB1 VBTO NN He adores listening to music Channel Input music VB1 VB PRP VB2 VB TONN He TO adores to listening Reordering music VB1 VB PRP VB2 VB TONN He ha TO Adores desu to listening no Insertion Ongaku VB1 VB PRP VB2 VB TONN kare ha TO desuki desu wo kiku no Translation ga SVO  SOV R-table N-tableT-table Kenji Yamada, Kevin et al. - Syntax based Translation Model - Southern California Univ.(2002)

Syntax based Translation Model for SMT The model parameters probabilities of n(v|N), r(p|R), and t(t|T) decide the behaviour of the translation model. Kenji Yamada, Kevin et al. - Syntax based Translation Model - Southern California Univ.(2002)

Use of heirarchical Phrases not words as translation units A phrase is a sequence of words Uses Bi-text to infer the syntax for both source and destination language The syntax is a synchronous grammar  Inherent reordering  Phrase to phrase alignment  Phrase to phrase translation  Handling divergence The translation has two phases – training and decoding The Bi-text is a word aligned corpus: - a set of triples  f is the French sentence (source language)  e is the corresponding English sentence (target language)  ~ is the many-to-many mapping between phrases in the sentences Hierarchical Phrase-Based MT A Hierarchical Phrase-Based Model for Statistical Machine Translation - David Chiang, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, June 2005.

A phrase grammar rule is represented as (an example) X  Where (i, j) is the source phrase boundary and (k,l) is the target phrase boundary The above example shows the attachment of a subordinate clause is reversed in English In training phrase the minimal set of all the above rules is extracted A Derivation D is a set of triples [ R, i, j ]. Each triple is a step in derivation.  R is the rule used  f i j is the phrase in source language that was rewritten using the grammar In decoding phase given a French sentence f, D(f) rewrites the sentence in English. An alternate notation for f an e is f(D) and e(D) respectively. Hierarchical Phrase-Based M T David Chiang - A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, June 2005

The following is a partial left-most derivation to the sentence English: "Australia is one of the few countries that have diplomatic relations with North Korea" Hierarchical Phrase-Based MT David Chiang - A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, June 2005.

To decode the CKY parser with beam search has been used Highest probability single derivation is given below: - Hierarchical Phrase-Based MT David Chiang - A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, June The arg max is computed over each derivation tree D yields f The corresponding English sentence is given by e(D) In each cell of the CYK parser, the beam search eliminates  Each item that has a score worse than β times the best score in the same cell  Each item that is that is worse than the b-th best item in the same cell  b = 40, β = 10*exp(−1) for X cells; b = 15, β = 10*exp(−1) for S cells

w(r) is the weight of the rule r [the first formula] P lm is the language model probability for sentence e |e| denotes length of sentence e λ lm and λ wp denote the respective exponent factors exp(−wp*|e|) is the word penalty Φ i and λ i denote the feature weight and the exponent Hierarchical Phrase-Based MT David Chiang - A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, June 2005 Franz Josef Och and Hermann Ney - The alignment template approach to statistical machine translation, Computational Linguistics 2004

Hierarchical Phrase-Based MT David Chiang - A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, June 2005.

Example translation: - Hierarchical Phrase Based MT S  { X 1 X 2, X 1 ne X 2 } ---- ENGLISH | hindi X  { RAM, ram } X  { HAD TOLD, kaha tha } THE ENGLISH SENTENCE : - RAM HAD TOLD S ⇒ ⇒ Compared to pure statistical parsing, the hierarchical phrase based (in general syntax based) MT handles dependency and divergence better.

S  { NP VP | NP ne VP } ---- ENGLISH | hindi NP  {N | N} N  { RAM | ram } VP  { V PAST_P | V PAST_P } V PAST_P  { HAD TOLD| kaha tha } THE ENGLISH SENTENCE PARSE TREE S ⇒ NP VP ⇒ N VP ⇒ RAM VP ⇒ RAM V PAST_P ⇒ RAM HAD TOLD HINDI TRANSLATION BY APPLYING THE DUAL OF EACH RULE S ⇒ NP ne VP ⇒ N ne VP ⇒ ram ne VP ⇒ ram ne V PAST_P ⇒ ram ne kaha tha Example translation: - Synchronous CFG Translation

Joshua Toolkit Open source toolkit Parsing based Machine Translation Joshua decoder is written in Java with implementation of several algorithms Chart-parsing n-gram language model integration Beam and cube pruning Unique k-best extraction

Goals Extendibility : Implementation is organized as packages for customization. End to End Cohesion : Integrated with suffix-array grammar extraction(Burch, et al., 2005) and minimum error rate training(Och, 2003) Scalability : Parsing and pruning algorithms are implemented with dynamic programming

Experiment Data Training Chinese - English 570K parallel data Language model was built on 130M words Decoding SCFG – 3M rules, 49M n-grams Results shows that it is 22 times faster decoder than others Translation quality is better than BLEU-4 (Papineni et al., 2002)

Joshua Features Decoding Algorithms Grammar Formalism Handles only SCFGs currently Chart Parsing Generates one best or k-best translations using CKY algorithm Pruning Increases computational efficiency

Joshua Features Decoding Algorithms Hyper-graphs and k-best extraction For each source sentence hyper-graph is generated containing set of derivations K-best extraction is used to retrieve subset of derivations Parallel and Distributed Computing Parallel decoding Distributed language model

Syntax based language and translation models provide a promising technique for use in noisy channel SMT. Syntax based language and translation models provide a promising technique for use in noisy channel SMT. Syntax based LM can be combined with several MT systems Syntax based LM can be combined with several MT systems Parsing Models such as YC, YT, BT have shown perfect translations of 45% by improving the English syntax of translations. Parsing Models such as YC, YT, BT have shown perfect translations of 45% by improving the English syntax of translations. By using syntactic linguistic information of different word orders and case markers the quality of translation can be improved. By using syntactic linguistic information of different word orders and case markers the quality of translation can be improved. Conclusions

Hierarchical phrase based translation does not require synchronous grammar as input – uses bitext to generate Hierarchical phrase pairs can be learned without any syntactically-annotated training data Improve translation accuracy over pure statistical phrase- based MT by 7.5% The major challenge in future is to produce a complete provable MT Another goal is to reduce the number of derivation trees with a more syntactically-motivated grammar Conclusions

References 1. Translation-Eugene Charniak, Kevin et al. - Syntax based Language Models for Statistical Machine Brown Univ.(2002) 2. David Chiang - A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, June David Chiang - An Introduction to Synchronous Grammars, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, 21 June Franz Josef Och and Hermann Ney - The alignment template approach to statistical machine translation, Computational Linguistics Zhifei Li, Chris Callison-Burch, Chris Dyer, Juri Ganitkevitch, Ann Irvine, Sanjeev Khudanpur, Lane Schwartz, Wren N. G. Thornton, ZiyuanWang, JonathanWeese and Omar F. Zaidan - Joshua 2.0: A Toolkit for Parsing-Based Machine Translation with Syntax, Semirings, Discriminative Training and Other Goodies - Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMAT, Uppsala, Sweden, July 2010.