Statistical NLP Spring 2011

Slides:

Advertisements

Similar presentations

Statistical Machine Translation

Advertisements

Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

MEANT: semi-automatic metric for evaluating for MT evaluation via semantic frames an asembling of ACL11,IJCAI11,SSST11 Chi-kiu Lo & Dekai Wu Presented.

Word Sense Disambiguation for Machine Translation Han-Bin Chen

Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.

Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein.

Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.

Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.

Approximate Factoring for A* Search Aria Haghighi, John DeNero, and Dan Klein Computer Science Division University of California Berkeley.

Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.

1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

Online Learning Algorithms

1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling.

Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.

Spring /22/071 Beyond PCFGs Chris Brew Ohio State University.

Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.

Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore.

Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.

Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.

Sentence Compression Based on ILP Decoding Method Hongling Wang, Yonglei Zhang, Guodong Zhou NLP Lab, Soochow University.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.

NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.

A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪

Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.

2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.

Statistical NLP Spring 2011 Lecture 3: Language Models II Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint manual before you delete.

LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Natural Language Processing Vasile Rus

A CASE STUDY OF GERMAN INTO ENGLISH BY MACHINE TRANSLATION: MOSES EVALUATED USING MOSES FOR MERE MORTALS. Roger Haycock

Neural Machine Translation

Statistical Machine Translation Part II: Word Alignments and EM

CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.

Statistical NLP Spring 2011

CSE 517 Natural Language Processing Winter 2015

David Mareček and Zdeněk Žabokrtský

Statistical NLP Spring 2010

Statistical NLP Spring 2011

Suggestions for Class Projects

Statistical NLP: Lecture 13

Daniel Fried*, Mitchell Stern* and Dan Klein UC Berkeley

Statistical NLP Spring 2011

Statistical NLP: Lecture 9

Eiji Aramaki* Sadao Kurohashi* * University of Tokyo

Learning to Sportscast: A Test of Grounded Language Acquisition

CSCI 5832 Natural Language Processing

Improved Word Alignments Using the Web as a Corpus

Neural Speech Synthesis with Transformer Network

Statistical Machine Translation Papers from COLING 2004

Leveraging Textual Specifications for Grammar-based Fuzzing of Network Protocols Samuel Jero, Maria Leonor Pacheco, Dan Goldwasser, Cristina Nita-Rotaru.

Word embeddings (continued)

Dekai Wu Presented by David Goss-Grubbs

Attention for translation

Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.

Jointly Generating Captions to Aid Visual Question Answering

A Joint Model of Orthography and Morphological Segmentation

Statistical NLP : Lecture 9 Word Sense Disambiguation

Neural Machine Translation by Jointly Learning to Align and Translate

Presentation transcript:

Statistical NLP Spring 2011 Lecture 10: Phrase Alignment Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA

Phrase Weights

Phrase Scoring Learning weights has been tried, several times: [Marcu and Wong, 02] [DeNero et al, 06] … and others Seems not to work well, for a variety of partially understood reasons Main issue: big chunks get all the weight, obvious priors don’t help Though, [DeNero et al 08] les chats aiment le poisson cats like fresh fish . frais

Phrase Size Phrases do help But they don’t need to be long Why should this be?

Lexical Weighting

Phrase Alignment

Identifying Phrasal Translations the past two years , a number of US citizens … 过去两年中 , 一批美国公民 … past two year in , one lots US citizen Past Go over Phrase alignment models: - Learning a translation model or synchronous grammar requires us to find a correspondence between sentences - That correspondence is multi-level (big and small phrases) We want to develop a model of alignment that directly models this Example: Past has a lot of translations, but the following word helps disambiguate Choose a segmentation and a one-to-one phrase alignment Underlying assumption: There is a correct phrasal segmentation

Unique Segmentations? Problem 1: In the past two years , a number of US citizens … 过去两年中 , 一批美国公民 … past two year in , one lots US citizen Problem 1: Overlapping phrases can be useful (and complementary) - Learning a translation model or synchronous grammar requires us to find a correspondence between sentences - That correspondence is multi-level (big and small phrases) We want to develop a model of alignment that directly models this Example: Past has a lot of translations, but the following word helps disambiguate Problem 2: Phrases and their sub-phrases can both be useful Hypothesis: This is why models of phrase alignment don’t work well

Identifying Phrasal Translations the past two years , a number of US citizens … 过去两年中 , 一批美国公民 … past two year in , one lots US citizen This talk: Modeling sets of overlapping, multi-scale phrase pairs - Learning a translation model or synchronous grammar requires us to find a correspondence between sentences - That correspondence is multi-level (big and small phrases) We want to develop a model of alignment that directly models this Example: we don’t know whether to make year plural in English w/o context Input: sentence pairs Output: extracted phrases

… But the Standard Pipeline has Overlap! 过去 past 两 two 年 year 中 in In the past two years We treat alignment as a task, but it’s plagued with much-maligned metrics, controversial test sets, and a tenuous relationship to machine translation performance. We do need to decide which pieces of a sentence to recycle and how to translate them; the disconnect between alignment and extraction fuels these problems Sentence Pair Word Alignment Extracted Phrases Motivation

Our Task: Predict Extraction Sets Conditional model of extraction sets given sentence pairs In the past two years 过去两年中 1 2 3 4 5 过去 1 两 2 年 3 中 4 1 2 3 4 5 In the past two years We treat alignment as a task, but it’s plagued with much-maligned metrics, controversial test sets, and a tenuous relationship to machine translation performance. We do need to decide which pieces of a sentence to recycle and how to translate them; the disconnect between alignment and extraction fuels these problems Sentence Pair Extracted Phrases + ``Word Alignments’’ Extracted Phrases Motivation

Alignments Imply Extraction Sets Word-level alignment links Word-to-span alignments Extraction set of bispans 过去 past 1 两 two 2 年 year 3 We want (a) a notion of coherent extraction sets, and (b) a way of leveraging existing word alignment resources. The following definitions give us both. - Meta point: I’m describing phrase extraction, but in the context of defining a phrase-to-phrase alignment model 中 in 4 1 2 3 4 5 In the past two years Model

Incorporating Possible Alignments Sure and possible word links Word-to-span alignments Extraction set of bispans 过去 past 1 两 two 2 年 year 3 We want (a) a notion of coherent extraction sets, and (b) a way of leveraging existing word alignment resources. The following definitions give us both. Meta point: I’m describing phrase extraction, but in the context of defining a phrase-to-phrase alignment model Important message: We’re predicting extraction sets, but using word-level and word-to-span alignments to constrain the output space 中 in 4 1 2 3 4 5 In the past two years Model

Linear Model for Extraction Sets Features on sure links 过去 1 Features on all bispans 两 2 年 3 - Output is an extraction set projected from a word alignment - Features on sure alignment links and extracted phrase pairs - Linear model over all features in an extraction set 中 4 1 2 3 4 5 In the past two years Model

Features on Bispans and Sure Links Some features on sure links HMM posteriors 过 go over Presence in dictionary 地球 Numbers & punctuation Earth over the Earth Features on bispans HMM phrase table features: e.g., phrase relative frequencies Lexical indicator features for phrases with common words Moses features receive high weight Distortion is a good idea, still -Monolingual features help too Shape features: e.g., Chinese character counts Monolingual phrase features: e.g., “the _____” Features

Getting Gold Extraction Sets Hand Aligned: Sure and possible word links Word-to-span alignments Extraction set of bispans Deterministic: Find min and max alignment index for each word - We really wouldn’t need word links (only word-to-span alignments) except that they are what we have annotated -We can use existing resources to train our model Deterministic: A bispan is included iff every word within the bispan aligns within the bispan Training

Discriminative Training with MIRA Gold (annotated) Guess (arg max w∙ɸ) Loss function: F-score of bispan errors (precision & recall) -MIRA works well, but we could try anything - Loss was in fact 1 - F_5 Training Criterion: Minimal change to w such that the gold is preferred to the guess by a loss-scaled margin Training

Inference: An ITG Parser ITG captures some bispans Inference

Experimental Setup Chinese-to-English newswire Parallel corpus: 11.3 million words; sentences length ≤ 40 Supervised data: 150 training & 191 test sentences (NIST ‘02) Unsupervised Model: Jointly trained HMM (Berkeley Aligner) MT systems: Tuned and tested on NIST ‘04 and ‘05 Results

Baselines and Limited Systems HMM: Joint training & competitive posterior decoding Source of many features for supervised models State-of-the-art unsupervised baseline ITG: Supervised ITG aligner with block terminals Re-implementation of Haghighi et al., 2009 State-of-the-art supervised baseline - While an extraction set model predicts phrase pairs, it can also be used as an aligner - Comparing to aligners gives us the most direct form of evaluation Coarse: Supervised block ITG + possible alignments Coarse pass of full extraction set model Results

Word Alignment Performance - We win Results

Extracted Bispan Performance - We win Results

Translation Performance (BLEU) - We win Supervised conditions also included HMM alignments Results