Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.

Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from the paper. Critique and snarky remarks, however, are original.

Motivation We have a new way of learning phrase translations, but… “What is the best method to extract phrase translation pairs?”

What to do, what to do Compose a framework for consistent comparison Implement each algorithm Compare the results

Evaluation Framework Phrases Models involved Language model Statistical model for translation Distortion model Decoder

Evaluation Framework: Phrases We all know what phrases are right? NP, VP, wait, what? Oh. Here, they’re generic spanning and non- overlapping subsequences of words. Are these guys really linguists?

Evaluation Framework: Models Distortion Model d(a i – b i-1 ) a i = start position of the foreign phrase translated into the i th English phrase b i-1 = end position of the foreign phrase translated into the (i-1) th English phrase Learned from the joint probability model Ping told us about

Evaluation Framework: Decoder Left-to-right incremental Stack-based beam search Estimates future costs Same decoder used in all experiments

Baseline Experiments Word-based alignment Syntactic phrases Phrase alignments

Baseline: Word-based Alignment Learn the phrases from word alignments

Baseline: Syntactic Phrases Learn only syntactically correct phrases Start with the word based alignment Prune out the phrase pairs which aren’t subtrees in the parsed sentences for either language.

Baseline: Phrase Alignment Marcu and Wong, 2002 Yes, this is the paper Ping just presented.

Experiment Background Europarl and BLEU Training corpus of 10, 20, 40, 80,160 and 320 kilo-sentence pairs

Baseline Results Notice the bottom row there? Comparing these models is like taking a 5-year old to a chess tournament.

Baseline Results

More Experiments Weighting Syntactic Phrases Maximum Phrase Length Lexical Weighting Phrase Extraction Heuristic Simpler Underlying Word-Base Models Other Languages

Experiments and Results Weighting Syntactic Phrases Double the count on syntactic phrases Is that sufficient? Insufficient post-analysis on this one The same BLEU score Were the translations in better syntax? Did the translations at least use more syntactic phrases?

Experiments and Results Maximum Phrase Length

Experiments and Results Lexical Weighting Lexical probability distribution Lexical Weight

Experiments and Results Lexical Weighting Example

Experiments and Results Lexical Weighting With multiple alignments Extended to fit this model

Experiments and Results Lexical Weighting Improvement:.01 BLEU

Experiments and Results Phrase Extraction Heuristic Align Bidirectionally Note this gives two different word alignment sets Start with the intersection of the two sets Add possible alignments Only if they’re in the union of the sets Only if they connect at least one previously unaligned word

Experiments and Results Phrase Extraction Heuristic Algorithm Start with the first English word Expand only directly adjacent alignment points Move to the next English word, repeat. Finally add non-adjacent alignment points which meet the heuristic criteria.

Experiments and Results

Simpler Underlying Word-Base Models IBM models 1-4

Experiments and Results Other Languages

Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.

Similar presentations

Presentation on theme: "Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.

Similar presentations

Presentation on theme: "Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from."— Presentation transcript:

Similar presentations

About project

Feedback