A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.

A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006

Statistical Machine Translation a refresh

The Noisy Channel Translate from f to e e f e’ E  FF  E encoderdecoder e’ = argmax P(e|f) = argmax P(e)* P(f|e) e e source model language model channel model translation model

Language Model Bag Translation: sentence => bag of words N-gram language model

Translation Model Alignment: P(f,a|e) Fertility: dependent solely on the English word Mary did not slap the green witch Mary no daba una botefada a la verde bruja (Spanish) Mary: fertility 1; did: 0; slap:3; the: 2; green 1; witch 1 (Example from Kevin Knight’s tutorial)

Conditional Probability & Word-based Statistical MT Fertility one-to-one mapping from e to f one-to-many mapping from e to f Conditional Probability: given e, what is alignment probability with f? i.e., p(f,a|e) Word-based MT IBM 1-5

How About Many-to-many Mapping? a b c x y

Out of sight, out of mind: Invisible Idiot Output from Systran French: Hors de la vue, hors de l’esprit. Back to English: Out of the sight, of the spirit. German: Aus dem Anblick des Geistes heraus. Translated back to English: From the sight of the spirit out. Italian: Dalla vista dello spirito fuori. Translated back to English: From the sight of the spirit outside. Portuguese: Da vista do espírito fora. Translated back to English: Of the sight of the spirit it are. Spanish: De la vista del alcohol está. Translated back to English: Of the Vista of the alcohol it is. From http://www.discourse.net/archives/2005/06/of_the_vista_of_the_alcohol_it_i s.html

Lost in Translation

Solution many-to-many mapping How? Word-based Phrase-based

Alignment between Multiple Phrases Phrases are not really phrases Phrases defined differently in different models Most extracted phrases based on word- based alignment Och and Ney (1999): alignment template model Melamed (2001): Non-compositional compounds model

Marcu and Wong (2002)

Promising Features Looking for phrases and alignments simultaneously for both Source and Target sentences Directly modeling phrase-based probabilities Not dependent on word-based probabilities

Phrase & Concept phrase: a sequence of consecutive words. concept: a pair of aligned phrases A set of concepts can be linearized into a sentence pair (E, F) if E and F can be obtained by permuting the phrases e i and f i that characterize all concepts c i ∊ C. This property is denoted in the predicate L(E, F, C)

Two Models Model 1: –Joint probability distribution –phrases are equivalent translations

Model 2 A position-based distortion joint probability model Probability of the alignment between two phrases

Probability to Generate a Sentence Pair

How? Sentences Phrases Concepts

Four Steps Phrases & Concepts determination Initialize the joint probability of concepts, i.e., t-distribution table EM training on Viterbi alignments –Calculate t-distribution table –Full Iteration and then approximation of EM –Viterbi alignment –Smoothing Generate conditional probability from joint probability, needed in the decoder

Step 1: Phrase Determination All unigram Frequency of n-gram >=5

Step 2: Initialize the t-distribution Table Given a sentence E of l words, there are S(l, k) ways in which the l words can be partitioned into k non-empty concepts

S(m, k) ways for a sentence of F be partitioned into k non-empty concepts The number of concepts k is between 1 and min(l, m) Total number of concepts alignment between two sentences:

Probability of Two Concepts

How about the Word Order The equation doesn’t take word order into consideration. Phrases must consist of consecutive words The formula overestimates the numerator and denominator equally, so the approximation works well in the practice

Step 3: EM training on Viterbi Alignments After the initial t-table is built, EM can be used to improve the parameters However, it is impossible to calculate expectations over all possible alignments So for the initial alignment, only the concepts with high t_probabilities are aligned

Implementation Greedy Alignment: Greedily produce an initial alignment Hillclimbing: examine the probability of neighbor concepts to get local maxima by performing the following operations:

Swap concepts:, =>, Merge concepts:, => Break concepts : =>, Move words across concepts:, =>, From www.iccs.informatics.ed.ac.uk/ ~osborne/msc- projects/oconnor.pdf

Viterbi Search Smoothing

Training Iteration First iteration used Model 1 Rest iterations used Model 2

Step 4: Derivation of Conditional Probability Model P(f|e) = p(e, f)/p(e) Used in the decoder model

Encoder Given a Foreign sentence F, maximize the probability p(E, F) Hillclimb by modifying E and the alignment between E and F to maximize p(E)*P(F|E) P(E) is a trigram-based language model at word level instead of phrase level

Evaluation Data: French-English Hansard data Compared with Giza (IBM Model 4) Training: 100,000 sentence pairs Testing: 500 unseen sentences, uniformly distributed across length 6, 8, 10, 15 and 20

Results

Comparison of the Model: from Koehn et.al (2003)

Limitations of the model: complexity problems Phrases up to 6 words Size of t-table Large number of possible alignments Memory management Expensive operations such as swap, break, merge during Viterbi training

Limitations of the model: non- consecutive phrases English: not French: ne … pas –is not =>“ne est pas” –is not here => “ne est pas ici” Longer alignment? Sparse problem

Complexity vs. Performance Marcu and Wong: n-gram <=6 Keohn et al. (2003) –Allow Length of words >3 –Complexity increases largely but no significant improvement

Questions?

A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.

Similar presentations

Presentation on theme: "A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.

Similar presentations

Presentation on theme: "A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006."— Presentation transcript:

Similar presentations

About project

Feedback