Presentation is loading. Please wait.

Presentation is loading. Please wait.

Addressing the Rare Word Problem in Neural Machine Translation

Similar presentations


Presentation on theme: "Addressing the Rare Word Problem in Neural Machine Translation"— Presentation transcript:

1 Addressing the Rare Word Problem in Neural Machine Translation
Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals (Google) Wojciech Zaremba (New York Univerity)

2 Abstract Neural Machine Translation (NMT) is a new approach to machine translation that has shown promising results that are comparable to traditional approaches A significant weakness in conventional NMT systems is their inability to correctly translate very rare words

3 Neural Machine Translation
A neural machine translation system is any neural network that maps a source sentence, s1, , sn, to a target sentence, t1, , tm More concretely, an NMT system uses a neural network to parameterize the conditional distributions for 1 ≤ j ≤ m

4 Neural Machine Translation
They use a deep LSTM to encode the input sequence and a separate deep LSTM to output the translation. The encoder reads the source sentence, one word at a time, and produces a large vector that represents the entire source sentence. The decoder is initialized with this vector and generates a translation, one word at a time, until it emits the end-of-sentence symbol <eos>.

5 Rare Word Models They treated the NMT system as a black box and train it on a corpus annotated by one of the models which will follow shortly. First, the alignments are produced with an unsupervised aligner. Next, they use the alignment links to construct a word dictionary that will be used for the word translations in the post-processing step. If a word does not appear in their dictionary, they apply the identity translation

6 1.Copyable Model 2.Position All Model (PosAll) 3.Positional Unknown Model (PosUnk)

7 Training Data Training data consisted of 12 M parallel sentences. (348 M French and 304 M English words) Due to the computationally intensive nature of the naive softmax, they limited the French vocabulary to the either the 40K or the 80K most frequent French words. On the source side, they could afford a much larger vocabulary, so they used the 200K most frequent English words. The model treats all other words as unknowns.

8 Main result

9 Comparison of different alignment models

10 Effect of depths

11 Sample Translation

12 Conclusions A simple alignment based technique can mitigate and even overcome the main weakness of the current NMT systems, which is their inability to translate words that are not in their vocabulary. A key advantage is that it is applicable to any NMT system and not only deep LSTM model. The technique yielded a constant and substantial improvement of up to 2.8 BLEU points over various NMT systems. With 37.5 BLEU points they have established the first NMT system that outperformed the best MT system on a WMT’14 contest dataset.

13 Thank You!


Download ppt "Addressing the Rare Word Problem in Neural Machine Translation"

Similar presentations


Ads by Google