Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neural Machine Translation by Jointly Learning to Align and Translate

Similar presentations


Presentation on theme: "Neural Machine Translation by Jointly Learning to Align and Translate"— Presentation transcript:

1 Neural Machine Translation by Jointly Learning to Align and Translate
Bahdanau et. al., ICLR 2015 Presented by İhsan Utlu

2 Outline Neural Machine Translation overview Relevant studies
Encoder/Decoder framework Attention mechanism Results Conclusion

3 Neural Machine Translation
Massive improvement in recent years Google Translate, Skype Translator Compare: Phrase-based End-to-end trainable Europarl FR-ENG: 2M aligned sentences Yandex RUS-EN: 1M aligned sentences

4 Neural Machine Translation
Basic framework: Encoder/Decoder Encoder: Vector representation of source sentence Decoder: A (conditional) language model

5 NMT: Preceding studies
Kalchbrenner, 2013: Recurrent Continuous Translation Models Encoder: Convolutional sequence model Cho 2014: Learning Phrase Representations using RNN Encoder- Decoder for Statistical Machine Translation GRUs introduced Sutskever 2014: Sequence to Sequence Learning with Neural Networks Multi-layer LSTMs

6 RNN Encoder/Decoder (Cho 2014, Sutskever 2014) LSTM/GRU units used
Word embeddings also learnt <EoS>, <UNK> tokens Words outside the top frequency rank

7 RNN Units: GRU vs LSTM Basic LSTM Unit Basic GRU Unit

8 Decoder: RNN-based LM Chain rule: RNN implementation
Could also condition on prev. target (Cho, et. al., 2014)

9 Decoder: Sentence generation
Greedy search Beam search Keep a collection of B translation candidates at time t Calculate conditional distributions at t+1 Prune down to B Repeat until <EoS>

10 Limitations on the Encoder
Encoding of long sentences an issue Even with LSTM/GRU Fixed size vector restrictive Encoded representations biased Sutskever: process in reverse Need to ‘attend’ to each individual word

11 Proposed Solutions Convolutional encoder (Kalchbrenner, 2013)
Represent input as a matrix Use convnet architectures Attention based models Use an adaptive weighted sum of individual word vectors

12 Attention Model Introduce BiRNNs into the encoder
Adaptive source embedding Weights depend on the target hidden state Alignments inferred with end-to- end training

13 Attention Model e.g. Google Translate (currently deployed)

14 BiRNN Encoder with Attention
One-hot vectors: GRU update eqns BiRNN w/ GRUs

15 Decoder implementation
GRU update eqns with feedback from target sentence

16 Decoder implementation
Attention model

17 Decoder implementation
Output layer (with Maxout neurons) Output embedding matrix Similar to word2vec algorithms Sampled with beam search

18 Training Objective: Training Dataset 1000 hidden units
Maximize log-prob of correct translation Training Dataset WMT 14 Corpora: 384M words after denoising Test dataset = 3003 sentences Freq. rank threshold = 30000 1000 hidden units Embedding dimensions: 620, 500 (input and output) Beam size 12

19 Learnt Alignments

20 Results The BLEU scores of the generated translations on the test set with respect to the lengths of the sentences.

21 Results BLEU scores BLEU-n: A metric for automated scoring
of translations Based on precision The percentage of n-grams in the candidate translation that exist in one of the reference translations Further modifications are applied to the precision criterion to acccount for abuses RNNencdec: Cho et. al., 2014 RNNsearch: Proposed method Moses: Phrase-based MT

22 Conclusion The concept of attention introduced in the context of neural machine translation The restriction of fixed-length encoding for variable-length source sequences lifted Improvements obtained in BLEU scores Rare words seen to cause performance problems

23 References K. Cho, B. van Merrienboer, C¸ . G¨ulc¸ehre, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” CoRR, vol. abs/ , [Online]. Available: I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” CoRR, vol. abs/ , [Online]. Available: M. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” CoRR, vol. abs/ , [Online]. Available: N. Kalchbrenner and P. Blunsom, “Recurrent continuous translation models,” in EMNLP, 2013.


Download ppt "Neural Machine Translation by Jointly Learning to Align and Translate"

Similar presentations


Ads by Google