Convolutional Sequence to Sequence Learning

Convolutional Sequence to Sequence Learning
Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin Facebook AI Research ( ) Shiyu Zhang

Classic RNN seq2seq Encoder-decoder with a soft-attention Encoder:

Can see the whole sentence
How about CNN seq2seq? Advantages: Do not depend on previous steps => parallelization Hierarchical structure provides a shorter path to capture long-range dependencies O(n) -> O(n/k) Can see the whole sentence

Architecture encoder Input: word + position
Kernel parameter: W (kd×2d), b(2d) GLU (gated linear units): Non-linearity allow the networks to exploit full input field or to focus on fewer elements; gated Sigmoid(b) control which inputs A are relevant 𝑧 𝑖 𝑙 =ν 𝑊 𝑙 𝑧 𝑖− 𝑘 2 𝑙−1 , …, 𝑧 𝑖+ 𝑘 2 𝑙−1 + 𝑏 𝑤 𝑙 + 𝑧 𝑖 𝑙−1 Residual connection

Architecture Decoder ℎ 𝑖 𝑙 =ν 𝑊 𝑙 ℎ 𝑖− 𝑘 2 𝑙−1 , …, ℎ 𝑖+ 𝑘 2 𝑙−1 + 𝑏 𝑤 𝑙 + ℎ 𝑖 𝑙−1

Architecture Attention Separate attentions for each decoder layer
c is simply added to h

Architecture Output

Strategies To stabilize learning: maintain the variance of activations throughout the forward and backward passes. Normalization × at the sum of residual connection ×𝑚 1/𝑚 at the weighted sum of attention Initialization Layers no GLU, initialize weights 𝑁(0, 𝑛 𝑙 ) Layers with GLU, output variance is ¼ of input variance, so initialize weights 𝑁(0, 𝑛 𝑙 ) If use dropout with probability p, above two are: 𝑁(0, 𝑝 𝑛 𝑙 ), 𝑁(0, 4𝑝 𝑛 𝑙 )

Experiments

Experiments summarization

Reference Gehring, J., Auli, M., Grangier, D., Yarats, D., and Dauphin, Y. N. Convolutional Sequence to Sequence Learning. ArXiv e-prints (May 2017). Gehring, J., Auli, M., Grangier, D., and Dauphin, Y. N. A Convolutional Encoder Model for Neural Machine Translation. ArXiv e-prints (Nov ).

Questions Parallelization at decoder?

Convolutional Sequence to Sequence Learning

Similar presentations

Presentation on theme: "Convolutional Sequence to Sequence Learning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Convolutional Sequence to Sequence Learning

Similar presentations

Presentation on theme: "Convolutional Sequence to Sequence Learning"— Presentation transcript:

Similar presentations

About project

Feedback