Convolutional Sequence to Sequence Learning

Slides:

Advertisements

Similar presentations

Request Dispatching for Cheap Energy Prices in Cloud Data Centers

Advertisements

SpringerLink Training Kit

Luminosity measurements at Hadron Colliders

From Word Embeddings To Document Distances

Choosing a Dental Plan Student Name

Virtual Environments and Computer Graphics

Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI

THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –

D. Phát triển thương hiệu

NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN

Điều trị chống huyết khối trong tai biến mạch máu não

BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.

Nasal Cannula X particulate mask

Evolving Architecture for Beyond the Standard Model

HF NOISE FILTERS PERFORMANCE

Electronics for Pedestrians – Passive Components –

Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel

L-Systems and Affine Transformations

CMSC423: Bioinformatic Algorithms, Databases and Tools

Some aspect concerning the LMDZ dynamical core and its use

Bayesian Confidence Limits and Intervals

实习总结（Internship Summary)

Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,

Front End Electronics for SOI Monolithic Pixel Sensor

Face Recognition Monday, February 1, 2016.

Solving Rubik's Cube By: Etai Nativ.

CS284 Paper Presentation Arpad Kovacs

انتقال حرارت 2 خانم خسرویار.

Summer Student Program First results

Theoretical Results on Neutrinos

HERMESでのHard Exclusive生成過程による核子内クォーク全角運動量についての研究

Wavelet Coherence & Cross-Wavelet Transform

yaSpMV: Yet Another SpMV Framework on GPUs

Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.

MOCLA02 Design of a Compact L-band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,

Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,

Fuel cell development program for electric vehicle

Overview of TST-2 Experiment

Optomechanics with atoms

داده کاوی سئوالات نمونه

Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium

ლექცია 4 - ფული და ინფლაცია

10. predavanje Novac i financijski sustav

Wissenschaftliche Aussprache zur Dissertation

FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,

Particle acceleration during the gamma-ray flares of the Crab Nebular

Interpretations of the Derivative Gottfried Wilhelm Leibniz

Advisor: Chiuyuan Chen Student: Shao-Chun Lin

Widow Rockfish Assessment

SiW-ECAL Beam Test 2015 Kick-Off meeting

On Robust Neighbor Discovery in Mobile Wireless Networks

Chapter 6 并发：死锁和饥饿 Operating Systems: Internals and Design Principles

You NEED your book!!! Frequency Distribution

Y V =0 a V =V0 x b b V =0 z

Fairness-oriented Scheduling Support for Multicore Systems

Climate-Energy-Policy Interaction

Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,

Ch48 Statistics by Chtan FYHSKulai

The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.

Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs

Online Learning: An Introduction

Factor Based Index of Systemic Stress (FISS)

What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.

THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*

Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.

The Toroidal Sporadic Source: Understanding Temporal Variations

FW 3.4: More Circle Practice

ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف

Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM

Limits on Anomalous WWγ and WWZ Couplings from DØ

Presentation transcript:

Convolutional Sequence to Sequence Learning Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin Facebook AI Research (2017.05.12) Shiyu Zhang 2017.05.18

Classic RNN seq2seq Encoder-decoder with a soft-attention Encoder:

Can see the whole sentence How about CNN seq2seq? Advantages: Do not depend on previous steps => parallelization Hierarchical structure provides a shorter path to capture long-range dependencies O(n) -> O(n/k) Can see the whole sentence

Architecture encoder Input: word + position Kernel parameter: W (kd×2d), b(2d) GLU (gated linear units): Non-linearity allow the networks to exploit full input field or to focus on fewer elements; gated Sigmoid(b) control which inputs A are relevant 𝑧 𝑖 𝑙 =ν 𝑊 𝑙 𝑧 𝑖− 𝑘 2 𝑙−1 , …, 𝑧 𝑖+ 𝑘 2 𝑙−1 + 𝑏 𝑤 𝑙 + 𝑧 𝑖 𝑙−1 Residual connection

Architecture Decoder ℎ 𝑖 𝑙 =ν 𝑊 𝑙 ℎ 𝑖− 𝑘 2 𝑙−1 , …, ℎ 𝑖+ 𝑘 2 𝑙−1 + 𝑏 𝑤 𝑙 + ℎ 𝑖 𝑙−1

Architecture Attention Separate attentions for each decoder layer c is simply added to h

Architecture Output

Strategies To stabilize learning: maintain the variance of activations throughout the forward and backward passes. Normalization × 0.5 at the sum of residual connection ×𝑚 1/𝑚 at the weighted sum of attention Initialization Layers no GLU, initialize weights 𝑁(0, 1 𝑛 𝑙 ) Layers with GLU, output variance is ¼ of input variance, so initialize weights 𝑁(0, 4 𝑛 𝑙 ) If use dropout with probability p, above two are: 𝑁(0, 𝑝 𝑛 𝑙 ), 𝑁(0, 4𝑝 𝑛 𝑙 )

Experiments

Experiments

Experiments summarization

Reference Gehring, J., Auli, M., Grangier, D., Yarats, D., and Dauphin, Y. N. Convolutional Sequence to Sequence Learning. ArXiv e-prints (May 2017). Gehring, J., Auli, M., Grangier, D., and Dauphin, Y. N. A Convolutional Encoder Model for Neural Machine Translation. ArXiv e-prints (Nov. 2016). https://github.com/facebookresearch/fairseq

Questions Parallelization at decoder?