ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding.

Slides:



Advertisements
Similar presentations
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Regression –Model selection, Cross-validation –Error decomposition.
Advertisements

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –SVM –Multi-class SVMs –Neural Networks –Multi-layer Perceptron Readings:
Branch Prediction with Neural- Networks: Hidden Layers and Recurrent Connections Andrew Smith CSE Dept. June 10, 2004.
Introduction to Recurrent neural networks (RNN), Long short-term memory (LSTM) Wenjie Pei In this coffee talk, I would like to present you some basic.
Artificial Neural Networks
ECE 5984: Introduction to Machine Learning
Appendix B: An Example of Back-propagation algorithm
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Neural Networks –Backprop –Modular Design.
ECE 6504: Deep Learning for Perception
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Nearest Neighbour Readings: Barber 14 (kNN)
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
Kai Sheng-Tai, Richard Socher, Christopher D. Manning
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Regression Readings: Barber 17.1, 17.2.
Haitham Elmarakeby.  Speech recognition
Recurrent Neural Networks - Learning, Extension and Applications
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –LSTMs (intuition and variants) –[Abhishek:] Lua / Torch Tutorial.
Recurrent Neural Networks Long Short-Term Memory Networks
Predicting the dropouts rate of online course using LSTM method
Introduction to Neural Networks Gianluca Pollastri, Head of Lab School of Computer Science and Informatics and Complex and Adaptive Systems Labs University.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Attention Model in NLP Jichuan ZENG.
Deep Learning RUSSIR 2017 – Day 3
Best viewed with Computer Modern fonts installed
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Deep Learning Amin Sobhani.
Dhruv Batra Georgia Tech
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Recursive Neural Networks
Recurrent Neural Networks for Natural Language Processing
Show and Tell: A Neural Image Caption Generator (CVPR 2015)
Matt Gormley Lecture 16 October 24, 2016
Intro to NLP and Deep Learning
ICS 491 Big Data Analytics Fall 2017 Deep Learning
Intelligent Information System Lab
Different Units Ramakrishna Vedantam.
Neural networks (3) Regularization Autoencoder
Neural Networks 2 CS446 Machine Learning.
Please, Pay Attention, Neural Attention
CSE P573 Applications of Artificial Intelligence Neural Networks
ECE 5424: Introduction to Machine Learning
Zhu Han University of Houston
A critical review of RNN for sequence learning Zachary C
Goodfellow: Chap 6 Deep Feedforward Networks
RNN and LSTM Using MXNet Cyrus M Vahid, Principal Solutions Architect
Image Captions With Deep Learning Yulia Kogan & Ron Shiff
Recurrent Neural Networks
Recurrent Neural Networks (RNN)
Tips for Training Deep Network
RNNs & LSTM Hadar Gorodissky Niv Haim.
CSE 573 Introduction to Artificial Intelligence Neural Networks
Understanding LSTM Networks
Neural Networks Geoff Hulten.
Lecture 16: Recurrent Neural Networks (RNNs)
Machine Learning – Neural Networks David Fenyő
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Attention.
实习生汇报 ——北邮 张安迪.
Neural networks (3) Regularization Autoencoder
Please enjoy.
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Recurrent Neural Networks
Deep learning: Recurrent Neural Networks CV192
Mahdi Kalayeh David Hill
Dhruv Batra Georgia Tech
Dhruv Batra Georgia Tech
Presentation transcript:

ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding Gradients –[Abhishek:] Lua / Torch Tutorial

Administrativia HW3 –Out today –Due in 2 weeks –Please please please please please start early – (C) Dhruv Batra2

Plan for Today Model –Recurrent Neural Networks (RNNs) Learning –BackProp Through Time (BPTT) –Vanishing / Exploding Gradients [Abhishek:] Lua / Torch Tutorial (C) Dhruv Batra3

New Topic: RNNs (C) Dhruv Batra4 Image Credit: Andrej Karpathy

Synonyms Recurrent Neural Networks (RNNs) Recursive Neural Networks –General familty; think graphs instead of chains Types: –Long Short Term Memory (LSTMs) –Gated Recurrent Units (GRUs) –Hopfield network –Elman networks –… Algorithms –BackProp Through Time (BPTT) –BackProp Through Structure (BPTS) (C) Dhruv Batra5

What’s wrong with MLPs? Problem 1: Can’t model sequences –Fixed-sized Inputs & Outputs –No temporal structure Problem 2: Pure feed-forward processing –No “memory”, no feedback (C) Dhruv Batra6 Image Credit: Alex Graves, book

Sequences are everywhere… (C) Dhruv Batra7 Image Credit: Alex Graves and Kevin Gimpel

(C) Dhruv Batra8 Even where you might not expect a sequence… Image Credit: Vinyals et al.

Even where you might not expect a sequence… 9 Image Credit: Ba et al.; Gregor et al Input ordering = sequence (C) Dhruv Batra

10 Image Credit: [Pinheiro and Collobert, ICML14]

Why model sequences? Figure Credit: Carlos Guestrin

Why model sequences? (C) Dhruv Batra12 Image Credit: Alex Graves

Name that model Hidden Markov Model (HMM) Y 1 = {a,…z} X 1 = Y 5 = {a,…z}Y 3 = {a,…z}Y 4 = {a,…z}Y 2 = {a,…z} X2 =X2 =X 3 =X 4 =X 5 = Figure Credit: Carlos Guestrin(C) Dhruv Batra13

How do we model sequences? No input (C) Dhruv Batra14 Image Credit: Bengio, Goodfellow, Courville

How do we model sequences? With inputs (C) Dhruv Batra15 Image Credit: Bengio, Goodfellow, Courville

How do we model sequences? With inputs and outputs (C) Dhruv Batra16 Image Credit: Bengio, Goodfellow, Courville

How do we model sequences? With Neural Nets (C) Dhruv Batra17 Image Credit: Alex Graves

How do we model sequences? It’s a spectrum… (C) Dhruv Batra18 Input: No sequence Output: No sequence Example: “standard” classification / regression problems Input: No sequence Output: Sequence Example: Im2Caption Input: Sequence Output: No sequence Example: sentence classification, multiple-choice question answering Input: Sequence Output: Sequence Example: machine translation, video captioning, open- ended question answering, video question answering Image Credit: Andrej Karpathy

Things can get arbitrarily complex (C) Dhruv Batra19 Image Credit: Herbert Jaeger

Key Ideas Parameter Sharing + Unrolling –Keeps numbers of parameters in check –Allows arbitrary sequence lengths! “Depth” –Measured in the usual sense of layers –Not unrolled timesteps Learning –Is tricky even for “shallow” models due to unrolling (C) Dhruv Batra20

Plan for Today Model –Recurrent Neural Networks (RNNs) Learning –BackProp Through Time (BPTT) –Vanishing / Exploding Gradients [Abhishek:] Lua / Torch Tutorial (C) Dhruv Batra21

BPTT a (C) Dhruv Batra22 Image Credit: Richard Socher

Illustration [Pascanu et al] Intuition Error surface of a single hidden unit RNN; High curvature walls Solid lines: standard gradient descent trajectories Dashed lines: gradient rescaled to fix problem (C) Dhruv Batra23

Fix #1 Pseudocode (C) Dhruv Batra24 Image Credit: Richard Socher

Fix #2 Smart Initialization and ReLus –[Socher et al 2013] –A Simple Way to Initialize Recurrent Networks of Rectified Linear Units, Le et al (C) Dhruv Batra25