Advanced topics.

Slides:



Advertisements
Similar presentations
CSC321 Introduction to Neural Networks and Machine Learning Lecture 21 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Advertisements

Greedy Layer-Wise Training of Deep Networks
Deep Belief Nets and Restricted Boltzmann Machines
Thomas Trappenberg Autonomous Robotics: Supervised and unsupervised learning.
Scalable Learning in Computer Vision
Deep Learning Bing-Chen Tsai 1/21.
CS590M 2008 Fall: Paper Presentation
Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University.
Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤.
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
Differentiable Sparse Coding David Bradley and J. Andrew Bagnell NIPS
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Deep Learning.
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Deep Belief Networks for Spam Filtering
AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Deep Boltzman machines Paper by : R. Salakhutdinov, G. Hinton Presenter : Roozbeh Gholizadeh.
Overview of Back Propagation Algorithm
What is the Best Multi-Stage Architecture for Object Recognition Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo.
Andrew Ng CS228: Deep Learning & Unsupervised Feature Learning Andrew Ng TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Comp 5013 Deep Learning Architectures Daniel L. Silver March,
CIAR Second Summer School Tutorial Lecture 2b Autoencoders & Modeling time series with Boltzmann machines Geoffrey Hinton.
Nantes Machine Learning Meet-up 2 February 2015 Stefan Knerr CogniTalk
Playing with features for learning and prediction Jongmin Kim Seoul National University.
How to do backpropagation in a brain
A shallow introduction to Deep Learning
Large-scale Deep Unsupervised Learning using Graphics Processors
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.
Introduction to Deep Learning
Cognitive models for emotion recognition: Big Data and Deep Learning
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Sum-Product Networks: A New Deep Architecture
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Deep learning Tsai bing-chen 10/22.
Neural Networks William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University
Deep Learning Primer Swadhin Pradhan Reading Group Presentation 03/30/2016, UT Austin.
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Learning Deep Generative Models by Ruslan Salakhutdinov
Convolutional Neural Network
Deep Learning Amin Sobhani.
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
ECE 5424: Introduction to Machine Learning
Deep Learning Insights and Open-ended Questions
Article Review Todd Hricik.
Matt Gormley Lecture 16 October 24, 2016
Restricted Boltzmann Machines for Classification
Multimodal Learning with Deep Boltzmann Machines
Deep learning and applications to Natural language processing
Deep Learning Qing LU, Siyuan CAO.
Deep Belief Networks Psychology 209 February 22, 2013.
Unsupervised Learning and Autoencoders
Deep Learning Workshop
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
Deep Architectures for Artificial Intelligence
Neural Networks Geoff Hulten.
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

Advanced topics

Learning feature hierarchies (Deep learning) Outline Self-taught learning Learning feature hierarchies (Deep learning) Scaling up

Self-taught learning

Cars Motorcycles Supervised learning Testing: What is this? Sometimes, most data wins. So, how to get more data? Even with AMT, often slow and expensive. Cars Motorcycles Testing: What is this?

Semi-supervised learning Unlabeled images (all cars/motorcycles) Car Motorcycle Testing: What is this?

Self-taught learning Car Unlabeled images (random internet images) Motorcycle Testing: What is this?

Self-taught learning Sparse coding, LCC, etc. f1, f2, …, fk If have labeled training set is small, can give huge performance boost. Use learned f1, f2, …, fk to represent training/test sets. Car Motorcycle Using f1, f2, …, fk a1, a2, …, ak

Learning feature hierarchies/Deep learning

Why feature hierarchies object models object parts (combination of edges) edges pixels

Deep learning algorithms Stack sparse coding algorithm Deep Belief Network (DBN) (Hinton) Deep sparse autoencoders (Bengio) [Other related work: LeCun, Lee, Yuille, Ng …]

Deep learning with autoencoders Logistic regression Neural network Sparse autoencoder Deep autoencoder

x1 x2 x3 +1 Logistic regression Logistic regression has a learned parameter vector q. On input x, it outputs: where x1 x2 x3 +1 Draw a logistic regression unit as:

String a lot of logistic units together. Example 3 layer network: Neural Network String a lot of logistic units together. Example 3 layer network: x1 a3 a2 a1 x2 x3 Layer 3 +1 +1 Layer 1 Layer 3

Example 4 layer network with 2 output units: Neural Network Example 4 layer network with 2 output units: x1 x2 x3 +1 Layer 4 +1 +1 Layer 3 Layer 1 Layer 2

Neural Network example [Courtesy of Yann LeCun]

Training a neural network Given training set (x1, y1), (x2, y2), (x3, y3 ), …. Adjust parameters q (for every node) to make: (Use gradient descent. “Backpropagation” algorithm. Susceptible to local optima.)

Unsupervised feature learning with a neural network Autoencoder. Network is trained to output the input (learn identify function). Trivial solution unless: Constrain number of units in Layer 2 (learn compressed representation), or Constrain Layer 2 to be sparse. x4 x5 x6 +1 Layer 1 Layer 2 x1 x2 x3 Layer 3 a1 a2 a3

Unsupervised feature learning with a neural network Training a sparse autoencoder. Given unlabeled training set x1, x2, … a1 a2 a3 Reconstruction error term L1 sparsity term

Unsupervised feature learning with a neural network x1 x1 x2 x2 a1 x3 x3 a2 x4 x4 a3 x5 x5 +1 x6 x6 Layer 2 Layer 3 +1 Layer 1

Unsupervised feature learning with a neural network x1 x2 a1 x3 a2 x4 a3 x5 +1 New representation for input. x6 Layer 2 +1 Layer 1

Unsupervised feature learning with a neural network x1 x2 a1 x3 a2 x4 a3 x5 +1 x6 Layer 2 +1 Layer 1

Unsupervised feature learning with a neural network x1 x2 a1 b1 x3 a2 b2 x4 a3 b3 x5 +1 +1 x6 Train parameters so that , subject to bi’s being sparse. +1

Unsupervised feature learning with a neural network x1 x2 a1 b1 x3 a2 b2 x4 a3 b3 x5 +1 +1 x6 Train parameters so that , subject to bi’s being sparse. +1

Unsupervised feature learning with a neural network x1 x2 a1 b1 x3 a2 b2 x4 a3 b3 x5 +1 +1 x6 Train parameters so that , subject to bi’s being sparse. +1

Unsupervised feature learning with a neural network x1 x2 a1 b1 x3 a2 b2 x4 a3 b3 x5 +1 +1 New representation for input. x6 +1

Unsupervised feature learning with a neural network x1 x2 a1 b1 x3 a2 b2 x4 a3 b3 x5 +1 +1 x6 +1

Unsupervised feature learning with a neural network x1 x2 a1 b1 c1 x3 a2 b2 c2 x4 a3 b3 c3 x5 +1 +1 +1 x6 +1

Unsupervised feature learning with a neural network x1 x2 a1 b1 c1 x3 a2 b2 c2 x4 a3 b3 c3 x5 New representation for input. +1 +1 +1 x6 +1 Use [c1, c3, c3] as representation to feed to learning algorithm.

Deep Belief Net Deep Belief Net (DBN) is another algorithm for learning a feature hierarchy. Building block: 2-layer graphical model (Restricted Boltzmann Machine). Can then learn additional layers one at a time.

Restricted Boltzmann machine (RBM) Layer 2. [a1, a2, a3] (binary-valued) x1 x2 x3 x4 Input [x1, x2, x3, x4] MRF with joint distribution: Use Gibbs sampling for inference. Given observed inputs x, want maximum likelihood estimation:

Restricted Boltzmann machine (RBM) Layer 2. [a1, a2, a3] (binary-valued) x1 x2 x3 x4 Input [x1, x2, x3, x4] Gradient ascent on log P(x) : [xiaj]obs from fixing x to observed value, and sampling a from P(a|x). [xiaj]prior from running Gibbs sampling to convergence. Adding sparsity constraint on ai’s usually improves results.

Deep Belief Network Similar to a sparse autoencoder in many ways. Stack RBMs on top of each other to get DBN. Layer 3. [b1, b2, b3] Layer 2. [a1, a2, a3] Input [x1, x2, x3, x4] Train with approximate maximum likelihood (often with sparsity constraint on ai’s):

Deep Belief Network Layer 4. [c1, c2, c3] Layer 3. [b1, b2, b3] Layer 2. [a1, a2, a3] End: One of challenges is scaling up. Most people: 14x14 up to 32x32. Input [x1, x2, x3, x4]

Deep learning examples

Convolutional DBN for audio Max pooling unit Detection units Spectrogram

Convolutional DBN for audio Time-invariant features Spectrogram

Probabilistic max pooling Convolutional DBN: Convolutional Neural net: X3 X1 X2 X4 max {x1, x2, x3, x4} max {x1, x2, x3, x4} Where xi are {0,1}, and mutually exclusive. Thus, 5 possible cases: 1 1 1 1 1 1 X1 X2 X3 X4 1 1 Where xi are real numbers. Collapse 2n configurations into n+1 configurations. Permits bottom up and top down inference.

Convolutional DBN for audio Spectrogram

Convolutional DBN for audio Max pooling Second CDBN layer Detection units Max pooling One CDBN layer Detection units

Learned first-layer bases CDBNs for speech Visual bases: Look at them and see if make sense/correspond to Gabors. Try to perform similar analysis on audio bases. Learned first-layer bases

Convolutional DBN for Images ‘’max-pooling’’ node (binary) Wk Detection layer H Max-pooling layer P Hidden nodes (binary) “Filter” weights (shared) At most one hidden nodes are active. Input data V Visible nodes (binary or real)

Convolutional DBN on face images object models object parts (combination of edges) edges Note: Sparsity important for these results. pixels

Learning of object parts Examples of learned object parts from object categories Faces Cars Elephants Chairs

Training on multiple objects Trained on 4 classes (cars, faces, motorbikes, airplanes). Second layer: Shared-features and object-specific features. Third layer: More specific features. Third layer bases learned from 4 object categories. Plot of H(class|neuron active) Second layer bases learned from 4 object categories.

Hierarchical probabilistic inference Generating posterior samples from faces by “filling in” experiments (cf. Lee and Mumford, 2003). Combine bottom-up and top-down inference. Input images Samples from feedforward Inference (control) Aglioti et al., 1994; Halligan et al., 1993; Weinstein, 1969; Ramachandran, 1998; Halligan et al., 1993; Sadato et al., 1996; Halligan et al., 1999 Samples from Full posterior inference

Key issue in feature learning: Scaling up

Scaling up with graphics processors US$ 250 NVIDIA GPU Peak GFlops http://www.cbsnews.com/stories/2000/06/29/tech/main210684.shtml: 12.3 Tflops, $110 million, used to simulate nuclear weapon testing. Like 13 graphics cards costing $250 each. 40 people with US$250 graphics card  #18 on top supercomputers list 2 years back. http://www.top500.org/list/2006/11/100 Intel CPU 2003 2004 2005 2006 2007 2008 (Source: NVIDIA CUDA Programming Guide)

Approx. number of parameters (millions): Scaling up with GPUs Approx. number of parameters (millions): Using GPU (Raina et al., 2009)

Unsupervised feature learning: Does it work?

State-of-the-art task performance Audio State-of-the-art task performance TIMIT Phone classification Accuracy Prior art (Clarkson et al.,1999) 79.6% Stanford Feature learning 80.3% TIMIT Speaker identification Accuracy Prior art (Reynolds, 1995) 99.7% Stanford Feature learning 100.0% Images CIFAR Object classification Accuracy Prior art (Yu and Zhang, 2010) 74.5% Stanford Feature learning 75.5% NORB Object classification Accuracy Prior art (Ranzato et al., 2009) 94.4% Stanford Feature learning 96.2% Video UCF activity classification Accuracy Prior art (Kalser et al., 2008) 86% Stanford Feature learning 87% Hollywood2 classification Accuracy Prior art (Laptev, 2004) 47% Stanford Feature learning 50% Multimodal (audio/video) AVLetters Lip reading Accuracy Prior art (Zhao et al., 2009) 58.9% Stanford Feature learning 63.1%

Instead of hand-tuning features, use unsupervised feature learning! Summary Instead of hand-tuning features, use unsupervised feature learning! Sparse coding, LCC. Advanced topics: Self-taught learning Deep learning Scaling up

Workshop page: http://ufldl.stanford.edu/eccv10-tutorial/ Other resources Workshop page: http://ufldl.stanford.edu/eccv10-tutorial/ Code for Sparse coding, LCC. References. Full online tutorial.