Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.

Slides:



Advertisements
Similar presentations
Deep Learning Early Work Why Deep Learning Stacked Auto Encoders
Advertisements

The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel
Greedy Layer-Wise Training of Deep Networks
Deep Belief Nets and Restricted Boltzmann Machines
Deep Learning Bing-Chen Tsai 1/21.
CIAR Second Summer School Tutorial Lecture 2a Learning a Deep Belief Net Geoffrey Hinton.
Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.
CS590M 2008 Fall: Paper Presentation
Advanced topics.
Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤.
CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
CIAR Summer School Tutorial Lecture 2b Learning a Deep Belief Net
Learning Deep Energy Models
CS 678 –Relaxation and Hopfield Networks1 Relaxation and Hopfield Networks Totally connected recurrent relaxation networks Bidirectional weights (symmetric)
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
Deep Learning Early Work Why Deep Learning Stacked Auto Encoders
Deep Belief Networks for Spam Filtering
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Deep Boltzman machines Paper by : R. Salakhutdinov, G. Hinton Presenter : Roozbeh Gholizadeh.
CS Instance Based Learning1 Instance Based Learning.
Document Classification using Deep Belief Nets Lawrence McAfee 6/9/08 CS224n, Sprint ‘08.
CIAR Second Summer School Tutorial Lecture 2b Autoencoders & Modeling time series with Boltzmann machines Geoffrey Hinton.
How to do backpropagation in a brain
Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.
CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.
Varieties of Helmholtz Machine Peter Dayan and Geoffrey E. Hinton, Neural Networks, Vol. 9, No. 8, pp , 1996.
Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient Tijmen Tieleman University of Toronto (Training MRFs using new algorithm.
Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.
Handwritten Recognition with Neural Network Chatklaw Jareanpon, Olarik Surinta Mahasarakham University.
Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient Tijmen Tieleman University of Toronto.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 18 Learning Boltzmann Machines Geoffrey Hinton.
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
How to learn a generative model of images Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.
CSC2515 Lecture 10 Part 2 Making time-series models with RBM’s.
Introduction to Deep Learning
Cognitive models for emotion recognition: Big Data and Deep Learning
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Deep learning Tsai bing-chen 10/22.
CSC2535 Lecture 5 Sigmoid Belief Nets
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
1 Restricted Boltzmann Machines and Applications Pattern Recognition (IC6304) [Presentation Date: ] [ Ph.D Candidate,
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Handwritten Digit Recognition Using Stacked Autoencoders
Learning Deep Generative Models by Ruslan Salakhutdinov
Energy models and Deep Belief Networks
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
Real Neurons Cell structures Cell body Dendrites Axon
Restricted Boltzmann Machines for Classification
Multimodal Learning with Deep Boltzmann Machines
Supervised Training of Deep Networks
Deep Learning Qing LU, Siyuan CAO.
Deep Belief Networks Psychology 209 February 22, 2013.
Structure learning with deep autoencoders
Unsupervised Learning and Autoencoders
Deep Learning Workshop
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CSSE463: Image Recognition Day 17
Deep Architectures for Artificial Intelligence
Deep Belief Nets and Ising Model-Based Network Construction
CSSE463: Image Recognition Day 17
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
CSC 2535: Computation in Neural Networks Lecture 9 Learning Multiple Layers of Features Greedily Geoffrey Hinton.
CSSE463: Image Recognition Day 17
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights and train subsequent RBM layers Then connect final outputs to a supervised model and train that model Finally, unfreeze all weights, and train full DBN with supervised model to fine-tune weights CS 678 – Deep Learning1 During execution typically iterate multiple times at the top RBM layer

Can use DBN as a Generative model to create sample x vectors 1. Initialize top layer to an arbitrary vector Gibbs sample (relaxation) between the top two layers m times If we initialize top layer with values obtained from a training example, then need less Gibbs samples 2. Pass the vector down through the network, sampling with the calculated probabilities at each layer 3. Last sample at bottom is the generated x value (can be real valued if we use the probability vector rather than sample) Alternatively, can start with an x at the bottom, relax to a top value, then start from that vector when generating a new x, which is the dotted lines version. More like standard Boltzmann machine processing. 2

3

DBN Learning Notes Each layer updates weights so as to make training sample patterns more likely (lower energy) in the free state (and non-training sample patterns less likely). This unsupervised approach learns broad features (in the hidden/subsequent layers of RBMs) which can aid in the process of making the types of patterns found in the training set more likely. This discovers features which can be associated across training patterns, and thus potentially helpful for other goals with the training set (classification, compression, etc.) Note still pairwise weights in RBMs, but because we can pick the number of hidden units and layers, we can represent any arbitrary distribution CS 678 – Deep Learning4

DBN Notes Can use lateral connections in RBM (no longer RBM) but sampling becomes more difficult – ala standard Boltzmann requiring longer sampling chains. – Lateral connections can capture pairwise dependencies allowing the hidden nodes to focus on higher order issues. Can get better results. Deep Boltzmann machine – Allow continual relaxation across the full network – Receive input from above and below rather than sequence through RBM layers – Typically for successful training, first initialize weights using the standard greedy DBN training with RBM layers – Requires longer sampling chains ala Boltzmann Conditional and Temporal RBMs – allow node probabilities to be conditioned by some other inputs – context, recurrence (time series changes in input and internal state), etc. CS 678 – Deep Learning5

Discrimination with Deep Networks Discrimination approaches with DBNs (Deep Belief Net) – Use outputs of DBNs as inputs to supervised model (i.e. just an unsupervised preprocessor for feature extraction) Basic approach we have been discussing – Train a DBN for each class. For each clamp the unknown x and iterate m times. The DBN that ends with the lowest normalized free energy (softmax variation) is the winner. – Train just one DBN for all classes, but with an additional visible unit for each class. For each output class: Clamp the unknown x, initiate the visible node for that class high, relax, and then see which final state has lowest free energy – no need to normalize since all energies come from the same network. Or could just clamp x, relax, and see which class node ends up with the highest probability See CS 678 – Deep Learning6

MNIST 7

DBN Project Notes To be consistent just use 28×28 (764) data set of gray scale values (0-255) – Normalize to 0-1 – Could try better preprocessing if want, but start with this – Small random initial weights Last year only slight improvements over baseline MLP (which gets high 90's) – Best results when using real net values vs. sampling – Fine tuning helps Parameters – Hinton Paper, others – do a little searching and me a reference for extra credit points – Still figuring out CS 678 – Deep Learning8

Conclusion Much recent excitement, still much to be discovered Biological Plausibility Potential for significant improvements Good in structured/Markovian spaces – Important research question: To what extent can we use Deep Learning in more arbitrary feature spaces? – Recent deep training of MLPs with BP shows potential in this area CS 678 – Deep Learning9