Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton

Slides:



Advertisements
Similar presentations
The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel
Advertisements

Greedy Layer-Wise Training of Deep Networks
Deep Belief Nets and Restricted Boltzmann Machines
Deep Learning Bing-Chen Tsai 1/21.
CIAR Second Summer School Tutorial Lecture 2a Learning a Deep Belief Net Geoffrey Hinton.
Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.
CS590M 2008 Fall: Paper Presentation
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Advanced topics.
Nathan Wiebe, Ashish Kapoor and Krysta Svore Microsoft Research ASCR Workshop Washington DC Quantum Deep Learning.
Tutorial on: Deep Belief Nets
What kind of a Graphical Model is the Brain?
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
CIAR Summer School Tutorial Lecture 2b Learning a Deep Belief Net
Deep Learning.
How to do backpropagation in a brain
Restricted Boltzmann Machines and Deep Belief Networks
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Can computer simulations of the brain allow us to see into the mind? Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto.
CIAR Second Summer School Tutorial Lecture 2b Autoencoders & Modeling time series with Boltzmann machines Geoffrey Hinton.
How to do backpropagation in a brain
Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.
CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.
An efficient way to learn deep generative models Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of.
CIAR Second Summer School Tutorial Lecture 1b Contrastive Divergence and Deterministic Energy-Based Models Geoffrey Hinton.
Varieties of Helmholtz Machine Peter Dayan and Geoffrey E. Hinton, Neural Networks, Vol. 9, No. 8, pp , 1996.
Highlights of Hinton's Contrastive Divergence Pre-NIPS Workshop Yoshua Bengio & Pascal Lamblin USING SLIDES FROM Geoffrey Hinton, Sue Becker & Yann Le.
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.
CIAR Second Summer School Tutorial Lecture 1a Sigmoid Belief Nets and Boltzmann Machines Geoffrey Hinton.
CSC321: Neural Networks Lecture 24 Products of Experts Geoffrey Hinton.
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
CSC2535 Lecture 4 Boltzmann Machines, Sigmoid Belief Nets and Gibbs sampling Geoffrey Hinton.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 18 Learning Boltzmann Machines Geoffrey Hinton.
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
How to learn a generative model of images Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.
Boltzman Machines Stochastic Hopfield Machines Lectures 11e 1.
Cognitive models for emotion recognition: Big Data and Deep Learning
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Deep learning Tsai bing-chen 10/22.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
CSC2535 Lecture 5 Sigmoid Belief Nets
CSC2515 Fall 2008 Introduction to Machine Learning Lecture 8 Deep Belief Nets All lecture slides will be available as.ppt,.ps, &.htm at
Neural Networks William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Learning Deep Generative Models by Ruslan Salakhutdinov
Supplementary readings:
Energy models and Deep Belief Networks
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
All lecture slides will be available as .ppt, .ps, & .htm at
Restricted Boltzmann Machines for Classification
LECTURE ??: DEEP LEARNING
CSC321: Neural Networks Lecture 19: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Multimodal Learning with Deep Boltzmann Machines
Deep Learning Qing LU, Siyuan CAO.
Deep Belief Networks Psychology 209 February 22, 2013.
Restricted Boltzman Machines
Read Chapter of Russell and Norvig
Deep Architectures for Artificial Intelligence
ECE 599/692 – Deep Learning Lecture 9 – Autoencoder (AE)
Deep Belief Nets and Ising Model-Based Network Construction
2007 NIPS Tutorial on: Deep Belief Nets
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
CSC 2535: Computation in Neural Networks Lecture 9 Learning Multiple Layers of Features Greedily Geoffrey Hinton.
Products of Experts Products of Experts, ICANN’99, Geoffrey E. Hinton, 1999 Training Products of Experts by Minimizing Contrastive Divergence, GCNU TR.
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton Introduction to DBN Jiaxing Tan Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton

Outline Short-comings for label based back-propagation What is Belief Nets? Restricted Boltzmann Machine Deep Belief Network Example

Outline Short-comings for label based back-propagation What is Belief Nets? Restricted Boltzmann Machine Deep Belief Network Example

What is wrong with back-propagation?(May not true now) It requires labeled training data. – Almost all data is unlabeled. The learning time does not scale well – It is very slow in networks with multiple hidden layers.

Overcoming the limitations of backpropagation Keep the efficiency and simplicity of using a gradient method for adjusting the weights, but use it for modeling the structure of the sensory input. Adjust the weights to maximize the probability that a generative model would have produced the sensory input. Learn p(image) not p(label | image): To recognize shapes, First learn to generate images.

Outline Short-comings for label based back-propagation What is Belief Nets? Restricted Boltzmann Machine Deep Belief Network Example

Belief Nets A belief net is a directed acyclic(no directed cycle) graph composed of stochastic variables. We get to observe some of the variables and we would like to solve two problems: The inference problem: Infer the states of the unobserved variables. The learning problem: Adjust the interactions between variables to make the network more likely to generate the observed data. stochastic hidden cause visible effect We will use nets composed of layers of stochastic binary variables with weighted connections

The learning rule for sigmoid belief nets Learning is easy if we can get an unbiased sample from the posterior distribution over hidden states given the observed data. For each unit, maximize the log probability that its binary state in the sample from the posterior would be generated by the sampled binary states of its parents. Sj=0 Si=0,1 (No Change No Meaning) Sj=1 Si=0,1(Change) j i

Outline Short-comings for label based back-propagation What is Belief Nets? Restricted Boltzmann Machine Deep Belief Network Example

Restricted Boltzmann Machines We restrict the connectivity to make learning easier. Only one layer of hidden units. We will deal with more layers later No connections between hidden units. In an RBM, the hidden units are conditionally independent given the visible states. So we can quickly get an unbiased sample from the posterior distribution when given a data-vector. (Just multiplied with W) This is a big advantage over directed belief nets (Explain Away->Conditional relationship) hidden j i visible

The Energy of a joint configuration (ignoring terms to do with biases) Boltzmann and Brownian motion(Finance) binary state of visible unit i binary state of hidden unit j Energy with configuration v on the visible units and h on the hidden units weight between units i and j Minus: Free Energy Only v and h is related =>MLE

Weights  Energies  Probabilities Each possible joint configuration of the visible and hidden units has an energy The energy is determined by the weights and biases (as in a Hopfield net). The energy of a joint configuration of the visible and hidden units determines its probability: The probability of a configuration over the visible units is found by summing the probabilities of all the joint configurations that contain it.

Using energies to define probabilities The probability of a joint configuration over both visible and hidden units depends on the energy of that joint configuration compared with the energy of all other joint configurations. The probability of a configuration of the visible units is the sum of the probabilities of all the joint configurations that contain it. partition function

A picture of the maximum likelihood learning algorithm for an RBM j j j j a fantasy Questions??? i i i i t = 0 t = 1 t = 2 t = infinity Expected value of product of states at thermal equilibrium when the training vector is clamped on the visible units Start with a training vector on the visible units. Then alternate between updating all the hidden units in parallel and updating all the visible units in parallel. (Gibbs Sampling with v and h) Then change w. The Beauty of the model is we do not need to change weight during sampling

A quick way to learn an RBM j j Contrastive Divergence Start with a training vector on the visible units. Update all the hidden units in parallel Update the all the visible units in parallel to get a “reconstruction”. Update the hidden units again. No Questions??? i i t = 0 t = 1 data reconstruction Millions time fast=100 less computation x computer developed 10,000 faster in 17 years

What is Next? How to use RBM M visual states, N hidden states Suppose we have weight matrix W Wij means w(vi->hj) Given a new data Represented by visualble state vi Calculate the activation For each hidden state hj, calculate the probability of hj to be open What is Next? http://blog.163.com/silence_ellen/blog/static/176104222201431710264087/ (Chinese Website)

How to use RBM (cont’) To decide wheter hj is open or not, sample u from a Uniform distribution: Then we decide open or not by: Given W(hi->vj) and a input Y, generate a sample X. Similar Process http://blog.163.com/silence_ellen/blog/static/176104222201431710264087/ (Chinese Website)

Outline Short-comings for label based back-propagation What is Belief Nets? Restricted Boltzmann Machine Deep Belief Network Example

Training a deep network etc. h2 Tied Weights CD for limited condition Structure loos familiar? v2 h1 v1 h0 v0

Training a deep network First train a layer of features that receive input directly from the pixels. Then treat the activations of the trained features as if they were pixels and learn features of features in a second hidden layer. (greedy layer-wise training) It can be proved that each time we add another layer of features we improve a variational lower bound on the log probability of the training data. The proof is slightly complicated. But it is based on a neat equivalence between an RBM and a deep directed model

Fine-tune: Wake-sleep algorithm (Actually Sleep and Wake) wake phase: do a down-top pass, sample h using the recognition weight based on input v for each RBM, and then adjust the generative weight by the RBM learning rule. if the reality is different with the imagination, then modify the generative weights to make what is imagined as close as the reality. sleep phase: do a top-down pass, start by a random state of h at the top layer and generate v. Then the recognition weights are modified. if the illusions produced by the concepts learned during wake phase are different with the concepts, then modify the recognition weight to make the illusions as close as the concepts.

A neural model of digit recognition 2000 top-level neurons 10 label neurons 500 neurons The model learns to generate combinations of labels and images. To perform recognition we start with a neutral state of the label units and do an up-pass from the image followed by a few iterations of the top-level associative memory. 500 neurons 28 x 28 pixel image

Outline Short-comings for label based back-propagation What is Belief Nets? Restricted Boltzmann Machine Deep Belief Network Example

Show DBN http://www.cs.toronto.edu/~hinton/adi/index.htm

Reference Materials Hinton, G. E., Osindero, S. and Teh, Y. (2006) A fast learning algorithm for deep belief nets. Neural Computation, 18, pp 1527-1554. Hinton, G. E.  Learning multiple layers of representation. Trends in Cognitive Sciences, Vol. 11, pp 428-434. Hinton, G. E. (2007) To recognize shapes, first learn to generate images. In P. Cisek, T. Drew and J. Kalaska (Eds.) Computational Neuroscience: Theoretical Insights into Brain Function. Elsevier. Hinton, G. E., Dayan, P., Frey, B. J. and Neal, R. The wake-sleep algorithm for unsupervised Neural Networks.

Questions?