Boltzmann Machines and their Extensions S. M. Ali Eslami Nicolas Heess John Winn March 2013 Heriott-Watt University.

Slides:



Advertisements
Similar presentations
The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel
Advertisements

Greedy Layer-Wise Training of Deep Networks
Deep Belief Nets and Restricted Boltzmann Machines
Deep Learning Bing-Chen Tsai 1/21.
CIAR Second Summer School Tutorial Lecture 2a Learning a Deep Belief Net Geoffrey Hinton.
CS590M 2008 Fall: Paper Presentation
Advanced topics.
Nathan Wiebe, Ashish Kapoor and Krysta Svore Microsoft Research ASCR Workshop Washington DC Quantum Deep Learning.
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
The Shape Boltzmann Machine S. M. Ali Eslami Nicolas Heess John Winn CVPR 2012 Providence, Rhode Island A Strong Model of Object Shape.
ImageNet Classification with Deep Convolutional Neural Networks
Generative Models of Images of Objects S. M. Ali Eslami Joint work with Chris Williams Nicolas Heess John Winn June 2012 UoC TTI.
Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
LOCUS (Learning Object Classes with Unsupervised Segmentation) A variational approach to learning model- based segmentation. John Winn Microsoft Research.
CIAR Summer School Tutorial Lecture 2b Learning a Deep Belief Net
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
Abstract We present a model of curvilinear grouping using piecewise linear representations of contours and a conditional random field to capture continuity.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Deep Boltzman machines Paper by : R. Salakhutdinov, G. Hinton Presenter : Roozbeh Gholizadeh.
Cue Integration in Figure/Ground Labeling Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, U.C. Berkeley We present a model of edge and region grouping.
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Deep Boltzmann Machines
Techniques for Estimating Layers from Polar Radar Imagery Jerome E. Mitchell, Geoffrey C. Fox, and David J. Crandall :: CReSIS NSF Site Visit :: School.
Learning Multiplicative Interactions many slides from Hinton.
Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient Tijmen Tieleman University of Toronto (Training MRFs using new algorithm.
Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.
Markov Random Fields Probabilistic Models for Images
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient Tijmen Tieleman University of Toronto.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 18 Learning Boltzmann Machines Geoffrey Hinton.
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.
CSC2515 Lecture 10 Part 2 Making time-series models with RBM’s.
Markov Random Fields & Conditional Random Fields
Cognitive models for emotion recognition: Big Data and Deep Learning
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Deep learning Tsai bing-chen 10/22.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
CSC2535 Lecture 5 Sigmoid Belief Nets
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.
CSC 2535: Computation in Neural Networks Lecture 10 Learning Deterministic Energy-Based Models Geoffrey Hinton.
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Learning Deep Generative Models by Ruslan Salakhutdinov
Energy models and Deep Belief Networks
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
A Practical Guide to Training Restricted Boltzmann Machines
CSC321: Neural Networks Lecture 19: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Multimodal Learning with Deep Boltzmann Machines
Dynamical Statistical Shape Priors for Level Set Based Tracking
Structure learning with deep autoencoders
Adversarially Tuned Scene Generation
Markov Networks.
An introduction to Graphical Models – Michael Jordan
Regulation Analysis using Restricted Boltzmann Machines
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
Machine learning overview
CSC 2535: Computation in Neural Networks Lecture 9 Learning Multiple Layers of Features Greedily Geoffrey Hinton.
Deeplearningbook.org.
Autoencoders David Dohan.
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

Boltzmann Machines and their Extensions S. M. Ali Eslami Nicolas Heess John Winn March 2013 Heriott-Watt University

Goal Define a probabilistic distribution on images like this: 2

What can one do with an ideal shape model? 3 Segmentation

Weizmann horse dataset 4 Sample training images 327 images

What can one do with an ideal shape model? 5 Image

What can one do with an ideal shape model? 6 Computer graphics

Energy based models 7 Gibbs distribution

Shallow architectures 8 Mean

Shallow architectures 9 MRF

Existing shape models 10 Most commonly used architectures MRFMean sample from the model

What is a strong model of shape? We define a strong model of object shape as one which meets two requirements: 11 Realism Generates samples that look realistic Generalization Can generate samples that differ from training images Training images Real distribution Learned distribution

Shallow architectures 12 HOP-MRF

Shallow architectures 13 RBM

Shallow architectures 14 The effect of the latent variables can be appreciated by considering the marginal distribution over the visible units: Restricted Boltzmann Machines

In fact, the hidden units can be summed out analytically. The energy of this marginal distribution is given by: Shallow architectures 15 Restricted Boltzmann Machines where

All hidden units are conditionally independent given the visible units and vice versa. Shallow architectures 16 Restricted Boltzmann Machines

RBM inference 17 Block-Gibbs MCMC

RBM inference 18 Block-Gibbs MCMC

RBM learning Maximize with respect to 19 Stochastic gradient descent

RBM learning Getting an unbiased sample of the second term, however is very difficult. It can be done by starting at any random state of the visible units and performing Gibbs sampling for a very long time. Instead: 20 Contrastive divergence

RBM inference 21 Block-Gibbs MCMC

RBM inference 22 Block-Gibbs MCMC

RBM learning Crudely approximating the gradient of the log probability of the training data. More closely approximating the gradient of another objective function called the Contrastive Divergence, but it ignores one tricky term in this objective function so it is not even following that gradient. Sutskever and Tieleman have shown that it is not following the gradient of any function. Nevertheless, it works well enough to achieve success in many significant applications. 23 Contrastive divergence

Deep architectures 24 DBM

Deep architectures 25 Deep Boltzmann Machines

Conditional distributions remain factorised due to layering. Deep architectures 26 Deep Boltzmann Machines

Shallow and Deep architectures 27 Modeling high-order and long-range interactions MRF RBM DBM

Deep Boltzmann Machines Probabilistic Generative Powerful Typically trained with many examples. We only have datasets with few training examples. 28 DBM

From the DBM to the ShapeBM 29 Restricted connectivity and sharing of weights DBMShapeBM Limited training data, therefore reduce the number of parameters: 1.Restrict connectivity, 2.Tie parameters, 3.Restrict capacity.

Shape Boltzmann Machine 30 Architecture in 2D Top hidden units capture object pose Given the top units, middle hidden units capture local (part) variability Overlap helps prevent discontinuities at patch boundaries

ShapeBM inference 31 Block-Gibbs MCMC image reconstructionsample 1sample n Fast: ~500 samples per second

ShapeBM learning Maximize with respect to 1.Pre-training Greedy, layer-by-layer, bottom-up, ‘Persistent CD’ MCMC approximation to the gradients. 2.Joint training Variational + persistent chain approximations to the gradients, Separates learning of local and global shape properties. 32 Stochastic gradient descent ~2-6 hours on the small datasets that we consider

Results

Weizmann horses – 327 images – hidden units Sampled shapes 34 Evaluating the Realism criterion Weizmann horses – 327 images Data FA Incorrect generalization RBM Failure to learn variability ShapeBM Natural shapes Variety of poses Sharply defined details Correct number of legs (!)

Weizmann horses – 327 images – hidden units Sampled shapes 35 Evaluating the Realism criterion Weizmann horses – 327 images This is great, but has it just overfit?

Sampled shapes 36 Evaluating the Generalization criterion Weizmann horses – 327 images – hidden units Sample from the ShapeBM Closest image in training dataset Difference between the two images

Interactive GUI 37 Evaluating Realism and Generalization Weizmann horses – 327 images – hidden units

Further results 38 Sampling and completion Caltech motorbikes – 798 images – hidden units Training images ShapeBM samples Sample generalization Shape completion

Constrained shape completion 39 Evaluating Realism and Generalization Weizmann horses – 327 images – hidden units ShapeBM NN

Further results 40 Constrained completion Caltech motorbikes – 798 images – hidden units ShapeBM NN

Imputation scores 1.Collect 25 unseen horse silhouettes, 2.Divide each into 9 segments, 3.Estimate the conditional log probability of a segment under the model given the rest of the image, 4.Average over images and segments. 41 Quantitative comparison Weizmann horses – 327 images – hidden units MeanRBMFAShapeBM Score

Multiple object categories Train jointly on 4 categories without knowledge of class: 42 Simultaneous detection and completion Caltech-101 objects – 531 images – hidden units Shape completion Sampled shapes

What does h 2 do? Weizmann horses Pose information 43 Multiple categories Class label information Number of training images Accuracy

What does h 2 do? 44

What does the overlap do? 45

Summary Shape models are essential in applications such as segmentation, detection, in-painting and graphics. The ShapeBM characterizes a strong model of shape: – Samples are realistic, – Samples generalize from training data. The ShapeBM learns distributions that are qualitatively and quantitatively better than other models for this task. 46

Questions MATLAB GUI available at