Csc2535 2013 Lecture 8 Modeling image covariance structure Geoffrey Hinton.

Slides:



Advertisements
Similar presentations
The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel
Advertisements

Deep learning with multiplicative interactions
Deep Learning Bing-Chen Tsai 1/21.
CS590M 2008 Fall: Paper Presentation
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.
Advanced topics.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
CSC321: Neural Networks Lecture 3: Perceptrons
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Radial Basis Function Networks
CSC 2535: 2013 Lecture 3b Approximate inference in Energy-Based Models
CIAR Second Summer School Tutorial Lecture 2b Autoencoders & Modeling time series with Boltzmann machines Geoffrey Hinton.
How to do backpropagation in a brain
Learning Multiplicative Interactions many slides from Hinton.
CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.
CSC 2535: Computation in Neural Networks Lecture 10 Learning Deterministic Energy-Based Models Geoffrey Hinton.
CIAR Second Summer School Tutorial Lecture 1b Contrastive Divergence and Deterministic Energy-Based Models Geoffrey Hinton.
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
CSC321: Neural Networks Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis Geoffrey Hinton.
CSC Lecture 8a Learning Multiplicative Interactions Geoffrey Hinton.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
Markov Random Fields Probabilistic Models for Images
CSC321: Introduction to Neural Networks and Machine Learning Lecture 22: Transforming autoencoders for learning the right representation of shapes Geoffrey.
Learning to perceive how hand-written digits were drawn Geoffrey Hinton Canadian Institute for Advanced Research and University of Toronto.
CSC321: Neural Networks Lecture 24 Products of Experts Geoffrey Hinton.
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 18 Learning Boltzmann Machines Geoffrey Hinton.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
CSC Lecture 6a Learning Multiplicative Interactions Geoffrey Hinton.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
Neural Networks and Deep Learning Slides credit: Geoffrey Hinton and Yann LeCun.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.
CSC2515 Lecture 10 Part 2 Making time-series models with RBM’s.
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.
CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.
Instructor: Mircea Nicolescu Lecture 5 CS 485 / 685 Computer Vision.
CSC321: Neural Networks Lecture 1: What are neural networks? Geoffrey Hinton
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
CSC2535 Lecture 5 Sigmoid Belief Nets
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.
CSC 2535: Computation in Neural Networks Lecture 10 Learning Deterministic Energy-Based Models Geoffrey Hinton.
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
1 Nonlinear models for Natural Image Statistics Urs Köster & Aapo Hyvärinen University of Helsinki.
CSC321: Neural Networks Lecture 9: Speeding up the Learning
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Learning Deep Generative Models by Ruslan Salakhutdinov
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
Article Review Todd Hricik.
Feature description and matching
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
Feature descriptors and matching
Intensity Transformation
Presentation transcript:

csc Lecture 8 Modeling image covariance structure Geoffrey Hinton

Test examples from the CIFAR-10 dataset plane car bird cat deer dog frog horse ship truck

Application to the CIFAR-10 labeled subset of the TINY images dataset (Marc’Aurelio Ranzato) There are x32 training images and x32 testing images for each of 10 different classes. –In addition, there are 80 million unlabeled images. Train the mcRBM model on a very large number of 8x8 color patches –81 hiddens for the mean –144 hiddens and 900 factors for the precision Replicate the patches across the 32x32 color images –49 patches with a stride of 4 –This gives 49 x 225 = hidden units.

How well does it discriminate? Compare with Gaussian-Binary RBM model that has the same number of hidden units, but only models the means of the pixel intensities. Use multinomial logistic regression directly on the hidden units representing the means and the hidden units representing the precisions. –We can probably do better, but the aim is to evaluate the mcRBM idea. Also try unsupervised learning of extra hidden layers with a standard RBM to see if this gives even better features for discrimination.

Change of Topic Modeling the covariance structure of image patches

Generating the parts of an object: why multiplicative interactions are useful One way to maintain the constraints between the parts is for the level above to specify the location of each part very accurately –But this would require a lot of communication bandwidth. Sloppy top-down specification of the parts is less demanding –but it messes up relationships between parts –so use redundant features and specify lateral interactions to sharpen up the mess. Each part helps to locate the others –This allows a noisy top-down channel

Generating the parts of an object sloppy top-down activation of parts clean-up using lateral interactions specified by the layer above. pose parameters parts with top- down support “square” + Its like soldiers on a parade ground

Towards a more powerful, multi-linear stackable learning module We want the states of the units in one layer to modulate the pair-wise interactions in the layer below (not just the biases) –Can we do this without losing the nice property that the hidden units are conditionally independent given the visible states?

Modeling the covariance structure of a static image by using two copies of the image Each factor sends the squared output of a linear filter to the hidden units. It is exactly the standard model of simple and complex cells. It allows complex cells to extract oriented energy. The standard model drops out of doing belief propagation for a factored third-order energy function. Copy 1Copy 2

What is a vertical edge? An intensity difference? A color difference? A texture difference? A depth difference? A motion difference? A combination of several of these? Is there a single simple definition of a vertical edge that covers all of these cases?

An advantage of modeling covariances between pixels rather than pixels During generation, a hidden “vertical edge” unit can turn off the horizontal interpolation in a region without worrying about exactly where the intensity discontinuity will be. –This gives some translational invariance –It also gives a lot of invariance to brightness and contrast. –The “vertical edge” unit acts like a complex cell. By modulating the correlations between pixels rather than the pixel intensities, the generative model can still allow interpolation parallel to the edge.

Using linear filters to model the inverse covariance matrix of two pixel intensities The joint distribution of 2 pixels Each factor creates a parabolic energy trough. small weight big weight

Modulating the precision matrix by using additive contributions that can be switched off Use the squared outputs of a set of linear filters to create an energy function. –The energy function represents the negative log probability of the data under a full covariance Gaussian. Adapt the precison matrix to each datapoint by switching off the energy contributions from some of the linear filters. –This is good for modeling smoothness constraints that almost always apply, but sometimes fail catastrophically (e.g. at edges).

Using binary hidden units to remove violated smoothness constraints When the negative input from the squared filter exceeds the positive bias, the hidden unit turns off. filter output, y  Free energy 

Inference with hidden units that represent active smoothness constraints The hidden units are all independent given the pixel intensities –The factors do not create dependencies between hidden units. Given the states of the hidden units, the pixel intensity distribution is a full covariance Gaussian that is adapted for that particular image. –The hidden states do create dependencies between the pixels.

Learning with an adaptive precision matrix Since the pixel intensities are no longer independent given the hidden states, it is much harder to produce reconstructions. –We could invert the precision matrix for each training example, but this is slow. Instead, we produce reconstructions using Hybrid Monte Carlo, starting at the data. –The rest of the learning algorithm is the same as before.

Hybrid Monte Carlo Given the pixel intensities, we can integrate out the hidden states to get a free energy that is a deterministic function of the image. –Backpropagation can then be used to get the derivatives of the free energy with respect to the pixel intensities. Hybrid Monte Carlo simulates a particle that starts at the datapoint with a random initial momentum and then moves over the free energy surface. –20 leapfrog steps work well for our networks.

mcRBM (mean and covariance RBM) Use one set of binary hidden units to model the means of the real-valued pixels. –These hidden units learn blurry patterns for coloring in regions Use a separate set of binary hidden units to model the image-specific precision matrix. –These hidden units get their input from factors. –The factors learn sharp edge filters for representing breakdowns in smoothness.

A product of a mean expert and a covariance expert mean expert covariance expert 0

Multiple reconstructions from the same hidden state of a mcRBM The mcRBM hidden states are the same for each row. The hidden states should reflect human similarity judgements much better than squared difference of pixel intensities.

Receptive fields of the hidden units that represent the means Trained on 16x16 patches of natural images.

Receptive fields of the factors that are used to represent precisions Notice the color blob with low frequency red-green and yellow- blue filters

Why is the map topographic? We laid out the factors in a 2-D grid and then connected each hidden unit to a small set of nearby factors. If two factors get activated at the same time, it pays to connect them to the same hidden unit. –You only lose once by turning off that hidden unit.

Summary RBM’s can be modified to allow factored multiplicative interactions. Inference is still easy. –Learning is still easy if we condition on one set of inputs (the pre-image for learning image transformations; the style for learning mocap) Multiplicative interactions allow an RBM to model pixel covariances within one image in an image-specific way. –Unbiased reconstructions from the hidden units are hard to compute because we need to invert a precision matrix. –We can avoid the inversion by using Hybrid Monte Carlo in image space.

Percent correct on CIFAR-10 test data Gaussian RBM (only models the means) 49x225 = hiddens 59.7% 3-way RBM (only models the covariances) 49x225 = hiddens, 225 filters per patch 62.3% 3-way RBM (only models the covariances) 49x225 = hiddens, 900 filters per patch (extra factors allow pooling of similar filters) 67.8% mcRBM (models means & covariances) 49x(81+144) = hids, 900 filters per patch 69.1% mcRBM then extra hidden layer of 8096 units 49x(81+144) = hids, 900 filters per patch 72.1%