Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov 2009 1.

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori CS @ Simon Fraser University 27 Nov 2009 1

CRBMs for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori CS @ Simon Fraser University 27 Nov 2009 2

Problems Human detection Handwritten digit classification 3

Sliding Window Approach 4

Sliding Window Approach (Cont’d) 5 [INRIA Person Dataset] Decision Boundary

Success or Failure of an object recognition algorithm hinges on the features used Input Feature representation Label Our Focus Classifier ? Human Background 0 / 1 / 2 / 3 / … 6 Learning

Local Feature Detector Hierarchies 7 Larger More complicated Less frequent

Generative & Layerwise Learning 8 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Generative CRBM ? ? ? ? ? ? ? ? ?? ? ?

Visual Features: Filtering 9 10 20-2 10 Filter Kernel (Feature) 01 -202 01 0 -2 10 210 Filter Response

Our approach to feature learning is generative ? ? ? Binary Hidden Variables 10 (CRBM model)

Related Work 11

Related Work Convolutional Neural Network (CNN) – Filtering layers are bundled with a classifier, and all the layers are learned together using error backpropagation. – Does not perform well on natural images Biologically plausible models – Hand-crafted first layer vs. Randomly selected prototypes for second layer. [Lecun et al. 98] [Ranzato et al. CVPR'07] [Serre et al., PAMI'07][Mutch and Lowe, CVPR'06] 12 Discriminative No Learning

Related Work (cont’d) Deep Belief Net – A two layer partially observed MRF, called RBM, is the building block – Learning is performed unsupervised and layer-by- layer from bottom layer upwards Our contributions: We incorporate spatial locality into RBMs and adapt the learning algorithm accordingly We add more complicated components such as pooling and sparsity into deep belief nets [Hinton et al., NC'2006] 13 Generative & Unsupervised

Why Generative &Unsupervised Discriminative learning of deep and large neural networks has not been successful – Requires large training sets – Easily gets over-fitted for large models – First layer gradients are relatively small Alternative hybrid approach – Learn a large set of first layer features generatively – Switch to a discriminative model to select the discriminative features from those that are learned – Discriminative fine-tuning is helpful

Details 15

CRBM Image is the visible layer and hidden layer is related to filter responses An energy based probabilistic model 16 Dot product of vectorized matrices

Training CRBMs Maximum likelihood learning of CRBMs is difficult Contrastive Divergence (CD) learning is applicable For CD learning we need to compute the conditionals and. data 17 sample

CRBM (Backward) Nearby hidden variables cooperate in reconstruction Conditional Probabilities take the form 18

Learning the Hierarchy The structure is trained bottom up and layerwise The CRBM model for training filtering layers Filtering layers are followed by down-sampling CRBM Classifier Pooling 19 Filtering Non-linearity Reduce the dimensionality layers

Input 1 st Filters2 nd Filters Responses 1 324

Experiments 21

Evaluation MNIST digit dataset Training set: 60,000 image of digits of size 28x28 Test set: 10,000 images INRIA person dataset Training set: 2416 person windows of size 128 x 64 pixels and 4.5x10 6 negative windows Test set: 1132 positive and 2x10 6 negative windows 22

First layer filters Gray-scale images of INRIA positive set 15 filters of 7x7 23 MNIST unlabeled digits 15 filters of 5x5

Second Layer Features (MNIST) Hard to visualize the filters We show patches highly responded to filters: 24

Second Layer Features (INRIA) 25

MNIST Results MNIST error rate when model is trained on the full training set 26

Results 27 False Positive

1 st 28

2 nd 29

3 rd 30

4 th 31

5 th 32

INRIA Results Adding our large-scale features significantly improves performance of the baseline (HOG) 33

Conclusion We extended the RBM model to Convolutional RBM, useful for domains with spatial locality We exploited CRBMs to train local hierarchical feature detectors one layer at a time and generatively This method obtained results comparable to state-of-the-art in digit classification and human detection 34

Thank You 35

Hierarchical Feature Detector 36 ??? ??? ???

Contrastive Divergence Learning 37

Training CRBMs (Cont'd) The problem of reconstructing border region becomes severe when number of Gibbs sampling steps > 1. – Partition visible units into middle and border regions Instead of maximizing the likelihood, we (approximately) maximize

Enforcing Feature Sparsity The CRBM's representation is K (number of filters) times overcomplete After a few CD learning iterations, V is perfectly reconstructed Enforce sparsity to tackle this problem – Hidden bias terms were frozen at large negative values Having a single non-sparse hidden unit improves the learned features – Might be related to the ergodicity condition

Probabilistic Meaning of Max 12 3456 1 234 Max 1 2 3456 1 122

The Classifier Layer We used SVM as our final classifier – RBF kernel for MNIST – Linear kernel for INRIA – For INRIA we combined our 4 th layer outputs and HOG features We experimentally observed that relaxing the sparsity of CRBM's hidden units yields better results – This lets the discriminative model to set the thresholds itself

Why HOG features are added? Because part-like features are very sparse Having a template of the human figure helps a lot f

RBM Two layer pairwise MRF with a full set of hidden-visible connections RBM Is an energy based model Hidden random variables are binary, Visible variables can be binary or continuous Inference is straightforward: and Contrastive Divergence learning for training h v w

Why Unsupervised Bottom-Up Discriminative learning of deep structure has not been successful – Requires large training sets – Easily is over-fitted for large models – First layer gradients are relatively small Alternative hybrid approach – Learn a large set of first layer features generatively – Later, switch to a discriminative model to select the discriminative features from those learned – Fine-tune the features using

INRIA Results (Cont'd) Missrate at different FPPW rates FPPI is a better indicator of performance More experiments on size of features and number of layers are desired

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov 2009 1.

Similar presentations

Presentation on theme: "Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov 2009 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov 2009 1.

Similar presentations

Presentation on theme: "Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov 2009 1."— Presentation transcript:

Similar presentations

About project

Feedback