AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/2014 1.

AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/2014 1

Outline 2  Introduction  Framework for feature learning  Unsupervised feature learning algorithms  Effect of some parameters  Experiments and analysis on the results

Introduction 3  1. Much work focused on employing complex unsupervised feature learning algorithm.  2. Simple factors, such as the number of hidden nodes may be more important to achieving high performance than the learning algorithm or the depth of the model.  3. Using only one single layer network can get very good feature learning results.

Unsupervised feature learning framework 4 1>. extract random patches from unlabeled training images (choose image as example) 2>. apply a pre-processing stage to the patches 3>. learn a feature-mapping using an unsupervised feature learning algorithm 4>. extract features from equally spaced sub-patches covering the input images 5>. pool features together to reduce the number of feature values 6>. train a linear classifier to predict the labels given the feature vectors

Unsupervised learning algorithm 5  1. Sparse autoencoder  2. Sparse restricted Boltzmann machine  3. K-means clustering  4. Gaussian mixture models clustering

Sparse auto-encoder 6  Objective function (minimize):  Feature mapping function:

Sparse restricted Boltzman machine 7  Energy function of an RBM is :  The same type of sparsity penalty can be added like in the sparse autoencoder  Sparse RBMs can be trained using a contrastive divergence approximation [7]  Feature mapping function:

K-means clustering 8  Object function for learning K centroids  Feature mapping function  1> hard-assignment  2> soft-assignment

GMM clustering 9  Gaussian mixture models: A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters.

GMM(Gaussian mixture models) 10

EM algorithm 11  EM(expectation-maximization) algorithm is an iterative method for finding maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models.  E-step : assign points to clusters  M-step : estimate model parameters

Gaussian mixtures 12  Feature mapping function:

Feature extraction and classification 13  Convolutional feature extraction and pooling(sum)  Classification : (L2) SVM

Data 14  1. CIFAR-10 (this data is used to tune the parameters)  2. NORB  3. downsampled STL(96*96 --> 32*32)

CIFAR10 dataset 15 The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images [3]

NORB dataset 16  This dataset is intended for experiments in 3D object recognition from shape. It contains images of 50 toys belonging to 5 generic categories: animals, human figures, airplanes, trucks, and cars. 24,300 training image pairs (96*96), 24300 test image pairs. [4]

STL-10 dataset 17 The STL-10 dataset consists of 5200 64x64 color images and 3200 test images in 4 classes, airplane, cat, car and dog. There are 50000 training images and 10000 test images. [5]

Effect elements 18  1. with or without whitening  2. number of features  3. stride(spacing between patches)  4. receptive field size

Effect of whitening 19  Result of whitening: 1. the features are less correlated with each other 2. the features all have the same variance  For sparse autoencoder and sparse RBM when using only 100 features, significant benefit from whitening preprocessing when the number of features getting bigger, the advantage disappeared  For clustering algorithms The whitening is a must have step because they cannot handle the correlations in the data.

Effect of number of features 20  Num of features used: 100, 200, 400, 800, 1600  All algorithms generally achieved higher performance by learning more features

Effect of stride 21  Stride is the spacing between patches where feature values will be extracted  Downward performance with increasing step size

Effect of receptive field size 22  Receptive field size is the patch size.  Overall, the 6 pixel receptive field size worked best.

Classification results 23 AlgorithmAccuracy Raw pixels 3-way factored RBM (3 layers) Mean-covariance RBM (3 layers) Improved Local Coord. Coding Conv. Deep Belief Net (2 layers) 37.3% 65.3% 71.0% 74.5% 78.9% Sparse auto-encoder Sparse RBM K-means (Hard) K-means (Triangle, 1600 features) k-means (Triangle, 4000 features) 73.4% 72.4% 68.6% 77.9% 79.6% Table 1: Test recognition accuracy on CIFAR-10 stride = 1, receptive field = 6, with whitening, large number of features

Classification results 24 AlgorithmAccuracy(error) Conv. Neural Network Deep Boltzmann Machine Deep Belief Network Best result of [6] Deep neural network 93.4% (6.6%) 92.8% (7.2%) 95.0% (5.0%) 94.4% (5.0%) 97.13% (2.87%) Sparse auto-encoder Sparse RBM K-means (Hard) K-means (Triangle, 1600 features) k-means (Triangle, 4000 features) 96.9% (3.1%) 96.2% (3.8%) 96.9% (3.1%) 97.0% (3.0%) 97.21% (2.79%) Table 2: Test recognition accuracy (and error) for NORB (normalized-uniform) stride = 1, receptive field = 6, with whitening, large number of features

Classification results 25 AlgorithmAccuracy Raw pixels K-means (Triangle 1600 features) 31.8% ( 0.62%) 51.5% ( 1.73%) Table 3: Test recognition accuracy on STL-10 The method proposed is strongest when we have large labeled training sets.

Conclusion 26  Best performance is based on k-means clustering.  Easy and fast.  No hypermeters to tune.  One layer network can get good result.  Using more features and dense extraction.

Reference 27 [1] Coates, Adam, Andrew Y. Ng, and Honglak Lee. "An analysis of single-layer networks in unsupervised feature learning." International Conference on Artificial Intelligence and Statistics. 2011. [2]http://ace.cs.ohio.edu/~razvan/courses/dl6900/index.htmlhttp://ace.cs.ohio.edu/~razvan/courses/dl6900/index.html [3]A. Krizhevsky. Learning multiple layers of features form Tiny Images. Master’s thesis, Dept. of Comp. Sci., University of Toronto, 2009 [4] LeCun, Yann, Fu Jie Huang, and Leon Bottou. "Learning methods for generic object recognition with invariance to pose and lighting." Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on. Vol. 2. IEEE, 2004. [5] http://cs.stanford.edu/~acoates/stl10 http://cs.stanford.edu/~acoates/stl10 [6] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object recognition?." Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009. [7] Goh, Hanlin, Nicolas Thome, and Matthieu Cord. "Biasing restricted Boltzmann machines to manipulate latent selectivity and sparsity." NIPS workshop on deep learning and unsupervised feature learning. 2010.

THANK YOU ! 28

AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/2014 1.

Similar presentations

Presentation on theme: "AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/2014 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/2014 1.

Similar presentations

Presentation on theme: "AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/2014 1."— Presentation transcript:

Similar presentations

About project

Feedback