Presented by: Mingyuan Zhou Duke University, ECE June 17, 2011

Slides:



Advertisements
Similar presentations
Improving the Fisher Kernel for Large-Scale Image Classification Florent Perronnin, Jorge Sanchez, and Thomas Mensink, ECCV 2010 VGG reading group, January.
Advertisements

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Scalable Learning in Computer Vision
Advanced topics.
Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University.
Limin Wang, Yu Qiao, and Xiaoou Tang
Machine learning continued Image source:
Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Nonlinear Unsupervised Feature Learning How Local Similarities Lead to Global Coding Amirreza Shaban.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Learning Convolutional Feature Hierarchies for Visual Recognition
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic
Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010 Presenter: Yunchao Gong Dept. Computer Science, UNC Chapel Hill.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Implementing a reliable neuro-classifier
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Unsupervised Learning
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
K-means Based Unsupervised Feature Learning for Image Recognition Ling Zheng.
AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
Clustering & Dimensionality Reduction 273A Intro Machine Learning.
Radial-Basis Function Networks
Radial Basis Function Networks
Image Classification using Sparse Coding: Advanced Topics
What is the Best Multi-Stage Architecture for Object Recognition Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
Cs: compressed sensing
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
A shallow introduction to Deep Learning
Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.
Transfer Learning for Image Classification Group No.: 15 Group member : Feng Cai Sauptik Dhar Sauptik.
Nonlinear Learning Using Local Coordinate Coding K. Yu, T. Zhang and Y. Gong, NIPS 2009 Improved Local Coordinate Coding Using Local Tangents K. Yu and.
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
Clustering Unsupervised learning introduction Machine Learning.
A Weighted Average of Sparse Representations is Better than the Sparsest One Alone Michael Elad and Irad Yavneh SIAM Conference on Imaging Science ’08.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin, South Korea Copyright © solarlits.com.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
CSC2535: Lecture 4: Autoencoders, Free energy, and Minimum Description Length Geoffrey Hinton.
CLASSIFICATION OF TUMOR HISTOPATHOLOGY VIA SPARSE FEATURE LEARNING Nandita M. Nayak1, Hang Chang1, Alexander Borowsky2, Paul Spellman3 and Bahram Parvin1.
Learning Mid-Level Features For Recognition
Article Review Todd Hricik.
Training Techniques for Deep Neural Networks
Structure learning with deep autoencoders
Unsupervised Learning and Autoencoders
Using Transductive SVMs for Object Classification in Images
CNNs and compressive sensing Theoretical analysis
Learning with information of features
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
Neuro-Computing Lecture 4 Radial Basis Function Network
Goodfellow: Chapter 14 Autoencoders
Department of Computer Science Ben-Gurion University of the Negev
Learning and Memorization
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Presented by: Mingyuan Zhou Duke University, ECE June 17, 2011 An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization Adam Coates and Andrew Y. Ng ICML 2011 Presented by: Mingyuan Zhou Duke University, ECE June 17, 2011

Outline Introduction Unsupervised feature learning Parameter setting An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates, Honglak Lee and Andrew Y. Ng AISTATS 2011 Outline Introduction Unsupervised feature learning Parameter setting Experiments on CIFAR, NORB and STL Conclusions

Training/testing pipeline Feature learning: Extract random patches from unlabeled training images Apply a pre-processing stage to the patches Learn a feature-mapping using an unsupervised learning algorithm Feature extraction and classification: Extract features from equally spaced sub-patches covering the input image Pool features together over regions of the input image to reduce the number of feature values Train a linear classifier to predict the labels given the feature vectors

Feature learning Pre-processing of patches Unsupervised learning Mean subtraction and scale normalization Whitening Unsupervised learning Sparse auto-encoder Sparse restricted Boltzmann machine K-means clustering Hard-assignment: Soft-assignment:

Feature learning Unsupervised learning Sparse auto-encoder Sparse restricted Boltzmann machine K-means clustering Gaussian mixture model (GMM)

Feature extraction and classification

Experiments and analysis Model parameters: Whitening? Number of features K Stride s (all the overlapping patches are used when s = 1) Receptive field (patch) size w

Experiments and analysis

Experiments and analysis

Experiments and analysis

Experiments and analysis

Conclusions Mean-subtraction, scale normalization and Whitening + Large K + Small s + Right patch size w + Simple feature learning algorithm (soft K-means) = State-of-the-art results on CIFAR-10 and NORB

The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization Adam Coates and Andrew Y. Ng ICML 2011 Outline Motivations and contributions Review of dictionary learning algorithms Review of sparse coding algorithms Experiments on CIFAR, NORB and Caltech101 Conclusions

Main contributions

Dictionary learning algorithms Sparse coding (SC) Orthogonal matching pursuit (OMP-k) Sparse RBMs and sparse auto-encoders (RBM, SAE) Randomly sampled patches (RP) Random weights (R)

Sparse coding algorithms Sparse coding (SC) OMP-k Soft threshold (T) “Natural” encoding

Experimental results

Experimental results

Comments on dictionary learning The results have shown that the main advantage of sparse coding is as an encoder, and that the choice of basis functions has little effect on performance. The main value of the dictionary is to provide a highly overcomplete basis on which to project the data before applying an encoder, but that the exact structure of these basis functions is less critical than the choice of encoding All that appears necessary is to choose the basis to roughly tile the space of the input data. This increases the chances that a few basis vectors will be near to an input, yielding a large activation that is useful for identifying the location of the input on the data manifold later This explains why vector quantization is quite capable of competing with more complex algorithms: it simply ensures that there is at least one dictionary entry near any densely populated areas of the input space. We expect that learning is more crucial if we use small dictionaries, since we would then need to be more careful to pick basis functions that span the space of inputs equitably.

Conclusions The main power of sparse coding is not that it learns better basis functions. In fact, we discovered that any reasonable tiling of the input space (including randomly chosen input patches) is sufficient to obtain high performance on any of the three very different recognition problems that we tested. Instead, the main strength of sparse coding appears to arise from its non-linear encoding scheme, which was almost universally effective in our experiments—even with no training at all. Indeed, it was difficult to beat this encoding on the Caltech 101 dataset. In many cases, however, it was possible to do nearly as well using only a soft threshold function, provided we have sufficient labeled data. Overall, we conclude that most of the performance obtained in our results is a function of the choice of architecture and encoding, suggesting that these are key areas for further study and improvements.