Image classification by sparse coding.

Slides:



Advertisements
Similar presentations
Joint Face Alignment The Recognition Pipeline
Advertisements

Scalable Learning in Computer Vision
Optimizing and Learning for Super-resolution
Advanced topics.
Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University.
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.
HMAX Models Architecture Jim Mutch March 31, 2010.
CS395: Visual Recognition Spatial Pyramid Matching Heath Vinicombe The University of Texas at Austin 21 st September 2012.
1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.
Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Learning sparse representations to restore, classify, and sense images and videos Guillermo Sapiro University of Minnesota Supported by NSF, NGA, NIH,
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
* * Joint work with Michal Aharon Freddy Bruckstein Michael Elad
Nonlinear Unsupervised Feature Learning How Local Similarities Lead to Global Coding Amirreza Shaban.
Learning Convolutional Feature Hierarchies for Visual Recognition
Image Super-Resolution Using Sparse Representation By: Michael Elad Single Image Super-Resolution Using Sparse Representation Michael Elad The Computer.
EE 7730 Image Segmentation.
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Quaternion Colour Texture
Genetic Algorithms  An example real-world application.
Efficient Sparse Coding Algorithms
Image Denoising via Learned Dictionaries and Sparse Representations
Supervised Distance Metric Learning Presented at CMU’s Computer Vision Misc-Read Reading Group May 9, 2007 by Tomasz Malisiewicz.
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Image Denoising and Inpainting with Deep Neural Networks Junyuan Xie, Linli Xu, Enhong Chen School of Computer Science and Technology University of Science.
Radial Basis Function Networks
Image Classification using Sparse Coding: Advanced Topics
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Face Detection using the Viola-Jones Method
Computer vision.
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
Bag of Visual Words for Image Representation & Visual Search Jianping Fan Dept of Computer Science UNC-Charlotte.
Non Negative Matrix Factorization
Large-scale Deep Unsupervised Learning using Graphics Processors
Presented by: Mingyuan Zhou Duke University, ECE June 17, 2011
Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Andrew Ng Feature learning for image classification Kai Yu and Andrew Ng.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
October 16, 2014Computer Vision Lecture 12: Image Segmentation II 1 Hough Transform The Hough transform is a very general technique for feature detection.
Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.
Single Image Interpolation via Adaptive Non-Local Sparsity-Based Modeling The research leading to these results has received funding from the European.
Jianchao Yang, John Wright, Thomas Huang, Yi Ma CVPR 2008 Image Super-Resolution as Sparse Representation of Raw Image Patches.
SIFT.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Sparsity Based Poisson Denoising and Inpainting
Dimensionality Reduction
IT472: Digital Image Processing
Week III: Deep Tracking
Learning Mid-Level Features For Recognition
Systems Biology for Translational Medicine
A Simple Artificial Neuron
Classification with Perceptrons Reading:
Mean Shift Segmentation
Fitting Curve Models to Edges
Computer Vision James Hays
Announcements more panorama slots available now
KFC: Keypoints, Features and Correspondences
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
SIFT.
Visualization of computational image feature descriptors.
* * Joint work with Michal Aharon Freddy Bruckstein Michael Elad
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
Announcements more panorama slots available now
Non-Negative Matrix Factorization
Learning complex visual concepts
Presentation transcript:

Image classification by sparse coding

Feature learning problem Given a 14x14 image patch x, can represent it using 196 real numbers. Problem: Can we find a learn a better representation for this?

Unsupervised feature learning Given a set of images, learn a better way to represent image than pixels.

First stage of visual processing in brain: V1 The first stage of visual processing in the brain (V1) does “edge detection.” Green: Responds to white dot. Red: Responds to black dot. Schematic of simple cell Actual simple cell http://www.ldeo.columbia.edu/4d4/wavelets/dm.html Also used in image compression and denoising. “Gabor functions.” [Images from DeAngelis, Ohzawa & Freeman, 1995]

Learning an image representation Sparse coding (Olshausen & Field,1996) Input: Images x(1), x(2), …, x(m) (each in Rn x n) Learn: Dictionary of bases f1, f2, …, fk (also Rn x n), so that each input x can be approximately decomposed as: s.t. aj’s are mostly zero (“sparse”) Use to represent 14x14 image patch succinctly, as [a7=0.8, a36=0.3, a41 = 0.5]. I.e., this indicates which “basic edges” make up the image. [NIPS 2006, 2007]

Sparse coding illustration Natural Images Learned bases (f1 , …, f64): “Edges” Test example » 0.8 * + 0.3 * + 0.5 * x » 0.8 * f36 + 0.3 * f42 + 0.5 * f63 [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …] = [a1, …, a64] (feature representation) Compact & easily interpretable

» 0.6 * + 0.8 * + 0.4 * » 1.3 * + 0.9 * + 0.3 * More examples Represent as: [0, 0, …, 0, 0.6, 0, …, 0, 0.8, 0, …, 0, 0.4, …] Represent as: [0, 0, …, 0, 1.3, 0, …, 0, 0.9, 0, …, 0, 0.3, …] » 0.6 * + 0.8 * + 0.4 * 15 28 37 » 1.3 * + 0.9 * + 0.3 * 5 18 29 Method hypothesizes that edge-like patches are the most “basic” elements of a scene, and represents an image in terms of the edges that appear in it. Use to obtain a more compact, higher-level representation of the scene than pixels.

Digression: Sparse coding applied to audio Efficient Kernels Here is a collection of revcor filters, some of which you saw earlier, recorded from cat auditory nerve fibers For each optimized kernel function we’ll pick the revcor filter it best matches and overlaid them [click] [Evan Smith & Mike Lewicki, 2006]

Digression: Sparse coding applied to audio Efficient Kernels Here you can see that for nearly all of the optimized kernel functions, they closely match the detailed structure of an individual ANF revcor filter [click] [Evan Smith & Mike Lewicki, 2006]

Input: Images x(1), x(2), …, x(m) (each in Rn x n) Sparse coding details Input: Images x(1), x(2), …, x(m) (each in Rn x n) L1 sparsity term (causes most s to be 0) Alternating minimization: Alternately minimize with respect to fi‘s (easy) and a’s (harder).

How to scale this algorithm up? Solving for bases Early versions of sparse coding were used to learn about this many bases: 32 learned bases How to scale this algorithm up?

Input: Images x(1), x(2), …, x(m) (each in Rn x n) Sparse coding details Input: Images x(1), x(2), …, x(m) (each in Rn x n) L1 sparsity term Alternating minimization: Alternately minimize with respect to fi‘s (easy) and a’s (harder).

Feature sign search (solve for ai’s) Goal: Minimize objective with respect to ai’s. Simplified example: Suppose I tell you: Problem simplifies to: This is a quadratic function of the ai’s. Can be solved efficiently in closed form. Algorithm: Repeatedly guess sign (+, - or 0) of each of the ai’s. Solve for ai’s in closed form. Refine guess for signs.

The feature-sign search algorithm: Visualization Current guess: Starting from zero (default)

The feature-sign search algorithm: Visualization Current guess: 1: Activate a2 with “+” sign Active set ={a2} Starting from zero (default)

The feature-sign search algorithm: Visualization Current guess: 1: Activate a2 with “+” sign Active set ={a2} Starting from zero (default)

The feature-sign search algorithm: Visualization Current guess: 2: Update a2 (closed form) 1: Activate a2 with “+” sign Active set ={a2} Starting from zero (default)

The feature-sign search algorithm: Visualization 3: Activate a1 with “+” sign Active set ={a1,a2} Current guess: Starting from zero (default)

The feature-sign search algorithm: Visualization 3: Activate a1 with “+” sign Active set ={a1,a2} Current guess: 4: Update a1 & a2 (closed form) Starting from zero (default)

Before feature sign search 32 learned bases

With feature signed search

Recap of sparse coding for feature learning SIFT descriptors x(1), x(2), …, x(m) (each in R128) R128. Recap of sparse coding for feature learning Input: Images x(1), x(2), …, x(m) (each in Rn x n) Learn: Dictionary of bases f1, f2, …, fk (also Rn x n). Relate to histograms view, and so sparse-coding on top of SIFT features. Training time Test time Input: Novel image x (in Rn x n) and previously learned fi’s. Output: Representation [a1, a2, …, ak] of image x. » 0.8 * + 0.3 * + 0.5 * x » 0.8 * f36 + 0.3 * f42 + 0.5 * f63 Represent as: [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …]

x » 0.8 * f36 + 0.3 * f42 + 0.5 * f63 Sparse coding recap » 0.8 * + 0.3 * + 0.5 * [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …] x » 0.8 * f36 + 0.3 * f42 + 0.5 * f63 Much better than pixel representation. But still not competitive with SIFT, etc. Three ways to make it competitive: Combine this with SIFT. Advanced versions of sparse coding (LCC). Deep learning.

Combining sparse coding with SIFT Input: Images x(1), x(2), …, x(m) (each in Rn x n) Learn: Dictionary of bases f1, f2, …, fk (also Rn x n). SIFT descriptors x(1), x(2), …, x(m) (each in R128) R128. Test time: Given novel SIFT descriptor, x (in R128), represent as

Relate to histograms view, and so sparse-coding on top of SIFT features. Putting it together Feature representation Learning algorithm Suppose you’ve already learned bases f1, f2, …, fk. Here’s how you represent an image. Learning algorithm or x(1) x(2) x(3) … E.g., 73-75% on Caltech 101 (Yang et al., 2009, Boreau et al., 2009) … a(1) a(2) a(3)

K-means vs. sparse coding Centroid 1 Centroid 2 Centroid 3 Represent as:

K-means vs. sparse coding Intuition: “Soft” version of k-means (membership in multiple clusters). K-means vs. sparse coding K-means Sparse coding Centroid 1 Basis f1 Centroid 2 Basis f2 Centroid 3 Basis f3 Represent as: Represent as:

K-means vs. sparse coding Rule of thumb: Whenever using k-means to get a dictionary, if you replace it with sparse coding it’ll often work better.