Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July-23-2010.

Slides:



Advertisements
Similar presentations
Shape Matching and Object Recognition using Low Distortion Correspondence Alexander C. Berg, Tamara L. Berg, Jitendra Malik U.C. Berkeley.
Advertisements

Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.
Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.
Object Recognition with Features Inspired by Visual Cortex T. Serre, L. Wolf, T. Poggio Presented by Andrew C. Gallagher Jan. 25, 2007.
Evaluating Color Descriptors for Object and Scene Recognition Koen E.A. van de Sande, Student Member, IEEE, Theo Gevers, Member, IEEE, and Cees G.M. Snoek,
November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.
Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.
Presented by Relja Arandjelović The Power of Comparative Reasoning University of Oxford 29 th November 2011 Jay Yagnik, Dennis Strelow, David Ross, Ruei-sung.
Patch to the Future: Unsupervised Visual Prediction
1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.
Detecting Categories in News Video Using Image Features Slav Petrov, Arlo Faria, Pascal Michaillat, Alex Berg, Andreas Stolcke, Dan Klein, Jitendra Malik.
Human Action Recognition by Learning Bases of Action Attributes and Parts.
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Transferable Dictionary Pair based Cross-view Action Recognition Lin Hong.
Nonlinear Unsupervised Feature Learning How Local Similarities Lead to Global Coding Amirreza Shaban.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Image classification by sparse coding.
Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA
Learning Convolutional Feature Hierarchies for Visual Recognition
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.
On the Relationship between Visual Attributes and Convolutional Networks Paper ID - 52.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Lecture 28: Bag-of-words models
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Distributed Representations of Sentences and Documents
Image Classification using Sparse Coding: Advanced Topics
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Exercise Session 10 – Image Categorization
Multiclass object recognition
Technion - Israel Institute of Technology Department of Electrical Engineering Advanced Topics in Computer Vision Course Presentation By Stav Shapiro.
CS 395 T Real-Time Graphics Architectures, Algorithms, and Programming Systems Spring’03 Vector Quantization for Texture Compression Qiu Wu Dept. of ECE.
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
Presented by Tienwei Tsai July, 2005
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.
Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.
Locality-constrained Linear Coding for Image Classification
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Lecture 08 27/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Recognition Using Visual Phrases
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
CS654: Digital Image Analysis
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Object Recognition by Discriminative Combinations of Line Segments and Ellipses Alex Chia ^˚ Susanto Rahardja ^ Deepu Rajan ˚ Maylor Leung ˚ ^ Institute.
On Using SIFT Descriptors for Image Parameter Evaluation Authors: Patrick M. McInerney 1, Juan M. Banda 1, and Rafal A. Angryk 2 1 Montana State University,
Does one size really fit all? Evaluating classifiers in a Bag-of-Visual-Words classification Christian Hentschel, Harald Sack Hasso Plattner Institute.
Graph-based WSD の続き DMLA /7/10 小町守.
Face recognition using Histograms of Oriented Gradients
Unsupervised Learning of Video Representations using LSTMs
Bag-of-Visual-Words Based Feature Extraction
CLASSIFICATION OF TUMOR HISTOPATHOLOGY VIA SPARSE FEATURE LEARNING Nandita M. Nayak1, Hang Chang1, Alexander Borowsky2, Paul Spellman3 and Bahram Parvin1.
Learning Mid-Level Features For Recognition
Recognition using Nearest Neighbor (or kNN)
CS 1674: Intro to Computer Vision Scene Recognition
Neural networks (3) Regularization Autoencoder
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
Scalable light field coding using weighted binary images
Presentation transcript:

Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July *Figures and formulae are directly copied from the original paper

Outline Introduction Group Coding Dictionary Learning Results and Discussion

Introduction Bag-of-words document representations – Encode document by a vector of the counts of descriptors (words) – Widely used in text, image, and video processing Easy to determine a suitable word dictionary for text documents. For images and videos – No simple mapping from the raw document to descriptor counts – Require visual descriptors (color, texture, angles, and shapes) extraction – Measure descriptors at appropriate locations (regular grids, special interest points, multiple scales) – More carful design of dictionary is needed

Dictionary Construction Unsupervised vector quantization (VQ), often k- means clustering – Pro: maximally sparse per descriptor occurrence – Cons: Does not guarantee sparse coding whole image Not robust w.r.to descriptor variability regularized optimization – Encode each visual descriptor as a weighted sum of dictionary elements Mixed-norm regularizers – Take into account the structure of bags of visual descriptors in images – Presenting sets of images from a given category

Problem Statement The main goal : encode groups of instances (e.g. image patches) in terms of dictionary code words (some kind of average patches) Notations – The m’th group – the subscript m is removed for single group operation. Sub goals – Encoding ( ) – Learning a good dictionary from a set of training groups

Group Coding Given and, group coding is achieved by solving where –. – is the – balances fidelity and reconstruction complexity. Coordinate descent is applied to solve the above problem. Finally, compress into a single vector by taking p-norm of each.

Group coding Define Optimum for p=1 Optimum for p=2

Dictionary Learning Good Dictionary should balances between – Reconstruction error – Reconstruction complexity – Overall complexity relative to the given training set Seeking learning method facilitates both – induction of new dictionary words – removal of dictionary words that have low predictive power Applying Let Objective

Dictionary Learning In this paper p=2 Define auxiliary variables Define vector (appearing in the gradient of objective function) Similar to the argument in group coding, one can obtain

Experimental Setting Compare with previous sparse coding method by measuring impact on classification the PASCAL VOC (Visual Object Classes) 2007 dataset – image from 20 classes, including people, animals, vehicles and indoor objects etc. – around 2500 images for respective training and validation; 5000 images for testing. Extract local descriptors based on Gabor wavelet response at – Four orientations ( ) – Spatial scales and offsets (27 combination) The 27 (scale, offset) pairs were chosen by optimizing a previous image recognition task, unrelated to this paper.

Results and Discussion