Learning Visual Bits with Direct Feature Selection Joel Jurik 1 and Rahul Sukthankar 2,3 1 University of Central Florida 2 Intel Research Pittsburgh 3.

Slides:



Advertisements
Similar presentations
Adjusting Active Basis Model by Regularized Logistic Regression
Advertisements

Regionlets for Generic Object Detection
Food Recognition Using Statistics of Pairwise Local Features Shulin (Lynn) Yang University of Washington Mei Chen Intel Labs Pittsburgh Dean Pomerleau.
Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Presented by Xinyu Chang
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Detecting Faces in Images: A Survey
Clustering with k-means and mixture of Gaussian densities Jakob Verbeek December 3, 2010 Course website:
Learning Visual Similarity Measures for Comparing Never Seen Objects Eric Nowak, Frédéric Jurie CVPR 2007.
Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
1 Fast Asymmetric Learning for Cascade Face Detection Jiaxin Wu, and Charles Brubaker IEEE PAMI, 2008 Chun-Hao Chang 張峻豪 2009/12/01.
Discriminative and generative methods for bags of features
Beyond bags of features: Part-based models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Fitting: The Hough transform
Graz University of Technology, AUSTRIA Institute for Computer Graphics and Vision Fast Visual Object Identification and Categorization Michael Grabner,
A Study of Approaches for Object Recognition
A Brief Introduction to Adaboost
Multi-Class Object Recognition Using Shared SIFT Features
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Local Features and Kernels for Classification of Object Categories J. Zhang --- QMUL UK (INRIA till July 2005) with M. Marszalek and C. Schmid --- INRIA.
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
AbstractAbstract Adjusting Active Basis Model by Regularized Logistic Regression Ruixun Zhang Peking University, Department of Statistics and Probability.
Exercise Session 10 – Image Categorization
Bag of Video-Words Video Representation
Using Statistic-based Boosting Cascade Weilong Yang, Wei Song, Zhigang Qiao, Michael Fang 1.
CSE 185 Introduction to Computer Vision Pattern Recognition.
Learning Visual Similarity Measures for Comparing Never Seen Objects By: Eric Nowark, Frederic Juric Presented by: Khoa Tran.
Learning Based Hierarchical Vessel Segmentation
EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.
SVCL Automatic detection of object based Region-of-Interest for image compression Sunhyoung Han.
Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.
Jifeng Dai 2011/09/27.  Introduction  Structural SVM  Kernel Design  Segmentation and parameter learning  Object Feature Descriptors  Experimental.
Recognition using Boosting Modified from various sources including
Wei Zhang Akshat Surve Xiaoli Fern Thomas Dietterich.
Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.
Benk Erika Kelemen Zsolt
Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Fitting: The Hough transform
Distributed Representative Reading Group. Research Highlights 1Support vector machines can robustly decode semantic information from EEG and MEG 2Multivariate.
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Puzzle Solver Sravan Bhagavatula EE 638 Project Stanford ECE.
Robust Real Time Face Detection
Methods for classification and image representation
Kylie Gorman WEEK 1-2 REVIEW. CONVERTING AN IMAGE FROM RGB TO HSV AND DISPLAY CHANNELS.
Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
Hand Detection with a Cascade of Boosted Classifiers Using Haar-like Features Qing Chen Discover Lab, SITE, University of Ottawa May 2, 2006.
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
 Read Distinctive Image Features from Scale- Invariant Keypoints by Lowe  Dr. Tappen also gave a presentation on SIFT.  Helped clarify, what is going.
1 Munther Abualkibash University of Bridgeport, CT.
Does one size really fit all? Evaluating classifiers in a Bag-of-Visual-Words classification Christian Hentschel, Harald Sack Hasso Plattner Institute.
Another Example: Circle Detection
Object detection with deformable part-based models
Bag-of-Visual-Words Based Feature Extraction
Learning Mid-Level Features For Recognition
Part 3: discriminative methods
Machine Learning Week 1.
Speaker: Lingxi Xie Authors: Lingxi Xie, Qi Tian, Bo Zhang
CS 2750: Machine Learning Support Vector Machines
SIFT.
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Presentation transcript:

Learning Visual Bits with Direct Feature Selection Joel Jurik 1 and Rahul Sukthankar 2,3 1 University of Central Florida 2 Intel Research Pittsburgh 3 Robotics Institute, Carnegie Mellon Introduction Process 1. Image representation using SIFT descriptors [2]. Testing Parameters Figure 2: Examples of SIFT descriptors. 2. Randomly generate weights using a uniform distribution from to with 128 dimensions (the same as SIFT). 3. We find the linear projection using these weights and descriptors, and find the best projection and threshold using the GentleBoost algorithm [3]. References [1] L. Yang, R. Jin, R. Sukthankar, F. Jurie: Unifying Discriminative Visual Codebook Generation with Classifier Training for Object Category Recognition, CVPR 2008 [2] D. G. Lowe: Distinctive Image Features from Scale- Invariant Keypoints, IJCV 2004 [3] J. Friedman, J., Hastie, T., Tibshirani, R.: Additive Logistic Regression: A Statistical View Of Boosting, Annals of Statistics, Vol. 28, [4] P. Viola, M. Jones: Robust Real-time Object Recognition, IJCV Using GentleBoost, construct a sequence of visual bits that represent an image at a much higher level than SIFT. 5. Train SVM using the aggregates of the visual bits Training Parameters Visual bits system parameters Weight distribution: Uniform [-1000,+1000] Number of weights: 10,000 Rounds of boosting: 200 K-means parameters Number of cluster centers: 1000 Other parameters 200 training images (100 positive, 100 negative) Systems use SIFT and SVM during training 100 testing images (100 positive, 100 negative) Distinguish between airplane and non-airplane images Results Figure 8: Examples of images used. SystemAccuracy Visual Bits89% K-means86% Table 1: Results. Model Support Vector Machine Figure 6: Training using visual bits Repeat for multiple categories Figure 4: Find projection using weights and SIFT descriptors … Figure 3: Example of direct feature selection. Figure 7: Training using visual bits. sheep?  bus? cat?  bicycle? car? cow?  dog?  horse?  mbike? person? Figure 1: Object recognition. Problem: Popular object recognition systems separate the processes of codebook generation and classifier training. Solution: Unify the processes into a single framework by proposing a new way to learn visual bits using direct feature selection to avoid the complicated optimization framework from [1] … … Figure 5: Find best projection and threshold to assign visual bit to either 1 or 0.

Learning Visual Bits with Direct Feature Selection Joel Jurik 1 and Rahul Sukthankar 2,3 1 University of Central Florida 2 Intel Research Pittsburgh 3 Robotics Institute, Carnegie Mellon Motivation Process 1. Image representation using SIFT descriptors [2]. Testing Parameters Figure 2: Examples of SIFT features. 2. Randomly generate weights using a uniform distribution from to with 128 dimensions (the same as SIFT). 3. We then find the linear projection using these weights and descriptors, and find the best projection and threshold using the GentleBoost algorithm [3]. References [1] L. Yang, R. Jin, R. Sukthankar, F. Jurie: Unifying Discriminative Visual Codebook Generation with Classifier Training for Object Category Recognition, CVPR 2008 [2] D. G. Lowe: Distinctive Image Features from Scale- Invariant Keypoints, IJCV 2004 [3] J. Friedman, J., Hastie, T., Tibshirani, R.: Additive Logistic Regression: A Statistical View Of Boosting, Annals of Statistics, Vol. 28, [4] P. Viola, M. Jones: Robust Real-time Object Recognition, IJCV Using GentleBoost, construct a sequence of visual bits that represent an image at a much higher level than SIFT. 5. Train SVM using the aggregates of the visual bits Training Parameters Visual bits system parameters Weight distribution: Uniform [-1000,+1000] Number of weights: 10,000 Rounds of boosting: 200 K-means parameters Number of cluster centers: 1000 Other parameters 200 training images (100 positive, 100 negative) Systems use SIFT and SVM during training 100 testing images (100 positive, 100 negative) Distinguish between airplane and non-airplane images Results Figure 7: Examples of images used. SystemAccuracy Visual Bits89% K-means86% Table 1: Results. Model Support Vector Machine Figure 5: Training using visual bits Repeat for multiple categories Figure 4: Finding the linear projections and thresholds … Figure 3: Example of direct feature selection. Figure 6: Training using visual bits. sheep?  bus? cat?  bicycle? car? cow?  dog?  horse?  mbike? person? Figure 1: Object recognition. We use an object recognition system that solves the problem of separating the processes of codebook generation and classifier training by unifying them into a single framework. We propose a new way to learn visual bits using direct feature selection to avoid the complicated optimization framework from [1].

Learning Visual Bits with Direct Feature Selection Joel Jurik 1 and Dr. Rahul Sukthankar 2,3 1 University of Central Florida 2 Intel Research Pittsburgh 3 Robotics Institute, Carnegie Mellon Abstract Popular object category recognition systems currently use an approach that represents images by a bag of visual words. However, these systems can be improved in two ways using the framework of visual bits [1]. First, instead of representing each image feature by a single visual word, each feature is represented by a sequence of visual bits. Second, instead of separating the processes of codebook generation and classifier training, we unify them into a single framework. We propose a new way to learn visual bits using direct feature selection to avoid the complicated optimization framework from [1]. Our results confirm that visual bits outperform the bag of words model on object category recognition. Explanation For image representation, we use SIFT [2]. Testing Parameters Figure 1: Examples of SIFT features Once the SIFT descriptors for all images have been obtained, we randomly generate weights using a uniform distribution from to with 128 dimensions (the same as SIFT). We then find the linear projection using these weights and descriptors, and find the best projection and threshold using the GentleBoost algorithm [3]: References [1] L. Yang, R. Jin, R. Sukthankar, F. Jurie: Unifying Discriminative Visual Codebook Generation with Classifier Training for Object Category Recognition, CVPR 2008 [2] D. G. Lowe: Distinctive Image Features from Scale- Invariant Keypoints, IJCV 2004 [3] J. Friedman, J., Hastie, T., Tibshirani, R.: Additive Logistic Regression: A Statistical View Of Boosting, Annals of Statistics, Vol. 28, [4] P. Viola, M. Jones: Robust Real-time Object Recognition, IJCV 2001 Using GentleBoost, we are able to construct a sequence of visual bits that represent an image at a much higher level than SIFT. We take the aggregates of visual bits to train the SVM, which will create a model used for testing. Training Parameters Model Support Vector Machine For training the visual bits system, we use the following parameters: Weight distribution: Uniform [-1000,+1000] Number of weights: 10,000 Rounds of boosting: 200 We compare our performance, against a baseline object category recognition that uses k- means + SVM with the following parameter: Number of clusters: 1000 Each system uses the same 200 training images, in which 100 are positive, and 100 are negative. The positive images are of airplanes, and the negative images can be rhinos, cars, elephants, faces, or minarets. SIFT is used for image representation, and SVM is used during training. As you can see, the visual bits system outperforms the object recognition system using k-means clustering with the task of distinguishing between airplane and non- airplane images. Object category recognition using multiple categories using visual bits is currently being implemented. For testing, we use a separate set of images from the training set that consists of 100 images, where 50 are positive and 50 are negative. The task at hand is to distinguish between an airplane and non-airplane images. Results Figure 3: Training using visual bits Figure 2: GentleBoost algorithmFigure 4: Examples of images used. SystemAccuracy Visual Bits89% K-means86% Figure 5: Results. Visual Bits system shows improvement upon standard clustering approaches