Methods for classification and image representation

Slides:

Advertisements

Similar presentations

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Advertisements

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.

Appearance-based recognition & detection II Kristen Grauman UT-Austin Tuesday, Nov 11.

Recap: Advanced Feature Encoding Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region (0 th order.

Crash Course on Machine Learning Part IV Several slides from Derek Hoiem, and Ben Taskar.

Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem

Lecture 31: Modern object recognition

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Discriminative and generative methods for bags of features

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Sparse vs. Ensemble Approaches to Supervised Learning

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

Lecture 28: Bag-of-words models

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

CSSE463: Image Recognition Day 31 Due tomorrow night – Project plan Due tomorrow night – Project plan Evidence that you’ve tried something and what specifically.

Object Recognition by Discriminative Methods Sinisa Todorovic 1st Sino-USA Summer School in VLPR July, 2009.

Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.

Discriminative and generative methods for bags of features

Generic object detection with deformable part-based models

Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.

Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,

Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.

CS 231A Section 1: Linear Algebra & Probability Review

Classification 2: discriminative models

CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:

EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.

Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Recognition using Boosting Modified from various sources including

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 24 – Classifiers 1.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.

Svetlana Lazebnik, Cordelia Schmid, Jean Ponce

An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.

Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.

Face detection Slides adapted Grauman & Liebe’s tutorial

SVM-KNN Discriminative Nearest Neighbor Classification for Visual Category Recognition Hao Zhang, Alex Berg, Michael Maire, Jitendra Malik.

Visual Object Recognition

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.

Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Handwritten digit recognition

Project 3 Results.

Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.

The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

CSSE463: Image Recognition Day 33 This week This week Today: Classification by “boosting” Today: Classification by “boosting” Yoav Freund and Robert Schapire.

Classification II Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,

CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.

Object Detection Overview Viola-Jones Dalal-Triggs Deformable models Deep learning.

Lecture 15: Eigenfaces CS6670: Computer Vision Noah Snavely.

CS 2750: Machine Learning Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh February 17, 2016.

AdaBoost Algorithm and its Application on Object Detection Fayin Li.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Lecture IX: Object Recognition (2)

Cascade for Fast Detection

Tues April 25 Kristen Grauman UT Austin

Lit part of blue dress and shadowed part of white dress are the same color

Recognition using Nearest Neighbor (or kNN)

Digit Recognition using SVMS

Object detection as supervised classification

CS 1674: Intro to Computer Vision Scene Recognition

CS 2770: Computer Vision Intro to Visual Recognition

Object Detection.

Presentation transcript:

Methods for classification and image representation Object recognition Methods for classification and image representation

Credits Paul Viola, Michael Jones, Robust Real-time Object Detection, IJCV 04 Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 Kristen Grauman, Gregory Shakhnarovich, and Trevor Darrell, Virtual Visual Hulls: Example-Based 3D Shape Inference from Silhouettes S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. Yoav Freund Robert E. Schapire, A Short Introduction to Boosting

Object recognition What is it? Where is it? How many are there? Instance Category Something with a tail Where is it? Localization Segmentation How many are there? Object recognition is a broad, interesting and important area. We will think of it as a number of specific questions of varying difficulty. The most specific question is “It this my blue cup?”. Here we aim to recognize the very specific instance of the object. It very quickly becomes inconvenient to reason about every single object separately. We would rather recognize that an object belongs to some category. Here we aim to separate cars from airplanes, people from cats, dogs from tables etc. It’s hard to believe that we can reasonably enumerate everything in advance. When we encounter something we’ve never seen before, we still want to reason about it. We may say, it has a tail, it is made of metal. Once we can tell road from trees we would like to say where they are in the image. A simple and powerful recipe is to look at small sub-images and ask the “what is it?” question over and over again. When we know what it is, we say “here”. After that we may want to remove that object. For that we need to know, which pixels belong to the object and which don’t. We call this segmentation. The recipe of the scanning window is great for many cases, but sometimes things get tricky. To reliably count object we need to understand the relations between objects and their parts. In an image, we’d need to associate which part belong to which object and if we missed anything.

Object recognition What is it? Where is it? How many are there? Instance Category Something with a tail Where is it? Localization Segmentation How many are there? Object recognition is a broad, interesting and important area. We will think of it as a number of specific questions of varying difficulty. The most specific question is “It this my blue cup?”. Here we aim to recognize the very specific instance of the object. It very quickly becomes inconvenient to reason about every single object separately. We would rather recognize that an object belongs to some category. Here we aim to separate cars from airplanes, people from cats, dogs from tables etc. It’s hard to believe that we can reasonably enumerate everything in advance. When we encounter something we’ve never seen before, we still want to reason about it. We may say, it has a tail, it is made of metal. Once we can tell road from trees we would like to say where they are in the image. A simple and powerful recipe is to look at small sub-images and ask the “what is it?” question over and over again. When we know what it is, we say “here”. After that we may want to remove that object. For that we need to know, which pixels belong to the object and which don’t. We call this segmentation. The recipe of the scanning window is great for many cases, but sometimes things get tricky. To reliably count object we need to understand the relations between objects and their parts. In an image, we’d need to associate which part belong to which object and if we missed anything.

Face detection +1 face ? features classify -1 not face ? ? x F(x) y We slide a window over the image Extract features for each window Classify each window into face/non-face We start with a walk through a face detector.

What is a face? Eyes are dark (eyebrows+shadows) Cheeks and forehead are bright. Nose is bright Paul Viola, Michael Jones, Robust Real-time Object Detection, IJCV 04

Basic feature extraction Information type: intensity Sum over: gray and white rectangles Output: gray-white Separate output value for Each type Each scale Each position in the window FEX(im)=x=[x1,x2,…….,xn] x357 x120 Basic features for face detection work on just intensity values. They capture x629 x834 Paul Viola, Michael Jones, Robust Real-time Object Detection, IJCV 04

Face detection +1 face ? features classify -1 not face x F(x) y We slide a window over the image Extract features for each window Classify each window into face/non-face We start with a walk through a face detector.

Classification + + Examples are points in Rn Positives are separated from negatives by the hyperplane w y=sign(wTx-b) + + + + + - w - + - - We have converted all windows to n-dimensional vectors, or points in N-d - - - -

Classification + + x  Rn - data points P(x) - distribution of the data y(x) - true value of y for each x F - decision function: y=F(x, )  - parameters of F, e.g. =(w,b) We want F that makes few mistakes + + + + + - w - + - - We have converted all windows to n-dimensional vectors, or points in N-dimensional space. Not all points are equally likely and P(x) captures the distribution of the data. For each point there the correct prediction value y(x). In this - - - -

Loss function + + POSSIBLE CANCER + + + + + - w - + - - - Our decision may have severe implications L(y(x),F(x, )) - loss function How much we pay for predicting F(x,), when the true value is y(x) Classification error: Hinge loss + + + + + - w - + - - We have converted all windows to n-dimensional vectors, or points in N-dimensional space. Not all points are equally likely and P(x) captures the distribution of the data. For each point there the correct prediction value y(x). In this - ABSOLUTELY NO RISK OF CANCER - - -

Learning Total loss shows how good a function (F, ) is: Learning is to find a function to minimize the loss: How can we see all possible x?

Datasets Dataset is a finite sample {xi} from P(x) Dataset has labels {(xi,yi)} Datasets today are big to ensure the sampling is fair #images #classes #instances Caltech 256 30608 256 Pascal VOC 4340 20 10363 LabelMe 176975 ??? 414687

Overfitting A simple dataset. Two models Non-linear Linear + + + + + +

Overfitting Let’s get more data. Simple model has better generalization. + + + + + + + - + + - - + + + - + + + + + + - + + - - - - + - + - + - + - - - + - - - + - + - - + - + - - - - + - + - - + - - - - + - + - -

Overfitting As complexity increases, the model overfits the data Loss As complexity increases, the model overfits the data Training loss decreases Real loss increases We need to penalize model complexity = to regularize Real loss Training loss Model complexity

Overfitting Loss Split the dataset Training set Validation set Test set Use training set to optimize model parameters Use validation test to choose the best model Use test set only to measure the expected loss Stopping point Test set loss Validation set loss Training set loss Model complexity

Classification methods K Nearest Neighbors Decision Trees Linear SVMs Kernel SVMs Boosted classifiers

K Nearest Neighbors Memorize all training data Find K closest points to the query The neighbors vote for the label: Vote(+)=2 Vote(–)=1 + + + + + + o + - - - + - - + + - - - - - - - -

K-Nearest Neighbors Nearest Neighbors (silhouettes) Kristen Grauman, Gregory Shakhnarovich, and Trevor Darrell, Virtual Visual Hulls: Example-Based 3D Shape Inference from Silhouettes

K-Nearest Neighbors Silhouettes from other views 3D Visual hull Kristen Grauman, Gregory Shakhnarovich, and Trevor Darrell, Virtual Visual Hulls: Example-Based 3D Shape Inference from Silhouettes

Decision tree X1>2 No Yes V(+)=8 o + V(+)=2 V(-)=8 X2>1 No Yes

Decision Tree Training V(-)=57% V(+)=80% V(-)=80% V(+)=64% Partition data into pure chunks Find a good rule Split the training data Build left tree Build right tree Count the examples in the leaves to get the votes: V(+), V(-) Stop when Purity is high Data size is small At fixed level + + + + + + - - - + - - + + - - - - - - - V(-)=100%

Decision trees Stump: If xi > a Very simple “Weak classifier” 1 root 2 leaves If xi > a then positive else negative Very simple “Weak classifier” x357 x120 x629 x834 Paul Viola, Michael Jones, Robust Real-time Object Detection, IJCV 04

Support vector machines Simple decision Good classification Good generalization + + + + w + margin + - - - + - - + + - - - - - - -

Support vector machines + + + + w + + - - - + - - + + - - - Support vectors: - - - -

How do I solve the problem? It’s a convex optimization problem Can solve in Matlab (don’t) Download from the web SMO: Sequential Minimal Optimization SVM-Light http://svmlight.joachims.org/ LibSVM http://www.csie.ntu.edu.tw/~cjlin/libsvm/ LibLinear http://www.csie.ntu.edu.tw/~cjlin/liblinear/ SVM-Perf http://svmlight.joachims.org/ Pegasos http://ttic.uchicago.edu/~shai/

Linear SVM for pedestrian detection

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

centered diagonal uncentered cubic-corrected Sobel Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Histogram of gradient orientations Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

8 orientations X= 15x7 cells Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

pedestrian Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Kernel SVM Decision function is a linear combination of support vectors: Prediction is a dot product: Kernel is a function that computes the dot product of data points in some unknown space: We can compute the decision without knowing the space:

Useful kernels Linear! RBF Histogram intersection Pyramid match

Histogram intersection +1 Assign to texture cluster Count S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories.

(Spatial) Pyramid Match S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories.

Boosting Weak classifier Weak learner builds weak classifiers Classifier that is slightly better than random guessing Weak learner builds weak classifiers

Boosting Start with uniform distribution Iterate: Get a weak classifier fk Compute it’s 0-1 error Take Update distribution Output the final “strong” classifier Yoav Freund Robert E. Schapire, A Short Introduction to Boosting

Face detection +1 face ? features classify -1 not face x F(x) y We slide a window over the image Extract features for each window Classify each window into face/non-face We start with a walk through a face detector.

Face detection Use haar-like features X234>1.3 -1 Non-face +1 Face Yes No Use haar-like features Use decision stumps as week classifiers Use boosting to build a strong classifier Use sliding window to detect the face x120 x357 x629 x834