Object Detection using Histograms of Oriented Gradients

Slides:

Advertisements

Similar presentations

Object Recognition from Local Scale-Invariant Features David G. Lowe Presented by Ashley L. Kapron.

Advertisements

Human Detection Phanindra Varma. Detection -- Overview  Human detection in static images is based on the HOG (Histogram of Oriented Gradients) encoding.

Combining Detectors for Human Hand Detection Antonio Hernández, Petia Radeva and Sergio Escalera Computer Vision Center, Universitat Autònoma de Barcelona,

Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.

Jan-Michael Frahm, Enrique Dunn Spring 2013

Histograms of Oriented Gradients for Human Detection

Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.

Ivan Laptev IRISA/INRIA, Rennes, France September 07, 2006 Boosted Histograms for Improved Object Detection.

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.

Lecture 31: Modern object recognition

LPP-HOG: A New Local Image Descriptor for Fast Human Detection Andy Qing Jun Wang and Ru Bo Zhang IEEE International Symposium.

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs CVPR 2005 Another Descriptor.

Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/10/12.

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Groups of Adjacent Contour Segments for Object Detection Vittorio Ferrari Loic Fevrier Frederic Jurie Cordelia Schmid.

Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA

Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles { y t (x) = +1 if h t (x) >  t -1 otherwise Unique.

Detecting Pedestrians by Learning Shapelet Features

EECS 274 Computer Vision Object detection. Human detection HOG features Cue integration Ensemble of classifiers ROC curve Reading: Assigned papers.

Fast intersection kernel SVMs for Realtime Object Detection

More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.

DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan.

Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

Learning Spatial Context: Can stuff help us find things? Geremy Heitz Daphne Koller April 14, 2008 DAGS Stuff (n): Material defined by a homogeneous or.

Generic object detection with deformable part-based models

Object Recognizing. Object Classes Individual Recognition.

Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS.

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Detecting Pedestrians Using Patterns of Motion and Appearance Paul Viola Microsoft Research Irfan Ullah Dept. of Info. and Comm. Engr. Myongji University.

1 Action Classification: An Integration of Randomization and Discrimination in A Dense Feature Representation Computer Science Department, Stanford University.

Visual Object Recognition

Object Detection with Discriminatively Trained Part Based Models

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Pedestrian Detection and Localization

Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting.

Face Detection Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL

Project 3 Results.

Object detection, deep learning, and R-CNNs

Histograms of Oriented Gradients for Human Detection(HOG)

Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs CVPR 2005 Another Descriptor.

Sean M. Ficht.  Problem Definition  Previous Work  Methods & Theory  Results.

CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.

Object Detection Overview Viola-Jones Dalal-Triggs Deformable models Deep learning.

Pedestrian Detection Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs CVPR ‘05 Pete Barnum March 8, 2006.

Improved Object Detection

Object Recognizing. Object Classes Individual Recognition.

More sliding window detection: Discriminative part-based models

Object Recognizing. Object Classes Individual Recognition.

Week 4: 6/6 – 6/10 Jeffrey Loppert. This week.. Coded a Histogram of Oriented Gradients (HOG) Feature Extractor Extracted features from positive and negative.

Cascade for Fast Detection

Object detection with deformable part-based models

Data Driven Attributes for Action Detection

Lit part of blue dress and shadowed part of white dress are the same color

Object detection, deep learning, and R-CNNs

Recap: Advanced Feature Encoding

Object detection as supervised classification

Introduction of Pedestrian Detection

A Tutorial on HOG Human Detection

HOGgles Visualizing Object Detection Features

An HOG-LBP Human Detector with Partial Occlusion Handling

Pedestrian Detection Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs CVPR ‘05 Pete Barnum March 8, 2006.

Pedestrian Detection Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs CVPR ‘05 Pete Barnum March 8, 2006.

ADABOOST(Adaptative Boosting)

Presentation transcript:

Object Detection using Histograms of Oriented Gradients Navneet Dalal, Bill Triggs INRIA Rhône-Alpes Grenoble, France Introduce urself, bill and Matthijs Thanks audience Introduce the topic Thanks to Matthijs Douze for volunteering to help with the experiments 7 May, 2006 Pascal VOC 2006 Workshop ECCV 2006, Graz, Austria

Talk Outline Current approaches Overall architecture Histogram of oriented gradient Description of image encoding algorithm Multi-scale detection architecture Fusion of detections at multiple scales and locations Key findings on Pascal VOC 2006 Conclusions

Motivation Current Approaches Our Approach Dense feature sets based approaches Papageorgiou & Poggio, 2000; Viola & Jones, 2001 Template or image fragments based approaches Gavrila & Philomen, 1999; Vidal-Naquet & Ullman, 2003 Models based on key points Leibe et al, 2005; Fergus et al, 2003 Our Approach Focus on creating robust encoding of images Linear SVM as classifier on normalized image windows, is reliable & fast Moving window based detector with non-maximum suppression over scale space Focus on why i opted for my approach. Before working on modelling, it is important to have robust representation. We divide image into windows and check if each window is a person or not. How we use linear SVM. Advantages of linear SVM. Describe multi-scale approach Feature space and learning method Include CMU guys and Deva, Forsyth paper

Overall Architecture Learning Phase Detection Phase Create normalised training data set Scan image at all scales and locations Encode images into feature vectors Run classifier to obtain object/non-object decisions Learn binary classifier Fuse multiple detections in 3-D position & scale space Object/Non-object decision Object detections with bounding boxes

Descriptor Processing Chain Object/Non-object Linear SVM [ ..., ..., ..., ...] Collect HOGs over detection window Contrast normalize over overlapping spatial cells Weighted vote in spatial & orientation cells Let me start with an overview of the encoding scheme algorithm. Don’t say anything think of phrase that a cell would activate neuron. Why normalize… We want invariance to color… so gradient information is necessary and we through away color information The problem in detecting human is that appearance and pose changes lot. We want invariance to small movements. So spatial binning is required. David Lowe’s SIFT features are very good in describing people. But key here is because of different clothng, there are no interest points. Using the lessons learned from it, we do orientation binning. One can put a linear SVM at this stage. But I will show that it is a bad idea. We want contrast normalization. One may ask why we need overlapping. It is the same information being copied over, but key is with different normalization. Overlapping helps encode same information in multiple ways and brings invariance to small displacements of silhouttes. We call it HOG. Actual implementation differs bit. We perform steps 1—3 over the whole image. Compute gradients Gamma compression Image Window

HOG Descriptors Parameters Gradient scale R-HOG/SIFT Orientation bins HOG: Histogram of Oriented Gradients Parameters Gradient scale Orientation bins Block overlap area R-HOG/SIFT Cell Schemes RGB or Lab, Color/gray- space Block normalization L2-hys, or L1-sqrt, Block HOGs can have any shape, like Lowe’s SIFT or more psychological inspired 3D version of shape context of Belongie et al. The log-polar shape of CHOG are motivated from human fovea system – where there are more cells at center and resolution decreases at the peripheries. Even in RHOG, we find that putting a Gaussian weight on top of block help increase the perform. But HOG has lot of parameters… that is hard to tune in. Surprisingly, our experience shows we only need to vary some – even if we change the object class or our detection window size. We take inspiration from Biological… C-HOG Center bin

Fine grained features improve performance Lessons on HOGs No gradient smoothing, [1 0 -1] derivative mask Use gradient magnitude (no thresholding) Orientation voting into fine bins (20˚ wide bins) Spatial voting into coarser bins Strong local normalization Use overlapping blocks +1 -1 Fine grained features improve performance Point 4 and 1st are opposite of what we do in Haar wavelets. This provides significant performance gain. It may seem lots of parameters, but our experience on humans, faces, and other object classes suggest that only normalization and size of blocks/cells need to be tested. Out of this, the object size already provides a fair idea of size of the object.  Have 1-2 order lower false positives than other descriptors  Slower than integral images of Viola & Jones, 2001 N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection. CVPR, 2005

Multi-Scale Detection Clip Detection Score After dense multi-scale scan of detection window Map each detection to 3D [x,y,scale] space x y s (in log) Now we need to get rid of multiple detections. How can we do it in principled manner? Tell about hypothesis… This leads us to kernel density estimates… We create 3d log space.. Put a point dependent sigma pose the problem as mode detection of this weighted density estimates… Final detections Apply robust mode detection, like mean shift

Performance Evaluation Transformation functions Scale-space pyramid steps Hard clipping Sigmoid mapping Hard clipping is better than sigmoid mapping Fine scale transition is very important

Overall robust non-maximum suppression is important Effect of Smoothing Spatial smoothing proportional to window size performs best Relatively independent to smoothing across scales Detector’s normalized image window size Detector’s response at the given scale level Overall robust non-maximum suppression is important Classifier responses for a dense scan at a scale

Overall Performance Recall-precision on INRIA person database R/C-HOG have 1-2 order lower false positives than other descriptors

Descriptor Cues: Motorbikes Average gradients Weighted pos wts Weighted neg wts Dominant pos orientations Dominant neg orientations Input window Detection Examples

Key Descriptor Parameters Class Window Size Avg. Size # of Orient- ation Bins Orientat- ion Range Gamma Compre- ssion Normal- isation Method Person 64×128 Height 96 9 0˚-180˚ √RGB L2-Hys Car 104×56 Height 48 18 0˚-360˚ L1-Sqrt Bus 120×80 Height 64 Motorbike Width 112 Bicycle 104×64 Width 96 Cow 128×80 Width 56 Sheep 104×60 Height 56 Horse RGB Cat 96×56 Dog Mention that bar helps for persons, motorbikes, bicycles

Conclusions Contributions Future Work Robust feature encoding for object detection Gives good performance for variety of object classes Real time detection is possible Future Work Part based detector for handling partial occlusions Incorporate texture and color descriptors into the framework One single optimization phase based on AdaBoost to learn most relevant descriptors

Thank You

Descriptor Cues: Cars Detection Examples Average gradients Weighted pos wts Weighted neg wts Detection Examples

Effect of Parameters Gradient smoothing, σ Orientation bins, β Using simple smoothed gradients and many orientations helps! Gradient scale 3→ 0 ⇒ false positives drop by 10 times Orientation bins 45˚→ 20˚ ⇒ false positives drop by 10 times

... Continued Normalization method Block overlap Strong local normalization is essential Overlapping block increases performance, but descriptor size increases

Effect of Block and Cell Size 64 128 Trade off between need for local spatial invariance and need for finer spatial resolution

Descriptor Cues: Persons Input example Average gradients Weighted pos wts Weighted neg wts Outside-in weights Most important cues are head, shoulder, leg silhouettes Vertical gradients inside a person are counted as negative Overlapping blocks just outside the contour are most important Of course we want to know what is going behind the scenes… On left, we have an example window. Then average gradient…… In some separate tests, not shown here, we find that if we reduce the background information, the detector performance drops.