Presentation is loading. Please wait.

Presentation is loading. Please wait.

Object Detection using Histograms of Oriented Gradients

Similar presentations


Presentation on theme: "Object Detection using Histograms of Oriented Gradients"— Presentation transcript:

1 Object Detection using Histograms of Oriented Gradients
Navneet Dalal, Bill Triggs INRIA Rhône-Alpes Grenoble, France Introduce urself, bill and Matthijs Thanks audience Introduce the topic Thanks to Matthijs Douze for volunteering to help with the experiments 7 May, 2006 Pascal VOC 2006 Workshop ECCV 2006, Graz, Austria

2 Talk Outline Current approaches Overall architecture
Histogram of oriented gradient Description of image encoding algorithm Multi-scale detection architecture Fusion of detections at multiple scales and locations Key findings on Pascal VOC 2006 Conclusions

3 Motivation Current Approaches Our Approach
Dense feature sets based approaches Papageorgiou & Poggio, 2000; Viola & Jones, 2001 Template or image fragments based approaches Gavrila & Philomen, 1999; Vidal-Naquet & Ullman, 2003 Models based on key points Leibe et al, 2005; Fergus et al, 2003 Our Approach Focus on creating robust encoding of images Linear SVM as classifier on normalized image windows, is reliable & fast Moving window based detector with non-maximum suppression over scale space Focus on why i opted for my approach. Before working on modelling, it is important to have robust representation. We divide image into windows and check if each window is a person or not. How we use linear SVM. Advantages of linear SVM. Describe multi-scale approach Feature space and learning method Include CMU guys and Deva, Forsyth paper

4 Overall Architecture Learning Phase Detection Phase
Create normalised training data set Scan image at all scales and locations Encode images into feature vectors Run classifier to obtain object/non-object decisions Learn binary classifier Fuse multiple detections in 3-D position & scale space Object/Non-object decision Object detections with bounding boxes

5 Descriptor Processing Chain
Object/Non-object Linear SVM [ ..., ..., ..., ] Collect HOGs over detection window Contrast normalize over overlapping spatial cells Weighted vote in spatial & orientation cells Let me start with an overview of the encoding scheme algorithm. Don’t say anything think of phrase that a cell would activate neuron. Why normalize… We want invariance to color… so gradient information is necessary and we through away color information The problem in detecting human is that appearance and pose changes lot. We want invariance to small movements. So spatial binning is required. David Lowe’s SIFT features are very good in describing people. But key here is because of different clothng, there are no interest points. Using the lessons learned from it, we do orientation binning. One can put a linear SVM at this stage. But I will show that it is a bad idea. We want contrast normalization. One may ask why we need overlapping. It is the same information being copied over, but key is with different normalization. Overlapping helps encode same information in multiple ways and brings invariance to small displacements of silhouttes. We call it HOG. Actual implementation differs bit. We perform steps 1—3 over the whole image. Compute gradients Gamma compression Image Window

6 HOG Descriptors Parameters Gradient scale R-HOG/SIFT Orientation bins
HOG: Histogram of Oriented Gradients Parameters Gradient scale Orientation bins Block overlap area R-HOG/SIFT Cell Schemes RGB or Lab, Color/gray- space Block normalization L2-hys, or L1-sqrt, Block HOGs can have any shape, like Lowe’s SIFT or more psychological inspired 3D version of shape context of Belongie et al. The log-polar shape of CHOG are motivated from human fovea system – where there are more cells at center and resolution decreases at the peripheries. Even in RHOG, we find that putting a Gaussian weight on top of block help increase the perform. But HOG has lot of parameters… that is hard to tune in. Surprisingly, our experience shows we only need to vary some – even if we change the object class or our detection window size. We take inspiration from Biological… C-HOG Center bin

7 Fine grained features improve performance
Lessons on HOGs No gradient smoothing, [1 0 -1] derivative mask Use gradient magnitude (no thresholding) Orientation voting into fine bins (20˚ wide bins) Spatial voting into coarser bins Strong local normalization Use overlapping blocks +1 -1 Fine grained features improve performance Point 4 and 1st are opposite of what we do in Haar wavelets. This provides significant performance gain. It may seem lots of parameters, but our experience on humans, faces, and other object classes suggest that only normalization and size of blocks/cells need to be tested. Out of this, the object size already provides a fair idea of size of the object.  Have 1-2 order lower false positives than other descriptors  Slower than integral images of Viola & Jones, 2001 N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection. CVPR, 2005

8 Multi-Scale Detection
Clip Detection Score After dense multi-scale scan of detection window Map each detection to 3D [x,y,scale] space x y s (in log) Now we need to get rid of multiple detections. How can we do it in principled manner? Tell about hypothesis… This leads us to kernel density estimates… We create 3d log space.. Put a point dependent sigma pose the problem as mode detection of this weighted density estimates… Final detections Apply robust mode detection, like mean shift

9 Performance Evaluation
Transformation functions Scale-space pyramid steps Hard clipping Sigmoid mapping Hard clipping is better than sigmoid mapping Fine scale transition is very important

10 Overall robust non-maximum suppression is important
Effect of Smoothing Spatial smoothing proportional to window size performs best Relatively independent to smoothing across scales Detector’s normalized image window size Detector’s response at the given scale level Overall robust non-maximum suppression is important Classifier responses for a dense scan at a scale

11 Overall Performance Recall-precision on INRIA person database
R/C-HOG have 1-2 order lower false positives than other descriptors

12 Descriptor Cues: Motorbikes
Average gradients Weighted pos wts Weighted neg wts Dominant pos orientations Dominant neg orientations Input window Detection Examples

13 Key Descriptor Parameters
Class Window Size Avg. Size # of Orient- ation Bins Orientat- ion Range Gamma Compre- ssion Normal- isation Method Person 64×128 Height 96 9 0˚-180˚ √RGB L2-Hys Car 104×56 Height 48 18 0˚-360˚ L1-Sqrt Bus 120×80 Height 64 Motorbike Width 112 Bicycle 104×64 Width 96 Cow 128×80 Width 56 Sheep 104×60 Height 56 Horse RGB Cat 96×56 Dog Mention that bar helps for persons, motorbikes, bicycles

14 Conclusions Contributions Future Work
Robust feature encoding for object detection Gives good performance for variety of object classes Real time detection is possible Future Work Part based detector for handling partial occlusions Incorporate texture and color descriptors into the framework One single optimization phase based on AdaBoost to learn most relevant descriptors

15 Thank You

16 Descriptor Cues: Cars Detection Examples Average gradients
Weighted pos wts Weighted neg wts Detection Examples

17 Effect of Parameters Gradient smoothing, σ Orientation bins, β
Using simple smoothed gradients and many orientations helps! Gradient scale 3→ 0 ⇒ false positives drop by 10 times Orientation bins 45˚→ 20˚ ⇒ false positives drop by 10 times

18 ... Continued Normalization method Block overlap
Strong local normalization is essential Overlapping block increases performance, but descriptor size increases

19 Effect of Block and Cell Size
64 128 Trade off between need for local spatial invariance and need for finer spatial resolution

20 Descriptor Cues: Persons
Input example Average gradients Weighted pos wts Weighted neg wts Outside-in weights Most important cues are head, shoulder, leg silhouettes Vertical gradients inside a person are counted as negative Overlapping blocks just outside the contour are most important Of course we want to know what is going behind the scenes… On left, we have an example window. Then average gradient…… In some separate tests, not shown here, we find that if we reduce the background information, the detector performance drops.


Download ppt "Object Detection using Histograms of Oriented Gradients"

Similar presentations


Ads by Google