More sliding window detection: Discriminative part-based models

Slides:



Advertisements
Similar presentations
Human Detection Phanindra Varma. Detection -- Overview  Human detection in static images is based on the HOG (Histogram of Oriented Gradients) encoding.
Advertisements

Jan-Michael Frahm, Enrique Dunn Spring 2013
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Histograms of Oriented Gradients for Human Detection
November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.
Recap: Advanced Feature Encoding Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region (0 th order.
Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.
Lecture 31: Modern object recognition
LPP-HOG: A New Local Image Descriptor for Fast Human Detection Andy Qing Jun Wang and Ru Bo Zhang IEEE International Symposium.
Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.
Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs CVPR 2005 Another Descriptor.
Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/10/12.
Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
Detecting Pedestrians by Learning Shapelet Features
More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.
Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.
Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.
Robust and large-scale alignment Image from
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Object Recognizing We will discuss: Features Classifiers Example ‘winning’ system.
Object Detection using Histograms of Oriented Gradients
Scale Invariant Feature Transform (SIFT)
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
Generic object detection with deformable part-based models
Object Recognizing. Object Classes Individual Recognition.
Object Recognizing. Recognition -- topics Features Classifiers Example ‘winning’ system.
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
Marco Pedersoli, Jordi Gonzàlez, Xu Hu, and Xavier Roca
Visual Object Recognition
Object Detection with Discriminatively Trained Part Based Models
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Pedestrian Detection and Localization
Deformable Part Model Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 11 st, 2013.
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.
Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.
Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting.
Object detection, deep learning, and R-CNNs
Histograms of Oriented Gradients for Human Detection(HOG)
Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs CVPR 2005 Another Descriptor.
Sean M. Ficht.  Problem Definition  Previous Work  Methods & Theory  Results.
CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.
Object Detection Overview Viola-Jones Dalal-Triggs Deformable models Deep learning.
A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )
Object Recognizing. Object Classes Individual Recognition.
776 Computer Vision Jan-Michael Frahm Spring 2012.
CS 2750: Machine Learning Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh February 17, 2016.
Object Recognizing. Object Classes Individual Recognition.
Fine-grained Fine-grained Recognition( 细粒度分类 ) 沈志强.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Cascade for Fast Detection
CS262: Computer Vision Lect 09: SIFT Descriptors
Object detection with deformable part-based models
Lecture 13: Feature Descriptors and Matching
Lit part of blue dress and shadowed part of white dress are the same color
Object detection, deep learning, and R-CNNs
Recap: Advanced Feature Encoding
Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.
Feature description and matching
Object detection as supervised classification
Introduction of Pedestrian Detection
A Tutorial on HOG Human Detection
HOGgles Visualizing Object Detection Features
An HOG-LBP Human Detector with Partial Occlusion Handling
“The Truth About Cats And Dogs”
Brief Review of Recognition + Context
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
Feature descriptors and matching
Presentation transcript:

More sliding window detection: Discriminative part-based models Many slides based on P. Felzenszwalb, E. Seemann, N. Dalal

Challenge: Generic object detection

Gradient Histograms Have become extremely popular and successful in the vision community Avoid hard decisions compared to edge based features Examples: SIFT (Scale-Invariant Image Transform) HOG (Histogram of Oriented Gradients)

Computing gradients One sided: Two sided: Filter masks in x-direction Magnitude: Orientation: -1 1 -1 1 Achtung: bogenmass Treppenfunktion mit Beispiel, wie dort der gradient aussieht Dr. Edgar Seemann

Histograms Gradient histograms measure the orientations and strengths of image gradients within an image region

Histograms of Oriented Gradients Gradient-based feature descriptor developed for people detection Authors: Dalal&Triggs (INRIA Grenoble, France) Global descriptor for the complete body Very high-dimensional Typically ~4000 dimensions

HOG Very promising results on challenging data sets Phases Learning Phase Detection Phase

Detector: Learning Phase Set of cropped images containing pedestrians in normal environment Global descriptor rather than local features Using linear SVM

Detector: Detection Phase Sliding window over each scale Simple SVM prediction

Descriptor Compute gradients on an image region of 64x128 pixels Compute histograms on ‘cells’ of typically 8x8 pixels (i.e. 8x16 cells) Normalize histograms within overlapping blocks of cells (typically 2x2 cells, i.e. 7x15 blocks) Concatenate histograms

HOG Descriptors Parameters Gradient scale Orientation bins HOG: Histogram of Oriented Gradients Parameters Gradient scale Orientation bins Block overlap area R-HOG/SIFT Cell Schemes RGB or Lab, Color/gray- space Block normalization L2-hys, or L1-sqrt, Block L2-hys, L2 normalize -> clip v va HOGs can have any shape, like Lowe’s SIFT or more psychological inspired 3D version of shape context of Belongie et al. The log-polar shape of CHOG are motivated from human fovea system – where there are more cells at center and resolution decreases at the peripheries. Even in RHOG, we find that putting a Gaussian weight on top of block help increase the perform. But HOG has lot of parameters… that is hard to tune in. Surprisingly, our experience shows we only need to vary some – even if we change the object class or our detection window size. We take inspiration from Biological… C-HOG Center bin

Gradients Convolution with [-1 0 1] filters No smoothing Compute gradient magnitude+direction Per pixel: color channel with greatest magnitude -> final gradient

Cell histograms 9 bins for unsigned gradient orientations (0-180 degrees) vote is gradient magnitude Interpolated trilinearly: Bilinearly into spatial cells Linearly into orientation bins

Linear and Bilinear interpolation for subsampling Draw this on the black board

Histogram interpolation example θ=85 degrees Distance to bin centers Bin 70 -> 15 degrees Bin 90 -> 5 degress Ratios: 5/20=1/4, 15/20=3/4 Left: 2, Right: 6 Top: 2, Bottom: 6 Ratio Left-Right: 6/8, 2/8 Ratio Top-Bottom: 6/8, 2/8 Ratios: 6/8*6/8 = 36/64 = 9/16 6/8*2/8 = 12/64 = 3/16 2/8*6/8 = 12/64 = 3/16 2/8*2/8 = 4/64 = 1/16

Blocks Overlapping blocks of 2x2 cells Cell histograms are concatenated and then normalized Note that each cell has several occurrences with different normalization in final descriptor Normalization Different norms possible (L2, L2hys etc.) We add a normalization epsilon to avoid division by zero Shortly explain l2hys

Blocks Gradient magnitudes are weighted according to a Gaussian spatial window Distant gradients contribute less to the histogram

Final Descriptor Concatenation of Blocks Visualization:

Engineering Developing a feature descriptor requires a lot of engineering Testing of parameters (e.g. size of cells, blocks, number of cells in a block, size of overlap) Normalization schemes (e.g. L1, L2-Norms etc., gamma correction, pixel intensity normalization) An extensive evaluation of different choices was performed, when the descriptor was proposed It’s not only the idea, but also the engineering effort

Effect of Block and Cell Size 64 128 Trade off between need for local spatial invariance and need for finer spatial resolution

Descriptor Cues: Persons Outside-in weights Input example Average gradients Weighted pos wts Weighted neg wts Most important cues are head, shoulder, leg silhouettes Vertical gradients inside a person are counted as negative Overlapping blocks just outside the contour are most important Of course we want to know what is going behind the scenes… On left, we have an example window. Then average gradient…… In some separate tests, not shown here, we find that if we reduce the background information, the detector performance drops.

Training Set More than 2000 positive & 2000 negative training images (96x160px) Carefully aligned and resized Wide variety of backgrounds

Model learning Simple linear SVM on top of the HOG Features Fast (one inner product per evaluation window) Hyper plane normal vector: with yi in {0,1} and xi the support vectors Decision: Slightly better results can be achieved by using a SVM with a Gaussian kernel But considerable increase in computation time Show on blackboard how normal svm equation with kernel is -> then linear -> then decision

Result on INRIA database Test Set contains 287 images Resolution ~640x480 589 persons Avg. size: 288 pixels

Demo

Fall 2015 Computer Vision

Last Class: Pedestrian detection Features: Histograms of oriented gradients (HOG) Partition image into 8x8 pixel blocks and compute histogram of gradient orientations in each block Learn a pedestrian template using a linear support vector machine At test time, convolve feature map with template HOG feature map Template Detector response map N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, CVPR 2005

Discriminative part-based models Root filter Part filters Deformation weights P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detection with Discriminatively Trained Part Based Models, PAMI 32(9), 2010

Object hypothesis Multiscale model: the resolution of part filters is twice the resolution of the root

Scoring an object hypothesis The score of a hypothesis is the sum of filter scores minus the sum of deformation costs Subwindow features Displacements Filters Deformation weights

Scoring an object hypothesis The score of a hypothesis is the sum of filter scores minus the sum of deformation costs Pictorial structures Subwindow features Displacements Filters Deformation weights Matching cost Deformation cost

Scoring an object hypothesis The score of a hypothesis is the sum of filter scores minus the sum of deformation costs Subwindow features Displacements Filters Deformation weights Concatenation of filter and deformation weights Concatenation of subwindow features and displacements

Detection Define the score of each root filter location as the score giving the best part placements:

Detection Define the score of each root filter location as the score given the best part placements: Efficient computation: generalized distance transforms For each “default” part location, find the best-scoring displacement Head filter responses Distance transform Head filter

Detection

Matching result

Training Training data consists of images with labeled bounding boxes Need to learn the filters and deformation parameters

Training Classifier has the form w are model parameters, z are latent hypotheses (z = (c,p0,...,pnc) parameters for model component C), represents the object configuration Latent SVM training: Initialize w and iterate: Fix w and find the best z for each training example (detection) Fix z and solve for w (standard SVM training) Issue: too many negative examples Do “data mining” to find “hard” negatives z object config An object hypothesis for a mixture model specifies a mixture component, 1 ≤ c ≤ m, and a location for each filter of Mc, z = (c,p0,...,pnc). Here nc is the number of parts in Mc. The score of this hypothesis is the score of the hypothesis z′ = (p0,...,pnc) for the c-th model component.

Car model Component 1 Component 2

Car detections

Person model

Person detections

Cat model

Cat detections

More detections

Quantitative results (PASCAL 2008) 7 systems competed in the 2008 challenge Out of 20 classes, first place in 7 classes and second place in 8 classes Bicycles Person Bird Proposed approach Proposed approach Proposed approach