CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.

Slides:

Advertisements

Similar presentations

Jan-Michael Frahm, Enrique Dunn Spring 2013

Advertisements

Fitting: The Hough transform. Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not.

Recap: Advanced Feature Encoding Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region (0 th order.

TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.

Lecture 31: Modern object recognition

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/10/12.

Face detection Many slides adapted from P. Viola.

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles { y t (x) = +1 if h t (x) >  t -1 otherwise Unique.

Detecting Pedestrians by Learning Shapelet Features

More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.

The Viola/Jones Face Detector (2001)

Beyond bags of features: Part-based models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Fitting: The Hough transform

Object Category Detection: Statistical Templates Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/14/15.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

1 Model Fitting Hao Jiang Computer Science Department Oct 8, 2009.

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.

Generic object detection with deformable part-based models

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Fitting: The Hough transform. Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.

Marco Pedersoli, Jordi Gonzàlez, Xu Hu, and Xavier Roca

Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.

Face detection Slides adapted Grauman & Liebe’s tutorial

Visual Object Recognition

Object Detection with Discriminatively Trained Part Based Models

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Deformable Part Model Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 11 st, 2013.

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting.

Fitting: The Hough transform

Project 3 Results.

Object detection, deep learning, and R-CNNs

Methods for classification and image representation

The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.

Object Detection Overview Viola-Jones Dalal-Triggs Deformable models Deep learning.

Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11.

Lecture 15: Eigenfaces CS6670: Computer Vision Noah Snavely.

Lecture 15: Eigenfaces CS6670: Computer Vision Noah Snavely.

Object Recognizing. Object Classes Individual Recognition.

CS 2750: Machine Learning Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh February 17, 2016.

More sliding window detection: Discriminative part-based models

Visual Object Recognition

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Recent developments in object detection

Object DetectionI Ali Taalimi 01/08/2013.

Cascade for Fast Detection

Object detection with deformable part-based models

Lit part of blue dress and shadowed part of white dress are the same color

Object detection, deep learning, and R-CNNs

Recap: Advanced Feature Encoding

Object detection as supervised classification

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

A Tutorial on HOG Human Detection

CS 1674: Intro to Computer Vision Scene Recognition

“The Truth About Cats And Dogs”

Object DetectionII Ali Taalimi 01/08/2013.

Outline Background Motivation Proposed Model Experimental Results

RCNN, Fast-RCNN, Faster-RCNN

Lecture 29: Face Detection Revisited

Presentation transcript:

CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015

Today: Object category detection Window-based approaches: – Review: Viola-Jones detector – Dalal-Triggs pedestrian detector Part-based approaches: – Implicit shape model – Deformable parts model

Object Category Detection Focus on object search: “Where is it?” Build templates that quickly differentiate object patch from background patch Dog or Non-Dog? … Derek Hoiem

Viola-Jones detector: features Feature output is difference between adjacent regions Efficiently computable with integral image: any sum can be computed in constant time “Rectangular” filters Value at (x,y) is sum of pixels above and to the left of (x,y) Integral image Adapted from Kristen Grauman and Lana Lazebnik Value = ∑ (pixels in white area) – ∑ (pixels in black area)

Considering all possible filter parameters: position, scale, and type: 180,000+ possible features associated with each 24 x 24 window Which subset of these features should we use to determine if a window has a face? Use AdaBoost both to select the informative features and to form the classifier Viola-Jones detector: features Kristen Grauman

Viola-Jones detector: AdaBoost Want to select the single rectangle feature and threshold that best separates positive (faces) and negative (non- faces) training examples, in terms of weighted error. Outputs of a possible rectangle feature on faces and non-faces. … Resulting weak classifier: For next round, reweight the examples according to errors, choose another filter/threshold combo. Kristen Grauman

Cascading classifiers for detection Form a cascade with low false negative rates early on Apply less accurate but faster classifiers first to immediately discard windows that clearly appear to be negative Kristen Grauman DEMO

Dalal-Triggs pedestrian detector 1.Extract fixed-sized (64x128 pixel) window at each position and scale 2.Compute HOG (histogram of gradient) features within each window 3.Score the window with a linear SVM classifier 4.Perform non-maxima suppression to remove overlapping detections with lower scores Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Resolving detection scores Non-max suppression Score = 0.1 Score = 0.8 Adapted from Derek Hoiem

Slide by Pete Barnum, Kristen Grauman Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 Map each grid cell in the input window to a histogram counting the gradients per orientation. Train a linear SVM using training set of pedestrian vs. non-pedestrian windows. Code available:

Histogram of gradient orientations – Votes weighted by magnitude – Bilinear interpolation between cells Orientation: 9 bins (for unsigned angles) Histograms in 8x8 pixel cells Slide by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Histograms of oriented gradients (HOG) N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, CVPR 2005Histograms of Oriented Gradients for Human Detection 10x10 cells 20x20 cells Image credit: N. Snavely

Histograms of oriented gradients (HOG) N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, CVPR 2005Histograms of Oriented Gradients for Human Detection Image credit: N. Snavely

Histograms of oriented gradients (HOG) N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, CVPR 2005Histograms of Oriented Gradients for Human Detection

Histograms of oriented gradients (HOG) N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, CVPR 2005Histograms of Oriented Gradients for Human Detection

Slide by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 pos w neg w

pedestrian Slide by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Detection examples

Are we done? Single rigid template usually not enough to represent a category – Many objects (e.g. humans) are articulated, or have parts that can vary in configuration – Many object categories look very different from different viewpoints, or from instance to instance Slide by N. Snavely

Deformable objects Images from Caltech-256 Slide Credit: Duan Tran

Deformable objects Images from D. Ramanan’s dataset Slide Credit: Duan Tran

Parts-based Models Define object by collection of parts modeled by 1.Appearance 2.Spatial configuration Slide credit: Rob Fergus

How to model spatial relations? One extreme: fixed template Derek Hoiem

Part-based template Object model = sum of scores of features at fixed positions = = 10.5 > 7.5 ? ? Non-object Object Derek Hoiem

How to model spatial relations? Another extreme: bag of words = Derek Hoiem

How to model spatial relations? Star-shaped model Root Part Derek Hoiem

How to model spatial relations? Star-shaped model = X X X ≈ Root Part Derek Hoiem

Parts-based Models Articulated parts model – Object is configuration of parts – Each part is detectable and can move around Adapted from Derek Hoiem, images from Felzenszwalb

Implicit shape models Visual vocabulary is used to index votes for object position [a visual word = “part”] B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004Combined Object Categorization and Segmentation with an Implicit Shape Model visual codeword with displacement vectors training image annotated with object localization info Lana Lazebnik

Implicit shape models Visual vocabulary is used to index votes for object position [a visual word = “part”] B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004Combined Object Categorization and Segmentation with an Implicit Shape Model test image Lana Lazebnik

Implicit shape models: Training 1.Build vocabulary of patches around extracted interest points using clustering Lana Lazebnik

Implicit shape models: Training 1.Build vocabulary of patches around extracted interest points using clustering 2.Map the patch around each interest point to closest word Lana Lazebnik

Implicit shape models: Training 1.Build vocabulary of patches around extracted interest points using clustering 2.Map the patch around each interest point to closest word 3.For each word, store all positions it was found, relative to object center Lana Lazebnik

Implicit shape models: Testing 1.Given new test image, extract patches, match to vocabulary words 2.Cast votes for possible positions of object center 3.Search for maxima in voting space Lana Lazebnik

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial K. Grauman, B. Leibe Original image Example: Results on Cows

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial K. Grauman, B. Leibe Original imageInterest points Example: Results on Cows

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial K. Grauman, B. Leibe Original imageInterest points Matched patches Example: Results on Cows

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial K. Grauman, B. Leibe 1 st hypothesis Example: Results on Cows

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial K. Grauman, B. Leibe 2 nd hypothesis Example: Results on Cows

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial K. Grauman, B. Leibe Example: Results on Cows 3 rd hypothesis

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial K. Grauman, B. Leibe Detection Results Qualitative Performance  Recognizes different kinds of objects  Robust to clutter, occlusion, noise, low contrast

Discriminative part-based models P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detection with Discriminatively Trained Part Based Models, PAMI 32(9), 2010Object Detection with Discriminatively Trained Part Based Models Root filter Part filters Deformation weights Lana Lazebnik

Discriminative part-based models P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detection with Discriminatively Trained Part Based Models, PAMI 32(9), 2010Object Detection with Discriminatively Trained Part Based Models Multiple components Lana Lazebnik

Discriminative part-based models P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detection with Discriminatively Trained Part Based Models, PAMI 32(9), 2010Object Detection with Discriminatively Trained Part Based Models Lana Lazebnik

Scoring an object hypothesis The score of a hypothesis is the sum of appearance scores minus the sum of deformation costs Appearance weights Subwindow features Deformation weights Displacements Adapted from Lana Lazebnik

Detection Lana Lazebnik

Training Training data consists of images with labeled bounding boxes Need to learn the filters and deformation parameters Lana Lazebnik

Training Our classifier has the form w are model parameters, z are latent hypotheses Latent SVM training: Initialize w and iterate: Fix w and find the best z for each training example Fix z and solve for w (standard SVM training) Lana Lazebnik

Car model Component 1 Component 2 Lana Lazebnik

Car detections Lana Lazebnik

Person model Lana Lazebnik

Person detections Lana Lazebnik

Bottle model Lana Lazebnik

Cat model Lana Lazebnik

Cat detections Lana Lazebnik

Detection state of the art Object detection system overview. Our system (1) takes an input image, (2) extracts around 2000 bottom-up region proposals, (3) computes features for each proposal using a large convolutional neural network (CNN), and then (4) classifies each region using class-specific linear SVMs. R-CNN achieves a mean average precision (mAP) of 53.7% on PASCAL VOC For comparison, Uijlings et al. (2013) report 35.1% mAP using the same region proposals, but with a spatial pyramid and bag-of-visual- words approach. The popular deformable part models perform at 33.4%. R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation Lana Lazebnik

Summary Window-based approaches – Assume object appears in roughly the same configuration in different images – Look for alignment with a global template Part-based methods – Allow parts to move somewhat from their usual locations – Look for good fits in appearance, for both the global template and the individual part templates Next time: Analyzing and debugging detection methods