Regionlets for Generic Object Detection IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 37, NO. 10, OCTOBER 2015 Xiaoyu Wang, Ming.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Regionlets for Generic Object Detection
Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Evaluating Color Descriptors for Object and Scene Recognition Koen E.A. van de Sande, Student Member, IEEE, Theo Gevers, Member, IEEE, and Cees G.M. Snoek,
Detecting Faces in Images: A Survey
Limin Wang, Yu Qiao, and Xiaoou Tang
Ming-Ming Cheng 1 Ziming Zhang 2 Wen-Yan Lin 3 Philip H. S. Torr 1 1 Oxford University, 2 Boston University 3 Brookes Vision Group Training a generic objectness.
ImageNet Classification with Deep Convolutional Neural Networks
Ivan Laptev IRISA/INRIA, Rennes, France September 07, 2006 Boosted Histograms for Improved Object Detection.
Presenter: Hoang, Van Dung
Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
A KLT-Based Approach for Occlusion Handling in Human Tracking Chenyuan Zhang, Jiu Xu, Axel Beaugendre and Satoshi Goto 2012 Picture Coding Symposium.
Large-Scale Object Recognition with Weak Supervision
High-level Component Filtering for Robust Scene Text Detection
Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.
Scale Invariant Feature Transform (SIFT)
PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang1;2, Manohar Paluri1, Marc’Aurelio Ranzato1, Trevor Darrell2, Lubomir Bourdev1 1: Facebook.
Spatial Pyramid Pooling in Deep Convolutional
On the Object Proposal Presented by Yao Lu
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Fast Human Detection Using a Novel Boosted Cascading Structure With Meta Stages Yu-Ting Chen and Chu-Song Chen, Member, IEEE.
Generic object detection with deformable part-based models
Olga Zoidi, Anastasios Tefas, Member, IEEE Ioannis Pitas, Fellow, IEEE
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
Video Tracking Using Learned Hierarchical Features
Detection, Segmentation and Fine-grained Localization
Marco Pedersoli, Jordi Gonzàlez, Xu Hu, and Xavier Roca
Object Detection with Discriminatively Trained Part Based Models
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Pedestrian Detection and Localization
Deformable Part Model Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 11 st, 2013.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Locality-constrained Linear Coding for Image Classification
Object detection, deep learning, and R-CNNs
Hierarchical Matching with Side Information for Image Classification
Human Detection Method Combining HOG and Cumulative Sum based Binary Pattern Jong Gook Ko', Jin Woo Choi', So Hee Park', Jang Hee You', ' Electronics and.
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
Robust Nighttime Vehicle Detection by Tracking and Grouping Headlights Qi Zou, Haibin Ling, Siwei Luo, Yaping Huang, and Mei Tian.
Object Recognition by Discriminative Combinations of Line Segments and Ellipses Alex Chia ^˚ Susanto Rahardja ^ Deepu Rajan ˚ Maylor Leung ˚ ^ Institute.
Learning Hierarchical Features for Scene Labeling
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
Spatial Localization and Detection
WLD: A Robust Local Image Descriptor Jie Chen, Shiguang Shan, Chu He, Guoying Zhao, Matti Pietikäinen, Xilin Chen, Wen Gao 报告人:蒲薇榄.
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.
Evaluation of Gender Classification Methods with Automatically Detected and Aligned Faces Speaker: Po-Kai Shen Advisor: Tsai-Rong Chang Date: 2010/6/14.
Face Detection 蔡宇軒.
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Recent developments in object detection
Object detection with deformable part-based models
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Learning Mid-Level Features For Recognition
Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan.
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.
Convolutional Neural Networks for Visual Tracking
KFC: Keypoints, Features and Correspondences
Outline Background Motivation Proposed Model Experimental Results
Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu
Human-object interaction
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Presentation transcript:

Regionlets for Generic Object Detection IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 37, NO. 10, OCTOBER 2015 Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin 1

Outline  INTRODUCTION  REGIONLETS FOR DETECTION  SUPPORT PIXEL INTEGRAL IMAGES FOR REGIONLETS  LEARNING THE REGIONLET MODEL  EXPERIMENTS  CONCLUSION 2

Outline  INTRODUCTION  REGIONLETS FOR DETECTION  SUPPORT PIXEL INTEGRAL IMAGES FOR REGIONLETS  LEARNING THE REGIONLET MODEL  EXPERIMENTS  CONCLUSION 3

INTRODUCTION  Despite the success of face detection (roughly rigid), different object classes demonstrate variable degrees of deformation or due to viewing distances or angles  Dilemma :  Model rigid object → Hardly handle deformable objects  High tolerance of deformation → Imprecise or false positives 4

INTRODUCTION  Primarily four typical strategies : 1.The classical Adaboost detector models local variations with an ensemble classifier of fast features (roughly rigid) 2.Deformable part model (DPM) method 3.Spatial pyramid matching (SPM) of bag-of-words (BoW) models 4.Deep convolutional neural networks (DCNN) 5

INTRODUCTION  The major contributions of this paper are three-fold : 1.The novel regionlet-based representation that models relative spatial layouts inside an object. 2.The regionlet-based detector efficiently applies to arbitrary bounding boxes at different scales and aspect ratios 3.The support pixel integral image allows fast access to feature vectors in arbitrary regions given a set of spatially sparse features in an image 6

OUTLINE  INTRODUCTION  REGIONLETS FOR DETECTION  SUPPORT PIXEL INTEGRAL IMAGES FOR REGIONLETS  LEARNING THE REGIONLET MODEL  EXPERIMENTS  CONCLUSION 7

REGIONLETS FOR DETECTION  Object detection is composed of two key components : 1.Determining where the candidate object locations are 2.Discerning whether they are the objects of interests  Exhaustive search → Selective search [8]  Selective search over-segments an image into superpixels, and then groups the superpixels to propose candidate bounding boxes (about 1,000~2,000 each image, achieve a very high recall) 8 [8] K. E. A. Van de Sande, J. R. R. Uijlings, T. Gevers, and A. W. M. Smeulders, “Segmentation as selective search for object recognition,” in Proc. Int. Conf. Comput. Vis., 2011, pp. 1879–1886

REGIONLETS FOR DETECTION  In our proposed method, we : 1.construct a large regionlet feature pool 2.design a cascaded boosting learning process to select the most discriminative regionlets for detection 9

Regionlets Definition 10 R will contribute a weak classifier to the boosting classifier

Regionlets Definition  A hand, as a supposedly informative part for a person, may appear at different locations within the bounding box of a person  More regionlets in R will increase the capacity to model deformations  Rigid objects may only require one regionlet from a feature extraction region 11

Regionlets Normalized by Detection Windows  Candidate bounding boxes derived from the selective search approach [8]  However, these proposed bounding boxes have arbitrary sizes and aspect ratios  Using the relative positions and sizes of the regionlets and their groups to an object bounding box 12

Regionlets Normalized by Detection Windows 13

Region Feature Extraction  Feature extraction from R takes two steps : 1.Extracting appearance features 2.Generating the representation of R based on regionlets’ features  In the first step, efficient extraction is straightforward for dense features where the features are accessible at every pixel location  Histogram of Oriented Gradients (HOG)  Local Binary Pattern (LBP)  Covariance features  Yet complicated for spatially sparse features  Sparse Scale-invariant feature transform (SIFT) features 14

Region Feature Extraction  We apply : 1.L2 normalization for HOG features 2.L1 normalization for LBP features 3.Covariance features are normalized by the corresponding variance 15

Region Feature Extraction  For the second step, we define a max-pooling operation over features extracted from individual regionlet  It is motivated by that a permutation invariant and exclusive operation over regionlet features allows for deformations inside these regionlets 16

Region Feature Extraction 17

Region Feature Extraction  We have millions of such 1D features in a detection window and the most discriminative ones are determined through a boosting type learning process 18

Outline  INTRODUCTION  REGIONLETS FOR DETECTION  SUPPORT PIXEL INTEGRAL IMAGES FOR REGIONLETS  LEARNING THE REGIONLET MODEL  EXPERIMENTS  CONCLUSION 19

SUPPORT PIXEL INTEGRAL IMAGES FOR REGIONLETS  The regionlet-based detector requires efficient access to features from arbitrary regionlets  This is not an issue for dense features such as HOG and LBP histograms, since the computations like gradient calculation or LBP extraction are amortized among multiple regionlets  However, this is hard for spatially sparse features, where the features are only available on a small set of pixel locations, e.g., sparse SIFT or DCNN features 20

Support Pixel Integral Images 21 Integral Images

Support Pixel Integral Images 22

Support Pixel Integral Images  The first layer stores effective integral vectors on a set of support pixels  The second layer produces a dense map with the same size as the input image, indicating the hashing table entry of the corresponding integral vector 23

Support Pixel Integral Images 24

Support Pixel Integral Images 25

Support Pixel Integral Images 26

27

SPII on DCNN Responses  In this section, we explain how to incorporate DCNN as a feature generator in the regionlet framework using support pixel integral images  DCNN : 1.Convolutional layers 2.Max-pooling layers 3.Fully-Connected layers 28 Deep Convolutional Neural Network Preserve their spatial support information (higher layers) Effective but inefficient

SPII on DCNN Responses 29 [39] for feature extraction The architecture is similar to [13] [39] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, “DeCAF: A deep convolutional activation feature for generic visual recognition,” arXiv preprint arXiv: , [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Neural Inf. Process. Syst., 2012, pp. 1097–1105. with a 16 x 16 pixel stride

SPII on DCNN Responses 30

Outline  INTRODUCTION  REGIONLETS FOR DETECTION  SUPPORT PIXEL INTEGRAL IMAGES FOR REGIONLETS  LEARNING THE REGIONLET MODEL  EXPERIMENTS  CONCLUSION 31

LEARNING THE REGIONLET MODEL  We follow the boosting framework to learn the discriminative regionlet groups and their configurations from a huge pool of candidate regions and regionlets 32

Regions/Regionlets Pool Construction  Before regionlet learning  R’ : a feature region prototype  r’ : a regionlet prototype  R’ = (l’, t’, r’, b’, k)  k : kth element of the low-level feature vector of the region  Enumerating all possible regions is impractical and not necessary. We employ a sampling process to reduce the pool size 33

34

Regions/Regionlets Pool Construction 35

Training with Boosting Regionlet Features 36 [40] R. E. Schapire and Y. Singer, “Improved boosting algorithms using confidence- rated predictions,” Mach. Learn., vol. 37, pp. 297–336, 1999 [18] C. Huang, H. Ai, B. Wu, and S. Lao, “Boosting nested cascade detector for multi- view face detection,” in Proc. Int. Conf. Patter Recognit., 2004, pp. 415–418.

Training with Boosting Regionlet Features 37

Training with Boosting Regionlet Features 38

Testing with Regionlets 39

Outline  INTRODUCTION  REGIONLETS FOR DETECTION  SUPPORT PIXEL INTEGRAL IMAGES FOR REGIONLETS  LEARNING THE REGIONLET MODEL  EXPERIMENTS  CONCLUSION 40

EXPERIMENTS  PASCAL VOC 2007 and VOC 2010 datasets  mean of Average Precisions (mAP)  The PASCAL VOC datasets have two evaluation tracks: 1.Comp3 : without external training data 2.Comp4 : use DCNN features  The much larger ImageNet object detection dataset (ILSVRC2013) which has 200 object categories 41

Experiment on PASCAL VOC Datasets  Selective search method [8] → bounding boxes  HOG,LBP and covariance features → features 42

Experiment on PASCAL VOC Datasets 43

Experiment on PASCAL VOC Datasets 44

Experiment on PASCAL VOC Datasets 45 [34] R. Cinbis, J. Verbeek, and C. Schmid, “Segmentation driven object detection with Fisher vectors,” in Proc. Int. Conf. Comput. Vis., 2013, pp. 2968–2975 [41] Z. Song, Q. Chen, Z. Huang, Y. Hua, and S. Yan, “Contextualizing object detection and classification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2011, pp. 1585–1592.

Experiment on PASCAL VOC Datasets 46

Experiment on PASCAL VOC Datasets 47

48

49

Experiment on PASCAL VOC Datasets  Speed :  12-core Xeon 2.1 GHz blade servers (multi-threading)  Training for one category on a single machine takes 4 to 8 hours  The detection runs at 50 frames per second on a server 5 frames per second :  single core  object region proposals are given 2 frames per second :  single core  computing object proposals 50

Experiment on PASCAL VOC Datasets  Recently, Cheng et al. [46] show that real-time object proposal generation can be achieved 51 [46] M.-M. Cheng, Z. Zhang, W.-Y. Lin, and P. H. S. Torr, “BING: Binarized normed gradients for objectness estimation at 300fps,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 3286– 3293.

Experiment on ImageNet Object Detection  the 200 object categories with 20 machines (each machine has 8-12 cores)  Training finishes in 2.5 days  Testing finishes in roughly 1 hour 52

Experiment on ImageNet Object Detection  ImageNet Large Scale Visual Recognition Challenge (ILSVRC 2013) 53

Outline  INTRODUCTION  REGIONLETS FOR DETECTION  SUPPORT PIXEL INTEGRAL IMAGES FOR REGIONLETS  LEARNING THE REGIONLET MODEL  EXPERIMENTS  CONCLUSION 54

CONCLUSION  Our regionlet model can adapt itself for detecting rigid objects, objects with small local deformations or large long-range deformations  Incorporate various types of features, e.g., dense HOG, LBP, Covariance features and sparse DCNN response features by support pixel integral images  PASCAL VOC datasets, ImageNet object detection dataset 55

Thanks for listening! 56