3Review: rigid vs. deformable Rigid objectsTackles local variations, hardly handle deformationsClassical cascaded AdaBoostDeformable part models (DPM)Spatial pyramid matching (SPM) of BoWDeformable objectsHigh deformation tolerance results in imprecise localization or false positives for rigid objectsCan we mode both rigid and deformable objects in a unified framework?
4Review: parameter selection Deformable Part-based Model (DPM)Specify the number of deformable partsSpatial Pyramid MatchingSpecify the number of pyramids to buildDo we have to pre-define model parameters tohandle different degrees of deformation?
5Review: multi scales/viewpoints DPMResize an image to detect objects at a fixed scaleMultiple models, each deals with one viewpointSpatial Pyramid MatchingNo need to resize the imageOne model, a codebook is used to encode featuresCan we learn a model that can be easily adapted to arbitrary scales and viewpoints?
6Review: sliding window vs. selective search Classical: sliding windowCheck hundreds of thousands (if not millions) of sliding windows.Resize image and detect objects at fixed scaleNeeds to use very efficient classifiersRecent: selective searchThousands of object-independent candidate regions[12PAMI] B. Alexe, T. Deselaers, and V. Ferrari, Measuring the objectness of image window.[13IJCV] K. E. A. Van de Sande, J. R. R. Uijlings, T. Gevers, and A. W. M. Smeulders, Selective search for object recognition[13PAMI] Ian Endres, and Derek Hoiem, Category-independent object proposals with diverse ranking.[11ICCV] E. Rahtu, J. Kannala, and M. Blaschko. Learning a category independent object detection cascade.[11CVPR] Z. Zhang, J. Warrell, and P. H. S. Torr. Proposal generation for object detection using cascaded ranking SVMs. InCVPR, 2011Allowing strong classifier, needs to work with arbitrary window size
7Motivation A flexible and general object-level representation with Hassle free deformation handlingArbitrary scales and aspect ratio handling
9Regionlet: Definition Region (𝑅): feature extraction regionRegionlet ( 𝑟 1 , 𝑟 2 , 𝑟 3 ): A sub-region in a feature extraction area whose position/resolution are relative and normalized to a detection windowBounding boxRegionRegionletsWhy regionlets?Some part of the region may not be informative or even distractive.Why ‘regionlets’?Define sub-parts of regions, as the basic units to extract appearance features. Then organize these subparts into small groups which are more flexible to describe distinct object categories with different degree of deformations.The features of these subregions will be aggregated to a single feature for R, and they are bellow the level of standalone feature extraction region in an object classifier.
10Regionlet: definition (cont.) Relative normalized position
12Regionlet Constructing the regions/regionlets pool Small region, fewer regionlets -> fine spatial layoutLarge region, more regionlets -> robust to deformationUse cascaded boosting learning to select the most discriminative regionlets from a largely over-complete pool
13Regionlet: Training Construct regionlets pool Enumerating all possible regions is impractical and not necessary.Propose a set of regionlets with random positions (up to 5) inside each region with identical size.
14Regionlet: TrainingConstructing the regions/regionlets poolLearning Boosting cascades16K region/regionlets candidates for each cascadeLearning of each cascade stops when the error rate is achieved (1% for positive, 37.5% for negative)Last cascade stops after collecting 5000 weak classifiersResult in 4-7 cascades2-3 hours to finish training one category on a 8-core machine
15Regionlets: Testing No image resizing Any scale, any aspect ratio Adapt the model size to the same size as the object candidate bounding box
16Experiments Datasets Investigated Features PASCAL VOC 2007, 2010 20 object categoriesImageNet Large Scale Object Detection Dataset200 object categoriesInvestigated FeaturesHOGLBPCovarianceDeep Convolutional Neural Network (DCNN) feature (only for the ImageNet challenge)
17Experiments: PASCAL VOC BaselinesDPM: excellent at detecting rigid objects. E.g. car, bus, etc.SS_SPM: for objects with significant global deformations.E.g. cat, cow sheep, etc.SPM outperforms DPM in airplane and TV. Due to very diverse viewpoints or rich sub-categories.Objectness (use old version of DPM): selective search itself dose not benefit detection tool much (0.6% improvement) in terms of accuracy.
18Experiments: PASCAL VOC Baselines:DPMSPMObjectnessRegionletsWon 16 out of 20 categories‘To our best knowledge, is the best performance in VOC07’Multiple regionlets consistently outperforms single regionlets
21Running speed0.2 second per image using a single core if candidate bounding boxes are given, real time(>30 frames per second) using 8 cores 2 seconds per image to generate candidate bounding boxes 2-3 hours to finish training one category on a 8-core machine
22Conclusions A new object representation for object detection Non-local max-pooling of regionletsRelative normalized locations of regionletsFlexibility to incorporate various types of featuresA principled data-driven detection framework, effective in handling deformation, multiple scales, multiple viewpointsSuperior performance with a fast running speed
23Latest resultsRegionlets with Deep CNN feature (outside data)
24Some really exciting parts during the reading group will be made public valuable after CVPR 2014 results announced, together with source code!
25My thoughts towards a real-time system Current bottleneck is object proposal generationState of the art methods needs 2+ seconds to process one image [IJCV13, PAMI12, PAMI13]Our recent resultsTraining on VOC: 20sTesting speed: 300fpsGeneric ability