Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Slides:



Advertisements
Similar presentations
Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Advertisements

Diagnosing Error in Object Detectors Department of Computer Science University of Illinois at Urbana-Champaign (UIUC) Derek Hoiem Yodsawalai Chodpathumwan.
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Lecture 6: Classification & Localization
Caroline Rougier, Jean Meunier, Alain St-Arnaud, and Jacqueline Rousseau IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 5,
ImageNet Classification with Deep Convolutional Neural Networks
Large-Scale Object Recognition with Weak Supervision
Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA
A Robust Pedestrian Detection Approach Based on Shapelet Feature and Haar Detector Ensembles Wentao Yao, Zhidong Deng TSINGHUA SCIENCE AND TECHNOLOGY ISSNl.
Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang1;2, Manohar Paluri1, Marc’Aurelio Ranzato1, Trevor Darrell2, Lubomir Bourdev1 1: Facebook.
1 Accurate Object Detection with Joint Classification- Regression Random Forests Presenter ByungIn Yoo CS688/WST665.
R-CNN By Zhang Liliang.
Distributed Representations of Sentences and Documents
Spatial Pyramid Pooling in Deep Convolutional
From R-CNN to Fast R-CNN
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Generic object detection with deformable part-based models
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Presented by: Kamakhaya Argulewar Guided by: Prof. Shweta V. Jain
Avoiding Segmentation in Multi-digit Numeral String Recognition by Combining Single and Two-digit Classifiers Trained without Negative Examples Dan Ciresan.
Detection, Segmentation and Fine-grained Localization
Object Detection with Discriminatively Trained Part Based Models
Object Recognition in Images Slides originally created by Bernd Heisele.
Object detection, deep learning, and R-CNNs
VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR
The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.
Fully Convolutional Networks for Semantic Segmentation
Deep Convolutional Nets
Neural networks in modern image processing Petra Budíková DISA seminar,
CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.
Object Detection Overview Viola-Jones Dalal-Triggs Deformable models Deep learning.
Recognition Using Visual Phrases
Learning Features and Parts for Fine-Grained Recognition Authors: Jonathan Krause, Timnit Gebru, Jia Deng, Li-Jia Li, Li Fei-Fei ICPR, 2014 Presented by:
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Unsupervised Visual Representation Learning by Context Prediction
Cascade Region Regression for Robust Object Detection
Cell Segmentation in Microscopy Imagery Using a Bag of Local Bayesian Classifiers Zhaozheng Yin RI/CMU, Fall 2009.
Convolutional Neural Network
Lecture 4a: Imagenet: Classification with Localization
Spatial Localization and Detection
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Recent developments in object detection
Object Detection based on Segment Masks
Compact Bilinear Pooling
Object detection with deformable part-based models
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Nonparametric Semantic Segmentation
Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.
Object detection as supervised classification
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Object detection.
Multiple Organ Detection in CT Volumes using CNN Week 2
Computer Vision James Hays
Image Classification.
Object Detection + Deep Learning
Faster R-CNN By Anthony Martinez.
Outline Background Motivation Proposed Model Experimental Results
Analysis of Trained CNN (Receptive Field & Weights of Network)
RCNN, Fast-RCNN, Faster-RCNN
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Department of Computer Science Ben-Gurion University of the Negev
Object Detection Implementations
Jiahe Li
Presentation transcript:

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik UC Berkeley 1

Outline Introduction Object detection with R-CNN Visualization, ablation, and modes of error Semantic segmentation Conclusion 2

Outline Introduction Object detection with R-CNN Visualization, ablation, and modes of error Semantic segmentation Conclusion 3

Introduction In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012—achieving a mAP of 53.3% 4

Introduction Our approach combines two key insights : 1.Apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects 2.When labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost 5

Outline Introduction Object detection with R-CNN Visualization, ablation, and modes of error Semantic segmentation Conclusion 6

Object detection with R-CNN R-CNN 7

Object detection with R-CNN Our object detection system consists of three modules : 1.Region proposals 2.Feature extraction 3.A set of classifiers 8

Region proposals Objectness [1] Selective search [34] Category-independent object proposals [12] Constrained parametric min-cuts (CPMC) [5] Multi-scale combinatorial grouping [3] Ciresan, et al. [6] (Mitosis detection in breast cancer) 9

Region proposals Objectness [1] Selective search [34] Category-independent object proposals [12] Constrained parametric min-cuts (CPMC) [5] Multi-scale combinatorial grouping [3] Ciresan, et al. [6] (Mitosis detection in breast cancer) 10

Feature extraction We extract a 4096-dimensional feature vector from each region proposal using the Caffe [22] implementation of the CNN described by Krizhevsky et al. [23] (AlexNet) 11

Feature extraction Features are computed by forward propagating a mean-subtracted 227 × 227 RGB image We warp all pixels in a tight bounding box around it to the required size Dilate the tight bounding box so that at the warped size there are exactly p pixels of warped image context around the original box (we use p = 16) Outperformed 3-5 mAP points 12

Feature extraction 13

Feature extraction Training : 1.Supervised pre-training Using the open source Caffe CNN library 2.Domain-specific fine-tuning We continue stochastic gradient descent (SGD) training of the CNN parameters using only warped region proposals from Visual Object Classes (VOC) 14

Feature extraction Replacing the CNN’s ImageNet-specific 1000-way classification layer with a randomly initialized (N+1)-way (VOC N=20) classification layer (N classes + 1 background) We treat all region proposals with ≥ 0.5 intersection-over-union(IoU) overlap with a ground-truth box as positives SGD learning rate = (1/10th of the initial pre-training rate) In each SGD iteration, we uniformly sample 32 positive windows (over all classes) and 96 background windows 15

A set of classifiers Linear SVMs Training binary classifiers Non-maximum suppression Below an IoU overlap threshold (0.3) which regions are defined as negatives Positive examples are defined simply to be the ground-truth bounding boxes for each class Proposals that fall into the grey zone (more than 0.3 IoU overlap, but are not ground truth) are ignored 16

17

Outline Introduction Object detection with R-CNN Visualization, ablation, and modes of error Semantic segmentation Conclusion 18

Visualization, ablation, and modes of error Visualizing learned features Ablation studies Detection error analysis Bounding box regression 19

Visualizing learned features 20

21

Ablation studies 22

23

Detection error analysis To understand some finer details, we applied the excellent detection analysis tool from Hoiem et al. [21] in order to reveal our method’s error modes, understand how fine-tuning changes them, and to see how our error types compare with DPM False positive False negative [21] D. Hoiem, Y. Chodpathumwan, and Q. Dai. Diagnosing error in object detectors. In ECCV

Detection error analysis – False Positive Loc — poor localization (a detection with an IoU overlap with the correct class between 0.1 and 0.5, or a duplicate) Sim — confusion with a similar category Oth — confusion with a dissimilar object category BG — a FP that fired on background 25

26

Detection error analysis – False Negative Sensitivity to object characteristics Occlusion, Truncation, Bounding box area, Aspect ratio, Viewpoint, Part visibility Fine-tuning does not reduce sensitivity 27

Bounding box regression 28

Bounding box regression scale-invariant translation log-space translations 29

Bounding box regression We learn by optimizing the regularized least squares objective 30

Outline Introduction Object detection with R-CNN Visualization, ablation, and modes of error Semantic segmentation Conclusion 31

Semantic segmentation 32

Semantic segmentation CNN features for segmentation, three strategies : Full : Ignores the region’s shape and computes CNN features directly on the warped window. However, two regions might have very similar bounding boxes while having very little overlap FG : Foreground mask. We replace the background with the mean input so that background regions are zero after mean subtraction 33

Semantic segmentation 34

35

36

37

38

39

Outline Introduction Object detection with R-CNN Visualization, ablation, and modes of error Semantic segmentation Conclusion 40

Conclusion This paper presents a simple and scalable object detection algorithm that gives a 30% relative improvement over the best previous results on PASCAL VOC 2012 We achieved this performance through two insights : 1.Apply high-capacity convolutional neural networks to bottom-up region proposals in order to localize and segment objects 2.A paradigm for training large CNNs when labeled training data is scarce We show that it is highly effective to pre-train the network with supervision and then fine-tune 41

Thanks for listening! 42