From R-CNN to Fast R-CNN

Slides:



Advertisements
Similar presentations
Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Advertisements

Regionlets for Generic Object Detection
Lecture 6: Classification & Localization
DeepID-Net: deformable deep convolutional neural network for generic object detection Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng.
Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
OverFeat Part1 Tricks on Classification
Large-Scale Object Recognition with Weak Supervision
Learning Convolutional Feature Hierarchies for Visual Recognition
R-CNN By Zhang Liliang.
DeeperVision and DeepInsight Solutions
Spatial Pyramid Pooling in Deep Convolutional
On the Object Proposal Presented by Yao Lu
Generic object detection with deformable part-based models
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Generic Object Detection
Detection, Segmentation and Fine-grained Localization
Object Detection with Discriminatively Trained Part Based Models
Object detection, deep learning, and R-CNNs
Fully Convolutional Networks for Semantic Segmentation
Deep Convolutional Nets
CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
Feedforward semantic segmentation with zoom-out features
Regionlets for Generic Object Detection IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 37, NO. 10, OCTOBER 2015 Xiaoyu Wang, Ming.
Unsupervised Visual Representation Learning by Context Prediction
Cascade Region Regression for Robust Object Detection
Lecture 4a: Imagenet: Classification with Localization
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Spatial Localization and Detection
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.
Scale Up Video Understanding with Deep Learning May 30, 2016 Chuang Gan Tsinghua University 1.
City Forensics: Using Visual Elements to Predict Non-Visual City Attributes Sean M. Arietta, Alexei A. Efros, Ravi Ramamoorthi, Maneesh Agrawala Presented.
Convolutional Neural Networks at Constrained Time Cost (CVPR 2015) Authors : Kaiming He, Jian Sun (MSR) Presenter : Hyunjun Ju 1.
Recent developments in object detection
Learning to Compare Image Patches via Convolutional Neural Networks
Faster R-CNN – Concepts
Object Detection based on Segment Masks
Object detection with deformable part-based models
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Perceptual Loss Deep Feature Interpolation for Image Content Changes
Huazhong University of Science and Technology
Training Techniques for Deep Neural Networks
Efficient Deep Model for Monocular Road Segmentation
CS6890 Deep Learning Weizhen Cai
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Object detection.
Computer Vision James Hays
Recognition IV: Object Detection through Deep Learning and R-CNNs
Image Classification.
Convolutional Neural Networks for Visual Tracking
RGB-D Image for Scene Recognition by Jiaqi Guo
Object Detection + Deep Learning
Very Deep Convolutional Networks for Large-Scale Image Recognition
KFC: Keypoints, Features and Correspondences
Faster R-CNN By Anthony Martinez.
Outline Background Motivation Proposed Model Experimental Results
Object Tracking: Comparison of
RCNN, Fast-RCNN, Faster-RCNN
Neural Network Pipeline CONTACT & ACKNOWLEDGEMENTS
Convolutional Neural Network
Course Recap and What’s Next?
Human-object interaction
Deep Object Co-Segmentation
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Object Detection Implementations
End-to-End Facial Alignment and Recognition
Jiahe Li
Volodymyr Bobyr Supervised by Aayushjungbahadur Rana
Point Set Representation for Object Detection and Beyond
Presentation transcript:

From R-CNN to Fast R-CNN Detection: From R-CNN to Fast R-CNN Reporter: Liliang Zhang

Object Detection: Intuition Detection ≈ Localization + Classification

Outline R-CNN SPP-Net Fast R-CNN

Outline R-CNN SPP-Net Fast R-CNN

R-CNN: Pipeline Overview Step1. Input an image Step2. Use selective search to obtain ~2k proposals Step3. Warp each proposal and apply CNN to extract its features Step4. Adopt class-specified SVM to score each proposal Step5. Rank the proposals and use NMS to get the bboxes. Step6. Use class-specified regressors to refine the bboxes’ positions. Ross Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR14

R-CNN: Performance in PASCAL VOC07 AlexNet(T-Net): 58.5 mAP VGG-Net(O-Net): 66.0 mAP

R-CNN: Limitation TOO SLOWWWW !!! (13s/image on a GPU or 53s/image on a CPU, and VGG-Net 7x slower) Proposals need to be warped to a fixed size.

Outline R-CNN SPP-Net Fast R-CNN

SPP-Net: Motivation Cropping may loss some information about the object Warpping may change the object’s appearance He et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, TPAMI15

SPP-Net: Spatial Pyramid Pooling (SPP) Layer FC layer need a fixed-length input while conv layer can be adapted to arbitrary input size. Thus we need a bridge between the conv and FC layer. Here comes the SPP layer.

SPP-Net: Training for Detection(1) Step1. Generate a image pyramid and exact the conv FeatMap of the whole image Conv5 feature map Conv5 feature map conv Conv5 feature map Image Pyramid FeatMap Pyramids

SPP-Net: Training for Detection(2) Step 2, For each proposal, walking the image pyramid and find a project version that has a number of pixels closest to 224x224. (For scaling invariance in training.) Step 3, find the corresponding FeatMap in Conv5 and use SPP layer to pool it to a fix size. Step 4, While getting all the proposals’ feature, fine-tune the FC layer only. Step 5, Train the class- specified SVM

SPP-Net: Testing for Detection Allmost the same as R-CNN, except Step3.

SPP-Net: Performance Speed: 64x faster than R-CNN using one scale, and 24x faster using five-scale paramid. mAP: +1.2 mAP vs R-CNN

SPP-Net: Limitation 1. Training is a multi-stage pipeline. Conv layers FC layers SVM regressor 2. Training is expensive in space and time. store

Outline R-CNN SPP-Net Fast R-CNN

JOINT TRAINING!! Fast R-CNN: Motivation Ross Girshick, Fast R-CNN, Arxiv tech report

Fast R-CNN: Joint Training Framework Joint the feature extractor, classifier, regressor together in a unified framework

Fast R-CNN: RoI pooling layer ≈ one scale SPP layer

Fast R-CNN: Regression Loss A smooth L1 loss which is less sensitive to outliers than L2 loss

Fast R-CNN: Scale Invariance brute force (single scale) image pyramids (multi scale) Conv5 feature map conv In practice, single scale is good enough. (The main reason why it can faster x10 than SPP-Net)

Fast R-CNN: Other tricks SVD on FC layers: 30% speed up at testing time with a little performance drop. Which layers to fine-tune? Fix the shallow conv layers can reduce the training time with a little performance drop. Data augment: use VOC12 as the additional trainset can boost mAP by ~3%

Fast R-CNN: Performance Without data augment, the mAP just +0.9 on VOC077 Without data augment, the mAP +2.3 on VOC127 But training and testing time has been greatly speed up. (training 9x, testing 213x vs R-CNN)

Fast-RCNN: Discussion about #proposal Are more proposals always better? NO!

Thanks