Presentation is loading. Please wait.

Presentation is loading. Please wait.

From R-CNN to Fast R-CNN

Similar presentations


Presentation on theme: "From R-CNN to Fast R-CNN"— Presentation transcript:

1 From R-CNN to Fast R-CNN
Detection: From R-CNN to Fast R-CNN Reporter: Liliang Zhang

2 Object Detection: Intuition
Detection ≈ Localization + Classification

3 Outline R-CNN SPP-Net Fast R-CNN

4 Outline R-CNN SPP-Net Fast R-CNN

5 R-CNN: Pipeline Overview
Step1. Input an image Step2. Use selective search to obtain ~2k proposals Step3. Warp each proposal and apply CNN to extract its features Step4. Adopt class-specified SVM to score each proposal Step5. Rank the proposals and use NMS to get the bboxes. Step6. Use class-specified regressors to refine the bboxes’ positions. Ross Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR14

6 R-CNN: Performance in PASCAL VOC07
AlexNet(T-Net): 58.5 mAP VGG-Net(O-Net): 66.0 mAP

7 R-CNN: Limitation TOO SLOWWWW !!! (13s/image on a GPU or 53s/image on a CPU, and VGG-Net 7x slower) Proposals need to be warped to a fixed size.

8 Outline R-CNN SPP-Net Fast R-CNN

9 SPP-Net: Motivation Cropping may loss some information about the object Warpping may change the object’s appearance He et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, TPAMI15

10 SPP-Net: Spatial Pyramid Pooling (SPP) Layer
FC layer need a fixed-length input while conv layer can be adapted to arbitrary input size. Thus we need a bridge between the conv and FC layer. Here comes the SPP layer.

11 SPP-Net: Training for Detection(1)
Step1. Generate a image pyramid and exact the conv FeatMap of the whole image Conv5 feature map Conv5 feature map conv Conv5 feature map Image Pyramid FeatMap Pyramids

12 SPP-Net: Training for Detection(2)
Step 2, For each proposal, walking the image pyramid and find a project version that has a number of pixels closest to 224x224. (For scaling invariance in training.) Step 3, find the corresponding FeatMap in Conv5 and use SPP layer to pool it to a fix size. Step 4, While getting all the proposals’ feature, fine-tune the FC layer only. Step 5, Train the class- specified SVM

13 SPP-Net: Testing for Detection
Allmost the same as R-CNN, except Step3.

14 SPP-Net: Performance Speed: 64x faster than R-CNN using one scale, and 24x faster using five-scale paramid. mAP: +1.2 mAP vs R-CNN

15 SPP-Net: Limitation 1. Training is a multi-stage pipeline.
Conv layers FC layers SVM regressor 2. Training is expensive in space and time. store

16 Outline R-CNN SPP-Net Fast R-CNN

17 JOINT TRAINING!! Fast R-CNN: Motivation
Ross Girshick, Fast R-CNN, Arxiv tech report

18 Fast R-CNN: Joint Training Framework
Joint the feature extractor, classifier, regressor together in a unified framework

19 Fast R-CNN: RoI pooling layer
≈ one scale SPP layer

20 Fast R-CNN: Regression Loss
A smooth L1 loss which is less sensitive to outliers than L2 loss

21 Fast R-CNN: Scale Invariance
brute force (single scale) image pyramids (multi scale) Conv5 feature map conv In practice, single scale is good enough. (The main reason why it can faster x10 than SPP-Net)

22 Fast R-CNN: Other tricks
SVD on FC layers: 30% speed up at testing time with a little performance drop. Which layers to fine-tune? Fix the shallow conv layers can reduce the training time with a little performance drop. Data augment: use VOC12 as the additional trainset can boost mAP by ~3%

23 Fast R-CNN: Performance
Without data augment, the mAP just +0.9 on VOC077 Without data augment, the mAP +2.3 on VOC127 But training and testing time has been greatly speed up. (training 9x, testing 213x vs R-CNN)

24 Fast-RCNN: Discussion about #proposal
Are more proposals always better? NO!

25 Thanks


Download ppt "From R-CNN to Fast R-CNN"

Similar presentations


Ads by Google