RCNN, Fast-RCNN, Faster-RCNN

Slides:



Advertisements
Similar presentations
Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Advertisements

Object recognition and scene “understanding”
Lecture 6: Classification & Localization
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
DeepID-Net: deformable deep convolutional neural network for generic object detection Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng.
Large-Scale Object Recognition with Weak Supervision
R-CNN By Zhang Liliang.
Spatial Pyramid Pooling in Deep Convolutional
On the Object Proposal Presented by Yao Lu
From R-CNN to Fast R-CNN
Generic object detection with deformable part-based models
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
Detection, Segmentation and Fine-grained Localization
Object detection, deep learning, and R-CNNs
Fully Convolutional Networks for Semantic Segmentation
CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
Feedforward semantic segmentation with zoom-out features
Cascade Region Regression for Robust Object Detection
Lecture 4a: Imagenet: Classification with Localization
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Spatial Localization and Detection
Lecture 3b: CNN: Advanced Layers
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Recent developments in object detection
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
Faster R-CNN – Concepts
Object Detection based on Segment Masks
Object detection with deformable part-based models
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Part-Based Room Categorization for Household Service Robots
Lecture 5 Smaller Network: CNN
Presentation by Ryan Brand
CS6890 Deep Learning Weizhen Cai
Object detection as supervised classification
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Image Question Answering
Object detection.
A Convolutional Neural Network Cascade For Face Detection
Computer Vision James Hays
Recognition IV: Object Detection through Deep Learning and R-CNNs
Introduction to Neural Networks
Image Classification.
Vessel Extraction in X-Ray Angiograms Using Deep Learning
Object Detection + Deep Learning
SAS Deep Learning Object Detection, Keypoint Detection
Smart Robots, Drones, IoT
CornerNet: Detecting Objects as Paired Keypoints
Object Detection Creation from Scratch Samsung R&D Institute Ukraine
Faster R-CNN By Anthony Martinez.
On Convolutional Neural Network
Papers 15/08.
Outline Background Motivation Proposed Model Experimental Results
Object Tracking: Comparison of
Analysis of Trained CNN (Receptive Field & Weights of Network)
边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University
Convolutional Neural Network
Human-object interaction
Rotational Rectification Network (R2N):
Motivation State-of-the-art two-stage instance segmentation methods depend heavily on feature localization to produce masks.
Object Detection Implementations
Learning Deconvolution Network for Semantic Segmentation
End-to-End Facial Alignment and Recognition
Jiahe Li
Point Set Representation for Object Detection and Beyond
Van-Thanh Hoang May 11, 2019 Improving Object Localization with Fitness NMS and Bounded IoU Loss Lachlan Tychsen-Smith, Lars.
Presentation transcript:

RCNN, Fast-RCNN, Faster-RCNN Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

Today’s Class Object Detection The RCNN Object Detector (2014) The Fast RCNN Object Detector (2015) The Faster RCNN Object Detector (2016)

Object Detection deer cat Input: image Output: object category and location for every object in the image

Object Detection as Classification deer? cat? background? CNN

Object Detection as Classification deer? cat? background? CNN

Object Detection as Classification CNN deer? cat? background?

Object Detection as Classification with Sliding Window CNN deer? cat? background? Main idea:

Object Detection as Classification with Box Proposals

RCNN https://people.eecs.berkeley.edu/~rbg/papers/r-cnn-cvpr.pdf Rich feature hierarchies for accurate object detection and semantic segmentation. Girshick et al. CVPR 2014.

RCNN First stage: generate category-independent region proposals. 2000 Region proposals for every image Selective Search: combine the strength of both an exhaustive search and segmentation. Uijlings et al. IJCV 2013. ref On the one hand, use the image structure to guide the sampling process. On the other hand: aim to capture all possible object locations.

RCNN CNN First stage: generate category-independent region proposals. 2000 Region proposals for every image Second stage: extracts a fixed-length feature vector from each region. a 4096-dimensional feature vector from each region proposal feature vector warp CNN On the one hand, use the image structure to guide the sampling process. On the other hand: aim to capture all possible object locations. Arbitrary rectangles? A fixed size input? 227 x 227 5 conv layers + 2 fully connected layers

Bounding box regression RCNN First stage: generate category-independent region proposals. 2000 Region proposals for every image Second stage: extracts a fixed-length feature vector from each region. a 4096-dimensional feature vector from each region proposal linear svm people? horse? background? feature vector Third stage: a set of class- specific linear SVMs. On the one hand, use the image structure to guide the sampling process. On the other hand: aim to capture all possible object locations. x y w h Bounding box regression object category and location proposal location

? Fast-RCNN RCNN Simple and scalable. improves mAP. A multistage pipeline. Training is expensive in space and time (features are extracted from each region proposal in each image and written into disk). Object detection is slow. ?

Fast-RCNN Idea: No need to recompute features for every box independently https://arxiv.org/abs/1504.08083 Fast R-CNN. Girshick. ICCV 2015.

Fast-RCNN Process the whole image with several convolutional (conv) and max pooling layers to produce a conv feature map. a region of interest (RoI) pooling layer extracts a fixed-length feature vector from the region feature map. FC+ softmax K + 1 categories feature vector + four real-valued numbers for each of the K object classes. FC+ regressor …

? Fast-RCNN RCNN Faster-RCNN Simple and scalable. improves mAP. Higher mAP. Single stage, end-to-end training. No disk storage is required for feature caching. A multistage pipeline. Training is expensive in space and time (features are extracted from each region proposal in each image and written into disk). Object detection is slow. ? proposals are the computational bottleneck in detection systems.

Faster-RCNN Idea: Integrate the Bounding Box Proposals as part of the CNN predictions https://arxiv.org/abs/1506.01497 Ren et al. NIPS 2015.

Faster-RCNN … Region Proposal Networks: k anchors boxes 2k scores 4k coordinates object or not object bounding box proposal RPN 1x1 conv layer 1x1 conv layer cls layer reg layer Shared conv layers nxn conv layer Fast-RCNN feature map sliding window, nxn …

? Fast-RCNN RCNN Faster-RCNN Simple and scalable. improves mAP. Higher mAP. Single stage, end-to-end training. No disk storage is required for feature caching. compute proposals with a deep convolutional neural network --Region Proposal Network (RPN) merge RPN and Fast R-CNN into a single network, enabling nearly cost-free region proposals. A multistage pipeline. Training is expensive in space and time (features are extracted from each region proposal in each image and written into disk). Object detection is slow. proposals are the computational bottleneck in detection systems. ?

YOLO- You Only Look Once Idea: No bounding box proposal. A single regression problem, straight from image pixels to bounding box coordinates and class probabilities. extremely fast reason globally learn generalizable representations https://arxiv.org/abs/1506.02640 Redmon et al. CVPR 2016.

YOLO- You Only Look Once Divide the image into 7x7 cells. Each cell trains a detector. The detector needs to predict the object’s class distributions. The detector has 2 bounding-box predictors to predict bounding-boxes and confidence scores.

SSD: Single Shot Detector Idea: Similar to YOLO, but denser grid map, multiscale grid maps. + Data augmentation + Hard negative mining + Other design choices in the network. Liu et al. ECCV 2016.

Questions?