Fully Convolutional Networks for Semantic Segmentation

Slides:

Advertisements

Similar presentations

Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)

Advertisements

Lecture 6: Classification & Localization

Classification spotlights

Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection CVPR2013 POSTER.

R-CNN By Zhang Liliang.

Spatial Pyramid Pooling in Deep Convolutional

From R-CNN to Fast R-CNN

Generic object detection with deformable part-based models

The Three R’s of Vision Jitendra Malik.

Detection, Segmentation and Fine-grained Localization

ECE 6504: Deep Learning for Perception

Object detection, deep learning, and R-CNNs

Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.

Feedforward semantic segmentation with zoom-out features

Unsupervised Visual Representation Learning by Context Prediction

Cascade Region Regression for Robust Object Detection

Lecture 4a: Imagenet: Classification with Localization

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Spatial Localization and Detection

Introduction to Convolutional Neural Networks

Lecture 3b: CNN: Advanced Layers

Recent developments in object detection

CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.

Faster R-CNN – Concepts

Object Detection based on Segment Masks

Object detection with deformable part-based models

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

The Problem: Classification

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

Announcements Project proposal due tomorrow

CSCI 5922 Neural Networks and Deep Learning: Convolutional Nets For Image And Speech Processing Mike Mozer Department of Computer Science and Institute.

Combining CNN with RNN for scene labeling (segmentation)

Nonparametric Semantic Segmentation

Dhruv Batra Georgia Tech

Structured Predictions with Deep Learning

Training Techniques for Deep Neural Networks

Efficient Deep Model for Monocular Road Segmentation

CS6890 Deep Learning Weizhen Cai

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Project Implementation for ITCS4122

Object detection.

Fully Convolutional Networks for Semantic Segmentation

Computer Vision James Hays

Introduction to Neural Networks

Image Classification.

EVA2: Exploiting Temporal Redundancy In Live Computer Vision

Counting in Dense Crowds using Deep Learning

Object Detection + Deep Learning

Smart Robots, Drones, IoT

KFC: Keypoints, Features and Correspondences

Semantic segmentation

Neural network training

Outline Background Motivation Proposed Model Experimental Results

Visualizing and Understanding Convolutional Networks

RCNN, Fast-RCNN, Faster-RCNN

边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University

Heterogeneous convolutional neural networks for visual recognition

Convolutional Neural Network

CSCI 5922 Neural Networks and Deep Learning: Convolutional Nets For Image And Speech Processing Mike Mozer Department of Computer Science and Institute.

Department of Computer Science Ben-Gurion University of the Negev

Human-object interaction

Deep Object Co-Segmentation

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Motivation State-of-the-art two-stage instance segmentation methods depend heavily on feature localization to produce masks.

Semantic Segmentation

Learning Deconvolution Network for Semantic Segmentation

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Presentation transcript:

Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Goal of work is to use FCn to predict class at every pixel Transfer existing classification models to dense prediction tasks Presented by: Gordon Christie Slide credit: Jonathan Long

Overview Reinterpret standard classification convnets as “Fully convolutional” networks (FCN) for semantic segmentation Use AlexNet, VGG, and GoogleNet in experiments Novel architecture: combine information from different layers for segmentation State-of-the-art segmentation for PASCAL VOC 2011/2012, NYUDv2, and SIFT Flow at the time Inference less than one fifth of a second for a typical image Note that using existing networks is transfer learning Slide credit: Jonathan Long

pixels in, pixels out Slide credit: Jonathan Long monocular depth estimation (Liu et al. 2015) boundary prediction (Xie & Tu 2015) semantic segmentation Slide credit: Jonathan Long

convnets perform classification < 1 millisecond “tabby cat” 1000-dim vector end-to-end learning Slide credit: Jonathan Long

R-CNN does detection R-CNN many seconds “dog” “cat” Slide credit: Jonathan Long

R-CNN figure: Girshick et al. Slide credit: Jonathan Long

< 1/5 second ??? end-to-end learning Slide credit: Jonathan Long

a classification network “tabby cat” note omissions “activations” fixed size input, single label output desire: efficient per-pixel output Slide credit: Jonathan Long

becoming fully convolutional Slide credit: Jonathan Long

becoming fully convolutional Slide credit: Jonathan Long

upsampling output Slide credit: Jonathan Long

end-to-end, pixels-to-pixels network upsampling pixelwise output + loss conv, pool, nonlinearity Slide credit: Jonathan Long

Dense Predictions Shift-and-stitch: trick that yields dense predictions without interpolation Upsampling via deconvolution Shift-and-stitch used in preliminary experiments, but not included in final model Upsampling found to be more effective and efficient “Final layer deconvolutional filters are fixed to bilinear interpolation, while intermediate upsampling layers are initial- ized to bilinear upsampling” Changing only the filters and layer strides of a convnet can produce the same output as this shift-and-stitch trick.

Classifier to Dense FCN Convolutionalize proven classification architectures: AlexNet, VGG, and GoogLeNet (reimplementation) Remove classification layer and convert all fully connected layers to convolutions Append 1x1 convolution with channel dimensions and predict scores at each of the coarse output locations (21 categories + background for PASCAL) “Despite similar classification accuracy, our implementation of GoogLeNet did not match this segmentation result.”

Classifier to Dense FCN Cast ILSVRC classifiers into FCNs and compare performance on validation set of PASCAL 2011 THESE ARE VAL NUMBERS. Just begun and they are already state of the art They initialize using the classification models trained on imagenet Train with per-pixel multinomial loss and validate with mean intersection over union

spectrum of deep features combine where (local, shallow) with what (global, deep) fuse features into deep jet (cf. Hariharan et al. CVPR15 “hypercolumn”) Slide credit: Jonathan Long

skip layers interp + sum skip to fuse layers! dense output end-to-end, joint learning of semantics and location skip to fuse layers! Slide credit: Jonathan Long

skip layers “Max fusion made learning difficult due to gradient switching.” Decreasing the stride of pooling layers is the most straightforward way to obtain finer predictions. However, doing so is problematic for our VGG16-based net. Setting the pool5 layer to have stride 1 requires our convolutionalized fc6 to have a kernel size of 14 × 14 in order to maintain its receptive field size. In addi- tion to their computational cost, we had difficulty learning such large filters. We made an attempt to re-architect the layers above pool5 with smaller filters, but were not suc- cessful in achieving comparable performance; one possible explanation is that the initialization from ImageNet-trained weights in the upper layers is important.

Comparison of skip FCNs Results on subset of validation set of PASCAL VOC 2011 Fixed = only fine tuning in final layer

skip layer refinement input image stride 32 stride 16 stride 8 ground truth no skips 1 skip 2 skips Slide credit: Jonathan Long

training + testing train full image at a time without patch sampling reshape network to take input of any size forward time is ~150ms for 500 x 500 x 21 output Slide credit: Jonathan Long

Results – PASCAL VOC 2011/12 VOC 2011: 8498 training images (from additional labeled data For following 3 results, dropout was used when used in original network SDS: MCG proposals, feature extraction, SVM to classify, region refinement

Results – NYUDv2 1449 RGB-D images with pixelwise labels  40 categories Gupta: region proposals (using depth and rgb), deep features for depth and rgb, svm classifier, segmentation Gupta et all encode depth differently (surface normals and height from ground included) RGBD (early fusion) little improvement, perhaps difficult to propogate meaningful gradients through model To add depth information, we train on a model upgraded to take four-channel RGB-D input (early fusion)

Results – SIFT Flow 2688 images with pixel labels 33 semantic categories, 3 geometric categories Learn both label spaces jointly  learning and inference have similar performance and computation as independent models Semantic: bridge, mountain, sun, etc Geometric: horizontal, vertical, sky Farabet: multi-scale convnet, averaging class predictions across superpixels Pinheiro: patch based learning using multiple scales with rcnns

results FCN SDS* Truth Input Relative to prior state-of-the-art SDS: 20% relative improvement for mean IoU 286× faster + NYUD net for multi-modal input and SIFT Flow net for multi-task output *Simultaneous Detection and Segmentation Hariharan et al. ECCV14 Slide credit: Jonathan Long

== segmentation with Caffe leaderboard FCN FCN FCN FCN FCN FCN FCN FCN FCN == segmentation with Caffe FCN FCN FCN FCN FCN FCN Many segmentation methods powered by Caffe, most FCNs Slide credit: Jonathan Long

github.com/BVLC/caffe conclusion fully convolutional networks are fast, end-to-end models for pixelwise problems code in Caffe branch (merged soon) models for PASCAL VOC, NYUDv2, SIFT Flow, PASCAL-Context caffe.berkeleyvision.org fcn.berkeleyvision.org github.com/BVLC/caffe Slide credit: Jonathan Long