Feedforward semantic segmentation with zoom-out features

Slides:



Advertisements
Similar presentations
Face Recognition: A Convolutional Neural Network Approach
Advertisements

Lecture 6: Classification & Localization
ImageNet Classification with Deep Convolutional Neural Networks
Large-Scale Object Recognition with Weak Supervision
R-CNN By Zhang Liliang.
Spatial Pyramid Pooling in Deep Convolutional
From R-CNN to Fast R-CNN
Generic object detection with deformable part-based models
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
INTRODUCTION Heesoo Myeong and Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, Seoul, Korea Tensor-based High-order.
Fully Convolutional Networks for Semantic Segmentation
Unsupervised Visual Representation Learning by Context Prediction
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Spatial Localization and Detection
Lecture 3b: CNN: Advanced Layers
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Lecture 3a Analysis of training of NN
Recent developments in object detection
Deep Learning for Dual-Energy X-Ray
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
Convolutional Neural Network
Summary of “Efficient Deep Learning for Stereo Matching”
Object Detection based on Segment Masks
Compact Bilinear Pooling
Object detection with deformable part-based models
Data Mining, Neural Network and Genetic Programming
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Understanding and Predicting Image Memorability at a Large Scale
Perceptual Loss Deep Feature Interpolation for Image Content Changes
Lecture 24: Convolutional neural networks
Combining CNN with RNN for scene labeling (segmentation)
Spring Courses CSCI 5922 – Probabilistic Models (Mozer) CSCI Mind Reading Machines (Sidney D’Mello) CSCI 7000 – Human Centered Machine Learning.
Dhruv Batra Georgia Tech
ECE 6504 Deep Learning for Perception
Huazhong University of Science and Technology
Training Techniques for Deep Neural Networks
Efficient Deep Model for Monocular Road Segmentation
CS6890 Deep Learning Weizhen Cai
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules
Object detection.
Fully Convolutional Networks for Semantic Segmentation
Human-level control through deep reinforcement learning
Computer Vision James Hays
Introduction to Neural Networks
Image Classification.
Vessel Extraction in X-Ray Angiograms Using Deep Learning
CS 4501: Introduction to Computer Vision Training Neural Networks II
Deep Learning Hierarchical Representations for Image Steganalysis
Object Classification through Deconvolutional Neural Networks
Very Deep Convolutional Networks for Large-Scale Image Recognition
Tina Jiang. , Vivek Natarajan. , Xinlei Chen
Smart Robots, Drones, IoT
Basics of Deep Learning No Math Required
Visualizing and Understanding Convolutional Networks
Example of a simple deep network architecture.
Object Tracking: Comparison of
Analysis of Trained CNN (Receptive Field & Weights of Network)
RCNN, Fast-RCNN, Faster-RCNN
Heterogeneous convolutional neural networks for visual recognition
Face Recognition: A Convolutional Neural Network Approach
Department of Computer Science Ben-Gurion University of the Negev
Deep Object Co-Segmentation
Object Detection Implementations
Learning Deconvolution Network for Semantic Segmentation
Example of a simple deep network architecture.
Motivation The subjects/objects are correlated to each other under semantic relationships.
Directional Occlusion with Neural Network
Presentation transcript:

Feedforward semantic segmentation with zoom-out features Mostajabi, Yadollahpour and Shakhnarovich Toyota Technological Institute at Chicago

Photo credit: Mostajabi et al. Main Ideas Casting semantic segmentation as classifying a set of superpixels. Extracting CNN features from different levels of spatial context around the superpixel at hand. Using MLP as the classifier Photo credit: Mostajabi et al.

Zoom-out feature extraction Photo credit: Mostajabi et al.

Zoom-out feature extraction Subscene Level Features Bounding box of superpixels within radius three from the superpixel at hand Warp bounding box to 256 x 256 pixels Activations of the last fully connected layer Scene Level Features Warp image to 256 x 256 pixels

Training Extracting the features from the mirror images and take element- wise max over the resulting two features vectors. 12416-dimensional representation for each superpixel. Training 2 classifiers Linear classifier (Softmax) MLP: Hidden layer (1024 neurons) + ReLU + Hidden layer (1024 neurons) with dropout

Loss Function Imbalanced dataset Loss function: Wheighted loss function Loss function: Let 𝑓 𝑐 be frequency of class c in the training data and 𝑐 𝑓 𝑐 =1.

Effect of Zoom-out Levels Image Ground Truth G1:3 G1:5 G1:5+S1 G1:5+S1+S2 Photo and Table credit: Mostajabi et al.

Table credit: Mostajabi et al. Quantitative Results Softmax Results on VOC 2012 Table credit: Mostajabi et al.

Table credit: Mostajabi et al. Quantitative Results MLP Results Table credit: Mostajabi et al.

Photo credit: Mostajabi et al. Qualitative Results Photo credit: Mostajabi et al.

Learning Deconvolution Network for Semantic Segmentation Noh, Hong and Han POSTECH, Korea

Motivations Image Ground Truth FCN Prediction Photo credit: Noh et al.

Motivations Photo credit: Noh et al.

Deconvolution Network Architecture Photo credit: Noh et al.

Unpooling Photo credit: Noh et al.

Deconvolution Photo credit: Noh et al.

Unpooling and Deconvolution Effects Photo credit: Noh et al.

Pipeline Generating 2K object proposals using Edge-Box and selecting top 50 based on their objectness scores. Aggregating the segmentation maps which are generated for each proposals using pixel-wise maximum or average. Constructing the class conditional probability map using Softmax Apply fully-conncected CRF to the probability map. Ensemble with FCN Computing mean of probability map generated with DeconvNet and FCN applying CRF. Photo credit: Noh et al.

Training Deep Network Adding a batch normalization layer to the output of every convolutional and deconvolutional layer. Two-stage Training Train on easy examples first and then fine-tune with more challenging ones. Constructing easy examples: Crop object instances using ground-truth annotations Limiting the variations in object location and size reduces the search space for semantic segmentation substantially

Effect of Number of Proposals Photo credit: Noh et al.

Quantitative Results Table credit: Noh et al.

Qualitative Results Photo credit: Noh et al.

Qualitative Results Examples that FCN produces better results than DeconvNet. Photo credit: Noh et al.

Qualitative Results Examples that inaccurate predictions from our method and FCN are improved by ensemble. Photo credit: Noh et al.