Deep Residual Learning for Image Recognition

Slides:



Advertisements
Similar presentations
ImageNet Classification with Deep Convolutional Neural Networks
Advertisements

Large-Scale Object Recognition with Weak Supervision
Spatial Pyramid Pooling in Deep Convolutional
Overview of Back Propagation Algorithm
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Convolutional Neural Network
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
ConvNets for Image Classification
Deep Residual Learning for Image Recognition
Lecture 3b: CNN: Advanced Layers
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.
Lecture 3a Analysis of training of NN
Understanding Convolutional Neural Networks for Object Recognition
Convolutional Neural Networks
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Big data classification using neural network
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift CS838.
Deep Residual Networks
RNNs: An example applied to the prediction task
Faster R-CNN – Concepts
Deep Feedforward Networks
Summary of “Efficient Deep Learning for Stereo Matching”
Training convolutional networks
Data Mining, Neural Network and Genetic Programming
Computer Science and Engineering, Seoul National University
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
YOLO9000:Better, Faster, Stronger
dawn.cs.stanford.edu/benchmark
Inception and Residual Architecture in Deep Convolutional Networks
Mini Presentations - part 2
ECE 6504 Deep Learning for Perception
Neural Networks 2 CS446 Machine Learning.
Convolutional Networks
Understanding the Difficulty of Training Deep Feedforward Neural Networks Qiyue Wang Oct 27, 2017.
Deep Belief Networks Psychology 209 February 22, 2013.
Machine Learning: The Connectionist
Deep Residual Learning for Image Recognition
Layer-wise Performance Bottleneck Analysis of Deep Neural Networks
Introduction to Neural Networks
Image Classification.
SBNet: Sparse Blocks Network for Fast Inference
Incremental Training of Deep Convolutional Neural Networks
ALL YOU NEED IS A GOOD INIT
Tips for Training Deep Network
Very Deep Convolutional Networks for Large-Scale Image Recognition
Smart Robots, Drones, IoT
Oral presentation for ACM International Conference on Multimedia, 2014
8-3 RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks Z. Dong, Z. Zhou, Z.F. Li, C. Liu, Y.N. Jiang,
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
Neural network training
Neural Networks Geoff Hulten.
Lecture: Deep Convolutional Neural Networks
Outline Background Motivation Proposed Model Experimental Results
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
Machine Learning – Neural Networks David Fenyő
Coding neural networks: A gentle Introduction to keras
Deep Learning Some slides are from Prof. Andrew Ng of Stanford.
Mihir Patel and Nikhil Sardana
Inception-v4, Inception-ResNet and the Impact of
Deep Residual Learning for Automatic Seizure Detection
Course Recap and What’s Next?
Natalie Lang Tomer Malach
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
Batch Normalization.
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Learning and Memorization
Learning Deconvolution Network for Semantic Segmentation
End-to-End Facial Alignment and Recognition
Presentation transcript:

Deep Residual Learning for Image Recognition Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research

Challenge Winner(1) 80 Object Categories More than 300,000 images MS COCO -Detection

Challenge Winner(2) MS COCO -Segmentation

Challenge Winner(3) ImageNet - Classification, Localization & Detection

Advantages of Depth

Very Deep Networks Two Similar approaches. ResNet. Highway Nets (R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. arXiv:1505.00387, 2015.)

Degradation Problem

Possible Causes ? Vanishing/Exploding Gradients. Overfitting

Vanishing/Exploding Gradients

Gradients in the first layer become very small Vanishing Gradients |𝜎′𝑤 𝑘 |< 1 4 |𝑤 𝑘 |<1 Gradients in the first layer become very small http://neuralnetworksanddeeplearning.com/

Gradients in the first layer become very large Exploding Gradients |𝜎′𝑤 𝑘 |≈25 |𝑤 𝑘 |=100 Gradients in the first layer become very large http://neuralnetworksanddeeplearning.com/

Batch Normalization(1) Addresses the problem of vanishing/exploding gradients. Increases learning speed and solves many other problems. Each activation in every iteration each layer is normalized to have zero mean and variance 1 over a minibatch. Integrated into back –propagation algorithm.

Batch Normalization(2) S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015

BN solves vanishing Gradients(1) 𝑧 𝑖 =𝜎(𝐵𝑁 𝑊 𝑧 𝑖−1 + 𝑏 𝑖 ) N𝑜𝑤 𝑢 𝑖𝑠 𝑧: S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015

Residual Learning 2 weight layers fit F(X) = H(x) – X instead of F(X). Networks might have “difficulties” fitting the unity mapping.

Similar Architecture – Highway Net 𝐻(𝑥, 𝑊 𝐻 ) C(𝑥, 𝑊 𝐶 ) T(𝑥, 𝑊 𝑇 ) R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. arXiv:1505.00387, 2015. y

Highway Net vs. ResNet The gates C and T are data dependent and have parameters. When a gated shortcut is “closed” the layers in highway networks represent non-residual functions. High-2 way networks have not demonstrated accuracy gains with depth of over 100 layers. R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. arXiv:1505.00387, 2015.

ResNet Design Rules 3 X3 Filters. Same number of filters for same feature map size. Feature Map Size halved Number of Filters doubled Almost no hidden max-pooling. Not Hidden FC-Layers. No Dropout.

ResNet Buliding Blocks Bottleneck Building Block Building Blocks are stacked to create full networks.

ResNet and Plain Nets

ResNets Vs. PlainNets(1) Deeper ResNets has lower train and test errors!

ResNets Vs. PlainNets(2) Deeper ResNets has lower train and test errors!

ResNets Vs. PlainNets(3) For 34 layers ResNet has a lower error. For 18 layers error is roughly the same but convergence is faster.

Layer Responses Magnitude Responses of 3X3 layers before non-linearity and after BN. Residual functions are closer to zero than non-residual.

Dimensionality Change Shortcut connections assume dimension equality between input X and output F(x). If dimensions do not match : Zero Pad X Projection mapping (Extra Parameters)

Exploring Different Shortcuts Types Three Options: A - Zero padding for increasing dimensions. B – Projection shortcuts for increasing dimensions; others are identity. C – All shortcuts are projections.

Overall Results

Uses For Image Detection and localization Based on Faster RCNN architecture. ResNet-101 architecture is used. Obtained best results on MS-COCO, imageNet localization and imageNet Detection datasets.

Conclusions Degradation problem is addressed for very deep NN. No additional parameter complexity. Faster convergence. Good for different types of tasks. Can be easily trained with existing solvers (Caffe, MatConvNet, etc…). Sepp Hochreiter, presumably described the phenomena in 1991.

Bottleneck Architectures Bottleneck building block with shortcut identity mapping. Similar time complexity to non- bottleneck block. 50-layer, 101-layer and 152-layer ResNets were constructed.

ResNet Implementation Details

Architectures For CIFAR-10 First layer is 3X3 convolutional. 6n layer stacked . 2n layers for each feature map. Feature map size halved => number of filters doubled. Only identity shortcuts are used. Each pair of layers has a shortcut connection.

Localization and Detection Results

Results on detection MS-COCO

Overall Results CIFAR-10 ResNet-110 train five times and results are in the format of “best (mean ± std)”. n= {3,5,7,9,18, 200}. For 1202-layer net result degradation due to overfit.

Overall Results ImageNet(2)