Lecture 3b: CNN: Advanced Layers

Slides:



Advertisements
Similar presentations
Face Recognition: A Convolutional Neural Network Approach
Advertisements

Lecture 6: Classification & Localization
Deep Learning and Neural Nets Spring 2015
Lecture 5: CNN: Regularization
ImageNet Classification with Deep Convolutional Neural Networks
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Lecture 4: CNN: Optimization Algorithms
Spatial Pyramid Pooling in Deep Convolutional
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Qual Presentation Daniel Khashabi 1. Outline  My own line of research  Papers:  Fast Dropout training, ICML, 2013  Distributional Semantics Beyond.
BOOSTING David Kauchak CS451 – Fall Admin Final project.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
Deeper is Better Latha Pemula.
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets.
Fully Convolutional Networks for Semantic Segmentation
Deep Convolutional Nets
Feedforward semantic segmentation with zoom-out features
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Lecture 4a: Imagenet: Classification with Localization
AdaBoost Algorithm and its Application on Object Detection Fayin Li.
Introduction to Convolutional Neural Networks
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.
Lecture 2b: Convolutional NN: Optimization Algorithms
Lecture 3a Analysis of training of NN
Convolutional Neural Networks
Cancer Metastases Classification in Histological Whole Slide Images
Recent developments in object detection
Deep Residual Learning for Image Recognition
Siamese Neural Networks
Convolutional Neural Network
Summary of “Efficient Deep Learning for Stereo Matching”
Data Mining, Neural Network and Genetic Programming
Deep Reinforcement Learning
[Ran Manor and Amir B.Geva] Yehu Sapir Outlines Review
Data Mining, Neural Network and Genetic Programming
Computer Science and Engineering, Seoul National University
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
The Problem: Classification
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Many slides and slide ideas thanks to Marc'Aurelio Ranzato and Michael Nielson.
Lecture 24: Convolutional neural networks
Inception and Residual Architecture in Deep Convolutional Networks
ECE 6504 Deep Learning for Perception
FaceNet A Unified Embedding for Face Recognition and Clustering
Network In Network Authors: Min Lin, Qiang Chen, Shuicheng Yan
ECE 599/692 – Deep Learning Lecture 6 – CNN: The Variants
Bird-species Recognition Using Convolutional Neural Network
Introduction to Neural Networks
Image Classification.
Object Classification through Deconvolutional Neural Networks
Very Deep Convolutional Networks for Large-Scale Image Recognition
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Smart Robots, Drones, IoT
Neural Networks Geoff Hulten.
Lecture: Deep Convolutional Neural Networks
Papers 15/08.
Use 3D Convolutional Neural Network to Inspect Solder Ball Defects
Visualizing and Understanding Convolutional Networks
RCNN, Fast-RCNN, Faster-RCNN
Going Deeper with Convolutions
Inception-v4, Inception-ResNet and the Impact of
Face Recognition: A Convolutional Neural Network Approach
Course Recap and What’s Next?
CIS 519 Recitation 11/15/18.
Department of Computer Science Ben-Gurion University of the Negev
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Example of training and deployment of deep convolutional neural networks. Example of training and deployment of deep convolutional neural networks. During.
Presentation transcript:

Lecture 3b: CNN: Advanced Layers boris.ginsburg@gmail.com

Agenda Advanced Layers Dropout (Hinton et al ) Stochastic pooling (Zeiler, Fergus) Maxout (IGoodfellow) Network–in –Network (Min Lin et al ) GoogLeNet (Szegedy et al) Siamese networks

Dropout Dropout very powerful training technique, usually used for fully connected layers (Hinton et al. ) Training: Set to 0 the output of each hidden neuron with probability 0.5 (‘drop”). The neurons which are “dropped out” in this way do not contribute to the forward pass and do not participate in back-propagation. So every time an input is presented, the neural network samples a different architecture, but all these architectures share weights . Note that smaller weight initialization ( 1 2 ) should be used Testing Use all the neurons but multiply their outputs by 0.5. See http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf )

Training with dropout The ideal training regime for dropout is when the training procedure resembles training an ensemble with bagging under parameter sharing constraints: Each Dropout update can be seen as update to a different model on a different subset of the training set. Training with dropout is very different from ordinary SGD: SGD moves slowly and steadily in the most promising direction. SGD usually works best with a small learning rate that results in a smoothly decreasing objective function Dropout rapidly explores many different directions and rejects the ones that worsen performance. Dropout works best with a large learning rate, resulting in a constantly fluctuating objective function. http://arxiv.org/pdf/1302.4389.pdf

Zeiler & Fergus http://arxiv.org/pdf/1302.4389.pdf Stochastic Pooling Similar to dropout technique, used instead of max- pooling. Training: Compute probability for each element in pooling region through normalization of activation inside pooling region: 𝑝 𝑖 = 𝑎 𝑖 𝑘 ∈𝑅 𝑎 𝑘 Pool activation based on probabilities from step 1. Testing: weighted pooling: 𝑠= 𝑘∈𝑅 𝑝 𝑘 𝑎 𝑘 Zeiler & Fergus http://arxiv.org/pdf/1302.4389.pdf

Stochastic Pooling

Maxout Maxout is new type of non-linear activation function which takes maximum across k affine features (“pool across channels”). Example: Classical MLP: ℎ 𝑖 =𝑅𝑒𝑙𝑢( 𝑗=1..1 𝑧 𝑖𝑗 ) = Relu( 𝑗=1..𝑀 ( 𝑤 𝑖𝑗 ∗ 𝑣 𝑗 + 𝑏 𝑖𝑗 )) MAXOUT: ℎ 𝑖 = max 𝑗=1..𝑀 𝑧 𝑖𝑗 = max 𝑗=1..𝑀 ( 𝑤 𝑖𝑗 ∗ 𝑣 𝑗 + 𝑏 𝑖𝑗 ) MLP with 2 maxout units Goodfellow: http://www-etud.iro.umontreal.ca/~goodfeli/maxout.html

Maxout + Dropout Maxout works exceptionally well with dropout

Network-in-Network Replace conventional linear filter with micro non-liner filter (small MLP), which is slided over the input (similar to CNN) Min Lin et al, http://arxiv.org/abs/1312.4400

https://github.com/BVLC/caffe/wiki/Model-Zoo Network-in-Network - 2 The overall structure of NIN is a stack of mlpconv layers: stack of mlp layers the global average pooling -.used instead of FC layers: the last mlpconv layer has # of output feature maps = # of classes loss layer https://github.com/BVLC/caffe/wiki/Model-Zoo

Network-in-Network: performance NIN - one of top performers:

GoogLeNet (2014) Winner of ILSVRC -2014. Very deep network with 22 layers: Network–in-network–in-network …., Removed fully connected layers  small # of parameters (5 mln weights) Convolution Pooling Softmax Other Overfeat , 2013

GoogLeNet (2014) Inception layer

GoogLeNet (2014) First “naïve” version of Inception module: Very expensive if on top of a convolutional layer with a large number of filters. Becomes even more expensive when pooling units are used: the number of output filters equals to the number of filters in the previous stage.

GoogLeNet (2014) Adding 1x1 convolutions  reduce dimensions  less compute

GoogLeNet (2014) Auxiliary classifiers Main classifier

GoogLeNet (2014) Details of GoogleNet architecture

GoogLeNet (2014) Training: Distbelief CPU clusyter Asynchronous SGD with momentum 0.9 Fixed lr ( decreased by 4% each 8 epochs) Polyak averaging for final model Data augmentation Testing: Softmax averaged over 7 models Multi-scale detection

Siamese Networks Face verification problem: The idea: the number of categories is very large and not known during training the number of training samples for a single category is very small The idea: to learn a function that maps input patterns into a target space such that the distance will be small for pairs of faces from the same person, and large for pairs from different persons The mapping from raw to the target space is a convolutional network The system is trained on pairs of patterns taken from a training set.

Siamese Network Training set composed of equal number of “true” and false samples Each sample consists of two pair of images and label (“true”, “false”). http://caffe.berkeleyvision.org/gathered/examples/siamese.html

Siamese Network Minimize contrastive Loss between outputs of images from the same class, and maximize for different classes.