Lecture 3b: CNN: Advanced Layers

Slides:

Advertisements

Similar presentations

Face Recognition: A Convolutional Neural Network Approach

Advertisements

Lecture 6: Classification & Localization

Deep Learning and Neural Nets Spring 2015

Lecture 5: CNN: Regularization

ImageNet Classification with Deep Convolutional Neural Networks

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Lecture 4: CNN: Optimization Algorithms

Spatial Pyramid Pooling in Deep Convolutional

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Qual Presentation Daniel Khashabi 1. Outline  My own line of research  Papers:  Fast Dropout training, ICML, 2013  Distributional Semantics Beyond.

BOOSTING David Kauchak CS451 – Fall Admin Final project.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.

Deeper is Better Latha Pemula.

ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets.

Fully Convolutional Networks for Semantic Segmentation

Deep Convolutional Nets

Feedforward semantic segmentation with zoom-out features

Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.

Lecture 4a: Imagenet: Classification with Localization

AdaBoost Algorithm and its Application on Object Detection Fayin Li.

Introduction to Convolutional Neural Networks

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.

Lecture 2b: Convolutional NN: Optimization Algorithms

Lecture 3a Analysis of training of NN

Convolutional Neural Networks

Cancer Metastases Classification in Histological Whole Slide Images

Recent developments in object detection

Deep Residual Learning for Image Recognition

Siamese Neural Networks

Convolutional Neural Network

Summary of “Efficient Deep Learning for Stereo Matching”

Data Mining, Neural Network and Genetic Programming

Deep Reinforcement Learning

[Ran Manor and Amir B.Geva] Yehu Sapir Outlines Review

Data Mining, Neural Network and Genetic Programming

Computer Science and Engineering, Seoul National University

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

The Problem: Classification

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

Many slides and slide ideas thanks to Marc'Aurelio Ranzato and Michael Nielson.

Lecture 24: Convolutional neural networks

Inception and Residual Architecture in Deep Convolutional Networks

ECE 6504 Deep Learning for Perception

FaceNet A Unified Embedding for Face Recognition and Clustering

Network In Network Authors: Min Lin, Qiang Chen, Shuicheng Yan

ECE 599/692 – Deep Learning Lecture 6 – CNN: The Variants

Bird-species Recognition Using Convolutional Neural Network

Introduction to Neural Networks

Image Classification.

Object Classification through Deconvolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

Smart Robots, Drones, IoT

Neural Networks Geoff Hulten.

Lecture: Deep Convolutional Neural Networks

Use 3D Convolutional Neural Network to Inspect Solder Ball Defects

Visualizing and Understanding Convolutional Networks

RCNN, Fast-RCNN, Faster-RCNN

Going Deeper with Convolutions

Inception-v4, Inception-ResNet and the Impact of

Face Recognition: A Convolutional Neural Network Approach

Course Recap and What’s Next?

CIS 519 Recitation 11/15/18.

Department of Computer Science Ben-Gurion University of the Negev

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Example of training and deployment of deep convolutional neural networks. Example of training and deployment of deep convolutional neural networks. During.

Presentation transcript:

Lecture 3b: CNN: Advanced Layers boris.ginsburg@gmail.com

Agenda Advanced Layers Dropout (Hinton et al ) Stochastic pooling (Zeiler, Fergus) Maxout (IGoodfellow) Network–in –Network (Min Lin et al ) GoogLeNet (Szegedy et al) Siamese networks

Dropout Dropout very powerful training technique, usually used for fully connected layers (Hinton et al. ) Training: Set to 0 the output of each hidden neuron with probability 0.5 (‘drop”). The neurons which are “dropped out” in this way do not contribute to the forward pass and do not participate in back-propagation. So every time an input is presented, the neural network samples a different architecture, but all these architectures share weights . Note that smaller weight initialization ( 1 2 ) should be used Testing Use all the neurons but multiply their outputs by 0.5. See http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf )

Training with dropout The ideal training regime for dropout is when the training procedure resembles training an ensemble with bagging under parameter sharing constraints: Each Dropout update can be seen as update to a different model on a different subset of the training set. Training with dropout is very different from ordinary SGD: SGD moves slowly and steadily in the most promising direction. SGD usually works best with a small learning rate that results in a smoothly decreasing objective function Dropout rapidly explores many different directions and rejects the ones that worsen performance. Dropout works best with a large learning rate, resulting in a constantly fluctuating objective function. http://arxiv.org/pdf/1302.4389.pdf

Zeiler & Fergus http://arxiv.org/pdf/1302.4389.pdf Stochastic Pooling Similar to dropout technique, used instead of max- pooling. Training: Compute probability for each element in pooling region through normalization of activation inside pooling region: 𝑝 𝑖 = 𝑎 𝑖 𝑘 ∈𝑅 𝑎 𝑘 Pool activation based on probabilities from step 1. Testing: weighted pooling: 𝑠= 𝑘∈𝑅 𝑝 𝑘 𝑎 𝑘 Zeiler & Fergus http://arxiv.org/pdf/1302.4389.pdf

Stochastic Pooling

Maxout Maxout is new type of non-linear activation function which takes maximum across k affine features (“pool across channels”). Example: Classical MLP: ℎ 𝑖 =𝑅𝑒𝑙𝑢( 𝑗=1..1 𝑧 𝑖𝑗 ) = Relu( 𝑗=1..𝑀 ( 𝑤 𝑖𝑗 ∗ 𝑣 𝑗 + 𝑏 𝑖𝑗 )) MAXOUT: ℎ 𝑖 = max 𝑗=1..𝑀 𝑧 𝑖𝑗 = max 𝑗=1..𝑀 ( 𝑤 𝑖𝑗 ∗ 𝑣 𝑗 + 𝑏 𝑖𝑗 ) MLP with 2 maxout units Goodfellow: http://www-etud.iro.umontreal.ca/~goodfeli/maxout.html

Maxout + Dropout Maxout works exceptionally well with dropout

Network-in-Network Replace conventional linear filter with micro non-liner filter (small MLP), which is slided over the input (similar to CNN) Min Lin et al, http://arxiv.org/abs/1312.4400

https://github.com/BVLC/caffe/wiki/Model-Zoo Network-in-Network - 2 The overall structure of NIN is a stack of mlpconv layers: stack of mlp layers the global average pooling -.used instead of FC layers: the last mlpconv layer has # of output feature maps = # of classes loss layer https://github.com/BVLC/caffe/wiki/Model-Zoo

Network-in-Network: performance NIN - one of top performers:

GoogLeNet (2014) Winner of ILSVRC -2014. Very deep network with 22 layers: Network–in-network–in-network …., Removed fully connected layers  small # of parameters (5 mln weights) Convolution Pooling Softmax Other Overfeat , 2013

GoogLeNet (2014) Inception layer

GoogLeNet (2014) First “naïve” version of Inception module: Very expensive if on top of a convolutional layer with a large number of filters. Becomes even more expensive when pooling units are used: the number of output filters equals to the number of filters in the previous stage.

GoogLeNet (2014) Adding 1x1 convolutions  reduce dimensions  less compute

GoogLeNet (2014) Auxiliary classifiers Main classifier

GoogLeNet (2014) Details of GoogleNet architecture

GoogLeNet (2014) Training: Distbelief CPU clusyter Asynchronous SGD with momentum 0.9 Fixed lr ( decreased by 4% each 8 epochs) Polyak averaging for final model Data augmentation Testing: Softmax averaged over 7 models Multi-scale detection

Siamese Networks Face verification problem: The idea: the number of categories is very large and not known during training the number of training samples for a single category is very small The idea: to learn a function that maps input patterns into a target space such that the distance will be small for pairs of faces from the same person, and large for pairs from different persons The mapping from raw to the target space is a convolutional network The system is trained on pairs of patterns taken from a training set.

Siamese Network Training set composed of equal number of “true” and false samples Each sample consists of two pair of images and label (“true”, “false”). http://caffe.berkeleyvision.org/gathered/examples/siamese.html

Siamese Network Minimize contrastive Loss between outputs of images from the same class, and maximize for different classes.