Download presentation

Presentation is loading. Please wait.

Published byAllen Webster Modified about 1 year ago

1
**ImageNet Classification with Deep Convolutional Neural Networks**

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, NIPS 2012 Eunsoo Oh (오은수) 1

2
**ILSVRC ImageNet Large Scale Visual Recognition Challenge**

An image classification challenge with 1,000 categories (1.2 million images) Processing… Deep Convolutional Neural Network (ILSVRC-2012 Winner) reference :

3
**Why Deep Learning? “Shallow” vs. “deep” architectures**

Learn a feature hierarchy all the way from pixels to classifier reference :

4
**Background A neuron x1 w1 x2 w2 x3 w3 f … wd xd Input Weights**

(raw pixel) Weights x1 w1 x2 w2 Output: f(w·x+b) f x3 w3 … wd xd reference :

5
**Background Multi-Layer Neural Networks Nonlinear classifier**

Learning can be done by gradient descent Back-Propagation algorithm Input Layer Hidden Layer Output Layer

6
**Background Convolutional Neural Networks Kernel (Convolution Matrix)**

Variation of multi-layer neural networks Kernel (Convolution Matrix) reference :

7
**Background Input Feature Map Convolutional Filter ...**

reference :

8
**Proposed Method Deep Convolutional Neural Network**

5 convolutional and 3 fully connected layers 650,000 neurons, 60 million parameters Some techniques for boosting up performance ReLU nonlinearity Training on Multiple GPUs Overlapping max pooling Data Augmentation Dropout

9
**Rectified Linear Units (ReLU)**

reference :

10
**Training on Multiple GPUs**

Spread across two GPUs GTX 580 GPU with 3GB memory Particularly well-suited to cross-GPU parallelization Very efficient implementation of CNN on GPUs

11
**Pooling Spatial Pooling Non-overlapping / overlapping regions**

Sum or max Max Sum reference :

12
**Data Augmentation Training Images Enlarge the dataset! Training Image**

Horizontal Flip Training Images 224x224 224x224 224x224 224x224 256x256 Enlarge the dataset! 224x224 224x224 Training Image

13
Dropout Independently set each hidden unit activity to zero with 0.5 probability Used in the two globally-connected hidden layers at the net's output reference :

14
Overall Architecture Trained with stochastic gradient descent on two NVIDIA GPUs for about a week (5~6 days) 650,000 neurons, 60 million parameters, 630 million connections The last layer contains 1,000 neurons which produces a distribution over the 1,000 class labels.

15
**Results ILSVRC-2010 test set ILSVRC-2010 winner Previous best**

published result Proposed Method

16
**Results ILSVRC-2012 results Runner-up Top-5 error rate : 26.172%**

reference : Results ILSVRC-2012 results Runner-up Top-5 error rate : % Proposed method Top-5 error rate : %

17
**Qualitative Evaluations**

18
**Qualitative Evaluations**

19
**ILSVRC-2013 Classification**

reference :

20
**ILSVRC-2014 Classification**

22 Layers 19 Layers

21
Conclusion Large, deep convolutional neural networks for large scale image classification was proposed 5 convolutional layers, 3 fully-connected layers 650,000 neurons, 60 million parameters Several techniques for boosting up performance Several techniques for reducing overfitting The proposed method won the ILSVRC-2012 Achieved a winning top-5 error rate of 15.3%, compared to 26.2% achieved by the second-best entry

22
Q & A ? ? ?

23
Quiz 1. The proposed method used hand-designed features, thus there is no need to learn features and feature hierarchies. (True / False) 2. Which technique was not used in this paper? ① Dropout ② Rectified Linear Units nonlinearity ③ Training on multiple GPUs ④ Local contrast normalization

24
**Appendix Feature Visualization**

96 learned low-level(1st layer) filters

25
**Appendix Visualizing CNN**

reference : M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional neural networks. arXiv preprint arXiv: , 2013.

26
**Appendix Local Response Normalization**

: the activity of a neuron computed by applyuing kernel i at position (x, y) The response-normalized activity is given by N : the total # of kernels in the layer n : hyper-parameter, n=5 k : hyper-parameter, k=2 α : hyper-parameter, α=10^(-4) This aids generalization even though ReLU don’t require it. This reduces top-5 error rate by 1.2%

27
**Appendix Another Data Augmentation**

Alter the intensities of the RGB channels in training images Perform PCA on the set of RGB pixel values To each training image, add multiples of the found principal components To each RGB image pixel add the following quantity , : i-th eigenvector and eigenvalue : random variable drawn from a Gaussian with mean 0 and standard deviation 0.1 This reduces top-1 error rate by over 1%

28
**Appendix Details of Learning**

Use stochastic gradient descent with a batch size of 128 examples, momentum of 0.9, and weigh decay of The update rule for weight w was i : the iteration index : the learning rate, initialized at 0.01 and reduced three times prior to termination : the average over the i-th batch Di of the derivative of the objective with respect to w Train for 90 cycles through the training set of 1.2 million images

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on forward and future contracts Ppt on tata trucks india Ppt on group development and change How to open ppt on mac Ppt on 2nd world war deaths Ppt on amplitude shift keying experiment Ppt on standing order act scores Ppt on new product development steps Ppt on animation Ppt on eye osmolarity