Download presentation

Presentation is loading. Please wait.

Published byAllen Webster Modified about 1 year ago

1
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, NIPS 2012 Eunsoo Oh ( 오은수 )

2
2 ILSVRC ● ImageNet Large Scale Visual Recognition Challenge ● An image classification challenge with 1,000 categories (1.2 million images) reference : Processing… Deep Convolutional Neural Network (ILSVRC-2012 Winner)

3
3 Why Deep Learning? ● “Shallow” vs. “deep” architectures reference : Learn a feature hierarchy all the way from pixels to classifier

4
4 Background ● A neuron x1x1 x2x2 x3x3 xdxd … f Input (raw pixel) w1w1 w2w2 w3w3 wdwd Weights Output: f(w·x+b) reference :

5
5 Background ● Multi-Layer Neural Networks ● Nonlinear classifier ● Learning can be done by gradient descent Back-Propagation algorithm Input Layer Hidden Layer Output Layer

6
6 Background ● Convolutional Neural Networks ● Variation of multi-layer neural networks ● Kernel (Convolution Matrix) reference :

7
7 Background ● Convolutional Filter reference : Input Feature Map......

8
8 Proposed Method ● Deep Convolutional Neural Network ● 5 convolutional and 3 fully connected layers ● 650,000 neurons, 60 million parameters ● Some techniques for boosting up performance ● ReLU nonlinearity ● Training on Multiple GPUs ● Overlapping max pooling ● Data Augmentation ● Dropout

9
9 Rectified Linear Units (ReLU) reference :

10
10 Training on Multiple GPUs ● Spread across two GPUs ● GTX 580 GPU with 3GB memory ● Particularly well-suited to cross-GPU parallelization ● Very efficient implementation of CNN on GPUs

11
11 Pooling ●Spatial Pooling ●Non-overlapping / overlapping regions ●Sum or max reference : Max Sum

12
12 Data Augmentation 256x x224 Horizontal Flip Training Image

13
13 Dropout ● Independently set each hidden unit activity to zero with 0.5 probability ● Used in the two globally-connected hidden layers at the net's output reference :

14
14 Overall Architecture ● Trained with stochastic gradient descent on two NVIDIA GPUs for about a week (5~6 days) ● 650,000 neurons, 60 million parameters, 630 million connections ● The last layer contains 1,000 neurons which produces a distribution over the 1,000 class labels.

15
15 Results ● ILSVRC-2010 test set ILSVRC-2010 winner Previous best published result Proposed Method

16
16 Results ● ILSVRC-2012 results Proposed method Top-5 error rate : % Runner-up Top-5 error rate : % reference :

17
17 Qualitative Evaluations

18
18 Qualitative Evaluations

19
19 ILSVRC-2013 Classification reference :

20
20 ILSVRC-2014 Classification 22 Layers 19 Layers

21
21 Conclusion ● Large, deep convolutional neural networks for large scale image classification was proposed ● 5 convolutional layers, 3 fully-connected layers ● 650,000 neurons, 60 million parameters ● Several techniques for boosting up performance ● Several techniques for reducing overfitting ● The proposed method won the ILSVRC-2012 ● Achieved a winning top-5 error rate of 15.3%, compared to 26.2% achieved by the second-best entry

22
22 Q & A

23
23 Quiz ● 1. The proposed method used hand-designed features, thus there is no need to learn features and feature hierarchies. (True / False) ● 2. Which technique was not used in this paper? ① Dropout ② Rectified Linear Units nonlinearity ③ Training on multiple GPUs ④ Local contrast normalization

24
24 Appendix Feature Visualization ● 96 learned low-level(1 st layer) filters

25
25 Appendix Visualizing CNN reference : M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional neural networks. arXiv preprint arXiv: , 2013.

26
26 Appendix Local Response Normalization ● : the activity of a neuron computed by applyuing kernel i at position (x, y) ● The response-normalized activity is given by ● N : the total # of kernels in the layer ● n : hyper-parameter, n=5 ● k : hyper-parameter, k=2 ● α : hyper-parameter, α=10^(-4) ● This aids generalization even though ReLU don’t require it. ● This reduces top-5 error rate by 1.2%

27
27 Appendix Another Data Augmentation ● Alter the intensities of the RGB channels in training images ● Perform PCA on the set of RGB pixel values ● To each training image, add multiples of the found principal components ● To each RGB image pixel add the following quantity ●, : i-th eigenvector and eigenvalue ● : random variable drawn from a Gaussian with mean 0 and standard deviation 0.1 ● This reduces top-1 error rate by over 1%

28
28 Appendix Details of Learning ● Use stochastic gradient descent with a batch size of 128 examples, momentum of 0.9, and weigh decay of ● The update rule for weight w was ● i : the iteration index ● : the learning rate, initialized at 0.01 and reduced three times prior to termination ● : the average over the i-th batch D i of the derivative of the objective with respect to w ● Train for 90 cycles through the training set of 1.2 million images

29
29

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google