Presentation is loading. Please wait.

Presentation is loading. Please wait.

ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, NIPS 2012 Eunsoo Oh ( 오은수 )

Similar presentations


Presentation on theme: "ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, NIPS 2012 Eunsoo Oh ( 오은수 )"— Presentation transcript:

1 ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, NIPS 2012 Eunsoo Oh ( 오은수 )

2 2 ILSVRC ● ImageNet Large Scale Visual Recognition Challenge ● An image classification challenge with 1,000 categories (1.2 million images) reference : Processing… Deep Convolutional Neural Network (ILSVRC-2012 Winner)

3 3 Why Deep Learning? ● “Shallow” vs. “deep” architectures reference : Learn a feature hierarchy all the way from pixels to classifier

4 4 Background ● A neuron x1x1 x2x2 x3x3 xdxd … f Input (raw pixel) w1w1 w2w2 w3w3 wdwd Weights Output: f(w·x+b) reference :

5 5 Background ● Multi-Layer Neural Networks ● Nonlinear classifier ● Learning can be done by gradient descent  Back-Propagation algorithm Input Layer Hidden Layer Output Layer

6 6 Background ● Convolutional Neural Networks ● Variation of multi-layer neural networks ● Kernel (Convolution Matrix) reference :

7 7 Background ● Convolutional Filter reference : Input Feature Map......

8 8 Proposed Method ● Deep Convolutional Neural Network ● 5 convolutional and 3 fully connected layers ● 650,000 neurons, 60 million parameters ● Some techniques for boosting up performance ● ReLU nonlinearity ● Training on Multiple GPUs ● Overlapping max pooling ● Data Augmentation ● Dropout

9 9 Rectified Linear Units (ReLU) reference :

10 10 Training on Multiple GPUs ● Spread across two GPUs ● GTX 580 GPU with 3GB memory ● Particularly well-suited to cross-GPU parallelization ● Very efficient implementation of CNN on GPUs

11 11 Pooling ●Spatial Pooling ●Non-overlapping / overlapping regions ●Sum or max reference : Max Sum

12 12 Data Augmentation 256x x224 Horizontal Flip Training Image

13 13 Dropout ● Independently set each hidden unit activity to zero with 0.5 probability ● Used in the two globally-connected hidden layers at the net's output reference :

14 14 Overall Architecture ● Trained with stochastic gradient descent on two NVIDIA GPUs for about a week (5~6 days) ● 650,000 neurons, 60 million parameters, 630 million connections ● The last layer contains 1,000 neurons which produces a distribution over the 1,000 class labels.

15 15 Results ● ILSVRC-2010 test set ILSVRC-2010 winner Previous best published result Proposed Method

16 16 Results ● ILSVRC-2012 results Proposed method Top-5 error rate : % Runner-up Top-5 error rate : % reference :

17 17 Qualitative Evaluations

18 18 Qualitative Evaluations

19 19 ILSVRC-2013 Classification reference :

20 20 ILSVRC-2014 Classification 22 Layers 19 Layers

21 21 Conclusion ● Large, deep convolutional neural networks for large scale image classification was proposed ● 5 convolutional layers, 3 fully-connected layers ● 650,000 neurons, 60 million parameters ● Several techniques for boosting up performance ● Several techniques for reducing overfitting ● The proposed method won the ILSVRC-2012 ● Achieved a winning top-5 error rate of 15.3%, compared to 26.2% achieved by the second-best entry

22 22 Q & A

23 23 Quiz ● 1. The proposed method used hand-designed features, thus there is no need to learn features and feature hierarchies. (True / False) ● 2. Which technique was not used in this paper? ① Dropout ② Rectified Linear Units nonlinearity ③ Training on multiple GPUs ④ Local contrast normalization

24 24 Appendix Feature Visualization ● 96 learned low-level(1 st layer) filters

25 25 Appendix Visualizing CNN reference : M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional neural networks. arXiv preprint arXiv: , 2013.

26 26 Appendix Local Response Normalization ● : the activity of a neuron computed by applyuing kernel i at position (x, y) ● The response-normalized activity is given by ● N : the total # of kernels in the layer ● n : hyper-parameter, n=5 ● k : hyper-parameter, k=2 ● α : hyper-parameter, α=10^(-4) ● This aids generalization even though ReLU don’t require it. ● This reduces top-5 error rate by 1.2%

27 27 Appendix Another Data Augmentation ● Alter the intensities of the RGB channels in training images ● Perform PCA on the set of RGB pixel values ● To each training image, add multiples of the found principal components ● To each RGB image pixel add the following quantity ●, : i-th eigenvector and eigenvalue ● : random variable drawn from a Gaussian with mean 0 and standard deviation 0.1 ● This reduces top-1 error rate by over 1%

28 28 Appendix Details of Learning ● Use stochastic gradient descent with a batch size of 128 examples, momentum of 0.9, and weigh decay of ● The update rule for weight w was ● i : the iteration index ● : the learning rate, initialized at 0.01 and reduced three times prior to termination ● : the average over the i-th batch D i of the derivative of the objective with respect to w ● Train for 90 cycles through the training set of 1.2 million images

29 29


Download ppt "ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, NIPS 2012 Eunsoo Oh ( 오은수 )"

Similar presentations


Ads by Google