8 Proposed Method Deep Convolutional Neural Network 5 convolutional and 3 fully connected layers650,000 neurons, 60 million parametersSome techniques for boosting up performanceReLU nonlinearityTraining on Multiple GPUsOverlapping max poolingData AugmentationDropout
10 Training on Multiple GPUs Spread across two GPUsGTX 580 GPU with 3GB memoryParticularly well-suited to cross-GPU parallelizationVery efficient implementation of CNN on GPUs
11 Pooling Spatial Pooling Non-overlapping / overlapping regions Sum or maxMaxSumreference :
12 Data Augmentation Training Images Enlarge the dataset! Training Image Horizontal FlipTrainingImages224x224224x224224x224224x224256x256Enlarge the dataset!224x224224x224Training Image
13 DropoutIndependently set each hidden unit activity to zero with 0.5 probabilityUsed in the two globally-connected hidden layers at the net's outputreference :
14 Overall ArchitectureTrained with stochastic gradient descent on two NVIDIA GPUs for about a week (5~6 days)650,000 neurons, 60 million parameters, 630 million connectionsThe last layer contains 1,000 neurons which produces a distribution over the 1,000 class labels.
15 Results ILSVRC-2010 test set ILSVRC-2010 winner Previous best published resultProposed Method
21 ConclusionLarge, deep convolutional neural networks for large scale image classification was proposed5 convolutional layers, 3 fully-connected layers650,000 neurons, 60 million parametersSeveral techniques for boosting up performanceSeveral techniques for reducing overfittingThe proposed method won the ILSVRC-2012Achieved a winning top-5 error rate of 15.3%,compared to 26.2% achieved by the second-best entry
23 Quiz1. The proposed method used hand-designed features, thus there is no need to learn features and feature hierarchies. (True / False)2. Which technique was not used in this paper?① Dropout② Rectified Linear Units nonlinearity③ Training on multiple GPUs④ Local contrast normalization
25 Appendix Visualizing CNN reference : M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional neural networks.arXiv preprint arXiv: , 2013.
26 Appendix Local Response Normalization : the activity of a neuron computed by applyuing kernel i at position (x, y)The response-normalized activity is given byN : the total # of kernels in the layern : hyper-parameter, n=5k : hyper-parameter, k=2α : hyper-parameter, α=10^(-4)This aids generalization even though ReLU don’t require it.This reduces top-5 error rate by 1.2%
27 Appendix Another Data Augmentation Alter the intensities of the RGB channels in training imagesPerform PCA on the set of RGB pixel valuesTo each training image, add multiples of the found principal componentsTo each RGB image pixeladd the following quantity, : i-th eigenvector and eigenvalue: random variable drawn from a Gaussian with mean 0 and standard deviation 0.1This reduces top-1 error rate by over 1%
28 Appendix Details of Learning Use stochastic gradient descent with a batch size of 128 examples, momentum of 0.9, and weigh decay ofThe update rule for weight w wasi : the iteration index: the learning rate, initialized at 0.01 and reduced three times prior to termination: the average over the i-th batch Di of the derivative of the objective with respect to wTrain for 90 cycles through the training set of 1.2 million images