ImageNet Classification with Deep Convolutional Neural Networks

Slides:

Advertisements

Similar presentations

Lecture 5: CNN: Regularization

Advertisements

ImageNet Classification with Deep Convolutional Neural Networks

Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.

PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang1;2, Manohar Paluri1, Marc’Aurelio Ranzato1, Trevor Darrell2, Lubomir Bourdev1 1: Facebook.

K-means Based Unsupervised Feature Learning for Image Recognition Ling Zheng.

AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/

Spatial Pyramid Pooling in Deep Convolutional

What is the Best Multi-Stage Architecture for Object Recognition Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo.

Nantes Machine Learning Meet-up 2 February 2015 Stefan Knerr CogniTalk

CSC2535: Advanced Machine Learning Lecture 6a Convolutional neural networks for hand-written digit recognition Geoffrey Hinton.

Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,

Deep Convolutional Nets

ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.

Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.

Convolutional Neural Network

Philipp Gysel ECE Department University of California, Davis

Introduction to Convolutional Neural Networks

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

Understanding Convolutional Neural Networks for Object Recognition

Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University

Convolutional Neural Networks

Deep Learning and Its Application to Signal and Image Processing and Analysis Class III - Fall 2016 Tammy Riklin Raviv, Electrical and Computer Engineering.

Recent developments in object detection

Big data classification using neural network

Deep Residual Learning for Image Recognition

Convolutional Neural Network

The Relationship between Deep Learning and Brain Function

Data Mining, Neural Network and Genetic Programming

From Vision to Grasping: Adapting Visual Networks

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

The Problem: Classification

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

Article Review Todd Hricik.

Matt Gormley Lecture 16 October 24, 2016

CSCI 5922 Neural Networks and Deep Learning: Convolutional Nets For Image And Speech Processing Mike Mozer Department of Computer Science and Institute.

Lecture 24: Convolutional neural networks

Combining CNN with RNN for scene labeling (segmentation)

Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan.

ECE 6504 Deep Learning for Perception

Training Techniques for Deep Neural Networks

Deep Belief Networks Psychology 209 February 22, 2013.

CS6890 Deep Learning Weizhen Cai

Machine Learning: The Connectionist

State-of-the-art face recognition systems

Handwritten Digits Recognition

Bird-species Recognition Using Convolutional Neural Network

Computer Vision James Hays

Introduction to Neural Networks

Image Classification.

Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.

CS 4501: Introduction to Computer Vision Training Neural Networks II

Convolutional Neural Networks

Deep learning Introduction Classes of Deep Learning Networks

Introduction of MATRIX CAPSULES WITH EM ROUTING

Very Deep Convolutional Networks for Large-Scale Image Recognition

Smart Robots, Drones, IoT

A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE

Neural Networks Geoff Hulten.

Visualizing and Understanding Convolutional Networks

Machine Learning – Neural Networks David Fenyő

Heterogeneous convolutional neural networks for visual recognition

CSCI 5922 Neural Networks and Deep Learning: Convolutional Nets For Image And Speech Processing Mike Mozer Department of Computer Science and Institute.

Department of Computer Science Ben-Gurion University of the Negev

Introduction to Neural Networks

Natalie Lang Tomer Malach

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Deep learning: Recurrent Neural Networks CV192

Presentation transcript:

Alex Krizhevsky University of Toronto ImageNet Classification with Deep Convolutional Neural Networks Ilya Sutskever University of Toronto Geoffrey E. Hinton University of Toronto Presenter : Aydin Ayanzadeh Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018

Outline ●Introduction ●Dataset ●Architecture of the Network ●Reducing over-fitting ●Result 2

ImageNet ●About 15M Labeled High resolution Images ●Roughly 22K Categories ●Collected from the web and labeled by Amazon Mechanical Turk 3 Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018

ILSVRC ImageNet Large Scale Visual Recognition Challenge Task: 1.2M, 50K Validation, 150K testset,1k categories Goal: Top-5 error 4 NEC-UIUC,Lin Top 5 error= 28% 2010 XRCE-Perronnin Top 5 error= 28% 2011 Supervision-Krizhevsky: Top 5-error: 16% 2012 ZF-net Top5 error: 12% L 2013 GoogLeNet-Szegedy Top 5= 7% 2014 Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018

5 Task in ImageNet

Rectified Linear Units (ReLUs) ●Very faster than rather than the classical activation functions such as Tanh. ●Very computationally efficient ●Converges fast(it converges six time faster than tanh) 6 Fig2.A four-layer convolutional neural network with ReLUs (solid line) reaches a 25% training error rate on CIFAR-10 six times faster than an equivalent network with tanh neurons (dashed line). The learning rates for each network were chosen independently to make training as fast as possible. No regularization of any kind was employed. The magnitude of the effect demonstrated here varies with network architecture, but networks with ReLUs consistently learn several times faster than equivalents with saturating neurons.

AlexNet General Feature ●650K neuron ●60M Parameters ●630M connections ●7 hidden weight layers ●Rectified Linear Units(Relu) ●Dropout trick, ●Randomly extracted patches with the size of (224*224) 7 Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018

Architecture 8 Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018

Architecture 9 Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018 Input image size can not be 224*224 ((224−11+2(0))/4)+1=54.25 !!! ((227−11+2(0))/4)+1=55

10 Architecture Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018 Full (simplified) AlexNet architecture: [227x227x3] INPUT [55x55x96]CONV1 : 96 11x11 filters at stride 4, pad 0 [27x27x96] MAX POOL1: 3x3 filters at stride 2 [27x27x96] NORM1: Normalization layer [27x27x256] CONV2: 256 5x5 filters at stride 1, pad 2 [13x13x256] MAX POOL2: 3x3 filters at stride 2 [13x13x256] NORM2: Normalization layer [13x13x384] CONV3 : 384 3x3 filters at stride 1, pad 1 [13x13x384] CONV4 : 384 3x3 filters at stride 1, pad 1 [13x13x256] CONV5 : 256 3x3 filters at stride 1, pad 1 [6x6x256] MAX POOL3 : 3x3 filters at stride 2 [4096] FC6: 4096 neurons [4096] FC7:4096 neurons with F=1 [1000] FC8:1000 neurons (class scores)

Local Response Normalization ●reduces top-1 and top-5 error rates by 1.4% and 1.2% ●k = 2, n = 5, α = 10e-4, and β = ●It applies before ReLU nonlinearity in certain layers 11

Data Augmentation ●Reduce Over-fitting ○Artificially enlarge dataset ●Type of Data augmentation ○Extract 5 patches with the size of 224*224 (four corner patch and center patch) and horizontal reflection ○Altering the intensity of RGB channel in training image(perform PCA on rgb pixels) ○This approach reduce top-1 error by 1% 12 Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018 =

Dropout 13 1-Srivastava, Nitish, et al. "Dropout: A simple way to prevent neural networks from overfitting." The Journal of Machine Learning Research 15.1 (2014): ●Reducing over-fitting ●zero the output of each hidden neuron with specific probability. ● Double the number of iteration to converge ●Learning more robust features ●Applied in the first two fully connected layers

Stochastic Gradient Descent ●SGD with a batch size of 128 ●Learning rate is setted 0.01 (equal for all layers but, it divided based on validation error), ●Neuron biases in 2,4,5 layers and Fc layers ●NVIDIA GTX 580 (3GB GPUs) ●Weight initialization based on N(0,0.1) 14 Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018

Results 15 ModelTop-1(Val)Top-5(Val)Top-5(test) SIFT+FVs18.2%26.2% 1 CNN40.7%18.2% 5 CNN38.1%16.4% 1 CNN*39.0%16.6% 7 CNNs*36.7%15.4%15.3% Table 2: Comparison of error rates on ILSVRC-2012 validation and test sets. In italics are best results achieved by others. Models with an asterisk were “pre-trained” to classify the entire ImageNet 2011 Fall release. See Section 6 for details. Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018 ●Averaging the predictions of two CNNs that were pre-trained on the entire release with 5CNNs has 15.3%.

Conclusion AlexNet ●Rectified Linear Units(Relu) ●Dropout trick ●Data augmentation ●Trained the model using batch stochastic gradient descent ●Top5-error rate=15.4% 16

Qualitative Evaluations 17 Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018

Visualizing First Layer 18 Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018 Fig5. 96 convolutional kernels of size 11×11×3 learned by the first convolutional layer on the 224×224×3 input images. The top 48 kernels were learned on GPU 1 while the bottom 48 kernels were learned on GPU 2. See Section 6.1 for details. ●Top 48 kernels on GPU 1 : color-agnostic ●bottom 48 kernels on GPU 2: color-specific.

References [1] R.M. Bell and Y. Koren. Lessons from the netflix prize challenge. ACM SIGKDD Explorations Newsletter,9(2):75–79, [2] A. Berg, J. Deng, and L. Fei-Fei. Large scale visual recognition challenge net.org/challenges [3] L. Breiman. Random forests. Machine learning, 45(1):5–32, [4] D. Cire ̧ san, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. Arxiv preprint arXiv: , [5] D.C. Cire ̧ san, U. Meier, J. Masci, L.M. Gambardella, and J. Schmidhuber. High-performance neural networks for visual object classification. Arxiv preprint arXiv: , [6] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, [7] J. Deng, A. Berg, S. Satheesh, H. Su, A. Khosla, and L. Fei-Fei. ILSVRC-2012, URL [8] L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories.Computer Vision and Image Understand-ing, 106(1):59–70, [9] G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology, URL [10] G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R.R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv: ,

[11] K. Jarrett, K. Kavukcuoglu, M. A. Ranzato, and Y. LeCun. What is the best multi-stage architecture for object recognition In International Conference on Computer Vision, pages 2146– IEEE, [12] A. Krizhevsky. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, [13] A. Krizhevsky. Convolutional deep belief networks on cifar-10. Unpublished manuscript, [14] A. Krizhevsky and G.E. Hinton. Using very deep autoencoders for content-based image retrieval. In ESANN, [15] Y. Le Cun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D. Jackel, et al. Hand-written digit recognition with a back-propagation network. In Advances in neural information processing systems, [16] Y. LeCun, F.J. Huang, and L. Bottou. Learning methods for generic object recognition with invariance to pose and lighting. In Computer Vision and Pattern Recognition, CVPR Proceedings of the 2004 IEEE Computer Society Conference on volume 2, pages II–97. IEEE, [17] Y. LeCun, K. Kavukcuoglu, and C. Farabet. Convolutional networks and applications in vision. In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on, pages 253–256.IEEE, [18] H. Lee, R. Grosse, R. Ranganath, and A.Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 609–616. ACM, [19] T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka. Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost. In ECCV - European Conference on Computer Vision, Florence, Italy, October References

21

22