Deep Learning Overview Sources:https://deeplearningworkshopnips2010.files.wordpress.com/2010/09/nips10- workshop-tutorial-final.pdf

Slides:



Advertisements
Similar presentations
A brief review of non-neural-network approaches to deep learning
Advertisements

Deep Learning and Neural Nets Spring 2015
ImageNet Classification with Deep Convolutional Neural Networks
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Deep Learning.
Lecture 14 – Neural Networks
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Deep Belief Networks for Spam Filtering
Overview of Back Propagation Algorithm
Radial-Basis Function Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
What is the Best Multi-Stage Architecture for Object Recognition Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo.
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Nantes Machine Learning Meet-up 2 February 2015 Stefan Knerr CogniTalk
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
A shallow introduction to Deep Learning
NEURAL NETWORKS FOR DATA MINING
ARTIFICIAL NEURAL NETWORKS. Overview EdGeneral concepts Areej:Learning and Training Wesley:Limitations and optimization of ANNs Cora:Applications and.
Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.
Akram Bitar and Larry Manevitz Department of Computer Science
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Fast Learning in Networks of Locally-Tuned Processing Units John Moody and Christian J. Darken Yale Computer Science Neural Computation 1, (1989)
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
EEE502 Pattern Recognition
1 Neural networks 2. 2 Introduction: Neural networks The nervous system contains 10^12 interconnected neurons.
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
ConvNets for Image Classification
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Machine Learning Supervised Learning Classification and Regression
Big data classification using neural network
Convolutional Sequence to Sequence Learning
Convolutional Neural Network
Summary of “Efficient Deep Learning for Stereo Matching”
Deep Learning Amin Sobhani.
Compact Bilinear Pooling
Data Mining, Neural Network and Genetic Programming
Data Mining, Neural Network and Genetic Programming
ECE 5424: Introduction to Machine Learning
Computer Science and Engineering, Seoul National University
Neural networks (3) Regularization Autoencoder
Supervised Training of Deep Networks
Neural Networks 2 CS446 Machine Learning.
Deep Learning Qing LU, Siyuan CAO.
Convolutional Networks
Shunyuan Zhang Nikhil Malik
Structure learning with deep autoencoders
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
CS 4501: Introduction to Computer Vision Training Neural Networks II
Deep Learning Hierarchical Representations for Image Steganalysis
Deep learning Introduction Classes of Deep Learning Networks
Neuro-Computing Lecture 4 Radial Basis Function Network
CSC 578 Neural Networks and Deep Learning
LECTURE 35: Introduction to EEG Processing
Neural Networks Geoff Hulten.
On Convolutional Neural Network
LECTURE 33: Alternative OPTIMIZERS
Convolutional Neural Networks
Neural networks (3) Regularization Autoencoder
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CSC 578 Neural Networks and Deep Learning
Department of Computer Science Ben-Gurion University of the Negev
Introduction to Neural Networks
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
Image recognition.
CSC 578 Neural Networks and Deep Learning
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Deep Learning Overview Sources: workshop-tutorial-final.pdf Jaya Thomas Computer Science Department SUNY Korea

Deep Learning = Learning Representations/Features

Deep Learning = Learning Hierarchical Representations

Trainable Feature Hierarchy

Three Types of Training Protocols

Deep Learning: Why Training is hard  Depending on the situation one or other situation tend to prevail  If first hypothesis (under fitting): use better optimization  Active area of research  If second hypothesis (over fitting): use better regularization  Unsupervised learning  Stochastic > training  Solution: Initialize hidden layers using unsupervised learning  Force network to represent latent structure of input distribution  Encourage hidden layers to encode that structure

Unsupervised Pre-training  We will use greedy, layer wise procedure  Train one layer at a time, from first to last, with unsupervised criterion  Fix the parameters of previous hidden layers  Previous layers viewed as feature extraction  Procedure:  First layer: find hidden unit features that are more common in training input than in random inputs  Second layer: find combinations of hidden unit features that are more common than random hidden unit features  Third layer: find combination of….  Etc.  Pre-training initializes the parameters in a region such that the near local optima overfit less the data

Fine tuning  Once all the layers are pre-trained  Add output layer  Train the whole network using supervised learning (Back propagation)  Supervised learning is performed as in a regular feed forward network  Forward propagation, back propagation and update  We call this last phase fine-tuning  All parameters are “tuned” for the supervised task at hand  Representation is adjusted to be more discriminative

Deep Learning

What kind of unsupervised learning Algorithm  Stacked restricted Boltzmann machine  Stacked Autoencodes  Stacked denoise autoencoders  Stacked semi-supervised embeddings  Stacked kernel PCA  Stacked independent subspace analysis

Advantage  Architecture of a CNN is designed to take advantage of the 2D structure of an input image  Achieved with local connections and tied weights followed by some form of pooling which results in translation invariant features.  CNN are easier to train and have many fewer parameters than fully connected networks with the same number of hidden units.

Architecture  A CNN consists of a number of convolutional and subsampling layers optionally followed by fully connected layers  Input to a convolutional layer is a m x m x r image where m x m is the height and width of the image and r is the number of channels, e.g. an RGB image has r=3  Convolutional layer will have k filters (or kernels)  size n x n x q  n is smaller than the dimension of the image and,  q can either be the same as the number of channels r or smaller and may vary for each kernel Fig 1: First layer of a convolutional neural network with pooling. Units of the same color have tied weights and units of different color represent different filter maps

 A convolutional neural network consists of several layers. These layers can be of three types:  Convolutional  Max Pooling  Fully-Connected Source:

Convolutional  Convolutional: Convolutional layers consist of a rectangular grid of neurons.  It requires that the previous layer also be a rectangular grid of neurons.  Each neuron takes inputs from a rectangular section of the previous layer;  the weights for this rectangular section are the same for each neuron in the convolutional layer.  Thus, the convolutional layer is just an image convolution of the previous layer, where the weights specify the convolution filter.  In addition, there may be several grids in each convolutional layer; each grid takes inputs from all the grids in the previous layer, using potentially different filters.

Feature Extraction using CNN Locally Connected Networks  Solution to this problem is to restrict the connections between the hidden units and the input units, allowing each hidden unit to connect to only a small subset of the input units.  Each hidden unit will connect to only a small contiguous region of pixels in the input.  This idea of having locally connected networks also draws inspiration from how the early visual system is wired up in biology. Specifically, neurons in the visual cortex have localized receptive fields (i.e., they respond only to stimuli in a certain location

Pooling Using features obtained after Convolution for Classification  In theory, one could use all the extracted features with a classifier such as a softmax classifier, but this can be computationally challenging.  Example : Consider images of size 96x96 pixels, and suppose we have learned 400 features over 8x8 inputs. Each convolution results in an output of size (96−8+1) ∗ (96−8+1) =7921(96−8+1) ∗ (96−8+1)=7921, and since we have 400 features, this results in a vector of 892 ∗ 400=3,168, ∗ 400=3,168,400 features per example! Learning a classifier with inputs having 3+ million features can be unwieldy, and can also be prone to overfitting

 Max-Pooling: After each convolutional layer, there may be a pooling layer.  The pooling layer takes small rectangular blocks from the convolutional layer and subsamples it to produce a single output from that block.  There are several ways to do this pooling, such as taking the average or the maximum, or a learned linear combination of the neurons in the block.  Our pooling layers will always be max-pooling layers; that is, they take the maximum of the block they are pooling.

 Fully-Connected: Finally, after several convolutional and max pooling layers, the high-level reasoning in the neural network is done via fully connected layers.  A fully connected layer takes all neurons in the previous layer (be it fully connected, pooling, or convolutional) and connects it to every single neuron it has.  Fully connected layers are not spatially located anymore (you can visualize them as one-dimensional), so there can be no convolutional layers after a fully connected layer.

Forward Propagation 1.Compute activations for layers with known inputs: 2. Compute inputs for the next layer from these activations: 3. Repeat steps 1 and 2 until you reach the output layer, and know values of y L.

Forward Propagation in Convolutional Neural Network  Suppose we have some N×N square neuron layer which is followed by our convolutional layer. If we use an m×m filter ω, our convolutional layer output will be of size (N−m+1)×(N−m+1). In order to compute the pre-nonlinearity input to some unit x ℓ ij in our layer, we need to sum up the contributions (weighted by the filter components) from the previous layer cells:  Then, the convolutional layer applies its nonlinearity:

Back Propagation

Back Propagation in Convolutional Network

Back Propagation

 upsample operation has to propagate the error through the pooling layer by calculating the error w.r.t to each unit incoming to the pooling layer  Ex> if mean pooling then upsample simply uniformly distributes the error for a single pooling unit among the units which feed into it in the previous layer. In max pooling the unit which was chosen as the max receives all the error since very small changes in input would perturb the result only through that unit.

Gradient Descent

Thank You