Deep Learning and Its Application to Signal and Image Processing and Analysis Class III - Fall 2016 Tammy Riklin Raviv, Electrical and Computer Engineering.

Deep Learning and Its Application to Signal and Image Processing and Analysis
Class III - Fall 2016 Tammy Riklin Raviv, Electrical and Computer Engineering

Today’s plan Convolutional Neural Networks Tensor Flow
Convolution and Pooling Different variants Different Data types & dimensions Different applications How to make more efficient ? Case studies Tensor Flow Main resource: Deep learning book, chapter 9 Goodfellow and Bengio and Courville MIT Press, 2016

Convolution Convolution is an operation on two functions of a real-
valued argument. Feature space Input Kernel

Convolution Convolution is an operation on two functions of a real-
valued argument. In fact Feature space Input Kernel

Multi-dimension convolution
Commutative property: In practice cross correlation is commonly used instead :

2D convolution w/o kernel flipping
An example of 2-D convolution without kernel-ﬂipping. The output is restricted to only positions where the kernel lies entirely within the image, called “valid” convolution in some contexts.

Convolution as a matrix
Discrete convolution can be viewed as multiplication by a matrix. 1. Entries constrained to be equal to other entries. 2. Convolution usually corresponds to a very sparse matrix Toeplitz matrix

Convolution – three key ideas
1. Sparse interactions 2. Parameter sharing 3. Equivariant representations .

Sparse interactions The kernel is smaller than the input
millions of units -> hundreds of units (e.g. pixels -> edges) reduction of memory capacity statistical/computational efficiency . CNN Traditional NN

A simple example Forward operation But efficient in terms of storage
280 320 Original image Vertical edge map

Receptive field The receptive ﬁeld of the units in the deeper layers of a convolutional network is larger than the receptive ﬁeld of the units in the shallow layers. This means that even though direct connections in a convolutional net are very sparse, units in the deeper layers can be indirectly connected to all or most of the input image.

Parameter sharing (tied weights)
CNN Parameter sharing: using the same parameter for more than one function in a model Parameter sharing: Black arrows indicate the connections that use a particular parameter in two diﬀerent models. (Top)The black arrows indicate uses of the central element of a 3-element kernel in a convolutional model. Due to parameter sharing, this single parameter is used at all input locations. (Bottom)The single black arrow indicates the use of the central element of the weight matrix in a fully connected model. This model has no parameter sharing so the parameter is used only once Traditional NN

Pooling A pooling function replaces the output of the net at a certain location with a summary statistic of the nearby outputs (e.g. a rectangle neighborhood). Max pooling Average Weighted average (e.g. based on the distance from the central voxel) L2 Norm we need not know the location of the eyes with pixel-perfect accuracy, we just need to know that there is an eye on the left side of the face and an eye on the right side of the face.

Pooling - advantages Invariance to small translations of the input (strong prior). Computational efficiency: fewer pooling units than detector units Handling inputs of varying size we need not know the location of the eyes with pixel-perfect accuracy, we just need to know that there is an eye on the left side of the face and an eye on the right side of the face. Max pooling -> shift by one pixel

Max pooling – an example (Invariance to rotation)

Max pooling – an example (reduction of size)
Reduction by a factor of 2

CNN Architecture A typical layer of a convolutional network consists of three stages: ﬁrst stage: performs several convolutions in parallel to produce a set of linear activations. second stage: each linear activation is run through a nonlinear activation function, called the detector stage. third stage: pooling function is used to modify the output of the layer further

CNN example LeCun, Bengio, Hinton Nature, 2015

Convolution and Pooling as an Inﬁnitely Strong Prior
Prior probability distribution - a probability distribution over the parameters of a model that encodes our beliefs about what models are reasonable, before we have seen any data. Priors can be considered weak or strong. Weak prior – high entropy – e.g. high variance Gaussian dist. Strong prior – low entropy – e.g. low variance Gaussian dist.

CNN = fully connected NN with strong priors Identical weights – shifted in space Zero weights

Key insights: 1. Convolution and pooling are only useful when the assumptions made by the prior are reasonably accurate. In not – may cause underfitting 2. We should only compare convolutional models to other convolutional models in benchmarks of statistical learning performance. Fully connected NN is permutation invariant.

Variants of the Basic Convolution Function
Strided convolution Zero padding Unshared convolution Tiled CNN

Strided Convolution

Zero padding

Locally connected, CNN and Fully connected
Standard CNN Fully connected

Tiled Convolution Locally connected (unshared convolution) Tiled CNN
Standard CNN

Google Inception Module

Data Types Single Channel Multi Channel 1D Audio waveform
Skeleton animation data 2D Fourier transform of Audio data Color image data 3D 4D Volumetric data – e.g. CT scans Heart scans Color video data Multi-modal MRI

Efficient convolution algorithms
Parallel computing Point-wise multiplication using Fourier transform. Performing d 1-D convolutions instead of one d-D convolutions when the kernel is separable. Approximate convolution

Random or Unsupervised Features
There are a few basic strategies for obtaining convolution kernels without supervised training: Random initialization Design them by hand, for example by setting each kernel to detect edges at a certain orientation or scale. Learn the kernels with an unsupervised criterion, For example, Coates et al. (2011) apply k-means clustering to small image patches, then use each learned centroid as a convolution kernel. 4. Unsupervised pre-training, One can then extract the features for the entire training set just once, essentially constructing a new training set for the last layer. Learning the last layer is then typically a convex optimization problem, assuming the last layer is something like logistic regression or an SVM.

Case studies (deeper, faster, stronger)
LeNet - LeCun in 1990’s. AlexNet- Alex Krizhevsky, Ilya Sutskever and Geoff Hinton, Won ImageNet LSVRC challenge 2012 (like LeNet but bigger and deeper) ZF Net – M. Zeiler and R. Fergus ILSVRC 2013 winner (improving the AlexNet by hyperparameter tweaking) GoogLeNet- Szegedy et al. from Google, ILSVRC 2014 winner (Inception Module) VGGNet – K. Simonyan and A. Zisserman runner-up in ILSVRC 2014 (depth matter) ResNet - Kaiming He et al, ILSVRC 2015 winner , special skip connections and a heavy use of batch normalization. ImageNet Large Scale Visual Recognition Challenge (ILSVRC )

Deep Learning and Its Application to Signal and Image Processing and Analysis Class III - Fall 2016 Tammy Riklin Raviv, Electrical and Computer Engineering.

Similar presentations

Presentation on theme: "Deep Learning and Its Application to Signal and Image Processing and Analysis Class III - Fall 2016 Tammy Riklin Raviv, Electrical and Computer Engineering."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Deep Learning and Its Application to Signal and Image Processing and Analysis Class III - Fall 2016 Tammy Riklin Raviv, Electrical and Computer Engineering.

Similar presentations

Presentation on theme: "Deep Learning and Its Application to Signal and Image Processing and Analysis Class III - Fall 2016 Tammy Riklin Raviv, Electrical and Computer Engineering."— Presentation transcript:

Similar presentations

About project

Feedback