Download presentation
Presentation is loading. Please wait.
1
Introduction to Neural Networks
2
Overview The motivation for NNs Neuron – The basic unit
Fully-connected Neural Networks Feedforward (inference) The linear algebra behind Convolutional Neural Networks & Deep Learning
3
The Brain does Complex Tasks
3×4 12 Tiger Danger! Fine
4
Inside the Brain
5
Real vs. Artificial Neuron
Inputs Inputs Weights 𝐼 1 Output 𝑤 1 𝑓( ) 𝐼 2 𝑤 2 𝑤 3 𝐼 3 Outputs
6
Σ 𝜎 Neuron – General Model 𝑎 𝑎=𝜎 𝑗=1 𝑁 𝐼 𝑗 ⋅ 𝑤 𝑗 (activation) 𝐼 1 𝑤 1
𝐼 2 𝑤 2 𝑎 Σ 𝜎 𝑤 𝑁 (activation) 𝐼 𝑁 𝑎=𝜎 𝑗=1 𝑁 𝐼 𝑗 ⋅ 𝑤 𝑗
7
Neuron Activation Functions
The most popular activation functions: Sigmoid was the first to be used, tanh came later Rectified Linear Unit (ReLU) is the most widely used today (also the simplest function) Allows for faster net training (will be discussed later)
8
Overview The motivation for NNs Neuron – The basic unit
Fully-connected Neural Networks Feedforward (inference) The linear algebra behind Convolutional Neural Networks & Deep Learning
9
Feedforward Neural Network
Hidden Layers Input Layer Output Layer
10
Feedforward Neural Network
11
Feedforward – How it Works?
Input Layer Output Layer 𝐼 1 𝑂 1 𝐼 2 𝐼 3 𝑂 2 Flow of Computation
12
Feedforward – What Can it Do?
Classification region vs. # of layers in a NN: More neurons More complexity in classification
13
𝑎 3 1 𝑤 1 3 2 Indexing Conventions (activation) (weight) Layer 1
Input Output Layer index 𝑎 3 1 𝑤 1 3 2 (activation) (weight) Neuron index Weight index within The neuron
14
Feedforward – General Equations
Layer 1 Layer 2 Input Output 𝐼 1 𝐼 2 𝐼 3 Input Weights matrix of layer j 𝑊 [𝑚,𝑛] 𝑗 = & 𝑤 1 1 𝑗 ⋯ neuron 1 weights ⋯ 𝑤 1 𝑛 𝑗 ⋮ 𝑤 𝑘 1 𝑗 ⋯ neuron 𝑘 weights ⋯ 𝑤 𝑘 𝑛 𝑗 & ⋮ 𝑤 𝑚 1 𝑗 ⋯ neuron 𝑚 weights ⋯ 𝑤 𝑚 𝑛 𝑗 𝐼= 𝐼 1 𝐼 2 ⋮ 𝐼 𝑁 Neurons number of activations of layer 𝑗‑1 Inputs per neuron =
15
The Linear Algebra Behind Feedforward: Example
Layer 1 Layer 2 Input Output 𝐼 1 𝐼 2 𝐼 3 1st hidden layer weights: 𝑊 [5,3] 1 = 𝑤 𝑤 𝑤 ⋮ 𝑤 𝑤 𝑤 5 3 1 1st neuron activation: 𝑎 1 1 =𝜎 𝑗=1 3 𝑤 1 𝑗 1 ⋅ 𝐼 𝑗 𝐼= 𝐼 1 𝐼 2 𝐼 3 𝒂 𝟏 =𝜎 𝑊 1 ⋅𝐼 =𝜎 𝑤 𝑤 𝑤 ⋮ 𝑤 𝑤 𝑤 ⋅ 𝐼 1 𝐼 2 𝐼 3 = 𝜎( Σ j 𝑤 1 𝑗 1 ⋅ 𝐼 𝑗 ) ⋮ 𝜎( Σ j 𝑤 5 𝑗 1 ⋅ 𝐼 𝑗 )
16
The Linear Algebra Behind Feedforward
Layer 1 𝑎 [5,1] 1 = 𝜎(Σ 𝑤 1 𝑗 1 ⋅ 𝐼 𝑗 ) ⋮ 𝜎(Σ 𝑤 5 𝑗 1 ⋅ 𝐼 𝑗 ) Input Layer 2 Output 2st hidden layer weights: 𝑊 [4,5] 2 = 𝑤 𝑤 … 𝑤 ⋮ 𝑤 𝑤 … 𝑤 4 5 2 =𝜎 𝑤 𝑤 … 𝑤 ⋮ 𝑤 𝑤 … 𝑤 ⋅ 𝑎 𝑎 2 1 ⋮ 𝑎 5 1 𝒂 𝟐 =𝜎 𝑊 2 ⋅ 𝑎 1 = 𝜎( Σ j 𝑤 1 𝑗 2 ⋅ 𝑎 𝑗 1 ) ⋮ 𝜎( Σ j 𝑤 4 𝑗 2 ⋅ 𝑎 𝑗 1 )
17
The Linear Algebra Behind Feedforward
Layer 1 Input Layer 2 Output 𝒂 𝒌 =𝜎 𝑊 𝑘 ⋅ 𝑎 𝑘−1 =𝜎 𝑤 1 1 𝑘 … 𝑤 1 𝑛 𝑘 ⋮ 𝑤 𝑚 1 𝑘 … 𝑤 𝑚 𝑛 𝑘 ⋅ 𝑎 1 𝑘−1 𝑎 2 𝑘−1 ⋮ 𝑎 𝑛 𝑘−1 = 𝜎( Σ j 𝑤 1 1 𝑘 ⋅ 𝑎 1 𝑘−1 ) ⋮ 𝜎( Σ j 𝑤 𝑚 𝑗 𝑘 ⋅ 𝑎 𝑗 𝑘−1 ) 𝒂 𝒌 =𝜎 𝑊 𝑘 ⋅ 𝑎 𝑘−1 =𝜎 𝑊 𝑘 ⋅𝜎 𝑊 𝑘−1 ⋅ 𝑎 𝑘−2 =…
18
Number of Computations
Given NN with 𝒌 layers (hidden + output): Total DMV (Dense Matrix⋅Vector) multiplications = 𝑘 Time complexity = 𝑂( 𝑖=1 𝑘 𝑛 𝑖 ⋅ 𝑛 𝑖−1 ) Memory complexity = 𝑂( 𝑖=1 𝑘 𝑛 𝑖 ⋅ 𝑛 𝑖−1 ) 𝑛 𝑖−1 𝑤 1 1 𝑖 … 𝑤 1 𝑛 𝑖−1 𝑖 ⋮ 𝑤 𝑛 𝑖 1 𝑘 … 𝑤 𝑛 𝑖 𝑛 𝑖−1 𝑘 ⋅ 𝑎 1 𝑖−1 ⋮ 𝑎 𝑛 𝑖−1 𝑖−1 𝑛 𝑖 𝑛 𝑖−1
19
Small Leftover – The Bias
Last activation is constant 1 Layer 1 Input Layer 2 Output +1 +1 +1 𝒂 𝟏 =𝜎 𝑊 1 ⋅𝐼 =𝜎 𝑤 𝑤 𝑤 𝑤 ⋮ 𝑤 𝑤 𝑤 𝑤 ⋅ 𝐼 1 𝐼 2 𝐼 3 1 𝒂 𝟐 =𝜎 𝑊 2 ⋅ 𝑎 1 =𝜎 𝑤 𝑤 … 𝑤 𝑤 ⋮ 𝑤 𝑤 … 𝑤 𝑤 ⋅ 𝑎 𝑎 2 1 ⋮ 𝑎
20
Classification – Softmax Layer
Classification one output = ‘1’, the rest are ‘0’. Neuron’s weighted sum is not limited to 1. The solution: softmax 𝑧 𝑘 = 𝑗=1…𝑁 𝑤 𝑘 𝑗 1 ⋅ 𝐼 𝑗 𝑎 𝑘 = 𝑒 𝑧 𝑘 𝑗∈𝑜𝑢𝑡𝑝𝑢𝑡 𝑒 𝑧 𝑗 ∀𝑘: 𝑎 𝑘 ∈(0,1]
21
Overview The motivation for NNs Neuron – The basic unit
Fully-connected Neural Networks Feedforward (inference) The linear algebra behind Convolutional Neural Networks & Deep Learning
22
Convolutional Neural Networks
A main building-block of deep learning Called Convnets / CNNs in short Motivation: When the input data is an image 1000 1000 Spatial correlation is local Better to put resources elsewhere
23
Convolutional Layer Reduce connectivity to local regions
Example: 1000×1000 image 100 different filters Filter size: 10×10 10k parameters Every filter is different
24
Convnets Sliding window computation: 𝑾 𝟏,𝟑 𝑾 𝟏,𝟐 𝑾 𝟏,𝟏 𝑾 𝟐,𝟑 𝑾 𝟐,𝟐
𝑾 𝟐,𝟏 𝑾 𝟑,𝟑 𝑾 𝟑,𝟐 𝑾 𝟑,𝟏
25
Convnets Sliding window computation:
26
Convnets Sliding window computation:
27
Convnets Sliding window computation:
28
Convnets Sliding window computation:
29
Convnets Sliding window computation:
30
Convnets Sliding window computation:
31
Convnets Sliding window computation:
32
Convnets Sliding window computation:
33
* = Conv Layer – The Math 𝑤 – Filter kernel of size 𝐾×𝐾 𝑥 - Input
𝑎 𝑖,𝑗 =𝑤∗ 𝑥 𝑖,𝑗 = 𝑝=1 𝐾 𝑞=1 𝐾 𝑤 𝑝, 𝑞 ⋅ 𝑥 𝑖+𝑝,𝑗+𝑞 * = Filter
34
Conv Layer - Parameters
Zero padding – add surrounding zeros so that output size = input size Stride – number of pixels to move the filter * = Stride = 2
35
Conv Layer – Multiple Inputs
Output = sum of convolutions Multiple outputs called Feature Maps Σ Output Feature maps for (n = 0; n < N; n++) for (m = 0; m < M; m ++) for(y = 0; y<Y; y++) for(x = 0; x<X; x++) for (p = 0; p< K; p++) for (q = 0; q< K; q++) AL (n; x, y) += AL-1(m, x+p, y+q) * w (m , n; p, q); Input Feature maps Single Input Filter Kernel
36
Pooling Layer Multiple inputs single output Reduces amount of data
Several types: Max (most common) Average …
37
Putting it All Together - AlexNet
The first CNN to win ImageNet challenge ImageNet: 1.2M 256×256 images,1000 classes Krizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012 Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." In European Conference on Computer Vision, 2014.
38
AlexNet – Inside the Feature Maps
Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." In European Conference on Computer Vision, pp Springer International Publishing, 2014.
39
Putting it All Together - AlexNet
Trained for 1 week on 2 GPUs Each Geforce GTX 580 – 3GB memory Krizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012
40
Architecture for Classification
category prediction Total nr. params: 60M 4M Total nr. flops: 832M 4M LINEAR 16M 37M FULLY CONNECTED 16M 37M FULLY CONNECTED MAX POOLING 442K CONV 74M 1.3M 884K CONV 224M 149M CONV MAX POOLING LOCAL CONTRAST NORM 307K CONV 223M MAX POOLING LOCAL CONTRAST NORM CONV 35K 105M 110 Ranzato Krizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012
41
Convnet - Bigger is Better?
GoogLeNet (2014) – ImageNet winner Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
42
Thank you
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.