Download presentation
Presentation is loading. Please wait.
1
Principles of Back-Propagation
The relation between biological vision and computer vision Principles of Back-Propagation Prof. Bart ter Haar Romeny
2
Deep Learning Convolutional Neural Networks
How does this actually work? Deep Learning Convolutional Neural Networks In Error backpropagation AlexNet (Alex Krizhevsky 2012) ImageNet challenge: 1.4 million images, 1000 classes 75% → 94% Convolution, ReLU, max pooling, convolution, convolution etc. A typical big deep NN has (hundreds of) millions of connections: weights.
3
A numerical example of backpropagation on a simple network:
From Prakash Jay, Senior Data
4
Approach Build a small neural network as defined in the architecture right. Initialize the weights and biases randomly. Fix the input and output. Forward pass the inputs. Calculate the cost. Compute the gradients and errors. Backprop and adjust the weights and biases accordingly. We initialize the network randomly:
5
Forward pass layer 1:
6
Forward pass layer 1: Matrix operation: Relu operation: Example:
7
Forward pass layer 2:
8
Forward pass layer 2: Matrix operation: Sigmoid operation: Example:
9
Forward pass layer 3:
11
Forward pass output layer:
Matrix operation: Softmax operation: Example: [ ]
12
Analysis: The Actual Output should be [1.0, 0.0, 0.0] but we got [0.2698, , ]. To calculate the error let us use cross-entropy Error: Example: Error = -(1 * Log[ ]+0+0 * Log[ ]+1 * Log[ ] +0 *Log[ ]+1 * Log[ ]) =
14
We are done with the forward pass. We know the error
Analysis: The Actual Output should be [1.0, 0.0, 0.0] but we got [ , , ]. To calculate the error let us use cross-entropy Error: Example: Error = -(1 * Log[ ]+0+0 * Log[ ]+1 * Log[ ] +0 *Log[ ]+1 * Log[ ]) = We are done with the forward pass. We know the error of the first iteration (we go do this numerous times). Now let us study the backward pass.
15
A chain of functions: From Rohan Kapur:
16
We recall:
17
For gradient descent: The derivative of this function with respect to some arbitrary weight (for example w1) is calculated by applying the chain rule: For a simple error measure (p = predicted, a = actual):
18
Important derivatives:
Sigmoid: ReLU: SoftMax:
19
= Two slides ago, we saw that
20
Going one more layer backwards, we can determine that:
With etc. And finally: 1 And iterate until convergence:
21
Numerical example in great detail by Prakash Jay on Medium.com:
etc.
22
Deeper reading:
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.