Principles of Back-Propagation

Principles of Back-Propagation
The relation between biological vision and computer vision Principles of Back-Propagation Prof. Bart ter Haar Romeny

Deep Learning Convolutional Neural Networks
How does this actually work? Deep Learning Convolutional Neural Networks In Error backpropagation AlexNet (Alex Krizhevsky 2012) ImageNet challenge: 1.4 million images, 1000 classes 75% → 94% Convolution, ReLU, max pooling, convolution, convolution etc. A typical big deep NN has (hundreds of) millions of connections: weights.

A numerical example of backpropagation on a simple network:
From Prakash Jay, Senior Data

Approach Build a small neural network as defined in the architecture right. Initialize the weights and biases randomly. Fix the input and output. Forward pass the inputs. Calculate the cost. Compute the gradients and errors. Backprop and adjust the weights and biases accordingly. We initialize the network randomly:

Forward pass layer 1:

Forward pass layer 1: Matrix operation: Relu operation: Example:

Forward pass layer 2: Matrix operation: Sigmoid operation: Example:

Forward pass output layer:
Matrix operation: Softmax operation: Example: [ ]

Analysis: The Actual Output should be [1.0, 0.0, 0.0] but we got [0.2698, , ]. To calculate the error let us use cross-entropy Error: Example: Error = -(1 * Log[ ]+0+0 * Log[ ]+1 * Log[ ] +0 *Log[ ]+1 * Log[ ]) =

We are done with the forward pass. We know the error
Analysis: The Actual Output should be [1.0, 0.0, 0.0] but we got [ , , ]. To calculate the error let us use cross-entropy Error: Example: Error = -(1 * Log[ ]+0+0 * Log[ ]+1 * Log[ ] +0 *Log[ ]+1 * Log[ ]) = We are done with the forward pass. We know the error of the first iteration (we go do this numerous times). Now let us study the backward pass.

A chain of functions: From Rohan Kapur:

We recall:

For gradient descent: The derivative of this function with respect to some arbitrary weight (for example w1) is calculated by applying the chain rule: For a simple error measure (p = predicted, a = actual):

Important derivatives:
Sigmoid: ReLU: SoftMax:

= Two slides ago, we saw that

Going one more layer backwards, we can determine that:
With etc. And finally: 1 And iterate until convergence:

Numerical example in great detail by Prakash Jay on Medium.com:
etc.

Deeper reading:

Principles of Back-Propagation

Similar presentations

Presentation on theme: "Principles of Back-Propagation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Principles of Back-Propagation

Similar presentations

Presentation on theme: "Principles of Back-Propagation"— Presentation transcript:

Similar presentations

About project

Feedback