Principles of Back-Propagation The relation between biological vision and computer vision Principles of Back-Propagation Prof. Bart ter Haar Romeny
Deep Learning Convolutional Neural Networks How does this actually work? Deep Learning Convolutional Neural Networks In Error backpropagation AlexNet (Alex Krizhevsky 2012) ImageNet challenge: 1.4 million images, 1000 classes 75% → 94% Convolution, ReLU, max pooling, convolution, convolution etc. A typical big deep NN has (hundreds of) millions of connections: weights.
A numerical example of backpropagation on a simple network: From Prakash Jay, Senior Data Scientist @FractalAnalytics: https://medium.com/@14prakash/back-propagation-is-very-simple-who-made-it-complicated-97b794c97e5c
Approach Build a small neural network as defined in the architecture right. Initialize the weights and biases randomly. Fix the input and output. Forward pass the inputs. Calculate the cost. Compute the gradients and errors. Backprop and adjust the weights and biases accordingly. We initialize the network randomly:
Forward pass layer 1:
Forward pass layer 1: Matrix operation: Relu operation: Example:
Forward pass layer 2:
Forward pass layer 2: Matrix operation: Sigmoid operation: Example:
Forward pass layer 3:
Forward pass output layer: Matrix operation: Softmax operation: Example: [ 0.1985 0.2855 0.5158 ]
Analysis: The Actual Output should be [1.0, 0.0, 0.0] but we got [0.2698, 0.3223, 0.4078]. To calculate the error let us use cross-entropy Error: Example: Error = -(1 * Log[0.19858]+0+0 * Log[0.28559]+1 * Log[1-0.28559] +0 *Log[0.51583]+1 * Log[1-0.51583]) = 2.67818
We are done with the forward pass. We know the error Analysis: The Actual Output should be [1.0, 0.0, 0.0] but we got [0.19858, 0.28559, 0.51583]. To calculate the error let us use cross-entropy Error: Example: Error = -(1 * Log[0.19858]+0+0 * Log[0.28559]+1 * Log[1-0.28559] +0 *Log[0.51583]+1 * Log[1-0.51583]) = 2.67818 We are done with the forward pass. We know the error of the first iteration (we go do this numerous times). Now let us study the backward pass.
A chain of functions: From Rohan Kapur: https://ayearofai.com/rohan-lenny-1-neural-networks-the-backpropagation-algorithm-explained-abf4609d4f9d
We recall:
For gradient descent: The derivative of this function with respect to some arbitrary weight (for example w1) is calculated by applying the chain rule: For a simple error measure (p = predicted, a = actual):
Important derivatives: Sigmoid: ReLU: SoftMax:
= Two slides ago, we saw that
Going one more layer backwards, we can determine that: With etc. And finally: 1 And iterate until convergence:
Numerical example in great detail by Prakash Jay on Medium.com: https://medium.com/@14prakash/back-propagation-is-very-simple-who-made-it-complicated-97b794c97e5c etc.
Deeper reading: https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative https://eli.thegreenplace.net/2018/backpropagation-through-a-fully-connected-layer/