12 1 Variations on Backpropagation. 12 2 Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.

12 1 Variations on Backpropagation

12 2 Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate Gradient –Newton’s Method (Levenberg-Marquardt)

12 3 Performance Surface Example Network ArchitectureNominal Function Parameter Values

12 4 Squared Error vs. w 1 1,1 and w 2 1,1 w 1 1,1 w 2 1,1 w 1 1,1 w 2 1,1

12 5 Squared Error vs. w 1 1,1 and b 1 1 w 1 1,1 b11b11 b11b11

12 6 Squared Error vs. b 1 1 and b 1 2 b11b11 b21b21 b21b21 b11b11

12 7 Convergence Example w 1 1,1 w 2 1,1

12 8 Learning Rate Too Large w 1 1,1 w 2 1,1

12 9 Momentum Filter Example

12 1010 Momentum Backpropagation Steepest Descent Backpropagation (SDBP) Momentum Backpropagation (MOBP) w 1 1,1 w 2 1,1

12 1 Variable Learning Rate (VLBP) If the squared error (over the entire training set) increases by more than some set percentage  after a weight update, then the weight update is discarded, the learning rate is multiplied by some factor (1  >    >  0), and the momentum coefficient  is set to zero. If the squared error decreases after a weight update, then the weight update is accepted and the learning rate is multiplied by some factor  >1. If  has been previously set to zero, it is reset to its original value. If the squared error increases by less than , then the weight update is accepted, but the learning rate and the momentum coefficient are unchanged.

12 1212 Example w 1 1,1 w 2 1,1 Squared error Learning rate

12 1313 Conjugate Gradient 1.The first search direction is steepest descent. 2.Take a step and choose the learning rate to minimize the function along the search direction. 3.Select the next search direction according to: where or

12 1414 Interval Location

12 1515 Interval Reduction

12 1616 Golden Section Search  =0.618 Setc 1 = a 1 + (1-  )(b 1 -a 1 ), F c =F(c 1 ) d 1 = b 1 - (1-  )(b 1 -a 1 ), F d =F(d 1 ) For k=1,2,... repeat If F c < F d then Set a k+1 = a k ; b k+1 = d k ; d k+1 = c k c k+1 = a k+1 + (1-  )(b k+1 -a k+1 ) F d = F c ; F c =F(c k+1 ) else Set a k+1 = c k ; b k+1 = b k ; c k+1 = d k d k+1 = b k+1 - (1-  )(b k+1 -a k+1 ) F c = F d ; F d =F(d k+1 ) end end until b k+1 - a k+1 < tol

12 1717 Conjugate Gradient BP (CGBP) w 1 1,1 w 2 1,1 w 1 1,1 w 2 1,1 Intermediate StepsComplete Trajectory

12 1818 Newton’s Method If the performance index is a sum of squares function: then the jth element of the gradient is

12 1919 Matrix Form The gradient can be written in matrix form: where J is the Jacobian matrix:

12 2020 Hessian

12 2121 Gauss-Newton Method x k J T x k  Jx k  1– J T x k  vx k  –= Approximate the Hessian matrix as: Newton’s method becomes:

12 2 Levenberg-Marquardt Gauss-Newton approximates the Hessian by: This matrix may be singular, but can be made invertible as follows: If the eigenvalues and eigenvectors of H are: then Eigenvalues of G

12 2323 Adjustment of  k As  k  0, LM becomes Gauss-Newton. As  k , LM becomes Steepest Descent with small learning rate. Therefore, begin with a small  k to use Gauss-Newton and speed convergence. If a step does not yield a smaller F(x), then repeat the step with an increased  k until F(x) is decreased. F(x) must decrease eventually, since we will be taking a very small step in the steepest descent direction.

12 2424 Application to Multilayer Network The performance index for the multilayer network is: The error vector is: The parameter vector is: The dimensions of the two vectors are:

12 2525 Jacobian Matrix

12 2626 Computing the Jacobian SDBP computes terms like: For the Jacobian we need to compute terms like: using the chain rule: where the sensitivity is computed using backpropagation.

12 2727 Marquardt Sensitivity If we define a Marquardt sensitivity: We can compute the Jacobian as follows: weight bias

12 2828 Computing the Sensitivities S ˜ m S ˜ 1 m S ˜ 2 m  S ˜ Q m = Backpropagation Initialization

12 2929 LMBP Present all inputs to the network and compute the corresponding network outputs and the errors. Compute the sum of squared errors over all inputs. Compute the Jacobian matrix. Calculate the sensitivities with the backpropagation algorithm, after initializing. Augment the individual matrices into the Marquardt sensitivities. Compute the elements of the Jacobian matrix. Solve to obtain the change in the weights. Recompute the sum of squared errors with the new weights. If this new sum of squares is smaller than that computed in step 1, then divide  k by , update the weights and go back to step 1. If the sum of squares is not reduced, then multiply  k by  and go back to step 3.

12 3030 Example LMBP Step w 1 1,1 w 2 1,1

12 3131 LMBP Trajectory w 1 1,1 w 2 1,1

12 1 Variations on Backpropagation. 12 2 Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.

Similar presentations

Presentation on theme: "12 1 Variations on Backpropagation. 12 2 Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

12 1 Variations on Backpropagation. 12 2 Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.

Similar presentations

Presentation on theme: "12 1 Variations on Backpropagation. 12 2 Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate."— Presentation transcript:

Similar presentations

About project

Feedback