Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improved BP algorithms ( first order gradient method) 1.BP with momentum 2.Delta- bar- delta 3.Decoupled momentum 4.RProp 5.Adaptive BP 6.Trinary BP 7.BP.

Similar presentations


Presentation on theme: "Improved BP algorithms ( first order gradient method) 1.BP with momentum 2.Delta- bar- delta 3.Decoupled momentum 4.RProp 5.Adaptive BP 6.Trinary BP 7.BP."— Presentation transcript:

1 Improved BP algorithms ( first order gradient method) 1.BP with momentum 2.Delta- bar- delta 3.Decoupled momentum 4.RProp 5.Adaptive BP 6.Trinary BP 7.BP with adaptive gain 8.Extended BP

2 BP with momentum (BPM) The basic improvement to BP (Rumelhart 1986) Momentum factor alpha selected between zero and one Adding momentum improves the convergence speed and helps network from being trapped in a local minimum.

3 Modification form: Proposed by nagata 1990 Beta is a constant value decided by user Nagata claimed that beta term reduce the possibility of the network being trapped in the local minimum This seems beta repeating alpha rule again!!! But not clear

4 Delta-bar-delta (DBD) Use adaptive learning rate to speed up the convergence The adaptive rate adopted base on local optimization Use gradient descent for the search direction, and use individual step sizes for each weight

5 RProp Jervis and Fitzgerald (1993) and limit the size of the step

6 Second order gradient methods Newton Gauss-Newton Levenberg-Marquardt Quickprop Conjugate gradient descent Broyde –Fletcher-Goldfab-Shanno

7 Performance Surfaces Taylor Series Expansion

8 Example

9 Plot of Approximations

10 Vector Case

11 Matrix Form

12 Performance Optimization Basic Optimization Algorithm Steepest Descent

13 Examples

14 Minimizing Along a Line

15 Example

16 Plot

17 Newton’s Method

18 Example

19 Plot

20 Non-Quadratic Example

21 Different Initial Conditions

22 DIFFICULT – Inverse a singular matrix!!! – Complexity

23 Newton’s Method

24 Matrix Form

25 Hessian

26 Gauss-Newton Method

27 Levenberg-Marquardt

28 Adjustment of  k

29 Application to Multilayer Network

30 Jacobian Matrix

31 Computing the Jacobian

32 Marquardt Sensitivity

33 Computing the Sensitivities

34 LMBP Present all inputs to the network and compute the corresponding network outputs and the errors. Compute the sum of squared errors over all inputs. Compute the Jacobian matrix. Calculate the sensitivities with the backpropagation algorithm, after initializing. Augment the individual matrices into the Marquardt sensitivities. Compute the elements of the Jacobian matrix. Solve to obtain the change in the weights. Recompute the sum of squared errors with the new weights. If this new sum of squares is smaller than that computed in step 1, then divide  k by , update the weights and go back to step 1. If the sum of squares is not reduced, then multiply  k by  and go back to step 3.

35 Example LMBP Step

36 LMBP Trajectory

37 Conjugate Gradient

38 LRLS method Recursive least square method based method, does not need gradient descent.

39 LRLS method cont.

40

41 Example for compare methods AlgorithmsAverage time steps (cut off at 0.15) Average time steps (cut off at 0.1) Average time steps (cut off at 0.05) BPM101175594 DBD641672382 LM2123264 LRLS3969

42 IGLS (Integration of the Gradient and Least Square) The LRLS algorithm has faster convergence speed than BPM algorithm. However, it is computationally more complex and is not ideal for very large network size. The learning strategy describe here combines ideas from both the algorithm to achieve the objectives of maintaining complexity for large network size and reasonably fast convergence. The out put layer weights update using BPM method The hidden layer weights update using BPM method


Download ppt "Improved BP algorithms ( first order gradient method) 1.BP with momentum 2.Delta- bar- delta 3.Decoupled momentum 4.RProp 5.Adaptive BP 6.Trinary BP 7.BP."

Similar presentations


Ads by Google