Presentation on theme: "1. 2 3 4 5 A back-propagation neural network is only practical in certain situations. Following are some guidelines on when you should use another."— Presentation transcript:
5 A back-propagation neural network is only practical in certain situations. Following are some guidelines on when you should use another approach: Can you write down a flow chart or a formula that accurately describes the problem? If so, then stick with a traditional programming method. Is there a simple piece of hardware or software that already does what you want? If so, then the development time for a NN might not be worth it. Do you want the functionality to "evolve" in a direction that is not pre-defined?
6 Do you have an easy way to generate a significant number of input/output examples of the desired behavior? If not, then you won't be able to train your NN to do anything. Is the problem is very "discrete"? Can the correct answer can be found in a look-up table of reasonable size? A look-up table is much simpler and more accurate. Are precise numeric output values required? NN's are not good at giving precise numeric answers.
7 Conversely, here are some situations where a BP NN might be a good idea: A large amount of input/output data is available, but you're not sure how to relate it to the output. The problem appears to have overwhelming complexity, but there is clearly a solution. It is easy to create a number of examples of the correct behavior. The solution to the problem may change over time, within the bounds of the given input and output parameters (i.e., today 2+2=4, but in the future we may find that 2+2=3.8). Outputs can be "fuzzy", or non-numeric.
The most popular & successful method. Steps to be followed for the training: ◦ Select the next training pair from the training set( input vector and the output). ◦ Present the input vector to the network. ◦ Network calculate the output of the network. ◦ Network calculates the error between the network output and the desired output. ◦ Network back propagates the error ◦ Adjust the weights of the network in a way that minimizes the error. ◦ Repeat the above steps for each vector in the training set until the error is acceptable, for each training data set.. 8
Step 1: Feed forward the inputs through networks: a 0 = p a m+1 = f m+1 (W m+1 a m + b m+1 ), where m = 0, 1,..., M – 1. a = a M Step 2: Back-propagate the sensitive (error): where m = M – 1,..., 2, 1. Step 3: Finally, weights and biases are updated by following formulas:. (Details on constructing the algorithm and other related issues should be found on text book Neural Network Design) at the output layer at the hidden layers
13 It is the most commonly used generalization of the delta rule. This procedure involves two phases (i)Forward phase: when the input is presented, it propagates forward through the network to compute output values for each processing element. For each PE all the current outputs are compared with the desired outputs and the error is computed. (ii)Backward phase: The calculated error in now fed backward and weights are adjusted. After completing both the phases, a new input is presented for the further training. This technique is slow and can cause instability and has tendency to stuck in a local minima, but it is still very popular.
15 Target output Network output Note: LMS = least mean square
16 This method of weight adjustment is also known as steepest gradient descent technique or Widrow and Hoff rule and is most common type. This is also known as Delta rule.
17 where x i (t) and y j (t) are the outputs at nodes i and j. w ij are the weights between the nodes i and j
General multi-layered neural network i 01 Output Layer Wi,0 W0,0 W1,0 X9,0 X0,0 X1,0 Hidden Layer Input Layer
Backpropagation ◦ Calculation of hidden layer activation values
Backpropagation ◦ Calculation of output layer activation values
Backpropagation ◦ Calculation of error k = f(D k ) -f(O k )
Advantages ◦ Relatively simple implementation ◦ Standard method and generally works well Disadvantages ◦ Slow and inefficient ◦ Can get stuck in local minima resulting in sub-optimal solutions
Local Minimum Global Minimum
Simulated Annealing ◦ Advantages Can guarantee optimal solution (global minimum) ◦ Disadvantages May be slower than gradient descent Much more complicated implementation
Genetic Algorithms/Evolutionary Strategies ◦ Advantages Faster than simulated annealing Less likely to get stuck in local minima ◦ Disadvantages Slower than gradient descent Memory intensive for large nets
Simplex Algorithm ◦ Advantages Similar to gradient descent but faster Easy to implement ◦ Disadvantages Does not guarantee a global minimum
Momentum ◦ Adds a percentage of the last movement to the current movement
Momentum ◦ Useful to get over small bumps in the error function ◦ Often finds a minimum in less steps ◦ w(t) = -n*d*y + a*w(t-1) w is the change in weight n is the learning rate d is the error y is different depending on which layer we are calculating a is the momentum parameter
Adaptive Backpropagation Algorithm ◦ It assigns each weight a learning rate ◦ That learning rate is determined by the sign of the gradient of the error function from the last iteration If the signs are equal it is more likely to be a shallow slope so the learning rate is increased The signs are more likely to differ on a steep slope so the learning rate is decreased ◦ This will speed up the advancement when on gradual slopes
Adaptive Backpropagation ◦ Possible Problems: Since we minimize the error for each weight separately the overall error may increase ◦ Solution: Calculate the total output error after each adaptation and if it is greater than the previous error reject that adaptation and calculate new learning rates
SuperSAB(Super Self-Adapting Backpropagation) ◦ Combines the momentum and adaptive methods. ◦ Uses adaptive method and momentum so long as the sign of the gradient does not change This is an additive effect of both methods resulting in a faster traversal of gradual slopes ◦ When the sign of the gradient does change the momentum will cancel the drastic drop in learning rate This allows for the function to roll up the other side of the minimum possibly escaping local minima
SuperSAB ◦ Experiments show that the SuperSAB converges faster than gradient descent ◦ Overall this algorithm is less sensitive (and so is less likely to get caught in local minima)
Varying training data ◦ Cycle through input classes ◦ Randomly select from input classes Add noise to training data ◦ Randomly change value of input node (with low probability) Retrain with expected inputs after initial training ◦ E.g. Speech recognition
Adding and removing neurons from layers ◦ Adding neurons speeds up learning but may cause loss in generalization ◦ Removing neurons has the opposite effect
1) In image analysis a.Text in image recognition. b.Finding oil fields. 2) Source Code recognition. 3) Reproducing similar sound. 4) Robotics 38
A Mad scientist wants to make billions of dollars by controlling the stock market. He will do this by controlling the stock purchases of several wealthy people. The scientist controls information that can be given by wall street insiders and has a device to control how much different people can trust each other. Using his ability to input insider information and control trust between people, he will control the purchases by wealthy individuals. If purchases can be made that are ideal to the mad scientist, he can gain capital by controlling the market. 40
41 Information is planted at the top level to Wall Street insiders. They then relay this information to stock brokers who are their friends. The brokers then relay that information to their favorite wealthy clients who then make trades. The weight for each edge is the amount of trust that person has for the person above them. The more they trust a person, the more likely they are to either pass along information or make a trade based on the information.
As a mad scientist, you will need to adjust this social network in order to create optimal actions in the market place. You do this using your secret Trust 'o' Vac With it you can increase or decrease each trust weight how you see fit. You then observe the trades that are made by the rich dudes. If the trades are not to your liking, then we consider this error. The more to your liking the trades are, the less error they contain. Ideally, you want to slowly adjust the network so that it gets closer and closer to what you want and contains less error. In general terms this is referred to as gradient descent. 42
43 As you place insider information, you observe the amount of error coming out of your network. If a person is making trades that rather poor you need to figure out where they are getting the information to do so. A strong trust (shown by a thick line) indicates where more error is coming from and where larger changes need to be made
There are many ways in which we can adjust the trust weights, but we will use a very simple method here. Each time we place some insider information, we watch the trades that come from our rich dudes. If there is a large error coming from one rich dude, then they are getting bad information from someone they trust too much or are not getting good information from someone they should trust more. When the mad scientist sees this, he uses the Trust 'o' Vac 2000 to weaken a strong trust by a little and strengthen a weak trust by a little. Thus, we try to slowly cut off the source of bad information and increase the source of good information going to the rich dudes 44
45 We next have to adjust the trust weights between the CEO's and the brokers. We do this by propagating error backwards: if a strong weight exists between a broker and a rich dude who is making bad purchases on a regular basis, then we can attribute that error to the broker. We can then make the rich dude trust this broker less and also adjust the weights of trust between the broker and the fat cats in a similar way