Presentation on theme: "Introduction to Artificial Neural Networks"— Presentation transcript:
1 Introduction to Artificial Neural Networks Neural networks do not perform miracles. But if used sensibly they can produce some amazing results.Presented by: Ghayas Ur RehmanCourse Trainer: Dr. Tehseen JilaniDepartment of Computer ScienceUniversity of Karachi
2 Back propagation Network: It is the most widely used architecture. It is very popular technique that is relatively easy to implement. It requires large amount of training data for conditioning the network before using it for predicting the outcome.A back-propagation network includes at-least one hidden layer.The approach is considered as “feed-forward/ back propagation” approach.Limitations:NNs do not do well at tasks that are not driven well by people.They lack the explaining facility.Training time can be excessive .
4 BPNN in simple wordsBack-propagation is an algorithm that extends the analysis that underpins the delta rule to neural nets with hidden nodes.To see the problem, imagine that Bob tells Alice a story, and then Alice tells Ted. Ted checks the facts, and finds that the story is erroneous. Now, Ted needs to find out how much of the error is due to Bob and how much to Alice. When output nodes take their inputs from hidden nodes, and the net finds that it is in error, its weight adjustments require an algorithm that will pick out how much the various nodes contributed to its overall error. The net needs to ask, "Who led me astray? By how much? And, how do I fix this?" What's a net to do?
5 When not to use BPNN?A back-propagation neural network is only practical in certain situations. Following are some guidelines on when you should use another approach:Can you write down a flow chart or a formula that accurately describes the problem? If so, then stick with a traditional programming method.Is there a simple piece of hardware or software that already does what you want? If so, then the development time for a NN might not be worth it.Do you want the functionality to "evolve" in a direction that is not pre-defined?
6 When not to use BPNN? (Contd.) Do you have an easy way to generate a significant number of input/output examples of the desired behavior? If not, then you won't be able to train your NN to do anything.Is the problem is very "discrete"? Can the correct answer can be found in a look-up table of reasonable size? A look-up table is much simpler and more accurate.Are precise numeric output values required? NN's are not good at giving precise numeric answers.
7 When to use BPNN?Conversely, here are some situations where a BP NN might be a good idea:A large amount of input/output data is available, but you're not sure how to relate it to the output.The problem appears to have overwhelming complexity, but there is clearly a solution.It is easy to create a number of examples of the correct behavior.The solution to the problem may change over time, within the bounds of the given input and output parameters (i.e., today 2+2=4, but in the future we may find that 2+2=3.8).Outputs can be "fuzzy", or non-numeric.
8 Back-propagation algorithm The most popular & successful method.Steps to be followed for the training:Select the next training pair from the training set( input vector and the output).Present the input vector to the network.Network calculate the output of the network.Network calculates the error between the network output and the desired output.Network back propagates the errorAdjust the weights of the network in a way that minimizes the error.Repeat the above steps for each vector in the training set until the error is acceptable, for each training data set..
9 Back-propagation algorithm Step 1: Feed forward the inputs through networks:a0 = pam+1 = fm+1 (Wm+1 am + bm+1), where m = 0, 1, ..., M – 1.a = aMStep 2: Back-propagate the sensitive (error):at the output layerat the hidden layerswhere m = M – 1, ..., 2, 1.Step 3: Finally, weights and biases are updated by following formulas:.(Details on constructing the algorithm and other related issues should be found on text book Neural Network Design)
10 Network Training Supervised Learning Unsupervised Learning Network is presented with the input and the desired output.Uses a set of inputs for which the desired outputs results / classes are known. The difference between the desired and actual output is used to calculate adjustment to weights of the NN structureUnsupervised LearningNetwork is not shown the desired output.Concept is similar to clusteringIt tries to create classification in the outcome.
11 Unsupervised Learning Only input stimuli (parameters) are presented to the network. The network is self organizing, that is, it organizes itself internally, so that each hidden processing elements and weights responds appropriately to a different set of input stimuli.No knowledge is supplied about the classification of outputs. However, the number of categories into which the network classifies the inputs can be controlled by varying certain parameters in the model. In any case, human expert must examine the final classifications to assign a meaning & usefulness of results.Reinforcement LearningIn between Supervised & Unsupervised learning.Network gets a feedback from the environment.
12 Learning ( Training) Algorithms The training process requires a set of properly selected data in the form of network inputs and target outputs. During training, the weights and biases are iteratively adjusted to minimize the network performance function ( error). The default performance function is mean square error. Input data should be independent.Back- Propagation learning algorithmThere are many variation. The commonly used one is:gradient descent algorithm:x k+1 = xk - k gkWhere xk is a vector of current weights and biases and gk is current gradient and k is the chosen learning rate.
13 Back Propagation Learning Algorithm It is the most commonly used generalization of the delta rule. This procedure involves two phasesForward phase: when the input is presented, it propagates forward through the network to compute output values for each processing element. For each PE all the current outputs are compared with the desired outputs and the error is computed.Backward phase: The calculated error in now fed backward and weights are adjusted.After completing both the phases, a new input is presented for the further training.This technique is slow and can cause instability and has tendency to stuck in a local minima, but it is still very popular.
14 Gradient Descent Algorithm The idea is to calculate an error each time the network is presented with a training vector (given that we have supervised learning where there is a target vector) and to perform a gradient descent on the error - considered as function of the weights. There will be a gradient or slope for each weight. Thus, we find the weights which give the minimal error.Typically the error criterion is defined by the square of the difference between the pattern output and the target output( least squared error).The total error E, is then just the sum of the pattern error square.
15 Error function (LMS) Target output Network output Note: LMS = least mean squareNetwork output
16 This method of weight adjustment is also known as steepest gradient descent technique or Widrow and Hoff rule and is most common type. This is also known as Delta rule.
17 Network Learning Rules Hebbian RuleThe first and the best known learning rule was introduced by Donald Hebb. This basic rule is: If a neuron receives an input from another neuron, and if both are highly active (mathematically have the same sign), the weight between the neurons should be strengthened.where xi(t) and yj(t) are the outputs at nodes i and j.wij are the weights between the nodes i and j
18 Backpropagation: The Math General multi-layered neural networkOutput Layer123456789X0,0X1,0X9,0Hidden Layer1iW0,0W1,0Wi,0Input Layer1
19 Backpropagation: The Math Calculation of hidden layer activation values
20 Backpropagation: The Math Calculation of output layer activation values
21 Backpropagation: The Math Calculation of errordk = f(Dk) -f(Ok)
22 Backpropagation: The Math Gradient Descent objective functionGradient Descent termination condition
23 Backpropagation: The Math Output layer weight recalculationLearning Rate(eg. 0.25)Error at k
24 Backpropagation: The Math Hidden Layer weight recalculation
25 Backpropagation Using Gradient Descent AdvantagesRelatively simple implementationStandard method and generally works wellDisadvantagesSlow and inefficientCan get stuck in local minima resulting in sub-optimal solutions
27 Alternatives To Gradient Descent Simulated AnnealingAdvantagesCan guarantee optimal solution (global minimum)DisadvantagesMay be slower than gradient descentMuch more complicated implementation
28 Alternatives To Gradient Descent Genetic Algorithms/Evolutionary StrategiesAdvantagesFaster than simulated annealingLess likely to get stuck in local minimaDisadvantagesSlower than gradient descentMemory intensive for large nets
29 Alternatives To Gradient Descent Simplex AlgorithmAdvantagesSimilar to gradient descent but fasterEasy to implementDisadvantagesDoes not guarantee a global minimum
30 Enhancements To Gradient Descent MomentumAdds a percentage of the last movement to the current movement
31 Enhancements To Gradient Descent MomentumUseful to get over small bumps in the error functionOften finds a minimum in less stepsw(t) = -n*d*y + a*w(t-1)w is the change in weightn is the learning rated is the errory is different depending on which layer we are calculatinga is the momentum parameter
32 Enhancements To Gradient Descent Adaptive Backpropagation AlgorithmIt assigns each weight a learning rateThat learning rate is determined by the sign of the gradient of the error function from the last iterationIf the signs are equal it is more likely to be a shallow slope so the learning rate is increasedThe signs are more likely to differ on a steep slope so the learning rate is decreasedThis will speed up the advancement when on gradual slopes
33 Enhancements To Gradient Descent Adaptive BackpropagationPossible Problems:Since we minimize the error for each weight separately the overall error may increaseSolution:Calculate the total output error after each adaptation and if it is greater than the previous error reject that adaptation and calculate new learning rates
34 Enhancements To Gradient Descent SuperSAB(Super Self-Adapting Backpropagation)Combines the momentum and adaptive methods.Uses adaptive method and momentum so long as the sign of the gradient does not changeThis is an additive effect of both methods resulting in a faster traversal of gradual slopesWhen the sign of the gradient does change the momentum will cancel the drastic drop in learning rateThis allows for the function to roll up the other side of the minimum possibly escaping local minima
35 Enhancements To Gradient Descent SuperSABExperiments show that the SuperSAB converges faster than gradient descentOverall this algorithm is less sensitive (and so is less likely to get caught in local minima)
36 Other Ways To Minimize Error Varying training dataCycle through input classesRandomly select from input classesAdd noise to training dataRandomly change value of input node (with low probability)Retrain with expected inputs after initial trainingE.g. Speech recognition
37 Other Ways To Minimize Error Adding and removing neurons from layersAdding neurons speeds up learning but may cause loss in generalizationRemoving neurons has the opposite effect
38 Applications of Backpropagation In image analysisText in image recognition.Finding oil fields.Source Code recognition.Reproducing similar sound.Robotics
40 Case studyA Mad scientist wants to make billions of dollars by controlling the stock market. He will do this by controlling the stock purchases of several wealthy people. The scientist controls information that can be given by wall street insiders and has a device to control how much different people can trust each other. Using his ability to input insider information and control trust between people, he will control the purchases by wealthy individuals. If purchases can be made that are ideal to the mad scientist, he can gain capital by controlling the market.
41 Information is planted at the top level to Wall Street insiders Information is planted at the top level to Wall Street insiders. They then relay this information to stock brokers who are their friends. The brokers then relay that information to their favorite wealthy clients who then make trades. The weight for each edge is the amount of trust that person has for the person above them. The more they trust a person, the more likely they are to either pass along information or make a trade based on the information.
42 Case study (Contd…)As a mad scientist, you will need to adjust this social network in order to create optimal actions in the market place. You do this using your secret Trust 'o' Vac With it you can increase or decrease each trust weight how you see fit. You then observe the trades that are made by the rich dudes. If the trades are not to your liking, then we consider this error. The more to your liking the trades are, the less error they contain. Ideally, you want to slowly adjust the network so that it gets closer and closer to what you want and contains less error. In general terms this is referred to as gradient descent.
43 As you place insider information, you observe the amount of error coming out of your network. If a person is making trades that rather poor you need to figure out where they are getting the information to do so. A strong trust (shown by a thick line) indicates where more error is coming from and where larger changes need to be made
44 Case study (Contd…)There are many ways in which we can adjust the trust weights, but we will use a very simple method here. Each time we place some insider information, we watch the trades that come from our rich dudes. If there is a large error coming from one rich dude, then they are getting bad information from someone they trust too much or are not getting good information from someone they should trust more. When the mad scientist sees this, he uses the Trust 'o' Vac 2000 to weaken a strong trust by a little and strengthen a weak trust by a little. Thus, we try to slowly cut off the source of bad information and increase the source of good information going to the rich dudes
45 We next have to adjust the trust weights between the CEO's and the brokers. We do this by propagating error backwards: if a strong weight exists between a broker and a rich dude who is making bad purchases on a regular basis, then we can attribute that error to the broker. We can then make the rich dude trust this broker less and also adjust the weights of trust between the broker and the fat cats in a similar way
46 Thanks for your patience The EndThanks for your patience