Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Neural Networks. Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing.

Similar presentations


Presentation on theme: "Introduction to Neural Networks. Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing."— Presentation transcript:

1 Introduction to Neural Networks

2

3 Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing and rest. Neuron fires if the total incoming stimulus exceeds the threshold –Synapse: thin gap between axon of one neuron and dendrite of another. Signal exchange Synaptic strength/efficiency

4

5

6

7

8 Neuron Model

9

10 Trasnfer Function

11 Sigmoidal Neurons The output (spike frequency) of every neuron is simulated as a value between 0 (no spikes) and 1 (maximum frequency).The output (spike frequency) of every neuron is simulated as a value between 0 (no spikes) and 1 (maximum frequency). 1 0 1 o i (t) o i (t) net i (t)  = 1  = 0.1

12

13

14

15

16

17

18

19

20

21

22

23

24

25 Supervised Learning in the BPN In supervised learning, we train an ANN with a set of vector pairs, so-called exemplars.In supervised learning, we train an ANN with a set of vector pairs, so-called exemplars. Each pair (x, y) consists of an input vector x and a corresponding output vector y.Each pair (x, y) consists of an input vector x and a corresponding output vector y. Whenever the network receives input x, we would like it to provide output y.Whenever the network receives input x, we would like it to provide output y. The exemplars thus describe the function that we want to “teach” our network.The exemplars thus describe the function that we want to “teach” our network. Besides learning the exemplars, we would like our network to generalize, that is, give plausible output for inputs that the network had not been trained with.Besides learning the exemplars, we would like our network to generalize, that is, give plausible output for inputs that the network had not been trained with.

26 Supervised Learning in the BPN Before the learning process starts, all weights (synapses) in the network are initialized with pseudorandom numbers.Before the learning process starts, all weights (synapses) in the network are initialized with pseudorandom numbers. We also have to provide a set of training patterns (exemplars). They can be described as a set of ordered vector pairs {(x 1, y 1 ), (x 2, y 2 ), …, (x P, y P )}.We also have to provide a set of training patterns (exemplars). They can be described as a set of ordered vector pairs {(x 1, y 1 ), (x 2, y 2 ), …, (x P, y P )}. Then we can start the backpropagation learning algorithm.Then we can start the backpropagation learning algorithm. This algorithm iteratively minimizes the network’s error by finding the gradient of the error surface in weight-space and adjusting the weights in the opposite direction (gradient- descent technique).This algorithm iteratively minimizes the network’s error by finding the gradient of the error surface in weight-space and adjusting the weights in the opposite direction (gradient- descent technique).

27 Supervised Learning in the BPN Gradient-descent example: Finding the absolute minimum of a one-dimensional error function f(x):Gradient-descent example: Finding the absolute minimum of a one-dimensional error function f(x):f(x)x x0x0x0x0 slope: f’(x 0 ) x 1 = x 0 -  f’(x 0 ) Repeat this iteratively until for some x i, f’(x i ) is sufficiently close to 0.

28 Supervised Learning in the BPN Gradients of two-dimensional functions:Gradients of two-dimensional functions: The two-dimensional function in the left diagram is represented by contour lines in the right diagram, where arrows indicate the gradient of the function at different locations. Obviously, the gradient is always pointing in the direction of the steepest increase of the function. In order to find the function’s minimum, we should always move against the gradient.

29 Supervised Learning in the BPN In the BPN, learning is performed as follows:In the BPN, learning is performed as follows: 1.Randomly select a vector pair (x p, y p ) from the training set and call it (x, y). 2.Use x as input to the BPN and successively compute the outputs of all neurons in the network (bottom-up) until you get the network output o. 3.Compute the error of the network, i.e., the difference between the desired output y and the actual output o. 4.Apply the backpropagation learning rule to update the weights in the network so that its output o for input x is closer to the desired output y.

30 Supervised Learning in the BPN Repeat steps 1 to 4 for all vector pairs in the training set; this is called a training epoch.Repeat steps 1 to 4 for all vector pairs in the training set; this is called a training epoch. Run as many epochs as required to reduce the network error E to fall below a threshold that you set beforehand.Run as many epochs as required to reduce the network error E to fall below a threshold that you set beforehand.

31 Supervised Learning in the BPN Now our BPN is ready to go! If we choose the type and number of neurons in our network appropriately, after training the network should show the following behavior: If we input any of the training vectors, the network should yield the expected output vector (with some margin of error). If we input any of the training vectors, the network should yield the expected output vector (with some margin of error). If we input a vector that the network has never “seen” before, it should be able to generalize and yield a plausible output vector based on its knowledge about similar input vectors. If we input a vector that the network has never “seen” before, it should be able to generalize and yield a plausible output vector based on its knowledge about similar input vectors.

32

33

34

35

36

37 For most situations, we recommend that you try the Levenberg- Marquardt algorithm first. If this algorithm requires too much memory, then try the BFGS algorithm, or one of the conjugate gradient methods. The Rprop algorithm is also very fast, and has relatively small memory requirements.

38 One of the problems that occurs during neural network training is called overfitting. The error on the training set is driven to a very small value, but when new data is presented to the network the error is large. The network has memorized the training examples, but it has not learned to generalize to new situations.

39 One method for improving network generalization is to use a network which is just large enough to provide an adequate fit. The larger a network you use, the more complex the functions that the network can create. If we use a small enough network, it will not have enough power to overfit the data. It is possible to improve generalization if we modify the performance function by adding a term that consists of the mean of the sum of squares of the network weights and biases: Modified Performance Function Where is the performance ratio, and Using this performancefunction will cause the network to have smaller weights and biases, and this will force the network response to be smoother and less likely to overfit.

40 Early Stopping Another method for improving generalization is called early stopping. In this technique the available data is divided into three subsets. The first subset is the training set which is used for computing the gradient and updating the network weights and biases. The second subset is the validation set. The error on the validation set is monitored during the training process. The validation error will normally decrease during the initial phase of training, as does the training set error. However, when the network begins to overfit the data, the error on the validation set will typically begin to rise. When the validation error increases for a specified number of iterations, the training is stopped, and the weights and biases at the minimum of the validation error are returned. The test set error is not used during the training, but it is used to compare different models. It is also useful to plot the test set error during the training process. If the error in the test set reaches a minimum at a significantly different iteration number than the validation set error, this may indicate a poor division of the data set.

41 Min and Max ( PREMNMX, POSTMNMX, TRAMNMX) Before training, it is often useful to scale the inputs and targets so that they always fall within a specified range. The function premnmx can be used to scale inputs and targets so that they fall in the range [- 1,1]. If premnmx is used to preprocess the training set data, then whenever the trained network is used with new inputs they should be preprocessed with the minimum and maximums which were computed for the training set. This can beAccomplished with the routine tramnmx. In the following code we simulate the network which was trained in the previous code and then convert the network output back into the original units.


Download ppt "Introduction to Neural Networks. Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing."

Similar presentations


Ads by Google