Presentation is loading. Please wait.

Presentation is loading. Please wait.

Artificial Intelligence Techniques. Aims: Section fundamental theory and practical applications of artificial neural networks.

Similar presentations


Presentation on theme: "Artificial Intelligence Techniques. Aims: Section fundamental theory and practical applications of artificial neural networks."— Presentation transcript:

1 Artificial Intelligence Techniques

2 Aims: Section fundamental theory and practical applications of artificial neural networks.

3 Aims: Session Aim Introduction to the biological background and implementation issues relevant to the development of practical systems.

4 Biological neuron  Taken from http://hepunx.rl.ac.uk/~candreop/minos/NeuralN ets/neuralNetIntro.html

5 Human brain consists of approx. 10 billion neurons interconnected with about 10 trillion synapses.

6  A neuron: specialized cell for receiving, processing and transmitting informations.

7  Electric charge from neighboring neurons reaches the neuron and they add.

8  The summed signal is passed to the soma that processing this information.

9  A signal threshold is applied.

10 If the summed signal > threshold, the neuron fires

11 Constant output signal is transmitted to other neurons.

12  The strength and polarity of the output depends features of each synapse

13  varies these features - adapt the network.

14  varies the input contribute - vary the system!

15 McCulloch-Pitts model X1 X2 X3 W1 W2 W3 T Y Y=1 if W1X1+W2X2+W3X3  T Y=0 if W1X1+W2X2+W3X3<T

16 McCulloch-Pitts model Y=1 if W1X1+W2X2+W3X3  T Y=0 if W1X1+W2X2+W3X3<T

17 Introduce the bias Take the threshold over to the other side of the equation and replace it with a weight W0 which equals -T, and include a constant input X0 which equals 1.

18 Introduce the bias Y=1 if W1X1+W2X2+W3X3 - T 0 Y=0 if W1X1+W2X2+W3X3 -T <0

19 Introduce the bias  Lets just use weights – replace T with a ‘fake’ input  ‘fake’ is always 1.

20 Introduce the bias Y=1 if W1X1+W2X2+W3X3 +W0X0 0 Y=0 if W1X1+W2X2+W3X3 +W0X0 <0

21 Logic functions - OR X1 X2 1 1 Y Y = X1 OR X2 X0

22 Logic functions - AND X1 X2 1 1 Y Y = X1 AND X2 X0 -2

23 Logic functions - NOT X1 Y Y = NOT X1 X0 0

24 The weighted sum  The weighted sum, Σ WiXi is called the “net” sum.  Net = Σ WiXi  y=1 if net  0  y=0 if net < 0

25 Hard-limiter The threshold function is known as a hard-limiter. y 0 1 net When net is zero or positive, the output is 1, when net is negative the output is 0.

26 Example Original image Weights +1 0 Net = 14

27 Example with bias With a bias of -14, the weighted sum, net, is 0. Any pattern other than the original will produce a sum that is less than 0. If the bias is changed to -13, then patterns with 1 bit different from the original will give a sum that is 0 or more, so an output of 1.

28 Generalisation  The neuron can respond to the original image and to small variations  The neuron is said to have generalised because it recognises patterns that it hasn’t seen before

29 Pattern space  To understand what a neuron is doing, the concept of pattern space has to be introduced  A pattern space is a way of visualizing the problem  It uses the input values as co-ordinates in a space

30 Pattern space in 2 dimensions X2 X1 X1 X2 Y 0 0 0 0 1 0 1 0 0 1 1 1 1 0 0 1 The AND function 1 0

31 Linear separability The AND function shown earlier had weights of -2, 1 and 1. Substituting into the equation for net gives: net = W0X0+W1X1+W2X2 = -2X0+X1+X2 Also, since the bias, X0, always equals 1, the equation becomes: net = -2+X1+X2

32 Linear separability The change in the output from 0 to 1 occurs when: net = -2+X1+X2 = 0 This is the equation for a straight line. X2 = -X1 + 2 Which has a slope of -1 and intercepts the X2 axis at 2. This line is known as a decision surface.

33 Linear separability X2 X1 X1 X2 Y 0 0 0 0 1 0 1 0 0 1 1 1 1 0 0 1 2 2 The AND function

34 Linear separability  When a neuron learns it is positioning a line so that all points on or above the line give an output of 1 and all points below the line give an output of 0  When there are more than 2 inputs, the pattern space is multi-dimensional, and is divided by a multi-dimensional surface (or hyperplane) rather than a line

35 Are all problems linearly separable?  No  For example, the XOR function is non-linearly separable  Non-linearly separable functions cannot be implemented on a single neuron

36 Exclusive-OR (XOR) X2 X1 X1 X2 Y 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0 1 2 2 ? ?

37 Learning  A single neuron learns by adjusting the weights  The process is known as the delta rule  Weights are adjusted in order to minimise the error between the actual output of the neuron and the desired output  Training is supervised, which means that the desired output is known

38 Delta rule The equation for the delta rule is: ΔWi = ηXiδ = ηXi(d-y) where d is the desired output and y is the actual output. The Greek “eta”, η, is a constant called the learning coefficient and is usually less than 1. ΔWi means the change to the weight, Wi.

39 Delta rule  The change to a weight is proportional to Xi and to d-y.  If the desired output is bigger than the actual output then d - y is positive  If the desired output is smaller than the actual output then d - y is negative  If the actual output equals the desired output the change is zero

40 Changes to the weight

41 Example  Assume that the weights are initially random  The desired function is the AND function  The inputs are shown one pattern at a time and the weights adjusted

42 The AND function

43 Example Start with random weights of 0.5, -1, 1.5 When shown the input pattern 1 0 0 the weighted sum is: net = 0.5 x 1 + (-1) x 0 + 1.5 x 0 = 0.5 This goes through the hard-limiter to give an output of 1. The desired output is 0. So the changes to the weights are: W0negative W1zero W2zero

44 Example New value of weights (with η equal to 0.1) of 0.4, -1, 1.5 When shown the input pattern 1 0 1 the weighted sum is: net = 1 x 0.4 + (-1) x 0 + 1.5 x 1 = 1.9 This goes through the hard-limiter to give an output of 1. The desired output is 0. So the changes to the weights are: W0negative W1zero W2negative

45 Example New value of weights of 0.3, -1, 1.4 When shown the input pattern 1 1 0 the weighted sum is: net = 1 x 0.3 + (-1) x 1 + 1.4 x 0 = -0.7 This goes through the hard-limiter to give an output of 0. The desired output is 0. So the changes to the weights are: W0zero W1zero W2zero

46 Example New value of weights of 0.3, -1, 1.4 When shown the input pattern 1 1 1 the weighted sum is: net = 1 x 0.3 + (-1) x 1 + 1.4 x 1 = 0.7 This goes through the hard-limiter to give an output of 1. The desired output is 1. So the changes to the weights are: W0zero W1zero W2zero

47 Example - with η = 0.5

48 Example

49

50

51

52

53

54 What happened in pattern space X2 X1 123 1 2

55 What happened in pattern space X2 X1 123 1 2

56 What happened in pattern space X2 X1 123 1 2

57 What happened in pattern space X2 X1 123 1 2

58 What happened in pattern space X2 X1 123 1 2

59 What happened in pattern space X2 X1 123 1 2

60 What happened in pattern space X2 X1 123 1 2

61 Conclusions  A single neuron can be trained to implement any linearly separable function  Training is achieved using the delta rule which adjusts the weights to reduce the error  Training stops when there is no error  Training is supervised

62 Conclusions  To understand what a neuron is doing, it help to picture what’s going on in pattern space  A linearly separable function can divide the pattern space into two areas using a hyperplane  If a function is not linearly separable, networks of neurons are needed

63 Kohonen network  All neurons connected to inputs not connected to each other  Often uses a MLP as an output layer  Neurons are self-organising  Trained using “winner-takes all”

64 What can they do?  Perform tasks that conventional software cannot do  For example, reading text, understanding speech, recognising faces

65 Neural network approach  Set up examples of numerals  Train a network  Done, in a matter of seconds

66 Learning and generalising  Neural networks can do this easily because they have the ability to learn and to generalise from examples

67 Learning and generalising  Learning is achieved by adjusting the weights  Generalisation is achieved because similar patterns will produce an output

68 Summary  Neural networks have a long history but are now a major part of computer systems

69 Summary  They can perform tasks (not perfectly) that conventional software finds difficult

70  Neural networks can  Classify  Learn and generalise.

71 Multilayer Perceptrons 1

72 Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses

73 Recap

74 Linear separability  When a neuron learns it is positioning a line so that all points on or above the line give an output of 1 and all points below the line give an output of 0  When there are more than 2 inputs, the pattern space is multi-dimensional, and is divided by a multi-dimensional surface (or hyperplane) rather than a line

75 Pattern space - linearly separable X2 X1

76 Non-linearly separable problems  If a problem is not linearly separable, then it is impossible to divide the pattern space into two regions  A network of neurons is needed

77 Pattern space - non linearly separable X2 X1 Decision surface

78 The multi-layered perceptron (MLP)

79 Input layer Hidden layerOutput layer

80 Complex decision surface  The MLP has the ability to emulate any function using one hidden layer with a sigmoid function, and a linear output layer  A 3-layered network can therefore produce any complex decision surface  However, the number of neurons in the hidden layer cannot be calculated

81 Network architecture  All neurons in one layer are connected to all neurons in the next layer  The network is a feedforward network, so all data flows from the input to the output  The architecture of the network shown is described as 3:4:2  All neurons in the hidden and output layers have a bias connection

82 Input layer  Receives all of the inputs  Number of neurons equals the number of inputs  Does no processing  Connects to all the neurons in the hidden layer

83 Hidden layer  Could be more than one layer, but theory says that only one layer is necessary  The number of neurons is found by experiment  Processes the inputs  Connects to all neurons in the output layer  The output is a sigmoid function

84 Output layer  Produces the final outputs  Processes the outputs from the hidden layer  The number of neurons equals the number of outputs  The output could be linear or sigmoid

85 Problems with networks  Originally the neurons had a hard-limiter on the output  Although an error could be found between the desired output and the actual output, which could be used to adjust the weights in the output layer, there was no way of knowing how to adjust the weights in the hidden layer

86 The invention of back- propagation  By introducing a smoothly changing output function, it was possible to calculate an error that could be used to adjust the weights in the hidden layer(s)

87 Output function The sigmoid function 0 0.2 0.4 0.6 0.8 1 1.2 -5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -0.5 -0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 net y

88 Sigmoid function  The sigmoid function goes smoothly from 0 to 1 as net increases  The value of y when net=0 is 0.5  When net is negative, y is between 0 and 0.5  When net is positive, y is between 0.5 and 1.0

89 Back-propagation  The method of training is called the back- propagation of errors  The algorithm is an extension of the delta rule, called the generalised delta rule

90 Generalised delta rule  The equation for the generalised delta rule is ΔWi = ηXiδ  δ is the defined according to which layer is being considered.  For the output layer, δ is y(1-y)(d-y).  For the hidden layer δ is a more complex.

91 Training a network  Example: The problem could not be implemented on a single layer - nonlinearly separable  A 3 layer MLP was tried with 2 neurons in the hidden layer - which trained  With 1 neuron in the hidden layer it failed to train

92 The hidden neurons

93 The weights  The weights for the 2 neurons in the hidden layer are -9, 3.6 and 0.1 and 6.1, 2.2 and -7.8  These weights can be shown in the pattern space as two lines  The lines divide the space into 4 regions

94 Training and Testing

95  Starting with a data set, the first step is to divide the data into a training set and a test set  Use the training set to adjust the weights until the error is acceptably low  Test the network using the test set, and see how many it gets right

96 A better approach  Critics of this standard approach have pointed out that training to a low error can sometimes cause “overfitting”, where the network performs well on the training data but poorly on the test data  The alternative is to divide the data into three sets, the extra one being the validation set

97 Validation set  During training, the training data is used to adjust the weights  At each iteration, the validation/test data is also passed through the network and the error recorded but the weights are not adjusted  The training stops when the error for the validation/test set starts to increase

98 Stopping criteria error time Stop here Validation set Training set

99 The multi-layered perceptron (MLP) and Backpropogation

100 Architecture Input layer Hidden layerOutput layer

101 Back-propagation  The method of training is called the back- propagation of errors  The algorithm is an extension of the delta rule, called the generalised delta rule

102 Generalised delta rule  The equation for the generalised delta rule is ΔWi = ηXiδ  δ is the defined according to which layer is being considered.  For the output layer, δ is y(1-y)(d-y).  For the hidden layer δ is a more complex.

103 Hidden Layer  We have to deal with the error from the output layer being feedback backwards to the hidden layer.  Lets look at example the weight w2(1,2)  Which is the weight connecting neuron 1 in the input layer with neuron 2 in the hidden layer.

104  Δw2(1,2)=ηX1(1)δ2(2)  Where  X1(1) is the output of the neuron 1 in the hidden layer.  δ2(2) is the error on the output of neuron 2 in the hidden layer.  δ2(2)=X2(2)[1-X2(2)]w3(2,1) δ3(1)

105  δ3(1)= y(1-y)(d-y) =x3(1)[1-x3(1)][d-x3(1)]  So we start with the error at the output and use this result to ripple backwards altering the weights.

106

107 Example  Exclusive OR using the network shown earlier: 2:2:1 network  Initial weights  W2(0,1)=0.862518, W2(1,1)=-0.155797, W2(2,1)=0.282885  W2(0,2)=0.834986, w2(1,2)=-0.505997, w2(2,2)=-0.864449  W3(0,1)=0.036498, w3(1,1)=-0.430437, w3(2,1)=0.48121

108 Feedforward – hidden layer (neuron 1)  So if  X1(0)=1 (the bias)  X1(1)=0  X1(2)=0  The output of weighted sum inside neuron 1 in the hidden layer=0.862518  Then using sigmoid function  X2(1)=0.7031864

109 Feedforward – hidden layer (neuron 2)  So if  X1(0)=1 (the bias)  X1(1)=0  X1(2)=0  The output of weighted sum inside neuron 2 in the hidden layer=0.834986  Then using sigmoid function  X2(2)=0.6974081

110 Feedforward – output layer  So if  X2(0)=1 (the bias)  X2(1)=0.7031864  X2(2)=0.6974081  The output of weighted sum inside neuron 2 in the hidden layer=0.0694203  Then using sigmoid function  X3(1)=0.5173481  Desired output=0

111  δ3(1)=x3(1)[1-x3(1)][d-x3(1)] =-0.1291812  δ2(1)=X2(1)[1-X2(1)]w3(1,1) δ3(1)=0.0116054  δ2(2)=X2(2)[1-X2(2)]w3(2,1) δ3(1)=-0.0131183  Now we can use the delta rule to calculate the change in the weights  ΔWi = ηXiδ

112 Examples  If we set η=0.5  ΔW2(0,1) = ηX1(0)δ2(1) =0.5 x 1 x 0.0116054 =0.0058027  ΔW3(2,1) = ηX2(1)δ3(1) =0.5 x 0.7031864 x –0.1291812 =-0.04545192

113  What would be the results of the following?  ΔW2(2,1) = ηX1(2)δ2(1)  ΔW2(2,2) = ηX1(2)δ2(2)

114  ΔW2(2,1) = ηX1(2)δ2(1) =0.5x0x0.0116054 =0  ΔW2(2,2) = ηX1(2)δ2(2) =0.5 x 0 x –0.131183 =0

115  New weights  W2(0,1)=0.868321W2(1,1)=-0.155797 W2(2,1)=0.282885  W2(0,2)=0.828427w2(1,2)=-0.505997w2(2,2)=- 0.864449  W3(0,1)=0.028093w3(1,1)=-0.475856 w3(2,1)=0.436164

116

117 Outcomes  Look at the theory of self-organisation.  Other self-organising networks  Look at examples of neural network applications

118 Multi-layered perceptron  Feedback network  Train by passing error backwards  Input-hidden-output layers  Most common

119 Multi-layered perceptron (Taken from Picton 2004) Input layer Hidden layer Output layer

120 Hopfield network  Feedback network  Easy to train  Single layer of neurons  Neurons fire in a random sequence

121 Hopfield network x1 x2 x3

122 Radial basis function network  Feedforward network  Has 3 layers  Hidden layer uses statistical clustering techniques to train  Good at pattern recognition

123 Radial basis function networks Input layer Hidden layer Output layer

124 Four requirements for SOM Weights in neuron must represent a class of pattern  one neuron, one class

125 Four requirements for SOM Inputs pattern presented to all neurons and each produces an output.  Output: measure of the match between input pattern and pattern stored by neuron.

126 Four requirements A competitive learning strategy selects neuron with largest response.

127 Four requirements A method of reinforcing the largest response.

128 Architecture  The Kohonen network (named after Teuvo Kohonen from Finland) is a self-organising network  Neurons are usually arranged on a 2- dimensional grid  Inputs are sent to all neurons  There are no connections between neurons

129 Architecture Kohonen network X

130 Theory  For a neuron output (j) is a weighted some:  Where x is the input, w is the weights, net is the output of the neuron

131 Four requirement-Kohonen networks  True  Euclidean distance and weighted sum  Winner takes all  Learning rule of Kohonen learning

132 Output value  The output of each neuron is the weighted sum  There is no threshold or bias  Input values and weights are normalized

133 “Winner takes all”  Initially the weights in each neuron are random  Input values are sent to all the neurons  The outputs of each neuron are compared  The “winner” is the neuron with the largest output value

134 Training  Having found the winner, the weights of the winning neuron are adjusted  Weights of neurons in a surrounding neighbourhood are also adjusted

135 Neighbourhood X Kohonen network neighbourhood

136 Training  As training progresses the neighbourhood gets smaller  Weights are adjusted according to the following formula:

137 Weight adjustment  The learning coefficient (alpha) starts with a value of 1 and gradually reduces to 0  This has the effect of making big changes to the weights initially, but no changes at the end  The weights are adjusted so that they more closely resemble the input patterns

138 Example  A Kohonen network receives the input pattern 0.6 0.6 0.6.  Two neurons in the network have weights 0.5 0.3 0.8 and -0.6 –0.5 0.6.  Which neuron will have its weights adjusted and what will the new values of the weights be if the learning coefficient is 0.4?

139 Answer

140 Summary  The Kohonen network is self-organising  It uses unsupervised training  All the neurons are connected to the input  A winner takes all mechanism determines which neuron gets its weights adjusted  Neurons in a neighbourhood also get adjusted

141 Demonstration  A demonstration of a Kohonen network learning has been taken from the following websites:  http://www.patol.com/java/TSP/index.html http://www.patol.com/java/TSP/index.html  http://www.samhill.co.uk/kohonen/index.htm http://www.samhill.co.uk/kohonen/index.htm

142 Applications of Neural Networks

143 Example Applications  Analysis of data  Classifying in EEG  Pattern recognition in ECG  EMG disease detection.

144 Gueli N et al (2005) The influence of lifestyle on cardiovascular risk factors analysis using a neural network Archives of Gerontology and Geriatrics 40 157–172  To produce a model of risk facts in heart disease.  MLP used  The accuracy was relatively good for chlorestremia and triglyceremdia:  Training phase around 99%  Testing phase around 93%  Not so good for HDL

145

146 Subasi A (in press) Automatic recognition of alertness level from EEG by using neural network and wavelet coefficients Expert Systems with Applications xx (2004) 1–11  Electroencephalography (EEG)  Recordings of electrical activity from the brain.  Classifying operation  Awake  Drowsy  Sleep

147  MLP  15-23-3  Hidden layer – log-tanh function  Output layer – log-sigmoid function  Input is normalise to be within the range 0 to 1.

148  Accuracy  95%+/-3% alert  93%+/-4% drowsy  92+/-5% sleep  Feature were extracted and form the input to the network, from wavelets.

149 Karsten Sternickel (2002) Automatic pattern recognition in ECG time series Computer Methods and Programs in Biomedicine 68 109–115  ECG – electrocardiographs – electrical signals from the heart.  Wavelets again.  Classification of patterns  Patterns were spotted

150

151

152 Abel et al (1996) Neural network analysis of the EMG interference pattern Med. Eng. Phys. Vol. 18, No. 1. pp. 12-l 7  EMG – Electromyography – muscle activity.  Interference patterns are signals produce from various parts of a muscle- hard to see features.  Applied neural network to EMG interference patterns.

153  Classifying  Nerve disease  Muscle disease  Controls  Applied various different ways of presenting the pattern to the ANN.  Good for less serve cases, serve cases can often be see by the clinician.

154 Example Applications  Wave prediction  Controlling a vehicle  Condition monitoring

155 Wave prediction  Raoa S, Mandal S(2005) Hindcasting of storm waves using neural networks Ocean Engineering 32 (2005) 667–684  MLP used to predict storm waves.  2:2:2 network  Good correlation between ANN model and another model

156

157 van de Ven P, Flanagan C, Toal D (in press) Neural network control of underwater vehicles Engineering Applications of Artificial Intelligence  Semiautomous vehicle  Control using ANN  ANN replaces a mathematical model of the system.

158

159

160 Silva et al (2000) THE ADAPTABILITY OF A TOOL WEAR MONITORING SYSTEM UNDER CHANGING CUTTING CONDITIONS Mechanical Systems and Signal Processing (2000) 14(2), 287-298  Modelling tool wear  Combines ANN with other AI (Expert systems)  Self organising Maps (SOM) and ART2 investigated  SOM better for extracting the required information.

161

162 Examples to try yourself  A.1 Number recognition (ONR)  http://www.generation5.org/jdk/demos.asp# neuralNetworks http://www.generation5.org/jdk/demos.asp# neuralNetworks  Details: http://www.generation5.org/content/2004/si mple_ocr.asp http://www.generation5.org/content/2004/si mple_ocr.asp

163  B.1 Kohonen Self Organising Example 1  http://www.generation5.org/jdk/demos.asp# neuralNetworks http://www.generation5.org/jdk/demos.asp# neuralNetworks  B.2 Kohonen 3D travelling salesman problem  http://fbim.fh- regensburg.de/~saj39122/jfroehl/diplom/e- index.html http://fbim.fh- regensburg.de/~saj39122/jfroehl/diplom/e- index.html


Download ppt "Artificial Intelligence Techniques. Aims: Section fundamental theory and practical applications of artificial neural networks."

Similar presentations


Ads by Google