Download presentation
Presentation is loading. Please wait.
Published byGary Anderson Modified over 9 years ago
1
1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath Pupacdi SCCS451 Artificial Intelligence Week 12
2
2 Agenda Multi-layer Neural Network Hopfield Network
3
3 Multilayer Neural Networks A multilayer perceptron is a feedforward neural network with ≥ 1 hidden layers. Single-layer VS Multi-layer Neural Networks
4
4 Roles of Layers Input Layer Accepts input signals from outside world Distributes the signals to neurons in hidden layer Usually does not do any computation Output Layer (computational neurons) Accepts output signals from the previous hidden layer Outputs to the world Knows the desired outputs Hidden Layer (computational neurons) Determines its own desired outputs
5
5 Hidden (Middle) Layers Neurons in hidden layers unobservable through input and output of the networks. Desired output unknown (hidden) from the outside and determined by the layer itself 1 hidden layer for continuous functions 2 hidden layers for discontinuous functions Practical applications mostly use 3 layers More layers are possible but each additional layer exponentially increases computing load
6
6 How do multilayer neural networks learn? More than a hundred different learning algorithms are available for multilayer ANNs The most popular method is back-propagation.
7
7 Back-propagation Algorithm In a back-propagation neural network, the learning algorithm has 2 phases. 1.Forward propagation of inputs 2.Backward propagation of errors The algorithm loops over the 2 phases until the errors obtained are lower than a certain threshold. Learning is done in a similar manner as in a perceptron A set of training inputs is presented to the network. The network computes the outputs. The weights are adjusted to reduce errors. The activation function used is a sigmoid function.
8
8 Common Activation Functions Hard limit functions often used for decision-making neurons for classification and pattern recognition Popular in back- propagation networks Often used for linear approximation Output is a real number in the [0, 1] range.
9
9 3-layer Back-propagation Neural Network
10
10 How a neuron determines its output Very similar to the perceptron 1. Compute the net weighted input 2. Pass the result to the activation function jj 2 5 1 8 0.1 0.2 0.5 0.3 X = (0.1(2) + 0.2(5) + 0.5(1) + 0.3(8)) – 0.2 = 3.9 Y = 1 / (1 + e -3.9 ) = 0.98 Let θ = 0.2 0.98 Input Signals
11
11 How the errors propagate backward The errors are computes in a similar manner to the errors in the perceptron. Error = The output we want – The output we get Error at an output neuron k at iteration p kk 2 5 1 8 0.1 0.2 0.5 0.3 e k (p) = 1 – 0.98 = 0.02 Suppose the expected output is 1. 0.98 Iteration p Error Signals
12
12 Back-Propagation Training Algorithm Step 1: Initialization Randomly define weights and threshold θ such that the numbers are within a small range where F i is the total number of inputs of neuron i. The weight initialization is done on a neuron-by-neuron basis.
13
13 Back-Propagation Training Algorithm Step 2: Activation Propagate the input signals forward from the input layer to the output layer. jj 2 5 1 8 0.1 0.2 0.5 0.3 X = (0.1(2) + 0.2(5) + 0.5(1) + 0.3(8)) – 0.2 = 3.9 Y = 1 / (1 + e -3.9 ) = 0.98 Let θ = 0.2 0.98 Input Signals
14
14 Back-Propagation Training Algorithm Step 3: Weight Training There are 2 types of weight training 1.For the output layer neurons 2.For the hidden layer neurons *** It is important to understand that first the input signals propagate forward, and then the errors propagate backward to help train the weights. *** In each iteration (p + 1), the weights are updated based on the weights from the previous iteration p. The signals keep flowing forward and backward until the errors are below some preset threshold value.
15
15 3.1 Weight Training (Output layer neurons) These formulas are used to perform weight corrections. kk 1 2 m w 1,k w 2,k w m,k y k (p) Iteration p j w j,k e k (p) = y d,k (p) - y k (p) δ = error gradient
16
16 We want to compute thisWe know this predefined We know how to compute this We know how to compute these kk 1 2 m w 1,k w 2,k w m,k y k (p) Iteration p j w j,k e k (p) = y d,k (p) - y k (p) We do the above for each of the weights of the outer layer neurons.
17
17 3.2 Weight Training (Hidden layer neurons) These formulas are used to perform weight corrections. jj 1 2 n w 1,j w 2,j w n,j Iteration p i w i,j 1 2 l k
18
18 We want to compute thisWe know this predefined input We do the above for each of the weights of the hidden layer neurons. Propagates from the outer layer We know this jj 1 2 n w 1,j w 2,j w n,j Iteration p i w i,j 1 2 l k We know how to compute these
19
19 P = 1
20
20 Weights trained P = 1
21
21 After the weights are trained in p = 1, we go back to Step 2 (Activation) and compute the outputs for the new weights. If the errors obtained via the use of the updated weights are still above the error threshold, we start weight training for p = 2. Otherwise, we stop.
22
22 P = 2
23
23 Weights trained P = 2
24
24 Example: 3-layer ANN for XOR x 2 or input 2 x 1 or input 1 (1, 1) (1, 0) (0, 1) (0, 0) XOR is not a linearly separable function. A single-layer ANN or the perceptron cannot deal with problems that are not linearly separable. We cope with these problem using multi-layer neural networks.
25
25 Example: 3-layer ANN for XOR (Non-computing) Let α = 0.1
26
26 Example: 3-layer ANN for XOR 26 Training set: x 1 = x 2 = 1 and y d,5 = 0 -0.1 -1.2 1.0 1.1 0.8 0.3 0.4 0.9 0.5 1 1 0 1 1 0.9 0.4 1.0 Calculate y 3 = sigmoid(0.5+0.4-0.8) = 0.5250 Calculate y 4 = sigmoid(0.9+1.0+0.1) = 0.8808 0.5250 0.8808 -0.63 0.9689 y 5 = sigmoid(-0.63+0.9689-0.3) = 0.5097 0.5097 e = 0 – 0.5097 = – 0.5097 Let α = 0.1
27
27 Example: 3-layer ANN for XOR (2) 27 Back-propagation of error (p = 1, output layer) -0.1 -1.2 1.0 1.1 0.8 0.3 0.4 0.9 0.5 1 1 0 1 1 0.9 0.4 1.0 0.5250 0.8808 -0.63 0.9689 y 5 = 0.5097 e = – 0.5097 δ = y 5 x (1-y 5 ) x e = 0.5097 x (1-0.5097) x (-0.5097) = -0.1274 Δw j,k (p) = α x y j (p) x δ k (p) Δw 3,5 (1) = 0.1 x 0.5250 x (-0.1274) = -0.0067 Let α = 0.1 w j,k (p+1) = w j,k (p) + Δw j,k (p) w 3,5 (2) = -1.2 – 0.0067 = -1.2067 -1.2067
28
28 Example: 3-layer ANN for XOR (3) 28 Back-propagation of error (p = 1, output layer) -0.1 -1.2 1.0 1.1 0.8 0.3 0.4 0.9 0.5 1 1 0 1 1 0.9 0.4 1.0 0.5250 0.8808 -0.63 0.9689 y 5 = 0.5097 e = – 0.5097 δ =-0.1274 Let α = 0.1 w 4,5 (2) = 1.1 – 0.0112 = 1.0888 -1.2067 Δw j,k (p) = α x y j (p) x δ k (p) Δw 4,5 (1) = 0.1 x 0.8808 x (-0.1274) = -0.0112 w j,k (p+1) = w j,k (p) + Δw j,k (p) 1.0888
29
29 Example: 3-layer ANN for XOR (4) 29 Back-propagation of error (p = 1, output layer) -0.1 -1.2 1.0 1.1 0.8 0.3 0.4 0.9 0.5 1 1 0 1 1 0.9 0.4 1.0 0.5250 0.8808 -0.63 0.9689 y 5 = 0.5097 e = – 0.5097 δ =-0.1274 Let α = 0.1 θ 5 (2) = 0.3 + 0.0127= 0.3127 -1.2067 Δθ k (p) = α x y(p) x δ k (p) Δθ 5 (1) = 0.1 x -1 x (-0.1274) = 0.0127 θ 5 (p+1) = θ 5 (p) + Δ θ 5 (p) 1.0888 0.3127
30
30 Example: 3-layer ANN for XOR (5) 30 Back-propagation of error (p = 1, input layer) -0.1 -1.2 1.0 1.1 0.8 0.3 0.4 0.9 0.5 1 1 0 1 1 0.9 0.4 1.0 0.5250 0.8808 -0.63 0.9689 y 5 = 0.5097 e = – 0.5097 δ =-0.1274 Let α = 0.1 -1.2067 1.0888 0.3127 Δw i,j (p) = α x x i (p) x δ j (p) Δw 1,3 (1) = 0.1 x 1 x 0.0381 = 0.00381 w i,j (p+1) = w i,j (p) + Δw i,j (p) w 1,3 (2) = 0.5 + 0.00381 = 0.5038 δ j (p) = y i (p) x (1-y i (p)) x ∑ [α k (p) w j,k (p)], all k’s δ 3 (p) = 0.525 x (1- 0.525) x (-0.1274 x -1.2) = 0.0381 0.5038
31
31 Example: 3-layer ANN for XOR (6) 31 Back-propagation of error (p = 1, input layer) -0.1 -1.2 1.0 1.1 0.8 0.3 0.4 0.9 0.5 1 1 0 1 1 0.9 0.4 1.0 0.5250 0.8808 -0.63 0.9689 y 5 = 0.5097 e = – 0.5097 δ =-0.1274 Let α = 0.1 -1.2067 1.0888 0.3127 Δw i,j (p) = α x x i (p) x δ j (p) Δw 1,4 (1) = 0.1 x 1 x -0.0147 = -0.0015 w i,j (p+1) = w i,j (p) + Δw i,j (p) w 1,4 (2) = 0.9 -0.0015 = 0.8985 δ j (p) = y i (p) x (1-y i (p)) x ∑ [α k (p) w j,k (p)], all k’s δ 4 (p) = 0.8808 x (1- 0.8808) x (-0.1274 x 1.1) = -0.0147 0.5038 0.8985
32
32 Example: 3-layer ANN for XOR (7) 32 Back-propagation of error (p = 1, input layer) -0.1 -1.2 1.0 1.1 0.8 0.3 0.4 0.9 0.5 1 1 0 1 1 0.9 0.4 1.0 0.5250 0.8808 -0.63 0.9689 y 5 = 0.5097 e = – 0.5097 δ =-0.1274 Let α = 0.1 -1.2067 1.0888 0.3127 Δw i,j (p) = α x x i (p) x δ j (p) Δw 2,3 (1) = 0.1 x 1 x 0.0381 = 0.0038 w i,j (p+1) = w i,j (p) + Δw i,j (p) w 2,3 (2) = 0.4 + 0.0038 = 0.4038 δ 3 (p) = 0.0381 δ 4 (p) = -0.0147 0.5038 0.4038 0.8985
33
33 Example: 3-layer ANN for XOR (8) 33 Back-propagation of error (p = 1, input layer) -0.1 -1.2 1.0 1.1 0.8 0.3 0.4 0.9 0.5 1 1 0 1 1 0.9 0.4 1.0 0.5250 0.8808 -0.63 0.9689 y 5 = 0.5097 e = – 0.5097 δ =-0.1274 Let α = 0.1 -1.2067 1.0888 0.3127 Δw i,j (p) = α x x i (p) x δ j (p) Δw 2,4 (1) = 0.1 x 1 x -0.0147 = -0.0015 w i,j (p+1) = w i,j (p) + Δw i,j (p) w 2,4 (2) = 1 – 0.0015 = 0.9985 δ 3 (p) = 0.0381 δ 4 (p) = -0.0147 0.5038 0.4038 0.8985 0.9985
34
34 Example: 3-layer ANN for XOR (9) 34 Back-propagation of error (p = 1, input layer) -0.1 -1.2 1.0 1.1 0.8 0.3 0.4 0.9 0.5 1 1 0 1 1 0.9 0.4 1.0 0.5250 0.8808 -0.63 0.9689 y 5 = 0.5097 e = – 0.5097 δ =-0.1274 Let α = 0.1 -1.2067 1.0888 0.3127 δ 3 (p) = 0.0381 δ 4 (p) = -0.0147 0.5038 0.4038 0.8985 0.9985 θ 3 (2) = 0.8 - 0.0038 = 0.7962 Δθ k (p) = α x y(p) x δ k (p) Δθ 3 (1) = 0.1 x -1 x 0.0381 = -0.0038 θ 3 (p+1) = θ 3 (p) + Δ θ 3 (p) 0.7962
35
35 Example: 3-layer ANN for XOR (10) 35 Back-propagation of error (p = 1, input layer) -0.1 -1.2 1.0 1.1 0.8 0.3 0.4 0.9 0.5 1 1 0 1 1 0.9 0.4 1.0 0.5250 0.8808 -0.63 0.9689 y 5 = 0.5097 e = – 0.5097 δ =-0.1274 Let α = 0.1 -1.2067 1.0888 0.3127 δ 3 (p) = 0.0381 δ 4 (p) = -0.0147 0.5038 0.4038 0.8985 0.9985 θ 4 (2) = -0.1 + 0.0015 = -0.0985 Δθ k (p) = α x y(p) x δ k (p) Δθ 4 (1) = 0.1 x -1 x (-0.0147) = 0.0015 θ 4 (p+1) = θ 4 (p) + Δ θ 4 (p) 0.7962 -0.0985
36
36 Example: 3-layer ANN for XOR (9) 36 -0.1 -1.2 1.0 1.1 0.8 0.3 0.4 0.9 0.5 α = 0.1 Now the 1 st iteration (p = 1) is finished. Weight training process is repeated until the sum of squared errors is less than 0.001 (threshold). -1.2067 1.0888 0.3127 0.5038 0.4038 0.8985 0.9985 0.7962 -0.0985
37
37 Learning Curve for XOR The curve shows ANN learning speed. 224 epochs or 896 iterations were required.
38
38 Final Results 38 -0.1 -1.2 1.0 1.1 0.8 0.3 0.4 0.9 0.5 -10.4 9.8 4.6 4.7 4.8 6.4 7.3 2.8 Training again with different initial values may result differently. It works so long as the sum of squared errors is below the preset error threshold.
39
39 Final Results Different result possible for different initial. But the result always satisfies the criterion.
40
40 McCulloch-Pitts Model: XOR Op. Activation function: sign function
41
41 Decision Boundary (a) Decision boundary constructed by hidden neuron 3; (b) Decision boundary constructed by hidden neuron 4; (c) Decision boundaries constructed by the complete three-layer network
42
42 Problems of Back-Propagation Not similar to the process of a biological neuron Heavy computing load
43
43 Accelerated Learning in Multi-layer NN (1) Represent sigmoid function by hyperbolic tangent: where a and b are constants. Suitable values: a = 1.716 and b = 0.667
44
44 Accelerated Learning in Multi-layer NN (2) Include a momentum term in the delta rule where is a positive number (0 1) called the momentum constant. Typically, the momentum constant is set to 0.95. This equation is called the generalized delta rule.
45
45 Learning with Momentum Reduced from 224 to 126 epochs
46
46 Accelerated Learning in Multi-layer NN (3) Adaptive learning rate: Idea small smooth learning curve large fast learning, possibly instable Heuristic rule: increase learning rate when the change of the sum of squared errors has the same algebraic sign for several consequent epochs. decrease learning rate when the sign alternates for several consequent epochs
47
47 Effect of Adaptive Learning Rate
48
48 Momentum + Adaptive Learning Rate
49
49 The Hopfield Network Neural networks were designed on an analogy with the brain, which has associative memory. We can recognize a familiar face in an unfamiliar environment. Our brain can recognize certain patterns even though some information about the patterns differ from what we have remembered. Multilayer ANNs are not intrinsically intelligent. Recurrent Neural Networks (RNNs) are used to emulate human’s associative memory. Hopfield network is a RNN.
50
50 The Hopfield Network: Goal To recognize a pattern even if some parts are not the same as what it was trained to remember. The Hopfield network is a single-layer network. It is recurrent. The network outputs are calculated and then fed back to adjust the inputs. The process continues until the outputs become constant. Let’s see how it works.
51
51 Single-layer n-neuron Hopfield Network I n p u t S i g n a l s O u t p u t S i g n a l s
52
52 Activation Function If the neuron’s weighted input is greater than zero, the output is +1. If the neuron’s weighted input is less than zero, the output is -1. If the neuron’s weighted input is zero, the output remains in its previous state.
53
53 Hopfield Network Current State The current state of the network is determined by the current outputs, i.e. the state vector.
54
54 What can it recognize? n = the number of inputs = n Each input can be +1 or -1 There are 2 n possible sets of input/output, i.e. patterns. M = total number of patterns that the network was trained with, i.e. the total number of patterns that we want the network to be able to recognize
55
55 Example: n = 3, 2 3 = 8 possible states
56
56 Weights Weights between neurons are usually represented in matrix form For example, let’s train the 3D network to recognize the following 2 patterns (M = 2, n = 3) Once the weights are calculated, they remained fixed. 1 1 1 1 Y 1 1 1 2 Y
57
57 Weights (2) M = 2 Thus we can determine the weight matrix as follows 1 1 1 1 Y 1 1 1 2 Y 111 1 T Y 111 2 T Y 100 010 001 I 100 010 001 2111 1 1 1 111 1 1 1 W 022 202 220
58
58 How is the Hopfield network tested? Given an input vector X, we calculate the output in a similar manner that we have seen before. Y m = sign(W X m – θ ), m = 1, 2, …, M Θ is the threshold matrix In this case all thresholds are set to zero.
59
59 Stable States As we see, Y 1 = X 1 and Y 2 = X 2. Thus both states are said to be stable (also called fundamental states).
60
60 Unstable States With 3 neurons in the network, there are 8 possible states. The remaining 6 states are unstable. Possible StateIteration InputsOutputs Fundamental Memory x1x2x3y1y2y3 1110111111111 110 11111 1111111111 1 101 1111 1111111111 11 011 111 1111111111 0 10 1 1 1 0 1 1 1 0 1 1
61
61 Error Correction Network Each of the unstable states represents a single error, compared to the fundamental memory. The Hopfield network can act as an error correction network.
62
62 The Hopfield Network The Hopfield network can store a set of fundamental memories. The Hopfield network can recall those fundamental memories when presented with inputs that maybe exactly those memories or slightly different. However, it may not always recall correctly. Let’s see an example.
63
63 Ex: When Hopfield Network cannot recall X1 = (+1, +1, +1, +1, +1) X2 = (+1, -1, +1, -1, +1) X3 = (-1, +1, -1, +1, -1) Let the probe vector be X = (+1, +1, -1, +1, +1) It is very similar to X1, but the network recalls it as X3. This is a problem with the Hopfield Network
64
64 Storage capacity of the Hopfield Network Storage capacity is the largest number of fundamental memories that can be stored and retrieved correctly. The maximum number of fundamental memories M max that can be stored in the n-neuron recurrent network is limited by
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.