Pattern Recognition Exercise Architecture? Weights? Are the original patterns classified correctly? Are the original patterns with 1 piece of wrong data classified correctly? Are the original patterns with 1 piece of missing data classified correctly?
Perceptrons (1958) Very important early neural network Guaranteed training procedure under certain circumstances x0x0 x1x1 y xnxn 1 w0w0 w1w1 wnwn
Activation Function f(y in ) = 1 if y in > f(y in ) = 0 if - <= y in <= f(y in ) = -1 otherwise Graph interpretation 1
Learning Rule w i (new) = w i (old) + *t*x i if error is the learning rate Typically, 0 < <= 1
Algorithm 1. set w i = 0 for 0 <= i <= n (can be random) 2. for each training exemplar do 3.x i = s i 4.y in = x i *w i 5.y = f(y in ) 6.w i (new) = w i (old) + *t*x i if error 7. if stopping condition not reached, go to 2
Exercise Continue the above example until the learning algorithm is finished.
Perceptron Learning Rule Convergence Theorem If a weight vector exists that correctly classifies all of the training examples, then the perceptron learning rule will converge to some weight vector that gives the correct response for all training patterns. This will happen in a finite number of steps.
Exercise Show perceptron weights for the 2-of-3 concept x1x2x3y 1111 111 1 11 1 111 1 1
Adaline (Widrow, Huff 1960) Adaptive Linear Network Learning rule minimizes the mean squared error Learns on all examples, not just ones with errors
Training Algorithm 1. set w i (small random values typical) 2. set (0.1 typical) 3. for each training exemplar do 4.x i = s i 5.y in = x i *w i 6.w i (new) = w i (old) + *(t – y in )*x i 7. go to 3 if largest weight change big enough
Activation Function f(y in ) = 1 if y in >= 0 f(y in ) = -1 otherwise
Delta Rule squared error E = (t – y in ) 2 minimize error E’ = -2(t – y in )x i = (t – y in )x i
Example: AND concept bipolar inputs bipolar targets w 0 = -0.5, w 1 = 0.5, w 2 = 0.5 minimizes E x0x0 x1x1 x2x2 y in tE 111.51.25 11-.5.25 11-.5.25 1 -1.5.25
Exercise Demonstrate that you understand the Adaline training procedure.
Madaline Many adaptive linear neurons xmxm x1x1 1 zkzk z1z1 1 y
Madaline MRI (1960) – only learns weights from input layer to hidden layer MRII (1987) – learns all weights