Presentation is loading. Please wait.

Presentation is loading. Please wait.

September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +

Similar presentations


Presentation on theme: "September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +"— Presentation transcript:

1 September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 + … + w n i n Output Compare with desired value class(i) (1 or -1)

2 September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 2 The Adaline Learning Algorithm The Adaline uses gradient descent to determine the weight vector that leads to minimal error. Error is defined as the MSE between the neuron’s net input net j and its desired output d j (= class(i j )) across all training samples i j. The idea is to pick samples in random order and perform (slow) gradient descent in their individual error functions. This technique allows incremental learning, i.e., refining of the weights as more training samples are added.

3 September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 3 The Adaline Learning Algorithm The Adaline uses gradient descent to determine the weight vector that leads to minimal error.

4 September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 4 The Adaline Learning Algorithm The gradient is then given by For gradient descent,  w should be a negative multiple of the gradient:

5 September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 5 The Widrow-Hoff Delta Rule In the original learning rule Longer input vectors result in greater weight changes, which can cause problems if there are extreme differences in vector length in the training set. Widrow and Hoff (1960) suggested the following modification of the learning rule:

6 September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 6 Multiclass Discrimination Often, our classification problems involve more than two classes. For example, character recognition requires at least 26 different classes. We can perform such tasks using layers of perceptrons or Adalines.

7 September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 7 Multiclass Discrimination A four-node perceptron for a four-class problem in n- dimensional input space i1i1i1i1 i2i2i2i2 inininin... o1o1o1o1 o2o2o2o2 o3o3o3o3 o4o4o4o4 w 11 w 12 w 4n

8 September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 8 Multiclass Discrimination Each perceptron learns to recognize one particular class, i.e., output 1 if the input is in that class, and 0 otherwise. The units can be trained separately and in parallel. In production mode, the network decides that its current input is in the k-th class if and only if o k = 1, and for all j  k, o j = 0, otherwise it is misclassified. For units with real-valued output, the neuron with maximal output can be picked to indicate the class of the input. This maximum should be significantly greater than all other outputs, otherwise the input is misclassified.

9 September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 9 Multilayer Networks Although single-layer perceptron networks can distinguish between any number of classes, they still require linear separability of inputs. To overcome this serious limitation, we can use multiple layers of neurons. Rosenblatt first suggested this idea in 1961, but he used perceptrons. However, their non-differentiable output function led to an inefficient and weak learning algorithm. The idea that eventually led to a breakthrough was the use of continuous output functions and gradient descent.

10 September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 10 Multilayer Networks The resulting backpropagation algorithm was popularized by Rumelhart, Hinton, and Williams (1986). This algorithm solved the “credit assignment” problem, i.e., crediting or blaming individual neurons across layers for particular outputs. The error at the output layer is propagated backwards to units at lower layers, so that the weights of all neurons can be adapted appropriately. The gradient descent technique is similar to the Adaline, but propagating the error requires some additional computations.

11 September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 11Terminology Example: Network function f: R 3  R 2 output layer hidden layer input layer input vector output vector x1x1x1x1 x2x2x2x2 o2o2o2o2 o1o1o1o1 x3x3x3x3


Download ppt "September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +"

Similar presentations


Ads by Google