Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ch2: Adaline and Madaline

Similar presentations


Presentation on theme: "Ch2: Adaline and Madaline"— Presentation transcript:

1 Ch2: Adaline and Madaline
Adaline : Adaptive Linear neuron Madaline : Multiple Adaline 2.1 Adaline (Bernard Widrow, Stanford Univ.) bias term (feedback, error, gain, adjust term) Linear combination d = f(y)

2 2.1.1 Least Mean Square (LMS) Learning
◎ Input vectors : Ideal outputs : Actual outputs : Assume the output function: f(y) = y = d Mean square error: Let correlation matrix

3 Idea: Let Obtain Practical difficulties of analytical formula : 1. Large dimensions difficult to calculate 2. < > expected value - Knowledge of probabilities

4 Steepest Descent The graph of is a paraboloid.

5 Let Steps: 1. Initialize weight values
2. Determine the steepest descent direction Let 3. Modify weight values 4. Repeat 2~3. No calculation of Drawbacks: i) To know R and p is equivalent to knowing the error surface in advance. ii) Steepest descent training is a batch training method.

6 2.1.3 Stochastic Gradient Descent
Approximate by randomly selecting one training example at a time 1. Apply an input vector 2. 3. 4. 5. Repeat 1~4 with the next input vector No calculation of

7 Drawback: time consuming.
Improvement: mini-batch training method. ○ Practical Considerations: (a) No. of training vectors, (b) Stopping criteria (c) Initial weights, (d) Step size

8 2.1.4 Conjugate Gradient Descent
-- Drawback: can only minimize quadratic functions, e.g., Advantage: guarantees to find the optimum solution in at most n iterations, where n is the size of matrix A. A-Conjugate Vectors: Let square, symmetric, positive-definite matrix. Vectors are A-conjugate if * If A = I (identity matrix), conjugacy = orthogonality.

9 Set S forms a basis for space
The solution in can be written as The conjugate-direction method for minimizing f(w) is defined by where w(0) is an arbitrary starting vector. is determined by How to determine Define , which is in the steepest descent direction of Let

10 Multiply by s(i-1)A, In order to be A-conjugate: generated by Eqs. (A) and (B) are A-conjugate. Desire that evaluating does not need to know A. Polak-Ribiere formula:

11 Fletcher-Reeves formula:
* The conjugate-direction method for minimizing w(0) is an arbitrary starting vector is determined by

12 Nonlinear Conjugate Gradient Algorithm
Initialize w(0) by an appropriate process

13 Example: A comparison of the convergences of
gradient descent (green) and conjugate gradient (red) for minimizing a quadratic function. Conjugate gradient converges in at most n steps where n is the size of the matrix of the system (here n=2).

14 2.3. Applications 2.3.1. Echo Cancellation in Telephone Circuits
n : incoming voice, s : outgoing voice : noise (leakage of the incoming voice) y : the output of the filter mimics

15 Hybrid circuit: deals with the leakage issue, which
attempts to isolate incoming from outgoing signals Adaptive filter: deals with the choppy issue, which mimics the leakage of the incoming voice for suppressing the choppy speech from the outgoing signals (s not correlated with y, )

16 2.3.2 Predict Signal An adaptive filter is trained to predict signal.
The signal used to train the filter is a delayed actual signal. The expected output is the current signal.

17 2.3.3 Reproduce Signal

18 2.3.4. Adaptive beam – forming antenna arrays
Antenna : spatial array of sensors which are directional in their reception characteristics. Adaptive filter learns to steer antennae in order that they can respond to incoming signals no matter what their directions are, which reduce responses to unwanted noise signals coming in from other directions

19 2.4 Madaline : Many adaline
○ XOR function ?

20

21 2.4.2. Madaline Rule II (MRII)
○ Training algorithm – A trial–and–error procedure with a minimum disturbance principle (those nodes that can affect the output error while incurring the least change in their weights should have precedence in the learning process) ○ Procedure – 1. Input a training pattern 2. Count #incorrect values in the output layer

22 3. For all units on the output layer
3.1. Select the first previously unselected error node whose analog output is closest to zero ( this node can reverse its bipolar output with the least change in its weights) 3.2. Change the weights on the selected unit s.t. the bipolar output of the unit changes 3.3. Input the same training pattern 3.4. If reduce #errors, accept the weight change, otherwise restore the original weights 4. Repeat Step 3 for all layers except the input layer

23 5. For all units on the output layer
5.1. Select the previously unselected pair of units whose output are closest to zero 5.2. Apply a weight correction to both units, in order to change their bipolar outputs 5.3. Input the same training pattern 5.4. If reduce # errors, accept the correction; otherwise, restore the original weights. 6. Repeat step 5 for all layers except the input layer.

24 ※ Steps 5 and 6 can be repeated with triplets,
quadruplets or longer combinations of units until satisfactory results are obtained The MRII learning rule considers the network with only one hidden layer. For networks with more hidden layers, the backpropagation learning strategy to be discussed later can be employed.

25 2.4.3. A Madaline for Translation–Invariant
Pattern Recognition

26 。Relationships among the weight matrices of Adalines

27 ○ Extension -- Mutiple slabs with different key weight
matrices for discriminating more then two classes of patterns


Download ppt "Ch2: Adaline and Madaline"

Similar presentations


Ads by Google