Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

Similar presentations


Presentation on theme: "Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons."— Presentation transcript:

1 Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons

2 2 Adaptive Filtering Problem Dynamic SystemThe external behavior of the system: T: {x(i), d(i); i=1, 2, …, n, …} where x(i)=[x 1 (i), x 2 (i), …, x m (i)] T x(i) can arise from:  Spatial: x(i) is a snapshot of data.  Temporal: x(i) is uniformly spaced in time. Signal-flow Graph of the Adaptive Filter  Filtering Process  y(i) is produced in response to x(i).  e(i) = d(i) - y(i)  Adaptive Process  Automatic Adjustment of the synaptic weights in accordance with e(i).

3 3 Unconstrained Optimization Techniques  Let C(w) be a continuously differentiable function of some unknown weight (parameter) vector w.  C(w) maps w into real numbers.  Goal: Find an optimal solution w* that satisfies C(w*)  C(w)  Minimize C(w) with respect to w. Necessary Condition for optimality:  C(w*)=0 (  is the gradient operator) A class of unconstrained optimization algorithm: Starting with an initial guess denoted by w(0), generate a sequence of weight vectors w(1), w(2), …, such that the cost function C(w) is reduced at each iteration of the algorithm.

4 4 Method of Steepest Descent The successive adjustments applied to w are in the direction of steepest descent, that is, in a direction opposite to the gradient vector  C(w). Let The steepest descent algorithm: w(n+1)=w(n)-  g(n)  : a positive constant called the stepsize or learning-rate parameter.  w(n) = w(n+1) - w(n) = -  g(n) Small   Overdamp the transient response. Large   Underdamp the transient response. If  exceeds a certain value, the algorithm becomes unstable.

5 5 Newton’s Method Applying second-order Taylor series expansion of C(w) around w(n).  C(w) is minimized when Generally speaking, Newton’s method converges quickly Minimize the quadratic approximation of the cost function C(w) around the current point w.

6 6 Gauss-Newton Method Let Gauss-Newton method is applicable to a cost function C(w) that is the sum of error squares. The Jacobian J(n) is  [  e(n)] T Goal:

7 7 Gauss-Newton Method (Cont.) Differentiating this expression with respect to w and setting the result to be zero. To guard against the possibility that J(n) is rank deficient.

8 8 Linear Least-Squares Filter Characteristics of Linear Least-Squares Filter – The single neuron around which it is built is linear. – The cost function C(w) consists of the sum of error squares. where d(n)=[d(1), d(2),…, d(n)] T X(n)=[x(1), x(2),…, x(n)] T Substituting it into equation derived from Gauss-Newton Method

9 9 Wiener Filter Limiting form of the Linear Least-Squares Filter for an Ergodic Environment Let w 0 denote the Wiener solution to the linear optimum filtering problem. Let R x denote the Correlation Matrix of input vector x(i). Let r xd denote the Cross-correlation Vector of x(i) and d(i).

10 10 Least-Mean-Square (LMS) Algorithm LMS is based on instantaneous values for the cost function e(n) is the error signal measured at time n. is used in place of w(n) to emphasize that LMS produces an estimate of w that result from the method of steepest descent. Summary of the LMS Algorithm

11 11 Virtues and Limitations of LMS Virtues – Simplicity Limitations – Slow rate of convergence – Sensitivity to variations in the eigenstructure of the input

12 12 Learning Curve

13 13 Learning Rate Annealing Normal Approach: Stochastic Approximation: There is a danger of parameter blowup for small n when c is large. Search-then-converge schedule:

14 14 Perceptron Let x 0 =1 and b=w 0  The simplest form used for the classification of patterns said to be linearly separable.  Goal: Classify the set {x(1), x(2), …, x(n)} into one of two classes, C 1 or C 2.  Decision Rule: Assign x(i) to class C 1 if y=+1 and to class C 2 if y=-1. w T x > 0 for every input vector x belonging to class C 1 w T x  0 for every input vector x belonging to class C 2

15 15 Perceptron (Cont.) Algorithms: 1. w(n+1)=w(n)if w T x(n) > 0 and x(n) belongs to class C 1 w(n+1)=w(n)if w T x(n)  0 and x(n) belongs to class C 2 2. w(n+1)=w(n)-  (n)x(n)if w T x(n) > 0 and x(n) belongs to class C 2 w(n+1)=w(n)+  (n)x(n)if w T x(n)  0 and x(n) belongs to class C 1 Let w(n+1) = w(n) +  [d(n)-y(n)]x(n)(Error-correction learning rule form)  Smaller  provides stable weight estimates.  Larger  provides fast adaption.


Download ppt "Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons."

Similar presentations


Ads by Google