Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

Slides:



Advertisements
Similar presentations
Aula 3 Single Layer Percetron
Advertisements

Instabilities of SVD Small eigenvalues -> m+ sensitive to small amounts of noise Small eigenvalues maybe indistinguishable from 0 Possible to remove small.
Slides from: Doug Gray, David Poole
Introduction to Neural Networks Computing
Artificial Neural Networks
Adaptive Filters S.B.Rabet In the Name of GOD Class Presentation For The Course : Custom Implementation of DSP Systems University of Tehran 2010 Pages.
Adaptive IIR Filter Terry Lee EE 491D May 13, 2005.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The FIR Adaptive Filter The LMS Adaptive Filter Stability and Convergence.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.
Linear Discriminant Functions
Perceptron.
The loss function, the normal equation,
Machine Learning Neural Networks
2806 Neural Computation Single Layer Perceptron Lecture Ari Visa.
Performance Optimization
Simple Neural Nets For Pattern Classification
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Least-Mean-Square Algorithm CS/CMPE 537 – Neural Networks.
The Perceptron CS/CMPE 333 – Neural Networks. CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS2 The Perceptron – Basics Simplest and one.
2806 Neural Computation Multilayer neural networks Lecture Ari Visa.
Unconstrained Optimization Problem
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Contents Optimisation Perceptron Convergence Conclusions CS 476: Networks of Neural Computation, CSD, UOC, 2009 WK1 - Introduction CS 476: Networks of.
Adaptive Signal Processing
Normalised Least Mean-Square Adaptive Filtering
RLSELE Adaptive Signal Processing 1 Recursive Least-Squares (RLS) Adaptive Filters.
Collaborative Filtering Matrix Factorization Approach
Neural Networks Lecture 8: Two simple learning algorithms
Equalization in a wideband TDMA system
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Algorithm Taxonomy Thus far we have focused on:
1 Mehran University of Engineering and Technology, Jamshoro Department of Electronic, Telecommunication and Bio-Medical Engineering Neural Networks Mukhtiar.
By Asst.Prof.Dr.Thamer M.Jamel Department of Electrical Engineering University of Technology Baghdad – Iraq.
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Discriminant Functions
CHAPTER 4 Adaptive Tapped-delay-line Filters Using the Least Squares Adaptive Filtering.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. x0x0 + -
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Derivation Computational Simplifications Stability Lattice Structures.
EE459 Neural Networks Backpropagation
Neural Networks 2nd Edition Simon Haykin
ADALINE (ADAptive LInear NEuron) Network and
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Overview of Adaptive Filters Quote of the Day When you look at yourself from a universal standpoint, something inside always reminds or informs you that.
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
METHOD OF STEEPEST DESCENT ELE Adaptive Signal Processing1 Week 5.
Neural Networks 2nd Edition Simon Haykin
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Classification with Perceptrons Reading:
Collaborative Filtering Matrix Factorization Approach
Outline Single neuron case: Nonlinear error correcting learning
Perceptron as one Type of Linear Discriminants
Instructor :Dr. Aamer Iqbal Bhatti
METHOD OF STEEPEST DESCENT
Backpropagation.
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Chapter - 3 Single Layer Percetron
Backpropagation.
Presentation transcript:

Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons

2 Adaptive Filtering Problem Dynamic SystemThe external behavior of the system: T: {x(i), d(i); i=1, 2, …, n, …} where x(i)=[x 1 (i), x 2 (i), …, x m (i)] T x(i) can arise from:  Spatial: x(i) is a snapshot of data.  Temporal: x(i) is uniformly spaced in time. Signal-flow Graph of the Adaptive Filter  Filtering Process  y(i) is produced in response to x(i).  e(i) = d(i) - y(i)  Adaptive Process  Automatic Adjustment of the synaptic weights in accordance with e(i).

3 Unconstrained Optimization Techniques  Let C(w) be a continuously differentiable function of some unknown weight (parameter) vector w.  C(w) maps w into real numbers.  Goal: Find an optimal solution w* that satisfies C(w*)  C(w)  Minimize C(w) with respect to w. Necessary Condition for optimality:  C(w*)=0 (  is the gradient operator) A class of unconstrained optimization algorithm: Starting with an initial guess denoted by w(0), generate a sequence of weight vectors w(1), w(2), …, such that the cost function C(w) is reduced at each iteration of the algorithm.

4 Method of Steepest Descent The successive adjustments applied to w are in the direction of steepest descent, that is, in a direction opposite to the gradient vector  C(w). Let The steepest descent algorithm: w(n+1)=w(n)-  g(n)  : a positive constant called the stepsize or learning-rate parameter.  w(n) = w(n+1) - w(n) = -  g(n) Small   Overdamp the transient response. Large   Underdamp the transient response. If  exceeds a certain value, the algorithm becomes unstable.

5 Newton’s Method Applying second-order Taylor series expansion of C(w) around w(n).  C(w) is minimized when Generally speaking, Newton’s method converges quickly Minimize the quadratic approximation of the cost function C(w) around the current point w.

6 Gauss-Newton Method Let Gauss-Newton method is applicable to a cost function C(w) that is the sum of error squares. The Jacobian J(n) is  [  e(n)] T Goal:

7 Gauss-Newton Method (Cont.) Differentiating this expression with respect to w and setting the result to be zero. To guard against the possibility that J(n) is rank deficient.

8 Linear Least-Squares Filter Characteristics of Linear Least-Squares Filter – The single neuron around which it is built is linear. – The cost function C(w) consists of the sum of error squares. where d(n)=[d(1), d(2),…, d(n)] T X(n)=[x(1), x(2),…, x(n)] T Substituting it into equation derived from Gauss-Newton Method

9 Wiener Filter Limiting form of the Linear Least-Squares Filter for an Ergodic Environment Let w 0 denote the Wiener solution to the linear optimum filtering problem. Let R x denote the Correlation Matrix of input vector x(i). Let r xd denote the Cross-correlation Vector of x(i) and d(i).

10 Least-Mean-Square (LMS) Algorithm LMS is based on instantaneous values for the cost function e(n) is the error signal measured at time n. is used in place of w(n) to emphasize that LMS produces an estimate of w that result from the method of steepest descent. Summary of the LMS Algorithm

11 Virtues and Limitations of LMS Virtues – Simplicity Limitations – Slow rate of convergence – Sensitivity to variations in the eigenstructure of the input

12 Learning Curve

13 Learning Rate Annealing Normal Approach: Stochastic Approximation: There is a danger of parameter blowup for small n when c is large. Search-then-converge schedule:

14 Perceptron Let x 0 =1 and b=w 0  The simplest form used for the classification of patterns said to be linearly separable.  Goal: Classify the set {x(1), x(2), …, x(n)} into one of two classes, C 1 or C 2.  Decision Rule: Assign x(i) to class C 1 if y=+1 and to class C 2 if y=-1. w T x > 0 for every input vector x belonging to class C 1 w T x  0 for every input vector x belonging to class C 2

15 Perceptron (Cont.) Algorithms: 1. w(n+1)=w(n)if w T x(n) > 0 and x(n) belongs to class C 1 w(n+1)=w(n)if w T x(n)  0 and x(n) belongs to class C 2 2. w(n+1)=w(n)-  (n)x(n)if w T x(n) > 0 and x(n) belongs to class C 2 w(n+1)=w(n)+  (n)x(n)if w T x(n)  0 and x(n) belongs to class C 1 Let w(n+1) = w(n) +  [d(n)-y(n)]x(n)(Error-correction learning rule form)  Smaller  provides stable weight estimates.  Larger  provides fast adaption.