Ch2: Adaline and Madaline

Slides:

Advertisements

Similar presentations

Slides from: Doug Gray, David Poole

Advertisements

Introduction to Neural Networks Computing

B.Macukow 1 Lecture 12 Neural Networks. B.Macukow 2 Neural Networks for Matrix Algebra Problems.

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Overview over different methods – Supervised Learning

Performance Optimization

Simple Neural Nets For Pattern Classification

Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.

Introduction to Neural Networks John Paxton Montana State University Summer 2003.

Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.

Improved BP algorithms ( first order gradient method) 1.BP with momentum 2.Delta- bar- delta 3.Decoupled momentum 4.RProp 5.Adaptive BP 6.Trinary BP 7.BP.

Artificial Neural Networks

Before we start ADALINE

12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.

Artificial Neural Networks

Radial Basis Function Networks

Collaborative Filtering Matrix Factorization Approach

Neural Networks Lecture 8: Two simple learning algorithms

Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences

Artificial Neural Networks

Chapter 4 Supervised learning: Multilayer Networks II.

1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.

11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering

Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.

CS 478 – Tools for Machine Learning and Data Mining Backpropagation.

Multi-Layer Perceptron

EE459 Neural Networks Backpropagation

ADALINE (ADAptive LInear NEuron) Network and

Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.

1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.

Chapter 2 Single Layer Feedforward Networks

CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.

SUPERVISED LEARNING NETWORK

Supervised learning network G.Anuradha. Learning objectives The basic networks in supervised learning Perceptron networks better than Hebb rule Single.

Variations on Backpropagation.

Chapter 8: Adaptive Networks

Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.

Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.

Neural NetworksNN 21 Architecture We consider the architecture: feedforward NN with one layer It is sufficient to study single layer perceptrons with.

10 1 Widrow-Hoff Learning (LMS Algorithm) ADALINE Network  w i w i1  w i2  w iR  =

Lecture 2 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 2/1 Dr.-Ing. Erwin Sitompul President University

Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.

Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.

Fall 2004 Backpropagation CS478 - Machine Learning.

Chapter 4 Supervised learning: Multilayer Networks II

Chapter 2 Single Layer Feedforward Networks

One-layer neural networks Approximation problems

Real Neurons Cell structures Cell body Dendrites Axon

Pipelined Adaptive Filters

Derivation of a Learning Rule for Perceptrons

Chapter 4 Supervised learning: Multilayer Networks II

Widrow-Hoff Learning (LMS Algorithm).

Ch 2. Concept Map ⊂ ⊂ Single Layer Perceptron = McCulloch – Pitts Type Learning starts in Ch 2 Architecture, Learning Adaline : Linear Learning.

Biological and Artificial Neuron

Collaborative Filtering Matrix Factorization Approach

Biological and Artificial Neuron

Variations on Backpropagation.

Outline Single neuron case: Nonlinear error correcting learning

CSC 578 Neural Networks and Deep Learning

Neuro-Computing Lecture 4 Radial Basis Function Network

Biological and Artificial Neuron

Capabilities of Threshold Neurons

Ch4: Backpropagation (BP)

The loss function, the normal equation,

Neuro-Computing Lecture 2 Single-Layer Perceptrons

Chapter - 3 Single Layer Percetron

Variations on Backpropagation.

Computer Vision Lecture 19: Object Recognition III

Ch4: Backpropagation (BP)

Presentation transcript:

Ch2: Adaline and Madaline Adaline : Adaptive Linear neuron Madaline : Multiple Adaline 2.1 Adaline (Bernard Widrow, Stanford Univ.) bias term (feedback, error, gain, adjust term) Linear combination d = f(y)

2.1.1 Least Mean Square (LMS) Learning ◎ Input vectors : Ideal outputs : Actual outputs : Assume the output function: f(y) = y = d Mean square error: Let correlation matrix

Idea: Let Obtain Practical difficulties of analytical formula : 1. Large dimensions - difficult to calculate 2. < > expected value - Knowledge of probabilities

2.1.2 Steepest Descent The graph of is a paraboloid.

Let Steps: 1. Initialize weight values 2. Determine the steepest descent direction Let 3. Modify weight values 4. Repeat 2~3. No calculation of Drawbacks: i) To know R and p is equivalent to knowing the error surface in advance. ii) Steepest descent training is a batch training method.

2.1.3 Stochastic Gradient Descent Approximate by randomly selecting one training example at a time 1. Apply an input vector 2. 3. 4. 5. Repeat 1~4 with the next input vector No calculation of

Drawback: time consuming. Improvement: mini-batch training method. ○ Practical Considerations: (a) No. of training vectors, (b) Stopping criteria (c) Initial weights, (d) Step size

2.1.4 Conjugate Gradient Descent -- Drawback: can only minimize quadratic functions, e.g., Advantage: guarantees to find the optimum solution in at most n iterations, where n is the size of matrix A. A-Conjugate Vectors: Let square, symmetric, positive-definite matrix. Vectors are A-conjugate if * If A = I (identity matrix), conjugacy = orthogonality.

Set S forms a basis for space The solution in can be written as The conjugate-direction method for minimizing f(w) is defined by where w(0) is an arbitrary starting vector. is determined by How to determine Define , which is in the steepest descent direction of Let

Multiply by s(i-1)A, In order to be A-conjugate: generated by Eqs. (A) and (B) are A-conjugate. Desire that evaluating does not need to know A. Polak-Ribiere formula:

Fletcher-Reeves formula: * The conjugate-direction method for minimizing w(0) is an arbitrary starting vector is determined by

Nonlinear Conjugate Gradient Algorithm Initialize w(0) by an appropriate process

Example: A comparison of the convergences of gradient descent (green) and conjugate gradient (red) for minimizing a quadratic function. Conjugate gradient converges in at most n steps where n is the size of the matrix of the system (here n=2).

2.3. Applications 2.3.1. Echo Cancellation in Telephone Circuits n : incoming voice, s : outgoing voice : noise (leakage of the incoming voice) y : the output of the filter mimics

Hybrid circuit: deals with the leakage issue, which attempts to isolate incoming from outgoing signals Adaptive filter: deals with the choppy issue, which mimics the leakage of the incoming voice for suppressing the choppy speech from the outgoing signals (s not correlated with y, )

2.3.2 Predict Signal An adaptive filter is trained to predict signal. The signal used to train the filter is a delayed actual signal. The expected output is the current signal.

2.3.3 Reproduce Signal

2.3.4. Adaptive beam – forming antenna arrays Antenna : spatial array of sensors which are directional in their reception characteristics. Adaptive filter learns to steer antennae in order that they can respond to incoming signals no matter what their directions are, which reduce responses to unwanted noise signals coming in from other directions

2.4 Madaline : Many adaline ○ XOR function ?

2.4.2. Madaline Rule II (MRII) ○ Training algorithm – A trial–and–error procedure with a minimum disturbance principle (those nodes that can affect the output error while incurring the least change in their weights should have precedence in the learning process) ○ Procedure – 1. Input a training pattern 2. Count #incorrect values in the output layer

3. For all units on the output layer 3.1. Select the first previously unselected error node whose analog output is closest to zero ( this node can reverse its bipolar output with the least change in its weights) 3.2. Change the weights on the selected unit s.t. the bipolar output of the unit changes 3.3. Input the same training pattern 3.4. If reduce #errors, accept the weight change, otherwise restore the original weights 4. Repeat Step 3 for all layers except the input layer

5. For all units on the output layer 5.1. Select the previously unselected pair of units whose output are closest to zero 5.2. Apply a weight correction to both units, in order to change their bipolar outputs 5.3. Input the same training pattern 5.4. If reduce # errors, accept the correction; otherwise, restore the original weights. 6. Repeat step 5 for all layers except the input layer.

※ Steps 5 and 6 can be repeated with triplets, quadruplets or longer combinations of units until satisfactory results are obtained The MRII learning rule considers the network with only one hidden layer. For networks with more hidden layers, the backpropagation learning strategy to be discussed later can be employed.

2.4.3. A Madaline for Translation–Invariant Pattern Recognition

。Relationships among the weight matrices of Adalines

○ Extension -- Mutiple slabs with different key weight matrices for discriminating more then two classes of patterns