Chapter 4 Supervised learning: Multilayer Networks II

Slides:

Advertisements

Similar presentations

Artificial Neural Networks

Advertisements

Backpropagation Learning Algorithm

A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.

Machine Learning Neural Networks

Performance Optimization

Simple Neural Nets For Pattern Classification

Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.

Radial Basis Functions

I welcome you all to this presentation On: Neural Network Applications Systems Engineering Dept. KFUPM Imran Nadeem & Naveed R. Butt &

November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.

Prediction Networks Prediction –Predict f(t) based on values of f(t – 1), f(t – 2),… –Two NN models: feedforward and recurrent A simple example (section.

Before we start ADALINE

Artificial Neural Networks

October 28, 2010Neural Networks Lecture 13: Adaptive Networks 1 Adaptive Networks As you know, there is no equation that would tell you the ideal number.

Radial-Basis Function Networks

Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.

8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.

Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences

Radial Basis Function Networks

Chapter 4 Supervised learning: Multilayer Networks II.

Cascade Correlation Architecture and Learning Algorithm for Neural Networks.

Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.

Appendix B: An Example of Back-propagation algorithm

Radial Basis Function Networks:

CS 478 – Tools for Machine Learning and Data Mining Backpropagation.

Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.

Multi-Layer Perceptron

ANFIS (Adaptive Network Fuzzy Inference system)

Chapter 2 Single Layer Feedforward Networks

CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.

SUPERVISED LEARNING NETWORK

Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.

Chapter 8: Adaptive Networks

Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.

Additional NN Models Reinforcement learning (RL) Basic ideas: –Supervised learning: (delta rule, BP) Samples (x, f(x)) to learn function f(.) precise error.

Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!

Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.

CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.

Neural networks and support vector machines

Multiple-Layer Networks and Backpropagation Algorithms

Fall 2004 Backpropagation CS478 - Machine Learning.

Deep Feedforward Networks

The Gradient Descent Algorithm

Neural Networks Winter-Spring 2014

Additional NN Models Reinforcement learning (RL) Basic ideas:

Chapter 2 Single Layer Feedforward Networks

One-layer neural networks Approximation problems

第 3 章神经网络.

Chapter 4 Supervised learning: Multilayer Networks II

LECTURE 28: NEURAL NETWORKS

Classification / Regression Neural Networks 2

CSC 578 Neural Networks and Deep Learning

Artificial Neural Network & Backpropagation Algorithm

Neuro-Computing Lecture 4 Radial Basis Function Network

Additional NN Models Reinforcement learning (RL) Basic ideas:

Artificial Intelligence Chapter 3 Neural Networks

Ch2: Adaline and Madaline

Artificial Neural Networks

Capabilities of Threshold Neurons

Artificial Intelligence Chapter 3 Neural Networks

LECTURE 28: NEURAL NETWORKS

Ch4: Backpropagation (BP)

Introduction to Radial Basis Function Networks

Artificial Intelligence Chapter 3 Neural Networks

Chapter - 3 Single Layer Percetron

Artificial Intelligence Chapter 3 Neural Networks

COSC 4335: Part2: Other Classification Techniques

Prediction Networks Prediction A simple example (section 3.7.3)

Ch4: Backpropagation (BP)

Artificial Intelligence Chapter 3 Neural Networks

Presentation transcript:

Chapter 4 Supervised learning: Multilayer Networks II

Other Feedforward Networks Madaline Multiple adalines (of a sort) as hidden nodes Weight change follows minimum disturbance principle Adaptive multi-layer networks Dynamically change the network size (# of hidden nodes) Prediction networks Recurrent nets BP nets for prediction Networks of radial basis function (RBF) e.g., Gaussian function Perform better than sigmoid function (e.g., interpolation in function approximation Some other selected types of layered NN

Madaline Architecture Learning Three Madaline models Hidden layers of adaline nodes Output nodes differ Learning Error driven, but not by gradient descent Minimum disturbance: smaller change of weights is preferred, provided it can reduce the error Three Madaline models Different node functions Different learning rules (MR I, II, and III) MR I and II developed in 60’s, MR III much later (88)

Madaline MRI net: Output nodes with logic function MRII net: Output nodes are adalines MRIII net: Same as MRII, except the nodes with sigmoid function

Madaline MR II rule Outline of algorithm Only change weights associated with nodes which have small |netj | Bottom up, layer by layer Outline of algorithm At layer h: sort all nodes in order of increasing net values, remove those with net <θ, put them in S For each Aj in S if reversing its output (change xj to -xj) improves the output error, then change the weight vector leading into Aj by LMS (or other ways)

Madaline MR III rule Even though node function is sigmoid, do not use gradient descent (do not assume its derivative is known) Use trial adaptation E: total square error at output nodes Ek: total square error at output nodes if netk at node k is increased by ε (> 0) Change weight leading to node k according to or It can be shown to be equivalent to BP Since it is not explicitly dependent on derivatives, this method can be used for hardware devices that inaccurately implement sigmoid function

Adaptive Multilayer Networks Smaller nets are often preferred Training is faster Fewer weights to be trained Smaller # of training samples needed Generalize better Heuristics for “optimal” net size Pruning: start with a large net, then prune it by removing unimportant nodes and associated connections/weights Growing: start with a very small net, then continuously increase its size with small increments until the performance becomes satisfactory Combining the above two: a cycle of pruning and growing until performance is satisfied and no more pruning is possible

Adaptive Multilayer Networks Pruning a network Weights with small magnitude (e.g., ≈ 0) Nodes with small incoming weights Weights whose existence does not significantly affect network output If is negligible By examining the second derivative Input nodes can also be pruned if the resulting change of is negligible

Adaptive Multilayer Networks Cascade correlation (example of growing net size) Cascade architecture development Start with a net without hidden nodes Each time a hidden node is added between the output nodes and all other nodes The new node is connected to output nodes, and from all other nodes (input and all existing hidden nodes) Not strictly feedforward

Adaptive Multilayer Networks Correlation learning: when a new node n is added first train all input weights to n from all nodes below (maximize covariance with current error of output nodes E) then train all weight to output nodes (minimize E) quickprop is used all other weights to lower hidden nodes are not changes (so it trains fast)

Adaptive Multilayer Networks xnew Train wnew to maximize covariance covariance between x and Eold wnew when S(wnew) is maximized, variance of from mirrors that of error from , S(wnew) is maximized by gradient ascent

Adaptive Multilayer Networks Example: corner isolation problem Hidden nodes are with sigmoid function ([-0.5, 0.5]) When trained without hidden node: 4 out of 12 patterns are misclassified After adding 1 hidden node, only 2 patterns are misclassified After adding the second hidden node, all 12 patterns are correctly classified At least 4 hidden nodes are required with BP learning X X X X