Chapter 4 Supervised learning: Multilayer Networks II

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Backpropagation Learning Algorithm
A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
Machine Learning Neural Networks
Performance Optimization
Simple Neural Nets For Pattern Classification
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Radial Basis Functions
I welcome you all to this presentation On: Neural Network Applications Systems Engineering Dept. KFUPM Imran Nadeem & Naveed R. Butt &
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Prediction Networks Prediction –Predict f(t) based on values of f(t – 1), f(t – 2),… –Two NN models: feedforward and recurrent A simple example (section.
Before we start ADALINE
Artificial Neural Networks
October 28, 2010Neural Networks Lecture 13: Adaptive Networks 1 Adaptive Networks As you know, there is no equation that would tell you the ideal number.
Radial-Basis Function Networks
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Radial Basis Function Networks
Chapter 4 Supervised learning: Multilayer Networks II.
Cascade Correlation Architecture and Learning Algorithm for Neural Networks.
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
Appendix B: An Example of Back-propagation algorithm
Radial Basis Function Networks:
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Multi-Layer Perceptron
ANFIS (Adaptive Network Fuzzy Inference system)
Chapter 2 Single Layer Feedforward Networks
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
SUPERVISED LEARNING NETWORK
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Chapter 8: Adaptive Networks
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Additional NN Models Reinforcement learning (RL) Basic ideas: –Supervised learning: (delta rule, BP) Samples (x, f(x)) to learn function f(.) precise error.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks and support vector machines
Multiple-Layer Networks and Backpropagation Algorithms
Fall 2004 Backpropagation CS478 - Machine Learning.
Deep Feedforward Networks
The Gradient Descent Algorithm
Neural Networks Winter-Spring 2014
Additional NN Models Reinforcement learning (RL) Basic ideas:
Chapter 2 Single Layer Feedforward Networks
One-layer neural networks Approximation problems
第 3 章 神经网络.
Chapter 4 Supervised learning: Multilayer Networks II
LECTURE 28: NEURAL NETWORKS
Classification / Regression Neural Networks 2
CSC 578 Neural Networks and Deep Learning
Artificial Neural Network & Backpropagation Algorithm
Neuro-Computing Lecture 4 Radial Basis Function Network
Additional NN Models Reinforcement learning (RL) Basic ideas:
Artificial Intelligence Chapter 3 Neural Networks
Ch2: Adaline and Madaline
Artificial Neural Networks
Capabilities of Threshold Neurons
Artificial Intelligence Chapter 3 Neural Networks
LECTURE 28: NEURAL NETWORKS
Ch4: Backpropagation (BP)
Introduction to Radial Basis Function Networks
Artificial Intelligence Chapter 3 Neural Networks
Chapter - 3 Single Layer Percetron
Artificial Intelligence Chapter 3 Neural Networks
COSC 4335: Part2: Other Classification Techniques
Prediction Networks Prediction A simple example (section 3.7.3)
Ch4: Backpropagation (BP)
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Chapter 4 Supervised learning: Multilayer Networks II

Other Feedforward Networks Madaline Multiple adalines (of a sort) as hidden nodes Weight change follows minimum disturbance principle Adaptive multi-layer networks Dynamically change the network size (# of hidden nodes) Prediction networks Recurrent nets BP nets for prediction Networks of radial basis function (RBF) e.g., Gaussian function Perform better than sigmoid function (e.g., interpolation in function approximation Some other selected types of layered NN

Madaline Architecture Learning Three Madaline models Hidden layers of adaline nodes Output nodes differ Learning Error driven, but not by gradient descent Minimum disturbance: smaller change of weights is preferred, provided it can reduce the error Three Madaline models Different node functions Different learning rules (MR I, II, and III) MR I and II developed in 60’s, MR III much later (88)

Madaline MRI net: Output nodes with logic function MRII net: Output nodes are adalines MRIII net: Same as MRII, except the nodes with sigmoid function

Madaline MR II rule Outline of algorithm Only change weights associated with nodes which have small |netj | Bottom up, layer by layer Outline of algorithm At layer h: sort all nodes in order of increasing net values, remove those with net <θ, put them in S For each Aj in S if reversing its output (change xj to -xj) improves the output error, then change the weight vector leading into Aj by LMS (or other ways)

Madaline MR III rule Even though node function is sigmoid, do not use gradient descent (do not assume its derivative is known) Use trial adaptation E: total square error at output nodes Ek: total square error at output nodes if netk at node k is increased by ε (> 0) Change weight leading to node k according to or It can be shown to be equivalent to BP Since it is not explicitly dependent on derivatives, this method can be used for hardware devices that inaccurately implement sigmoid function

Adaptive Multilayer Networks Smaller nets are often preferred Training is faster Fewer weights to be trained Smaller # of training samples needed Generalize better Heuristics for “optimal” net size Pruning: start with a large net, then prune it by removing unimportant nodes and associated connections/weights Growing: start with a very small net, then continuously increase its size with small increments until the performance becomes satisfactory Combining the above two: a cycle of pruning and growing until performance is satisfied and no more pruning is possible

Adaptive Multilayer Networks Pruning a network Weights with small magnitude (e.g., ≈ 0) Nodes with small incoming weights Weights whose existence does not significantly affect network output If is negligible By examining the second derivative Input nodes can also be pruned if the resulting change of is negligible

Adaptive Multilayer Networks Cascade correlation (example of growing net size) Cascade architecture development Start with a net without hidden nodes Each time a hidden node is added between the output nodes and all other nodes The new node is connected to output nodes, and from all other nodes (input and all existing hidden nodes) Not strictly feedforward

Adaptive Multilayer Networks Correlation learning: when a new node n is added first train all input weights to n from all nodes below (maximize covariance with current error of output nodes E) then train all weight to output nodes (minimize E) quickprop is used all other weights to lower hidden nodes are not changes (so it trains fast)

Adaptive Multilayer Networks xnew Train wnew to maximize covariance covariance between x and Eold wnew when S(wnew) is maximized, variance of from mirrors that of error from , S(wnew) is maximized by gradient ascent

Adaptive Multilayer Networks Example: corner isolation problem Hidden nodes are with sigmoid function ([-0.5, 0.5]) When trained without hidden node: 4 out of 12 patterns are misclassified After adding 1 hidden node, only 2 patterns are misclassified After adding the second hidden node, all 12 patterns are correctly classified At least 4 hidden nodes are required with BP learning X X X X