Neuro-Computing Lecture 4 Radial Basis Function Network

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Introduction to Neural Networks Computing
NEURAL NETWORKS Perceptron
Support Vector Machines
Ch. 4: Radial Basis Functions Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from many Internet sources Longin.
Lecture 14 – Neural Networks
Simple Neural Nets For Pattern Classification
6/10/ Visual Recognition1 Radial Basis Function Networks Computer Science, KAIST.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Radial Basis-Function Networks. Back-Propagation Stochastic Back-Propagation Algorithm Step by Step Example Radial Basis-Function Networks Gaussian response.
Radial Basis Functions
Radial Basis Function Networks 표현아 Computer Science, KAIST.
An Illustrative Example
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Prediction Networks Prediction –Predict f(t) based on values of f(t – 1), f(t – 2),… –Two NN models: feedforward and recurrent A simple example (section.
Chapter 6: Multilayer Neural Networks
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
Aula 4 Radial Basis Function Networks
Radial Basis Function (RBF) Networks
Radial Basis Function G.Anuradha.
Last lecture summary.
Radial-Basis Function Networks
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.
Radial Basis Function Networks
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
Last lecture summary.
Radial Basis Function Networks
Biointelligence Laboratory, Seoul National University
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
Artificial Neural Networks Shreekanth Mandayam Robi Polikar …… …... … net k   
Neural Networks Artificial neural network (ANN) is a machine learning approach inspired by the way in which the brain performs a particular learning task.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Radial Basis Function Networks:
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Non-Bayes classifiers. Linear discriminants, neural networks.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Chapter 2 Single Layer Feedforward Networks
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
EEE502 Pattern Recognition
Neural Networks 2nd Edition Simon Haykin
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Today’s Lecture Neural networks Training
Machine Learning Supervised Learning Classification and Regression
CSSE463: Image Recognition Day 14
Neural Networks Winter-Spring 2014
Learning with Perceptrons and Neural Networks
Learning in Neural Networks
Data Mining, Neural Network and Genetic Programming
Chapter 2 Single Layer Feedforward Networks
Radial Basis Function G.Anuradha.
LECTURE 28: NEURAL NETWORKS
Lecture 9 MLP (I): Feed-forward Model
Chapter 3. Artificial Neural Networks - Introduction -
Neural Network - 2 Mayank Vatsa
Multilayer Perceptron & Backpropagation
Capabilities of Threshold Neurons
Lecture Notes for Chapter 4 Artificial Neural Networks
LECTURE 28: NEURAL NETWORKS
Introduction to Radial Basis Function Networks
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Chapter - 3 Single Layer Percetron
Prediction Networks Prediction A simple example (section 3.7.3)
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Neuro-Computing Lecture 4 Radial Basis Function Network

Suggested Reading RBF Haykin: Sections 5.1, 5.2, 5.3, 5.4, 5.6, 5.8, 5.13, 5.14 - Bishop: Sections 6.3 Introduction to Radial Basis Function Networks by Mark J. L. Orr, 1996.

Bishop (ch.6, p.299)

(generates local receptive fields) Radial Basis Functions The RBF networks, just like MLP networks, can therefore be used classification and/or function approximation problems. The RBFs, which have a similar architecture to that of MLPs, however, achieve this goal using a different strategy: ……….. Linear output layer Input layer Nonlinear transformation layer (generates local receptive fields)

Radial Basis Function A hidden layer of radial kernels: The hidden layer performs a non-linear transformation of input space The resulting hidden space is typically of higher dimensionality than the input space An output layer of linear neurons: The output layer performs linear regression to predict the desired targets Dimension of hidden layer is much larger than that of input layer Cover’s theorem on the separability of patterns “A complex pattern-classification problem cast in a high-dimensional space non-linearly is more likely to be linearly separable than in a low-dimensional space”

Radial Basis Function å X ... x1 x2 x3 x ) ( j Increased dimension: N->m1 function linear non : i w å Output y More likely to be linearly separated =

ARCHITECTURE Input layer: source of nodes that connect the NN with its environment. x1 w1 x2 y wm1 xm Hidden layer: applies a non-linear transformation from the input space to the hidden space. Output layer: applies a linear transformation from the hidden space to the output space.

φ-separability of patterns Hidden function Hidden space A (binary) partition, also called dichotomy, (C1,C2) of the training set C is φ-separable if there is a vector w of dimension m1 such that:

Examples of φ-separability Separating surface: Examples of separable partitions (C1,C2): Linearly separable: Quadratically separable: Spherically separable:

φ( || x - t||2) HIDDEN NEURON MODEL Hidden units: use a radial basis function φ( || x - t||2) the output depends on the distance of the input x from the center t x2 x1 xm φ( || x - t||2) t is called center  is called spread center and spread are parameters

Hidden Neurons A hidden neuron is more sensitive to data points near its center. This sensitivity may be tuned by adjusting the spread . Larger spread  less sensitivity

Gaussian Radial Basis Function φ center φ :  is a measure of how spread the curve is: Large  Small 

Types of φ Multiquadrics: Inverse multiquadrics: Gaussian functions:

Nonlinear Receptive Fields The hallmark of RBF networks is their use of nonlinear receptive fields RBFs are universal approximators ! The receptive fields nonlinearly transforms (maps) the input feature space, where the input patterns are not linearly separable, to the hidden unit space, where the mapped inputs may be linearly separable. The hidden unit space often needs to be of a higher dimensionality Cover’s Theorem (1965) on the separability of patterns: A complex pattern classification problem that is nonlinearly separable in a low dimensional space, is more likely to be linearly separable in a high dimensional space.

Example: the XOR problem Input space: Output space: Construct an RBF pattern classifier such that: (0,0) and (1,1) are mapped to 0, class C1 (1,0) and (0,1) are mapped to 1, class C2 x2 (0,1) (1,1) x1 (1,0) (0,0) y

Example: the XOR problem In the feature (hidden) space: When mapped into the feature space < 1 , 2 >, C1 and C2 become linearly separable. So a linear classifier with 1(x) and 2(x) as inputs can be used to solve the XOR problem. φ1 φ2 1.0 (0,0) 0.5 (1,1) Decision boundary (0,1) and (1,0)

The (you guessed it right) XOR Problem Consider the nonlinear functions to map the input vector x to the 1- 2 space 1 x=[x1 x2] φ2 x1 0 0.2 0.4 0.6 0.8 1.0 1.2 | | | | | | _ 1.0 0.8 0.6 0.4 0.2 (1,1) (0,0) (0,1) (1,0) 1 Input x 1(x) 2(x) (1,1) 1 0.1353 (0,1) 0.3678 (1,0) (0,0)  φ1 The nonlinear  function transformed a nonlinearly separable problem into a linearly separable one !!!

Initial Assessment Using nonlinear functions, we can convert a nonlinearly separable problem into a linearly separable one. From a function approximation perspective, this is equivalent to implementing a complex function (corresponding to the nonlinearly separable decision boundary) using simple functions (corresponding to the linearly separable decision boundary) Implementing this procedure using a network architecture, yields the RBF networks, if the nonlinear mapping functions are radial basis functions. Radial Basis Functions: Radial: Symmetric around its center Basis Functions: A set of functions whose linear combination can generate an arbitrary function in a given function space.

H hidden layer RBFs (receptive fields) RBF Networks d input nodes H hidden layer RBFs (receptive fields) x1 1 c output nodes x2 z1 ……... Wkj .. Uji netk …….... zk j .. yj Linear act. function zc … x(d-1)  x1 xd uJi H xd : spread constant

Principle of Operation Euclidean Norm  x1 xd UJi : spread constant yJ  y1 wKj  yH Unknowns: uji, wkj, 

Principle of Operation What do these parameters represent? Physical meanings: : The radial basis function for the hidden layer. This is a simple nonlinear mapping function (typically Gaussian) that transforms the d- dimensional input patterns to a (typically higher) H-dimensional space. The complex decision boundary will be constructed from linear combinations (weighted sums) of these simple building blocks. uji: The weights joining the first to hidden layer. These weights constitute the center points of the radial basis functions. : The spread constant(s). These values determine the spread (extend) of each radial basis function. Wjk: The weights joining hidden and output layers. These are the weights which are used in obtaining the linear combination of the radial basis functions. They determine the relative amplitudes of the RBFs when they are combined to form the complex function.

Principle of Operation wJ:Relative weight of Jth RBF J: Jth RBF function J * uJ Center of Jth RBF

Learning Algorithms Parameters to be learnt are: centers spreads weights Different learning algorithms

Learning Algorithm 1 Centers are selected at random center locations are chosen randomly from the training set Spreads are chosen by normalization:

Weights are found by means of pseudo-inverse method Learning Algorithm 1 Weights are found by means of pseudo-inverse method Desired response Pseudo-inverse of

Learning Algorithm 2 Hybrid Learning Process: Self-organized learning stage for finding the centers Spreads chosen by normalization Supervised learning stage for finding the weights, using LMS algorithm Centers are obtained from unsupervised learning (clustering). Spreads are obtained as variances of clusters, w are obtained through LMS algorithm. Clustering (k-means) and LMS are iterative. This is the most commonly used procedure. Typically provides good results.

Learning Algorithm 2: Centers K-means clustering algorithm for centers Initialization: tk(0) random k = 1, …, m1 Sampling: draw x from input space C Similarity matching: find index of best center Updating: adjust centers Continuation: increment n by 1, goto 2 and continue until no noticeable changes of centers occur

Learning Algorithm 3 Supervised learning of all the parameters using the gradient descent method Modify centers All unknowns are obtained from supervised learning. Instantaneous error function Learning rate for Depending on the specific function can be computed using the chain rule of calculus

Learning Algorithm 3 Modify spreads Modify output weights

Application Model Focus 1. Function Width Demo Function Approximation Target Function Focus The width and the center of RBF. Generalization of RBF Network

Change the width of RBF (1) Radial Basis Function model : Gaussian Test Case The number of RBF(neuron) is fixed : 21 Case 1: width = 1 Case 2: width = 0.2 Case 3: width = 200 Validation Test To check the generalization ability of RBF Networks

Change the width of RBF (2) Width = 1 Width = 0.2 Width = 200

Change the number of RBF center (1) Test Case The Number of RBF = 7 center position: [-8, –5, –2, 0, 2, 5, 8] Case 1: width = 1 Case 2: width = 6 The Number of RBF = 2 center position: [-3, 10] Case 3: width = 1 Case 4: width = 6

Change the number of RBF center (2) width = 1 width = 6

Change the number of RBF center (3) width = 1 width = 6

The center and width of RBF should be given appropriately Summary The center and width of RBF should be given appropriately The RBF network shows good generalization ability in the training region

Comparison with multilayer NN RBF-Networks are used to perform complex (non-linear) pattern classification tasks. Comparison between RBF networks and multilayer perceptrons: Both are examples of non-linear layered feed-forward networks. Both are universal approximators. Hidden layers: RBF networks have one single hidden layer. MLP networks may have more hidden layers.

Comparison with multilayer NN Neuron Models: The computation nodes in the hidden layer of a RBF network are different. They serve a different purpose from those in the output layer. Typically computation nodes of MLP in a hidden or output layer share a common neuron model. Linearity: The hidden layer of RBF is non-linear, the output layer of RBF is linear. Hidden and output layers of MLP are usually non-linear.

Comparison with multilayer NN Activation functions: The argument of activation function of each hidden unit in a RBF NN computes the Euclidean distance between input vector and the center of that unit. The argument of the activation function of each hidden unit in a MLP computes the inner product of input vector and the synaptic weight vector of that unit. Approximations: RBF NN using Gaussian functions construct local approximations to non-linear I/O mapping. MLP NN construct global approximations to non-linear I/O mapping.