Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neuro-Computing Lecture 4 Radial Basis Function Network

Similar presentations


Presentation on theme: "Neuro-Computing Lecture 4 Radial Basis Function Network"— Presentation transcript:

1 Neuro-Computing Lecture 4 Radial Basis Function Network

2 Suggested Reading RBF Haykin: Sections 5.1, 5.2, 5.3, 5.4, 5.6, 5.8, 5.13, 5.14 - Bishop: Sections 6.3 Introduction to Radial Basis Function Networks by Mark J. L. Orr, 1996.

3 Bishop (ch.6, p.299)

4 (generates local receptive fields)
Radial Basis Functions The RBF networks, just like MLP networks, can therefore be used classification and/or function approximation problems. The RBFs, which have a similar architecture to that of MLPs, however, achieve this goal using a different strategy: ……….. Linear output layer Input layer Nonlinear transformation layer (generates local receptive fields)

5 Radial Basis Function A hidden layer of radial kernels:
The hidden layer performs a non-linear transformation of input space The resulting hidden space is typically of higher dimensionality than the input space An output layer of linear neurons: The output layer performs linear regression to predict the desired targets Dimension of hidden layer is much larger than that of input layer Cover’s theorem on the separability of patterns “A complex pattern-classification problem cast in a high-dimensional space non-linearly is more likely to be linearly separable than in a low-dimensional space”

6 Radial Basis Function å X ... x1 x2 x3 x ) ( j Increased dimension:
N->m1 function linear non : i w å Output y More likely to be linearly separated =

7 ARCHITECTURE Input layer: source of nodes that connect the NN with its environment. x1 w1 x2 y wm1 xm Hidden layer: applies a non-linear transformation from the input space to the hidden space. Output layer: applies a linear transformation from the hidden space to the output space.

8 φ-separability of patterns
Hidden function Hidden space A (binary) partition, also called dichotomy, (C1,C2) of the training set C is φ-separable if there is a vector w of dimension m1 such that:

9 Examples of φ-separability
Separating surface: Examples of separable partitions (C1,C2): Linearly separable: Quadratically separable: Spherically separable:

10 φ( || x - t||2) HIDDEN NEURON MODEL
Hidden units: use a radial basis function φ( || x - t||2) the output depends on the distance of the input x from the center t x2 x1 xm φ( || x - t||2) t is called center  is called spread center and spread are parameters

11 Hidden Neurons A hidden neuron is more sensitive to data points near its center. This sensitivity may be tuned by adjusting the spread . Larger spread  less sensitivity

12 Gaussian Radial Basis Function φ
center φ :  is a measure of how spread the curve is: Large  Small 

13 Types of φ Multiquadrics: Inverse multiquadrics: Gaussian functions:

14 Nonlinear Receptive Fields
The hallmark of RBF networks is their use of nonlinear receptive fields RBFs are universal approximators ! The receptive fields nonlinearly transforms (maps) the input feature space, where the input patterns are not linearly separable, to the hidden unit space, where the mapped inputs may be linearly separable. The hidden unit space often needs to be of a higher dimensionality Cover’s Theorem (1965) on the separability of patterns: A complex pattern classification problem that is nonlinearly separable in a low dimensional space, is more likely to be linearly separable in a high dimensional space.

15 Example: the XOR problem
Input space: Output space: Construct an RBF pattern classifier such that: (0,0) and (1,1) are mapped to 0, class C1 (1,0) and (0,1) are mapped to 1, class C2 x2 (0,1) (1,1) x1 (1,0) (0,0) y

16 Example: the XOR problem
In the feature (hidden) space: When mapped into the feature space < 1 , 2 >, C1 and C2 become linearly separable. So a linear classifier with 1(x) and 2(x) as inputs can be used to solve the XOR problem. φ1 φ2 1.0 (0,0) 0.5 (1,1) Decision boundary (0,1) and (1,0)

17 The (you guessed it right) XOR Problem
Consider the nonlinear functions to map the input vector x to the 1- 2 space 1 x=[x1 x2] φ2 x1 | | | | | | _ 1.0 0.8 0.6 0.4 0.2 (1,1) (0,0) (0,1) (1,0) 1 Input x 1(x) 2(x) (1,1) 1 0.1353 (0,1) 0.3678 (1,0) (0,0) φ1 The nonlinear  function transformed a nonlinearly separable problem into a linearly separable one !!!

18 Initial Assessment Using nonlinear functions, we can convert a nonlinearly separable problem into a linearly separable one. From a function approximation perspective, this is equivalent to implementing a complex function (corresponding to the nonlinearly separable decision boundary) using simple functions (corresponding to the linearly separable decision boundary) Implementing this procedure using a network architecture, yields the RBF networks, if the nonlinear mapping functions are radial basis functions. Radial Basis Functions: Radial: Symmetric around its center Basis Functions: A set of functions whose linear combination can generate an arbitrary function in a given function space.

19

20 H hidden layer RBFs (receptive fields)
RBF Networks d input nodes H hidden layer RBFs (receptive fields) x1 1 c output nodes x2 z1 ……... Wkj .. Uji netk …….... zk j .. yj Linear act. function zc x(d-1) x1 xd uJi H xd : spread constant

21 Principle of Operation
Euclidean Norm x1 xd UJi : spread constant yJ y1 wKj yH Unknowns: uji, wkj, 

22 Principle of Operation
What do these parameters represent? Physical meanings: : The radial basis function for the hidden layer. This is a simple nonlinear mapping function (typically Gaussian) that transforms the d- dimensional input patterns to a (typically higher) H-dimensional space. The complex decision boundary will be constructed from linear combinations (weighted sums) of these simple building blocks. uji: The weights joining the first to hidden layer. These weights constitute the center points of the radial basis functions. : The spread constant(s). These values determine the spread (extend) of each radial basis function. Wjk: The weights joining hidden and output layers. These are the weights which are used in obtaining the linear combination of the radial basis functions. They determine the relative amplitudes of the RBFs when they are combined to form the complex function.

23 Principle of Operation
wJ:Relative weight of Jth RBF J: Jth RBF function J * uJ Center of Jth RBF

24 Learning Algorithms Parameters to be learnt are: centers spreads
weights Different learning algorithms

25 Learning Algorithm 1 Centers are selected at random
center locations are chosen randomly from the training set Spreads are chosen by normalization:

26 Weights are found by means of pseudo-inverse method
Learning Algorithm 1 Weights are found by means of pseudo-inverse method Desired response Pseudo-inverse of

27 Learning Algorithm 2 Hybrid Learning Process:
Self-organized learning stage for finding the centers Spreads chosen by normalization Supervised learning stage for finding the weights, using LMS algorithm Centers are obtained from unsupervised learning (clustering). Spreads are obtained as variances of clusters, w are obtained through LMS algorithm. Clustering (k-means) and LMS are iterative. This is the most commonly used procedure. Typically provides good results.

28 Learning Algorithm 2: Centers
K-means clustering algorithm for centers Initialization: tk(0) random k = 1, …, m1 Sampling: draw x from input space C Similarity matching: find index of best center Updating: adjust centers Continuation: increment n by 1, goto 2 and continue until no noticeable changes of centers occur

29 Learning Algorithm 3 Supervised learning of all the parameters using the gradient descent method Modify centers All unknowns are obtained from supervised learning. Instantaneous error function Learning rate for Depending on the specific function can be computed using the chain rule of calculus

30 Learning Algorithm 3 Modify spreads Modify output weights

31 Application Model Focus 1. Function Width Demo Function Approximation
Target Function Focus The width and the center of RBF. Generalization of RBF Network

32 Change the width of RBF (1)
Radial Basis Function model : Gaussian Test Case The number of RBF(neuron) is fixed : 21 Case 1: width = 1 Case 2: width = 0.2 Case 3: width = 200 Validation Test To check the generalization ability of RBF Networks

33 Change the width of RBF (2)
Width = Width = Width = 200

34 Change the number of RBF center (1)
Test Case The Number of RBF = 7 center position: [-8, –5, –2, 0, 2, 5, 8] Case 1: width = 1 Case 2: width = 6 The Number of RBF = 2 center position: [-3, 10] Case 3: width = 1 Case 4: width = 6

35 Change the number of RBF center (2)
width = width = 6

36 Change the number of RBF center (3)
width = width = 6

37 The center and width of RBF should be given appropriately
Summary The center and width of RBF should be given appropriately The RBF network shows good generalization ability in the training region

38 Comparison with multilayer NN
RBF-Networks are used to perform complex (non-linear) pattern classification tasks. Comparison between RBF networks and multilayer perceptrons: Both are examples of non-linear layered feed-forward networks. Both are universal approximators. Hidden layers: RBF networks have one single hidden layer. MLP networks may have more hidden layers.

39 Comparison with multilayer NN
Neuron Models: The computation nodes in the hidden layer of a RBF network are different. They serve a different purpose from those in the output layer. Typically computation nodes of MLP in a hidden or output layer share a common neuron model. Linearity: The hidden layer of RBF is non-linear, the output layer of RBF is linear. Hidden and output layers of MLP are usually non-linear.

40 Comparison with multilayer NN
Activation functions: The argument of activation function of each hidden unit in a RBF NN computes the Euclidean distance between input vector and the center of that unit. The argument of the activation function of each hidden unit in a MLP computes the inner product of input vector and the synaptic weight vector of that unit. Approximations: RBF NN using Gaussian functions construct local approximations to non-linear I/O mapping. MLP NN construct global approximations to non-linear I/O mapping.

41

42

43

44

45


Download ppt "Neuro-Computing Lecture 4 Radial Basis Function Network"

Similar presentations


Ads by Google