Last lecture summary.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Neural networks Introduction Fitting neural networks
Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Mehran University of Engineering and Technology, Jamshoro Department of Electronic Engineering Neural Networks Feedforward Networks By Dr. Mukhtiar Ali.
Ch. 4: Radial Basis Functions Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from many Internet sources Longin.
Artificial Neural Networks - Introduction -
Machine Learning Neural Networks
6/10/ Visual Recognition1 Radial Basis Function Networks Computer Science, KAIST.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
Chapter 5 NEURAL NETWORKS
Radial Basis Function Networks 표현아 Computer Science, KAIST.
Chapter 6: Multilayer Neural Networks
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
CS Instance Based Learning1 Instance Based Learning.
Aula 4 Radial Basis Function Networks
Artificial Neural Network
Radial Basis Function (RBF) Networks
Radial Basis Function G.Anuradha.
Last lecture summary.
Radial-Basis Function Networks
Radial Basis Function Networks
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
Radial Basis Function Networks
Radial Basis Function Networks
Biointelligence Laboratory, Seoul National University
Classification Part 3: Artificial Neural Networks
Artificial Neural Networks Shreekanth Mandayam Robi Polikar …… …... … net k   
Chapter 9 Neural Network.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Radial Basis Function Networks:
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.
Non-Bayes classifiers. Linear discriminants, neural networks.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
Fast Learning in Networks of Locally-Tuned Processing Units John Moody and Christian J. Darken Yale Computer Science Neural Computation 1, (1989)
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
EEE502 Pattern Recognition
Introduction to Radial Basis Function Networks
Neural Networks 2nd Edition Simon Haykin
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
Big data classification using neural network
Deep Feedforward Networks
Artificial Neural Networks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CSC 578 Neural Networks and Deep Learning
Neuro-Computing Lecture 4 Radial Basis Function Network
Neural Network - 2 Mayank Vatsa
Multilayer Perceptron & Backpropagation
Introduction to Radial Basis Function Networks
Neural networks (1) Traditional multi-layer perceptrons
Computer Vision Lecture 19: Object Recognition III
Presentation transcript:

Last lecture summary

Multilayer perceptron MLP, the most famous type of neural network input layer hidden layer output layer

Processing by one neuron bias activation function output weights inputs

Linear activation functions w∙x > 0 w∙x ≤ 0 linear threshold

Nonlinear activation functions logistic (sigmoid, unipolar) tanh (bipolar)

Backpropagation training algorithm MLP is trained by backpropagation. forward pass present a training sample to the neural network calculate the error (MSE) in each output neuron backward pass first calculate gradient for hidden-to-output weights then calculate gradient for input-to-hidden weights the knowledge of gradhidden-output is necessary to calculate gradinput-hidden update the weights in the network - backpropagation – based on steepest decent - beta … learning rate

input signal propagates forward error propagates backward

Momentum Online learning vs. batch learning Batch learning improves the stability by averaging. Another averaging approach providing stability is using the momentum (μ). μ (between 0 and 1) indicates the relative importance of the past weight change ∆wm-1 on the new weight increment ∆wm - online learning – new patterns must be processed as they are introduced

Other improvements Delta-Bar-Delta (Turboprop) Second order methods Each weight has its own learning rate β. Second order methods Hessian matrix (How fast changes the rate of increase of the function in the small neighborhood?  curvature) QuickProp, Gauss-Newton, Levenberg-Marquardt less epochs, computationally (Hessian inverse, storage) expensive

New stuff

Bias-variance Just a small reminder bias (lack of fit, undefitting) – model does not fit data enough, not enough flexible (too small number of parameters) variance (overfitting) – model is too flexible (too much parameters), fits noise bias-variance tradeoff – improving the generalization ability of the model (i.e. find the correct amount of flexibility)

Parameters in MLP: weights If you use one more hidden neuron, the number of weights increases by how much? # input neurons + # output neurons If MLP is used for regression task, be careful! To use MLP statistically correctly, the number of degrees of freedoms (i.e. weights) can’t exceed the number of data points. Compare to polynomial regression example from the 2nd lecture

Improving generalization of MLP Flexibility comes from hidden neurons. Choose such a # of hidden neurons so neither undefitting, nor overfitting occurs. Three most common approaches: exhaustive search early stopping regularization

Exhaustive search Increase a number of hidden units, and monitor the performance on the validation data set. number of neurons

Early stopping fixed and large number of neurons is used network is trained while testing its performance on a validation set at regular intervals minimum at validation error – correct weights epochs

Weight decay Idea: keep the growth of weights to a minimum in such a way that non-important weights are pulled toward zero Only the important weights are allowed to grow, others are forced to decay regularization

This is achieved not by minimizing MSE, but by minimizing second term – regularization term m – number of weights in the network δ – regularization parameter the larger the δ, the more important the regularization - W is MSE and sum of square of weights

Network pruning Both early stopping and weight decay use all weights in the NN. They do not reduce the complexity of the model. Network pruning – reduce complexity by keeping only essential weights/neurons. Several pruning approaches, e.g. optimal brain damage (OBD) optimal brain surgeon (OBS) optimal cell damage (OCD)

OBD Based on sensitivity analysis systematically change parameters in a model to determine the effects of such changes Weights that are not important for input-output mapping are removed. The importance (saliency) of the weight is measured based on the cost of setting a weight to zero. saliency – významnost The saliency can be computed from the Hessian.

How to perform OBD? Train flexible network in a normal way (i.e. use early stopping, weight decay, …) Compute saliency for each weight. Remove weight with small saliencies. Train again the reduced network with kept weights. Initialize the training with their values obtained in the previous step. Repeat from step 1.

Radial Basis Function Networks

Radial Basis Function (RBF) Network Becoming an increasingly popular neural network. Is probably the main rival to the MLP. Completely different approach by viewing the design of a neural network as an approximation problem in high-dimensional space. Uses radial functions as activation function.

Gaussian RBF Typical radial function is the Gaussian RBF. Response decreases with distance from a central point. Parameters: center c width (radius r) r radius c - center

Local vs. global units Local Global they are localized (i.e., non-zero) just in the certain part of the space Gaussian Global sigmoid, linear - Gaussian is local function Global Local

MLP RBF classification using global (MLP) and local (RBF) units Pavel Kordík, Data Mining lecture, FEL, ČVUT, 2009

RBFN architecture Each of n compo-nents of the input vector x feeds forward to m basis functions whose outputs are linearly combined with weights w (i.e. dot product x∙w) into the network output f(x). no weights x1 h1 h2 W1 x2 W2 h3 x3 W3 f(x) Wm xn hm Input layer Hidden layer (RBFs) Output layer Pavel Kordík, Data Mining lecture, FEL, ČVUT, 2009

Pavel Kordík, Data Mining lecture, FEL, ČVUT, 2009 Σ - 2D Gaussian

The basic architecture for a RBF is a 3-layer network. The input layer is simply a fan-out layer and does no processing. The hidden layer performs a non-linear mapping from the input space into a (usually) higher dimensional space in which the patterns become linearly separable. The output layer performs a simple weighted sum (i.e. w∙x). If the RBFN is used for regression then this output is fine. However, if pattern classification is required, then a hard-limiter or sigmoid function could be placed on the output neurons to give 0/1 output values

Clustering The unique feature of the RBF network is the process performed in the hidden layer. The idea is that the patterns in the input space form clusters. If the centres of these clusters are known, then the distance from the cluster centre can be measured.

Beyond this area, the value drops dramatically. Furthermore, this distance measure is made non-linear, so that if a pattern is in an area that is close to a cluster centre it gives a value close to 1. Beyond this area, the value drops dramatically. The notion is that this area is radially symmetrical around the cluster centre, thus the non-linear function becomes known as the radial-basis function. non-linearly transformed distance distance from the center of the cluster

RBFN for classification Category 1 Category 1 Category 2 Category 2 Σ Σ

RBFN for regression Z http://diwww.epfl.ch/mantra/tutorial/english/rbf/html/, ale tato stranka jiz nefunguje

XOR problem 1 - 0,0 and 1,1 gives 0 as output 1

XOR problem 2 inputs x1, x2, 2 hidden units, one output The parameters of hidden neurons are set as center: c1 = <0,0>, c2 = <1,1> radius: r is chosen such that 2r2 = 1 φ1, φ2 are oputputs from hidden neurons x1 x2 h1 h2 φ1 φ2 x1 x2 φ1 φ2 1 0.1 0.4

Linear classifier is represented by the output layer. 1 0,1 1,1 1 1,1 0,1 1,0 0,0 0,0 1,0 1 1 - 0,0 and 1,1 gives 0 as output x1 x2 φ1 φ2 1 0.1 0.4 When mapped into the feature space < h1 , h2 >, two classes become linearly separable. So, a linear classifier with h1(x) and h2(x) as inputs can be used to solve the XOR problem. Linear classifier is represented by the output layer.

RBF Learning Design decision Parameters to be learnt number of hidden neurons max of neurons = number of input patterns min of neurons = determine more neurons – more complex, smaller tolerance Parameters to be learnt centers radii A hidden neuron is more sensitive to data points near its center. This sensitivity may be tuned by adjusting the radius. smaller radius  fits training data better (overfitting) larger radius  less sensitivity, less overfitting, network of smaller size, faster execution weights between hidden and output layers

Learning can be divide in two independent tasks: Center and radii determination Learning of output layer weights Learning strategies for RBF parameters Sample center position randomly from the training data Self-organized selection of centers Both layers are learnt using supervised learning

Select centers at random Choose centers randomly from the training set. Radius r is calculated as Weights are found by means of numerical linear algebra approach. Requires a large training set for a satisfactory level of performance.

Self-organized selection of centers centers are selected using k-means clustering algorithm radii are usually found using k-NN find k-nearest centers The root-mean squared distance between the current cluster centre and its k (typically 2) nearest neighbours is calculated, and this is the value chosen for r. The output layer is learnt using a gradient descent technique

Supervised learning Supervised learning of all parameters (centers, radii, weights) using gradient descent. Mathematical formulas for updating all of these parameters. They are not shown here, I don’t want to scare you more than necessary.

RBFN and MLP RBFN trains faster than a MLP Although the RBFN is quick to train, it is slower in retrieving than a MLP. RBFNs are essentially well established statistical techniques being presented as neural networks. Learning mechanisms in statistical neural networks are not biologically plausible. RBFN can give “I don’t know” answer. RBFN construct local approximations to non-linear I/O mapping. MLP construct global approximations to non-linear I/O mapping.