ETHEM ALPAYDIN © The MIT Press, 2010 Lecture Slides for.

Slides:



Advertisements
Similar presentations
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Advertisements

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Support Vector Machines
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning 3rd Edition
Supervised Learning Recap
Neural Networks Chapter 9 Joost N. Kok Universiteit Leiden.
Self Organization: Competitive Learning
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
INTRODUCTION TO Machine Learning 2nd Edition
Lecture 14 – Neural Networks
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
6/10/ Visual Recognition1 Radial Basis Function Networks Computer Science, KAIST.
Radial Basis Functions
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Supervised Learning Networks. Linear perceptron networks Multi-layer perceptrons Mixture of experts Decision-based neural networks Hierarchical neural.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Aula 4 Radial Basis Function Networks
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Radial Basis Function (RBF) Networks
Radial-Basis Function Networks
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.
Radial Basis Function Networks
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
Radial Basis Function Networks
Soft Computing Colloquium 2 Selection of neural network, Hybrid neural networks.
Biointelligence Laboratory, Seoul National University
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
EM and expected complete log-likelihood Mixture of Experts
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Machine Learning CSE 681 CH2 - Supervised Learning.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Today Ensemble Methods. Recap of the course. Classifier Fusion
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Fast Learning in Networks of Locally-Tuned Processing Units John Moody and Christian J. Darken Yale Computer Science Neural Computation 1, (1989)
Review for final exam 2015 Fundamentals of ANN RBF-ANN using clustering Bayesian decision theory Genetic algorithm SOM SVM.
CHAPTER 14 Competitive Networks Ming-Feng Yeh.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning 12. Local Models.
Machine Learning Supervised Learning Classification and Regression
Deep Feedforward Networks
Multilayer Perceptrons
Neural Networks Winter-Spring 2014
Data Mining Lecture 11.
Neuro-Computing Lecture 4 Radial Basis Function Network
CSE 573 Introduction to Artificial Intelligence Neural Networks
INTRODUCTION TO Machine Learning
CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis
Radial Basis Functions: Alternative to Back Propagation
Linear Discrimination
Presentation transcript:

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for

Introduction Function approximation: devide the input space into local regions and learn simple (constant/linear) models in each patch Unsupervised: (Competitive model: online clustering) Adaptive resonance theory (ART) Self-organizing map (SOM) Supervised: Radial-basis function network (RBF) Mixture of experts (MoE) 3 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Competitive Learning Online k-means 4 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Online k-means The advantages of an online method: We do not need extra memory to store the whole training set. Updates at each step are simple to implement. The input distribution may change in time. Hebbian learning Online k-means algorithm Ps. The batch version is given in Figure Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Winner-take-all If one group wins and gets updated, and the others are not updated at all. The winner-take-all competitive neural network A network of k perceptrons with recurrent connections at the output Dashed lines are recurrent connections, of which the ones that end with an arrow are excitatory and the ones that end with a circle are inhibitory. Each unit at the output reinforces its value and tries to suppress the other outputs. Under a suitable assignment of these recurrent weights, the maximum suppresses all the others. 6 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Adaptive Resonance Theory Incremental; add a new cluster if not covered; defined by vigilance, ρ (Carpenter and Grossberg, 1988) Create a new class Belong to m i 7 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Self-Organizing Maps Units have a neighborhood defined; m i is “between” m i-1 and m i+1, and are all updated together One-dim map: (Kohonen, 1990) 8 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Radial-Basis Functions Locally-tuned units: Bell-shaped function Figure 12.8 Basis function 9 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Local vs Distributed Representation Distributed representation: The input is encoded by the simultaneous activation of many hidden unit For example: Multilayer perceptrons (MLP) Local representation For a given input, only one or a few unit are active For example: Radial Basis functions 10 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Local vs Distributed Representation 11 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Training RBF Hybrid learning: (Figure 12.8) First layer centers and spreads: Unsupervised k-means Second layer weights: Supervised gradient-descent Fully supervised (Broomhead and Lowe, 1988; Moody and Darken, 1989) 12 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Regression Batch error 13 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Classification Update rules can similarly be derived using gradient descent. (exercise 2) Cross-entropy error 14 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Rules and Exceptions Default rule Exceptions 15 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Rule-Based Knowledge In the RBF framework, the rule is encoded by two Gaussian unit as Incorporation of prior knowledge (before training) The better our initial knowledge of a problem, the faster we can achieve good performance and the less training that is required. Fuzzy membership functions and fuzzy rules 16 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Normalized Basis Functions Why normalized? To make sure that for any input there is at least one nonzero unit. To make sure that the values of the local units sum up to 1. Before normalization After normalization 17 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Competitive Basis Functions Competitive Basis Function: In an RBF network the final output is determined as a weighted sum of the contributions of the local units. In regression, we minimize equation based on the probabilistic model Assume the output is drawn from a mixture model: Mixture proportions Mixture components generating the output if that component is chosen h: hidden unit 18 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Regression Let us take the case of regression where the components are Gaussian. Log likelihood Using gradient ascent to maximize the log likelihood ghgh 19 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Classification In classification, each component by itself is a multinomial. Log likelihood ghgh 20 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

EM for RBF (Supervised EM) E-step: M-step: ghgh 21 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Learning Vector Quantization H units per class prelabeled (Kohonen, 1990) Given x, m i is the closest: If the closest center has the correct label, it is moved toward the input to better represent it. If it belongs to the wrong class, it is moved away from the input in the expectation that if it is moved sufficiently away, a center of the correct class will be the closest in a future iteration. x mimi mjmj 22 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Mixture of Experts In RBF, each local fit is a constant, w ih, second layer weight In MoE, each local fit is a linear function of x, a local expert: The mixture of experts can be seen as an RBF network where the second-layer weights are output of linear models. Only one linear model is shown for clarity. ( Jacobs et al., 1991) MoE (mixture of experts) is a piecewise linear approximation. 23 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

MoE as Models Combined Radial gating: Softmax gating: Local experts Gating network The mixture of experts can be seen as a model for combining multiple models. w h are the models and the gating network is another model determining the weight of each model, as given by g h. 24 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Cooperative MoE Regression In the cooperative case: There is no force on the unit to be localized. In the competitive case: To increase the likelihood, unit are forced to be localized with more separation between them and smaller spreads. 25 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Competitive MoE: Regression Regression Classification 26 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Hierarchical Mixture of Experts Tree of MoE where each MoE is an expert in a higher- level MoE Soft decision tree: Takes a weighted (gating) average of all leaves (experts), as opposed to using a single path and a single leaf Can be trained using EM (Jordan and Jacobs, 1994) 27 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Discussion Cooperative model v.s. competitive model The cooperative model is generally more accurate The competitive model learns faster Why? In the cooperative case, models overlap more and implement a smoother approximation. It is preferable in regression problems. The competitive model makes a harder split; generally only one expert is active for an input and there fore learning is faster. 28 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)