Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pattern Classification & Decision Theory. How are we doing on the pass sequence? Bayesian regression and estimation enables us to track the man in the.
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Biointelligence Laboratory, Seoul National University
Pattern Recognition and Machine Learning
Computer vision: models, learning and inference Chapter 8 Regression.
Supervised Learning Recap
On-Line Probabilistic Classification with Particle Filters Pedro Højen-Sørensen, Nando de Freitas, and Torgen Fog, Proceedings of the IEEE International.
Linear Models for Classification: Probabilistic Methods
Chapter 4: Linear Models for Classification
Computer vision: models, learning and inference
What is Statistical Modeling
Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Pattern Recognition and Machine Learning
x – independent variable (input)
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Giansalvo EXIN Cirrincione unit #7/8 ERROR FUNCTIONS part one Goal for REGRESSION: to model the conditional distribution of the output variables, conditioned.
Neural Networks: A Statistical Pattern Recognition Perspective
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
A Unifying Review of Linear Gaussian Models
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Ch 6. Kernel Methods by Aizerman et al. (1964). Re-introduced in the context of large margin classifiers by Boser et al. (1992). Vapnik (1995), Burges.
Ch 6. Kernel Methods Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J. S. Kim Biointelligence Laboratory, Seoul National University.
Biointelligence Laboratory, Seoul National University
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Gaussian Processes Li An Li An
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Linear Models for Classification
Dropout as a Bayesian Approximation
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
3. Linear Models for Regression 後半 東京大学大学院 学際情報学府 中川研究室 星野 綾子.
Pattern Recognition and Machine Learning
CEE 6410 Water Resources Systems Analysis
Deep Feedforward Networks
Ch 12. Continuous Latent Variables ~ 12
Probability Theory and Parameter Estimation I
CH 5: Multivariate Methods
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Machine Learning Basics
Statistical Learning Dong Liu Dept. EEIS, USTC.
دسته بندی با استفاده از مدل های خطی
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Generally Discriminant Analysis
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Multivariate Methods Berlin Chen, 2005 References:
Linear Discrimination
Uncertainty Propagation
Presentation transcript:

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav

Arizona State University DMML Gaussian Processes Extending role of kernels to probabilistic discriminative models leads to framework of Gaussian processes Linear regression model –Evaluate posterior distribution over W Gaussian Processes: Define probability distribution over functions directly

Arizona State University DMML Linear regression x - input vector w – M Dimensional weight vector Prior distribution of w given by the Gaussian form Prior distribution over w induces a probability distribution over function y(x)

Arizona State University DMML Linear regression Y is a linear combination of Gaussian distributed variables given by elements of W, where is the design matrix with elements We need only mean and covariance to find the joint distribution of Y where K is the Gram matrix with elements

Arizona State University DMML Gaussian Processes Defn. : Probability distributions over functions y(x) such that the set of values of y(x) evaluated at an arbitrary set of points jointly have a gaussian distribution –Mean is assumed zero –Covariance of y(x) evaluated at any two values of x is given by the kernel function

Arizona State University DMML Gaussian Processes for regression To apply Gaussian process models for regression we need to take account of noise on observed target values Consider noise processes with gaussian distribution with To find marginal distribution over ‘t’ we need to integrate over ‘Y’ where covariance matrix C has elements

Arizona State University DMML Gaussian Processes for regression Joint distribution over is given by Conditional distribution of is a Gaussian distribution with mean and covariance given by where and is N*N covariance matrix

Arizona State University DMML Learning the hyperparameters Rather than fixing the covariance function we can use a parametric family of functions and then infer the parameter values from the data Evaluation of likelihood function where denotes the hyperparameters of Gaussian process model Simplest approach is to make a point estimate of by maximizing the log likelihood function

Arizona State University DMML Gaussian Process for classification We can adapt gaussian processes to classification problems by transforming the output using an appropriate nonlinear activation function –Define Gaussian process over a function a(x), and transform using Logistic sigmoid function,we obtain a non-Gaussian stochastic process over functions

Arizona State University DMML The left plot shows a sample from the Gaussian process prior over functions a(x). The right plot shows the result of transforming this sample using a logistic sigmoid function. Probability distribution function over target variable is given by Bernoulli distribution on one dimensional input space

Arizona State University DMML Gaussian Process for classification To determine the predictive distribution we introduce a Gaussian process prior over vector, the Gaussian prior takes the form The predictive distribution is given by where

Arizona State University DMML Gaussian Process for classification The integral is analytically intractable so may be approximated using sampling methods. Alternatively techniques based on analytical approximation can be used –Variational Inference –Expectation propagation –Laplace approximation

Arizona State University DMML Illustration of Gaussian process for classification Optimal decision boundary – Green Decision boundary from Gaussian Process classifier - Black

Arizona State University DMML Connection to Neural Networks For a broad class of prior distributions over w, the distribution of functions generated by a neural network will tend to a Gaussian process as M -> Infinity In this Gaussian process limit the ouput variables of the neural network become independent.

Arizona State University DMML Thank you