Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pattern Recognition and Machine Learning
What Could We Do better? Alternative Statistical Methods Jim Crooks and Xingye Qiao.
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Regression Eric Feigelson Lecture and R tutorial Arcetri Observatory April 2014.
Linear Regression.
Pattern Recognition and Machine Learning
Biointelligence Laboratory, Seoul National University
Pattern Recognition and Machine Learning
Pattern Recognition and Machine Learning: Kernel Methods.
Model assessment and cross-validation - overview
Chapter 4: Linear Models for Classification
Data mining and statistical learning - lecture 6
Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University
Visual Recognition Tutorial
Pattern Recognition and Machine Learning
6/10/ Visual Recognition1 Radial Basis Function Networks Computer Science, KAIST.
Giansalvo EXIN Cirrincione unit #7/8 ERROR FUNCTIONS part one Goal for REGRESSION: to model the conditional distribution of the output variables, conditioned.
Machine Learning CMPT 726 Simon Fraser University
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
ECIV 301 Programming & Graphics Numerical Methods for Engineers Lecture 24 Regression Analysis-Chapter 17.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Statistical analysis and modeling of neural data Lecture 4 Bijan Pesaran 17 Sept, 2007.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
Radial Basis Function Networks
Chapter 6-2 Radial Basis Function Networks 1. Topics Basis Functions Radial Basis Functions Gaussian Basis Functions Nadaraya Watson Kernel Regression.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
PATTERN RECOGNITION AND MACHINE LEARNING
Biointelligence Laboratory, Seoul National University
EM and expected complete log-likelihood Mixture of Experts
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning 5. Parametric Methods.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
RECITATION 2 APRIL 28 Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Deep Feedforward Networks
Probability Theory and Parameter Estimation I
Dept. Computer Science & Engineering, Shanghai Jiao Tong University
Ch3: Model Building through Regression
Machine learning, pattern recognition and statistical data modelling
Special Topics In Scientific Computing
Data Mining Lecture 11.
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Akio Utsugi National Institute of Bioscience and Human-technology,
Modelling data and curve fitting
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Recognition and Machine Learning
Biointelligence Laboratory, Seoul National University
Robust Full Bayesian Learning for Neural Networks
Parametric Methods Berlin Chen, 2005 References:
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Presentation transcript:

Gaussian Process and Prediction

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression  Weight-space view  Function-space view  Spline smoothing  Neural network  Classification problem Active Data Selection  Maximizing the expected information gain  Minimizing the regression error  Experimental result Mixtures of Gaussian Process

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)3 Gaussian Process and Bayesian Regression (1) A distribution of y in Bayesian regression Generalized linear regression Weight-space view

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)4 Gaussian Process and Bayesian Regression (2) Function-space view  Y(x) is a linear combination of Gaussian random variables W ~ N(0,  )  { Y x } is a Gaussian Process with mean and covariance functions:  can be predicted from conditional distributions

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)5 Gaussian Process and Bayesian Regression (3) Weight-space view and function-space view gave same results For a smaller number of basis functions, weight space-view is preferred, while for a larger number of basis functions, function space-view (Gaussian procees view) is better. Cf. Nonparametric Kernel estimator for a density p(y) :

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)6 Spline Smoothing (1) Interpolating spline I nterpolation spline is a cubic polynomial defined piecewise between adjacent knots with continuous second derivative(Schoenberg (1964)) Smoothing spline   interpolating spline.   least squares linear fit.   Smoothing spline is also a cubic spline ( Reinsch (1967) )

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)7 Spline Smoothing (2) Linear smoothing property of smoothing spline If the design is equally spaced, then all of the n component smoothing splines are identical in shape. And the shape converged to the kernel (Silverman (1984)). Cf. Nonparametric kernel regression (Nadaraya(1964) and Watson(1964):

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)8 Spline Smoothing (3) Spline estimation procedure can be interpreted as a Bayesian MAP:   When p=2: the resulting is a cubic spline ( a piesewise cubic function that has knots at the data points.)

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)9 Spline Smoothing (4) Spline priors are Gaussian processes   Gaussian Process:

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)10 Spline Smoothing (5) Splines correspond to Gaussian processes with a particular choice of covariance function.

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)11 Known covariance function for modeling : (e.g.)

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)12 Covariance function with unknown parameters  For a smaller number of parameters: choose a parametric family of covaiance function and estimate by log likelihood. For a larger number of parameters or for a local maxima etc.:use a prior distribution of parameters numerically.

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)13 Multilayer Neural Networks and Gaussian Process The properties of neural network with one hidden layer converge to those of a gaussian process as the number of hidden neurons tends to infinity if standard weight decay priors are assumed. (Neal (1996)) The corresponding covariance of this gaussian process depends on the priors on the weights and activation functions of the hidden units in the network.

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)14 Classification Problems Estimate the posterior p ( k | x ) for each class k with Find a distribution by a Gaussian process prior of activation y(x) through a logistic regression. Make a prediction for a test input x * by ( Apply appropriate Jacobian to the above for a distribution of ) When p(t|y) is Gaussian : exact expression When : no exact expression (use analytic approximation or MCMC)

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)15 Active data Selection (1) Maximizing the expected information gain criterior (Mckay (1992)).  By selecting the data with maximum predictor variance Minimizing the error of (Cohn (1996)) : minimum overall variance.

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)16 Active data Selection (2) (a) Target function from a covariance function (b) Expected change of average variance over x for 100 reference points

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)17 Active data Selection (3) Experiments :  First data is selected random  150 data are selected actively  500 reference points for error evaluation  Optimum query was selected using 300 random reference points.

(C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)18 Active data Selection (4) For real data: pumadyn-8nm (puma560 robot arm) 250 data points for active selecting, 400 reference points