Gaussian Processes Li An Li An

Slides:

Advertisements

Similar presentations

Copula Regression By Rahul A. Parsa Drake University &

Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

Pattern Recognition and Machine Learning: Kernel Methods.

Computer vision: models, learning and inference Chapter 8 Regression.

CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct

2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)

4/15/2017 Using Gaussian Process Regression for Efficient Motion Planning in Environments with Deformable Objects Barbara Frank, Cyrill Stachniss, Nichola.

Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.

Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.

Kalman’s Beautiful Filter (an introduction) George Kantor presented to Sensor Based Planning Lab Carnegie Mellon University December 8, 2000.

Pattern Recognition and Machine Learning

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)

Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

5-1 Two Discrete Random Variables Example Two Discrete Random Variables Figure 5-1 Joint probability distribution of X and Y in Example 5-1.

The Multivariate Normal Distribution, Part 2 BMTRY 726 1/14/2014.

5-1 Two Discrete Random Variables Example Two Discrete Random Variables Figure 5-1 Joint probability distribution of X and Y in Example 5-1.

Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.

Computer vision: models, learning and inference Chapter 5 The Normal Distribution.

Gaussian process regression Bernád Emőke Gaussian processes Definition A Gaussian Process is a collection of random variables, any finite number.

Cao et al. ICML 2010 Presented by Danushka Bollegala.

Ch 6. Kernel Methods Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J. S. Kim Biointelligence Laboratory, Seoul National University.

Julian Center on Regression for Proportion Data July 10, 2007 (68)

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

University of Southern California Department Computer Science Bayesian Logistic Regression Model (Final Report) Graduate Student Teawon Han Professor Schweighofer,

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Predicting Output from Computer Experiments Design and Analysis of Computer Experiments Chapter 3 Kevin Leyton-Brown.

-Arnaud Doucet, Nando de Freitas et al, UAI

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.

Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan.

Linear Models for Classification

Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.

Gaussian Processes For Regression, Classification, and Prediction.

Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Introduction to Multilevel Analysis Presented by Vijay Pillai.

Sparse Approximate Gaussian Processes. Outline Introduction to GPs Subset of Data Bayesian Committee Machine Subset of Regressors Sparse Pseudo GPs /

Gaussian Process Networks Nir Friedman and Iftach Nachman UAI-2K.

- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

3. Linear Models for Regression 後半東京大学大学院学際情報学府中川研究室星野綾子.

CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Chapter 3: Maximum-Likelihood Parameter Estimation

Probability Theory and Parameter Estimation I

Background on Classification

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

Computer vision: models, learning and inference

Non-Parametric Models

Lecture 09: Gaussian Processes

Kalman’s Beautiful Filter (an introduction)

Machine Learning Basics

Overview of Supervised Learning

Classification Discriminant Analysis

CSCI 5822 Probabilistic Models of Human and Machine Learning

Propagating Uncertainty In POMDP Value Iteration with Gaussian Process

Classification Discriminant Analysis

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

10701 / Machine Learning Today: - Cross validation,

OVERVIEW OF LINEAR MODELS

Lecture 10: Gaussian Processes

数据的矩阵描述.

Introduction to Radial Basis Function Networks

Parametric Methods Berlin Chen, 2005 References:

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Machine Learning – a Probabilistic Perspective

Probabilistic Surrogate Models

Presentation transcript:

Gaussian Processes Li An Li An

The Plan Introduction to Gaussian Processes Revisit Linear regression Linear regression updated by Gaussian Processes Gaussian Processes for Regression Conclusion Introduction to Gaussian Processes Revisit Linear regression Linear regression updated by Gaussian Processes Gaussian Processes for Regression Conclusion

Why GPs? Here are some data points! What function did they come from? I have no idea. Oh. Okay. Uh, you think this point is likely in the function too? I have no idea. Here are some data points! What function did they come from? I have no idea. Oh. Okay. Uh, you think this point is likely in the function too? I have no idea.

Why GPs? You can’t get anywhere without making some assumptions GPs are a nice way of expressing this ‘prior on functions’ idea. Can do a bunch of cool stuff Regression Classification Optimization You can’t get anywhere without making some assumptions GPs are a nice way of expressing this ‘prior on functions’ idea. Can do a bunch of cool stuff Regression Classification Optimization

Gaussian Unimodal Concentrated Easy to compute with Sometimes Tons of crazy properties Unimodal Concentrated Easy to compute with Sometimes Tons of crazy properties

Linear Regression Revisited Linear regression model: Combination of M fixed basis functions given by, so that Prior distribution Given training data points, what is the joint distribution of ? is the vector with elements, this vector is given by where is the design matrix with elements Linear regression model: Combination of M fixed basis functions given by, so that Prior distribution Given training data points, what is the joint distribution of ? is the vector with elements, this vector is given by where is the design matrix with elements

Linear Regression Revisited, y is a linear combination of Gaussian distributed variables given by the elements of w, hence itself is Gaussian. Find its mean and covariance, y is a linear combination of Gaussian distributed variables given by the elements of w, hence itself is Gaussian. Find its mean and covariance

Definition of GP A Gaussian process is defined as a probability distribution over functions y(x), such that the set of values of y(x) evaluated at an arbitrary set of points x1,.. Xn jointly have a Gaussian distribution. Probability distribution indexed by an arbitrary set Any finite subset of indices defines a multivariate Gaussian distribution Input space X, for each x the distribution is a Gaussian, what determines the GP is The mean function µ(x) = E(y(x)) The covariance function (kernel) k(x,x')=E(y(x)y(x')) In most applications, we take µ(x)=0. Hence the prior is represented by the kernel. A Gaussian process is defined as a probability distribution over functions y(x), such that the set of values of y(x) evaluated at an arbitrary set of points x1,.. Xn jointly have a Gaussian distribution. Probability distribution indexed by an arbitrary set Any finite subset of indices defines a multivariate Gaussian distribution Input space X, for each x the distribution is a Gaussian, what determines the GP is The mean function µ(x) = E(y(x)) The covariance function (kernel) k(x,x')=E(y(x)y(x')) In most applications, we take µ(x)=0. Hence the prior is represented by the kernel.

Linear regression updated by GP Specific case of a Gaussian Process It is defined by the linear regression model with a weight prior the kernel function is given by Specific case of a Gaussian Process It is defined by the linear regression model with a weight prior the kernel function is given by

Kernel function We can also define the kernel function directly. The figure show samples of functions drawn from Gaussian processes for two different choices of kernel functions We can also define the kernel function directly. The figure show samples of functions drawn from Gaussian processes for two different choices of kernel functions

GP for Regression Take account of the noise on the observed target values, which are given by Take account of the noise on the observed target values, which are given by

GP for regression From the definition of GP, the marginal distribution p(y) is given by The marginal distribution of t is given by Where the covariance matrix C has elements From the definition of GP, the marginal distribution p(y) is given by The marginal distribution of t is given by Where the covariance matrix C has elements

GP for Regression The sampling of data points t

GP for Regression We’ve used GP to build a model of the joint distribution over sets of data points Goal: To find, we begin by writing down the joint distribution We’ve used GP to build a model of the joint distribution over sets of data points Goal: To find, we begin by writing down the joint distribution

GP for Regression The conditional distribution is a Gaussian distribution with mean and covariance given by These are the key results that define Gaussian process regression. The predictive distribution is a Gaussian whose mean and variance both depend on The conditional distribution is a Gaussian distribution with mean and covariance given by These are the key results that define Gaussian process regression. The predictive distribution is a Gaussian whose mean and variance both depend on

A Example of GP Regression

GP for Regression The only restriction on the kernel is that the covariance matrix given by must be positive definite. GP will involve a matrix of size n*n, for which require computations. The only restriction on the kernel is that the covariance matrix given by must be positive definite. GP will involve a matrix of size n*n, for which require computations.

Conclusion Distribution over functions Jointly have a Gaussian distribution Index set can be pretty much whatever Reals Real vectors Graphs Strings … Most interesting structure is in k(x,x ’ ), the ‘kernel.’ Uses for regression to predict the target for a new input Distribution over functions Jointly have a Gaussian distribution Index set can be pretty much whatever Reals Real vectors Graphs Strings … Most interesting structure is in k(x,x ’ ), the ‘kernel.’ Uses for regression to predict the target for a new input

Questions Thank you!