Spline and Kernel method Gaussian Processes

Slides:



Advertisements
Similar presentations
Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan.
Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Polynomial Curve Fitting BITS C464/BITS F464 Navneet Goyal Department of Computer Science, BITS-Pilani, Pilani Campus, India.
An Introduction of Support Vector Machine
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Modeling Uncertainty over time Time series of snapshot of the world “state” we are interested represented as a set of random variables (RVs) – Observable.
Model assessment and cross-validation - overview
Observers and Kalman Filters
Data mining and statistical learning - lecture 6
RECITATION 1 APRIL 9 Polynomial regression Ridge regression Lasso.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Kernel methods - overview
Speeding up multi-task learning Phong T Pham. Multi-task learning  Combine data from various data sources  Potentially exploit the inter-relation between.
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Data mining and statistical learning - lecture 13 Separating hyperplane.
Support Vector Regression (Linear Case:)  Given the training set:  Find a linear function, where is determined by solving a minimization problem that.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
GAUSSIAN PROCESS REGRESSION FORECASTING OF COMPUTER NETWORK PERFORMANCE CHARACTERISTICS 1 Departments of Computer Science and Mathematics, 2 Department.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Study of Sparse Online Gaussian Process for Regression EE645 Final Project May 2005 Eric Saint Georges.
Chapter 6-2 Radial Basis Function Networks 1. Topics Basis Functions Radial Basis Functions Gaussian Basis Functions Nadaraya Watson Kernel Regression.
1 1 Slide Simple Linear Regression Chapter 14 BA 303 – Spring 2011.
Gaussian process regression Bernád Emőke Gaussian processes Definition A Gaussian Process is a collection of random variables, any finite number.
Cao et al. ICML 2010 Presented by Danushka Bollegala.
Gaussian process modelling
PATTERN RECOGNITION AND MACHINE LEARNING
Outline Separating Hyperplanes – Separable Case
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Gaussian Processes Nando de Freitas University of British Columbia June 2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.
Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 6.2: Kalman Filter Jürgen Sturm Technische Universität München.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Today: Lab 9ab due after lecture: CEQ Monday: Quizz 11: review Wednesday: Guest lecture – Multivariate Analysis Friday: last lecture: review – Bring questions.
Mixed Effects Models Rebecca Atkins and Rachel Smith March 30, 2015.
Gaussian Processes Li An Li An
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Efficient Gaussian Process Regression for large Data Sets ANJISHNU BANERJEE, DAVID DUNSON, SURYA TOKDAR Biometrika, 2008.
An Introduction To The Kalman Filter By, Santhosh Kumar.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Gaussian Processes For Regression, Classification, and Prediction.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Cameron Rowe.  Introduction  Purpose  Implementation  Simple Example Problem  Extended Kalman Filters  Conclusion  Real World Examples.
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.
Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer.
RECITATION 4 MAY 23 DPMM Splines with multiple predictors Classification and regression trees.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
RECITATION 2 APRIL 28 Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Probability Theory and Parameter Estimation I
Lecture 3: Linear Regression (with One Variable)
CSE 4705 Artificial Intelligence
Linear Regression (continued)
Machine learning, pattern recognition and statistical data modelling
Machine learning, pattern recognition and statistical data modelling
Lecture 09: Gaussian Processes
Unscented Kalman Filter
Bias and Variance of the Estimator
CSCI 5822 Probabilistic Models of Human and Machine Learning
10701 / Machine Learning Today: - Cross validation,
Unscented Kalman Filter
Lecture 10: Gaussian Processes
Basis Expansions and Generalized Additive Models (2)
Probabilistic Surrogate Models
Uncertainty Propagation
Presentation transcript:

Spline and Kernel method Gaussian Processes Recitation 3 April 30 Spline and Kernel method Gaussian Processes

Penalized Cubic Regression Splines gam() in library “mgcv” gam( y ~ s(x, bs=“cr”, k=n.knots) , knots=list(x=c(…)), data = dataset) By default, the optimal smoothing parameter selected by GCV R Demo 1

Kernel Method Nadaraya-Watson locally constant model locally linear polynomial model How to define “local”? By Kernel function, e.g. Gaussian kernel R Demo 1 R package: “locfit” Function: locfit(y~x, kern=“gauss”, deg= , alpha= ) Bandwidth selected by GCV: gcvplot(y~x, kern=“gauss”, deg= , alpha= bandwidth range)

Gaussian Processes Distribution on functions f ~ GP(m,κ) m: mean function κ: covariance function p(f(x1), . . . , f(xn)) ∼ Nn(μ, K) μ = [m(x1),...,m(xn)] Kij = κ (xi,xj) Idea: If xi, xj are similar according to the kernel, then f(xi) is similar to f(xj) Just as a quick tutorial. A Gaussian process is a distribution over functions. In particular, the process is uniquely defined by a mean function and a covariance function. By definition of the GP, for any set of n locations, the function is normally distributed with mean given by evaluating the mean function at these locations and covariance given by defining the following kernel matrix. For example, a standard covariance function is the squared-exponential or RBF kernel leading to smooth, continually differentiable functions. This function has two parameters, a variance parameter that determines how much the function can vary from it’s mean and a length-scale parameter that determines the smoothness.

Gaussian Processes – Noise free observations Example task: learn a function f(x) to estimate y, from data (x, y) A function can be viewed as a random variable of infinite dimensions GP provides a distribution over functions.

Gaussian Processes – Noise free observations Model (x, f) are the observed locations and values (training data) (x*, f*) are the test or prediction data locations and values. After observing some noise free data (x, f), Length-scale R Demo 2

Gaussian Processes – Noisy observations (GP for Regression) Model (x, y) are the observed locations and values (training data) (x*, f*) are the test or prediction data locations and values. After observing some noisy data (x, y), R Demo 3

Reference Chapter 2 from Gaussian Processes for Machine Learning Carl Edward Rasmussen and Christopher K. I. Williams 527 lecture notes by Emily Fox