Probabilistic Surrogate Models

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Probability and Maximum Likelihood. How are we doing on the pass sequence? This fit is pretty good, but… Hand-labeled horizontal coordinate, t The red.
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
The Normal Distribution
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Copula Regression By Rahul A. Parsa Drake University &
Pattern Recognition and Machine Learning
Pattern Recognition and Machine Learning
Kriging.
Cost of surrogates In linear regression, the process of fitting involves solving a set of linear equations once. For moving least squares, we need to form.
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
SOLVED EXAMPLES.
Integration of sensory modalities
The General Linear Model. The Simple Linear Model Linear Regression.
Visual Recognition Tutorial
Classification and risk prediction
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
Visual Recognition Tutorial
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Lecture 7 1 Statistics Statistics: 1. Model 2. Estimation 3. Hypothesis test.
Modern Navigation Thomas Herring
Review of Lecture Two Linear Regression Normal Equation
Gaussian process regression Bernád Emőke Gaussian processes Definition A Gaussian Process is a collection of random variables, any finite number.
PATTERN RECOGNITION AND MACHINE LEARNING
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Geo479/579: Geostatistics Ch12. Ordinary Kriging (1)
Measures of Variability Objective: Students should know what a variance and standard deviation are and for what type of data they typically used.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Gaussian Processes Li An Li An
CY3A2 System identification1 Maximum Likelihood Estimation: Maximum Likelihood is an ancient concept in estimation theory. Suppose that e is a discrete.
Lecture 2: Statistical learning primer for biologists
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Gaussian Processes For Regression, Classification, and Prediction.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Machine Learning 5. Parametric Methods.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Gaussian Process Networks Nir Friedman and Iftach Nachman UAI-2K.
- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and 2 Now, we need procedures to calculate  and 2 , themselves.
Usman Roshan CS 675 Machine Learning
Probability Theory and Parameter Estimation I
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
CH 5: Multivariate Methods
Inference for the mean vector
Special Topics In Scientific Computing
CSCI 5822 Probabilistic Models of Human and Machine Learning
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Information Based Criteria for Design of Experiments
Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E
Modelling data and curve fitting
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
Integration of sensory modalities
Pattern Recognition and Machine Learning
Computing and Statistical Data Analysis / Stat 7
#21 Marginalize vs. Condition Uninteresting Fitted Parameters
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and 2 Now, we need procedures to calculate  and 2 , themselves.
Mathematical Foundations of BME Reza Shadmehr
Moments of Random Variables
Uncertainty Propagation
Optimization under Uncertainty
Presentation transcript:

Probabilistic Surrogate Models

Probabilistic Surrogate Models It is often useful to quantify confidence in a surrogate model One approach is to use a probabilistic model that quantifies our confidence A common probabilistic model is the Gaussian process

Gaussian Distribution Also called normal distribution Univariate Gaussian distribution is parameterized by mean µ and variance σ2 Multivariate Gaussian distribution is parameterized by mean vector µ and covariance matrix ∑ Probability density at x is given by

Gaussian Distribution Sampling a value from a Gaussian is written as Two jointly Gaussian random variables a and b are written as Each variable’s marginal distribution is written as A variable’s conditional distribution is written as

Gaussian Distribution Given the full set of parameters µ and ∑, the parameters for conditional distributions can be easily computed

Gaussian Processes A Gaussian Process extends the idea of Gaussian distributions to functions For any finite set of points, the distribution over the function evaluations at those points is written as Here, m(x) is the mean function and k(x, x’) is the covariance function or kernel

Gaussian Processes A common kernel function is the squared exponential kernel where ℓ is the characteristic length-scale, which is the distance required for the function to change significantly

Gaussian Processes Examples of the same Gaussian process with different kernel functions

Gaussian Processes Example of multivariate Gaussian with squared-exponential kernel at different length scales

Prediction Given a set of function evaluations 𝑦 evaluated at points 𝑥 and model parameters µ and ∑, a Gaussian process can be used to predict the function evaluations 𝑦 at new points 𝑥 ∗ . This can be written as where

Prediction The predicted values can be found using the conditional distribution using the conditional distribution formulas previously presented The predicted mean and variance of 𝑦 is

Prediction Using variance to compute standard deviation, the predicted mean and standard deviation can be computed at any point This enables calculation of the 95% confidence region

Gradient Measurements If function gradient evaluations can be made as well, the process can be extended to include gradient predictions for higher prediction fidelity

Noisy Measurements If function evaluations are affected by zero-mean noise, then the joint distribution can be augmented and solved the same way

Fitting Gaussian Processes The choice of kernel and parameters can be found using cross validation methods Instead of minimizing squared error, we maximize the likelihood of the data To avoid working with numbers of greatly disparate orders of magnitude, the log-likelihood is maximized in practice

Fitting Gaussian Processes The log-likelihood formula for a Gaussian process with zero- mean noise Can be maximized using gradient ascent Gradient formula for log-likelihood is where

Summary Gaussian processes are probability distributions over functions The choice of kernel affects the smoothness of the functions sampled from a Gaussian process The multivariate normal distribution has analytic conditional and marginal distributions We can compute the mean and standard deviation of our prediction of an objective function at a particular design point given a set of past evaluations

Summary We can incorporate gradient observations to improve our predictions of the objective value and its gradient We can incorporate measurement noise into a Gaussian process We can fit the parameters of a Gaussian process using maximum likelihood