Presentation on theme: "Kriging 1. Cost of surrogates In linear regression, the process of fitting involves solving a set of linear equations once. With radial basis neural networks."— Presentation transcript:
Cost of surrogates In linear regression, the process of fitting involves solving a set of linear equations once. With radial basis neural networks we have to optimize the selection of neurons, which will again entail multiple solutions of the linear system. We may find the best spread by minimizing cross-validation errors. Kriging, our next surrogate is even more expensive, we have a spread constant in every direction and we have to perform optimization to calculate the best set of constants (hyperparameters). –With many hundreds of data points this can become significant computational burden.
Introduction to Kriging Method invented in the 1950s by South African geologist Danie G. Krige ( ) for predicting distribution of minerals. –Formalized by French engineer, Georges Matheron in –Statisticians refer to a more general Gaussian Process regression. Became very popular for fitting surrogates to expensive computer simulations in the 21 st century. It is one of the best surrogates available. It probably became popular late mostly because of the high computer cost of fitting it to data.
Kriging philosophy We assume that the data is sampled from an unknown function that obeys simple correlation rules. The value of the function at a point is correlated to the values at neighboring points based on their separation in different directions. The correlation is strong to nearby points and weak with far away points, but strength does not change based on location. Normally Kriging is used with the assumption that there is no noise so that it interpolates exactly the function values.
Reminder: Covariance and Correlation Covariance of two random variables X and Y The covariance of a random variable with itself is the square of the standard deviation. Covariance matrix for a vector contains the covariances of the components. Correlation The correlation matrix has 1 on the diagonal.
Correlation between function values at nearby points for sine(x)
Gaussian correlation function
Universal Kriging Trend function is most often a low order polynomial We will cover Ordinary Kriging, where linear trend is just a constant to be estimated by data. There is also Simple Kriging, where constant is assumed to be known (often as 0). Assumption: Systematic departures Z(x) are correlated. Kriging prediction comes with a normal distribution of the uncertainty in the prediction. x y Kriging Sampling data points Systematic Departure Linear Trend Model Trend model Systematic departure
Prediction and shape functions
Fitting the data
Kriging fitting issues The maximum likelihood or cross-validation optimization problem solved to obtain the kriging fit is often ill- conditioned leading to poor fit, or poor estimate of the prediction variance. Poor estimate of the prediction variance can be checked by comparing it to the cross validation error. Poor fits are often characterized by the kriging surrogate having large curvature near data points (see example on next slide). It is recommended to visualize by plotting the kriging fit and its standard error.
Example of poor fits.
Good fit and poor standard error SE: standard error
Kriging with nuggets Nuggets – refers to the inclusion of noise at data points. The more general Gaussian Process surrogates or Kriging with nuggets can handle data with noise (e.g. experimental results with noise).
Introduction to optimization with surrogates Based on cycles. Each consists of sampling design points by simulations, fitting surrogates to simulations and then optimizing an objective. Construct surrogate, optimize original objective, refine region and surrogate. –Typically small number of cycles with large number of simulations in each cycle. Adaptive sampling –Construct surrogate, add points by taking into account not only surrogate prediction but also uncertainty in prediction. –Most popular, Jones’s EGO (Efficient Global Optimization). –Easiest with one added sample at a time.