Presentation is loading. Please wait.

Presentation is loading. Please wait.

Surrogate Response Surface Methods for Optimization,

Similar presentations


Presentation on theme: "Surrogate Response Surface Methods for Optimization,"— Presentation transcript:

1 Surrogate Response Surface Methods for Optimization,
Parameter Estimation, and Uncertainty Quantification of Multi-modal, Computationally Expensive Models (With Environmental Applications) Christine A. Shoemaker Civil and Environmental Engineering. & Operations Research and Information Engineering. Cornell University With results from joint work with others as noted NSF Institute for Mathematical Applications “Large-Scale Inverse Problems and Uncertainty Quantification” Workshop June 5-10, 2011 (References given at end of the slides; contact for codes)

2 Acknowledgements This work has been a collaboration with others including: R. Regis, S. Wild , Y. Wang (Optimization) A.Singh, A. Espinet and P. Mugunthan (Environmental Applications) N. Bliznyuk and D. Ruppert (uncertainty quantification), Funding by NSF from CISE, Math & Phys Science, Geoscience, and Engineering Directorates

3 Outline of Talk on Analysis for Computationally Expensive Nonlinear Simulation Models
Background Global Optimization with Surrogates Integration of global optimization into Uncertainty Quantification

4 Calibration Optimization
This can be a measure of error between model prediction and observations Our goal is to find the minimum of f(x) where x є D . Let Fmax be the maximum number of function evaluations (e.g. simulations) We want Fmax to be small because f(x) is “costly” to evaluate X can be parameter values

5 Context (i.e. Is this of Interest to You?)
We are dealing with general models for which one can make few mathematical assumptions. The models are NOT assumed to be linear, Gaussian, or unimodal, and we do not have bounds on amplitude of derivatives. (Almost none of our real applications have simulation models that satisfy any of these criteria.) The primary assumptions are that the simulation models and decision variables are continuous the simulation computation time (minutes to days/run) is significantly longer than is required to compute a response surface approximation (seconds) the decision variables are in a closed (bounded) space. Our methods can be applied without derivatives if they are not available.

6 Global Optimization versus Local Optimizanti Multi-Modal Problems have Multiple local minima
F(x) Local minimum Global minimum X (parameter value)

7 IDEAS: Local Derivative Methods have difficulty when Objective Function can be a Rough Surface because of Numerical Simulation and Embedded Equations Figures below give Objective Function Versus Parameter Value for a Real Watershed Model called SWAT

8 Context--Continued As a result the lack of assumptions (on unimodality, etc.) and the cost of the simulations, we have no evidence (yet) that we can solve problems with more than 40 decision variables. For problems with more variables, it is sometimes possible to do a hierarchical approach, a subset of the variables (lower level) can justifiably be computed from a nonlinear or linear programming solution as a function of the remaining (upper level variables). Then the global optimization problem has only the upper level variables. This greatly reduces the computational effort.

9 Objective Function and Simulation Model
Let G(x) = simulation model (computationally expensive) e.g. a PDE model of groundwater contamination Let Ω= the objective function, which depends on output from the simulation. For example, Ω could be the sum of squared errors between data and output from G(x). Then the function to be minimized is f(x)= Ω(G(x)). We build our response surface on f(x).

10 Ideas: Nonlinear Simulation-Optimization is usually a Global Optimization Problem
If objective function is a nonlinear equation of output from a nonlinear simulation model, the resulting optimization problem is multimodal and hence needs a global optimization algorithm. Hence most nonlinear simulation-optimization problems require global optimization.

11 Why Do Nonlinear Simulation Models Lead to Global Optimization Problems?
Assume simulation model G(x) is nonlinear, and objective function (e.g. an error function) Ω (y) is nonlinear. The simplest nonlinear form is that both of them are quadratic Then the function to be minimized is a fourth order polynomial A fourth order polynomial can be multimodal and hence it can have multiple local minima. In this case, it requires a global optimization method to find the best solution. This is the simplest form of nonlinearity, and in multiple dimensions with more nonlinear terms, we can expect many local minima .

12 Global Optimization versus Local Optimizanti Multi-Modal Problems have Multiple local minima
F(x) Local minimum Global minimum X (parameter value)

13 Response Surface Approximation for Surrogate
Different types of splines can be used for creating the response surface. We use Radial Basis Functions (RBF) plus a linear polynomial for our spline because it works well with scattered data. Some other applications have also used a kriging surface.

14 IDEAS: Response Surface for Objective Function
It has been relatively common for years to create an approximation A-G(x) of the simulation model of G(x) (often called a surrogate model or a low fidelity model). Often this is done either by making the mesh for the PDE model coarser or by using something like a neural net approximation of G(x). Then an optimization method is applied to A-G(x), so it is feasible to do many evaluations of the cheap model A-G(x). I believe it is much better to make the response surface of the objective function f(x)=Ω(G(x)) since f’s sensitivity to changes in x can be different from A-G(x)’s sensitivities.

15 Experimental Design with Symmetric Latin Hypercube (SLHD)
To fit the first Surrogate Approximation we need to have evaluated the function at several points. We use a symmetric Latin Hypercube (SLHD) to pick these initial points.

16 One Dimensional Example of Experimental Design
to Obtain Initial Surrogate Approximation Objective Function f(x) measure of error Costly Function Evaluation (e.g. over .5 hour CPU time for one evaluation). x (parameter value-one dimensional example)

17 Surrogate Approximation with Initial Points from Experimental Design
f(x) x (parameters) In real applications x is multidimensional since there are many parameters (e.g. 10).

18 Surrogate Approximation with Initial Points from Experimental Design
f(x) x (parameters) Two goals in picking next point: small value of surrogate and away from previously evaluated points. (We select based on weighted sum of these two factors.)

19 Update in Surrogate Approximation with New Evaluation
Update done in each iteration for Surrogate Approximation for each algorithm expert. f(x) new x (parameter value) Surrogate Approximation is a guess of the function value of f(x) for all x.

20 Example: Cubic RBF Surrogate
Graph by Stefan Wild

21 Graph by Stefan Wild, ANL-DOE
Example: Multiquadric RBF Surrogate Multiquadric Graph by Stefan Wild, ANL-DOE

22 Why Use Surrogate Approximation Methods?
A Surrogate Approximation R(x) can be used as part of an efficient parallel optimization algorithm in order to reduce the number of points at which we evaluate f(x), and thereby significantly reduce computational cost. Our Surrogate Approximation algorithm searches for the global minimum.

23 Idea: Use previous Simulations
One of the advantages of this approach is that you are making use of all previous simulation results (since they are incorporated in the response surface). A great deal of time has been invested in these previous simulations. Contrast this with derivative-based optimization, which is based entirely on the derivative at the current search point and does not use any of the values from previous function evaluations

24 Outline of Stochastic RBF (LMSRBF) Method1
1. Use a “space filling” experimental design (e.g. SLHD) to select a limited number of evaluation points. 2. Make an approximation of the function (e.g. Radial Basis Functions) based on experimental design points over the entire domain. 3. Use our algorithm to select the next function evaluation point from a set of random candidate points (Note: selection could be based on Surrogate Approximation and location of previously evaluated points; and can use different probability distributions) 4. Construct a new Surrogate Approximation that incorporates the newly evaluated point and all prior points. 5. Stop if reach maximum number of iterations. Otherwise go to Step 3 1 Reference: (Regis and Shoemaker, INFORMS JOC, 2007)

25 Stochastic RBF Algorithm Converges Almost Surely
Regis and Shoemaker, Informs Jn. of Computing, 2007

26 So What Are the Components of a Surrogate Optimization Algorithm after initialization?
Type of surrogate (e.g. polynomial, kriging, radial basis function (RBF-cubic, Gaussian, etc.) Method of search on surrogate model (math programming, heuristic, random search) Objective function used in the search (lowest surrogate surface value, maximize expected uncertainty, or weighted or constrained objective that considers both the surrogate model value and other criteria to promote global search.

27 Applications for Optimization
The following applications focus on the use of Stochastic RBF published in 2007 in INFORMS Jn of Computing and a modification for parallel computation published in INFORMS Jn of Computing in 2009. There are three subsurface applications discussed including bioremediation, groundwater remediation and carbon sequestration. The method is general and can be used on other applications (e.g. the watershed application discussed later in the talk).

28 Application to Groundwater Bioremediation at Cape Canaveral Field Site
This example requires solution of a highly nonlinear systems of partial differential equations and is a truly “computationally expensive function” requiring up to 2.5 hours for one evaluation. I show here application to a “hypothetical” example that has the same equations as the 2.5 hour simulation model, but a smaller grid since it was applied to many other algorithms The results for the full 2.5 hour model are given in the paper by Mugunthan, Shoemaker and Regis, Water Resource Research, 2005.

29 Why Does Simulation Take So Long?
Partial differential equation has 22 species at each node in finite difference model and a total of 1448 time steps. There are in 4800 nodes 3 dimensions. The total number of unknowns over all time steps in this nonlinear system is about 153 million. The biological reactions are very nonlinear (biggest factor) Many groundwater problems are larger than this.

30 Algorithms Used for Comparison of Optimization Performance on Calibration
Stochastic Greedy Algorithm Neighborhood defined to make search global Neighbors generated from triangular distribution around current solution. Moves only to a better solution. Evolutionary Algorithms Derandomized evolution strategy DES with lambda = 10 and b1 = 1/n and b2 = 1/n0.5 (Ostermeier et al. 1992) Binary or Real Genetic algorithm GA, population size 10, one point cross-over, mutation probability 0.1, crossover probability 1 RBF Surrogate Approximation Algorithms RBF Gutmann- radial basis function approach, with cycle length five, SLH space filling design Global Stochastic RBF-Cornell radial basis function approach FMINCON derivative based optimizer in Matlab with numerical derivatives 10 trials of 100 function evaluations were performed for heuristic and Surrogate Approximation algorithms for comparison

31 Comparison of Algorithm Performance on Hypothetical Aquifer – CNS
Experimental Design for RBF algorithms gets over at 28 Lowest curve is best—note log scale

32 Mean and Standard deviation of best solution produced after 100 function evaluations – Hypothetical Example ours Algorithm with a lowest mean and lowest standard deviation is desirable Based on 10 trials

33 Conclusions on Bioremediation Example
Our Surrogate Approximation algorithm generally outperformed the alternative algorithms considered. This performance was based on a limited number of function evaluations. The Surrogate Approximation algorithm was robust in that it had very few bad results of 10 trials.

34 Types of Global Optimization Methods (all of which are used in later comparisons)
Heuristics (genetic algorithms, simulated annealing, particle swarm, etc.) Response surface methods (that utilize mathematical analysis associated with approximation of continuous functions by other continuous functions) Coupling a local optimization method with a “multi-start method” that does another local search with a new starting point each time the local search finds a local minimum.

35 Stochastic RBF (our algorithm) Results On Many Global Optimization Test Problems (2007)
We solved 17 global optimization test problems and one small groundwater problem. All methods repeated in 30 trials and values shown are the means over these 30 trials. (Local) Stochastic RBF is the best among the eight algorithms considered on higher dimensional problems and is as good or better on lower dimensional test problems. (Regis and Shoemaker, INFORMS Jn of Computing, 2007)

36 14 Dimensional Schoen 14.100 Global Test Function
Ours (LMSRBF) From INFORMS JOC, Regis and Shoemaker, 2007

37 Conclusions on our Surrogate Approximation Optimization with RBF from INFORMS JOC 2007 paper
Our local metric stochastic RBF outperformed other algorithms for dimensions (6 to 14) when looking at the suite of global optimization test problems with limited numbers of simulations. We did not consider dimensions over 14 Contrary to popular belief, using multistart with an excellent local optimizer is not the most effective method for global optimization If the number of simulations are not limited, other algorithms than a surrogate like Stochastic RBF might find a more accurate solution.

38 Current Research: Groundwater Remediation Site (Umatilla) History
19,728-acre military reservation in Oregon Established in 1941 From 1950’s until 1965, depot operated an onsite explosives washout plant Wash water disposed in two unlined lagoons Estimated 85 million gallons of water was discharged during the 15 year stretch Two contaminants RDX (Hexahydro-1,3,5-trinitro-1,3,5-triazine) TNT(2,4,6-Trinitrotoluene)

39 Results (Umatilla) : This plot shows that Stochastic RBF is most efficient of the tested algorithms (lowest curve) (From copyrighted PhD thesis of my student Amandeep Singh, 2011) Convergence Plot (note log scale)

40 Conclusions for comparison of Algorithms on Umatilla Groundwater Remediation Problem
Response Surface Algorithm (Stochastic RBF) was more efficient than heuristic and non-linear optimization methods for an expensive remediation model. Response Surface Algorithm produced good solutions with lowest mean and least spread on Umatilla. Response Surface based methods were robust to the two different functions Umatilla and Blaine. Recommend the use of Response Surface based methods as an optimization tool for computationally expensive groundwater models.

41 Antoine Espinet & Christine Shoemaker
CO2 Plume Estimation by Automatic Calibration of TOUGH2 Models for Carbon Sequestration in Geological Formations (current research) Antoine Espinet & Christine Shoemaker Cornell University Christine Doughty DOE Berkeley Lab

42 Current Research: CO2 Plume Estimation by Automatic Calibration of TOUGH2 Models for Carbon Sequestration in Geological Formations Carbon sequestration: storage of super critical carbon dioxide in geological formations

43 Example Based on Frio Field Site
We generated multiple realizations that were modeled on a DOE Field Site for Carbon Sequestration. The parameter to be estimated was permeability.

44 From PHD thesis in progress for my student
Antoine Espinet

45 Optimization to Estimate CO2 Plume by Estimating Spatial Distribution of Permeabilities.
Data locations are sparse because of expense of drilling wells over 1000 m deep. Insufficient data to estimate all 60 permeabilities Reduction of parameters to 7 parameters gave good results. Fortunately knowing precise permeabilities for high permeability regions was not important.

46 46 Approximated Permeabilities with 7 Parameters
From PHD thesis in progress (2011) for my student Antoine Espinet 46

47 Results for Carbon Sequestration Problem (from PhD thesis in progress by my student Antoine Espinet)
1. Current forecast (based on pressure data up to 1.5 years from two wells [one inside plume and one outside plume plus injection well] has R2 = Future forecast (to 7 years into the future with no data beyond 1.5 years) has R2 = .85 These are excellent results for deep subsurface estimates. Results are only slightly better when CO2 saturation is also measured.

48 7-parameter, t=7.5 yr, (R2=0.85), layer 4
Based on simulation with true parameters Based on simulation with parameters estimated with limited data Based on simulation with parameters generated randomly

49 Comparison of Optimization Algorithms for Geological Carbon Sequestration
One simulation takes up to 3 hours 7 parameters, the permeabilities Heterogeneous, 3D geological model Averaged over 3 trials 120 evaluations Multiple Local minima so requires global optimization

50 Algorithm comparison: notice the log scale on the Y-axis Low curves are best so local algorithms did poorly The derivative based local algorithms Our surrogate algorithms Objective Function Number of simulations

51 Conclusions on Carbon Sequestration
Managing Risks for Carbon Sequestration requires solving a large scale nonlinear inverse problem in a reasonable amount of time, which we could do with our Stochastic RBF surrogate method and with our global optimization trust region RBF method GORBIT (Wild , PhD thesis, 2009), which uses our local RBF algorithm Orbit (Wild et SIAM Sci Comp & SIAM Jn of Optim., in press) The surrogate global optimization methods work much better than the local optimizers, presumably because the problem has multiple local minima. Future work: we are applying our uncertainty method SOARS to this GCS problem currently.

52 New Method Global Optimization Method
My postdoc Yilun Wang and I are now developing a new method that continues to use the RBF surrogate approximation but which changes the objective function and the way candidate points are selected. This method changes the procedure for randomly selecting perturbation size, and incorporates derivatives (cheap to obtain) on the RBF surface to generate candidate points. The results are very promising for higher dimension (up to 40 dimensional problems tested).

53 Keane Function (30 Dimensional)
Our earlier Stochastic RBF method Our new RBF method in blue(lowest curve) is best (from Y. Wang and C. Shoemaker, manuscript in draft, 2011)

54 Conclusions Stochastic RBF is a very effective global optimization method. Further improvements appear to be possible, especially for higher dimensions.

55 Future Work on Global Optimization
I just got positive notice about an NSF CISE award to develop an asynchronous parallel global optimization algorithm that incorporates some of the improvements above and some newer ideas to give faster results for high dimensional problems.1 1 We published an earlier paper on a synchronous parallel algorithm based on an extension of Stochastic RBF (Regis and Shoemaker, Informs Jn. of Computing, 2009)

56 Next Idea: Optimization + Response Surface in Uncertainty Quantification for Computationally Expensive Models Typically uncertainty analysis (e.g. Monte Carlo Sampling or MCMC) requires a huge number of samples (e.g. simulations) in the domain. This makes them infeasible for computationally expensive simulation models. I thought the use of optimization in the sampling process combined with response surfaces can greatly reduce the number of simulations and give accurate results.

57 National Science Foundation Project
SOARS: An Uncertainty Analysis Combining Statistical and Optimization Analysis and Response Surfaces for Computationally Expensive Models National Science Foundation Project (Jointly Funded by Mathematical Science (Statistics) and Geoscience Programs—now expired) C. Shoemaker and D. Ruppert- PIs Students and Postdocs: N. Blizniouk, R. Rommel, S. Wild, D. Cowen. P. Mugunthan, Yilun Wang, Amy Li (This presentation covers 2 papers from this project)

58 MCMC Markov Chain Monte Carlos
MCMC is a Bayesian and statistically rigorous method for estimating uncertainty. Our SOARS method reduced computational effort by a factor of 65 (e.g. 150 simulations rather than 10,000 simulations for conventional MCMC) and SOARS is also a statistically rigorous method without requiring any linear or unimodal assumptions.

59 Paper 1 on SOARS We have several papers we have on SOARS (one published, one in review , and two in draft). I will first discuss a statistics journal paper: Blizniouk, N., D. Ruppert, C.A. Shoemaker, R. G. Regis, S. Wild, P. Mugunthan, “Bayesian Calibration of Computationally Expensive Models Using Optimization and Radial Basis Surrogate Approximation.” Journal of Computational and Graphical Statistics, July 2008.

60 Our Objective Function
The objective is the likelihood function, which we want to maximize. The likelihood function is based on generalized least squares and includes the basic model parameters as well as parameters for transformations to convert non normal random variables into normal. We want our method to be able to address multi modal functions.

61 Role of Optimization We want to approximate the objective function (e.g. the likelihood function) in parameter space. We use a derivative free optimization search to find the global maxima (MLE) and important local maxima. We do additional evaluations around local maxima to get a more accurate description of the function. We build an RBF Surrogate Approximation of the objective function based on this information

62 Find Maximum Likelihood and Other Modes in Likelihood Functions (picture is one dimensional example)
High probability region Objective (Likelihood) Low probability region Parameter value

63 Overview of Methodology
There are three steps: 1.Use optimization to search for the maximum likelihood and construct Surrogate Approximation (with radial basis functions) of the likelihood function with the simulation results from this search. If likelihood is possibly multimodal then use surrogate global optimization method. 2. Locate the region(s) of high posterior density and do more simulations to get an improved (RBF) Surrogate Approximation of the likelihood function using both optimization and extra simulations. 3. Do MCMC and standard Bayesian analysis using the approximate posterior density-likelihood based on RBF. (This is the major step to reduce computational effort.)

64 Find Maximum Likelihood and Other Modes in Likelihood Functions (picture is one dimensional example)
High probability region Objective (Likelihood) Low probability region Parameter value

65 SOARS Computational Effort
Based on the “costly” function evaluations done in the optimization search and additional function evaluations around the local optimizers, we build a Surrogate Approximation of the likelihood function. (100’s of expensive simulations.) We then apply MCMC (Markov Chain Monte Carlo) to the Surrogate Approximation of the likelihood function This generates the joint posterior density function) of the parameter values (requires 10,000+ evaluations of the Inexpensive approximation function).

66 Paper 1 Application to Chemical Spill Problem
There is a chemical spill of mass M into a long narrow channel of water at both locations marked in red. The system is described by advection diffusion equation. We want to estimate the mass M, the time t and location L of the second spill, and the diffusion coefficent D, which are the model parameters. The output from the model we want is average pollutant concentration over time at the end of the channel.

67 The equations describing the concentration C as a function of location s and time t and the parameters M,D,L, and tau are: Because this equation can be solved analytically, we can afford to do the 10,000 or more model evaluations necessary for conventional MCMC and compare it to our SOARS method which requires far fewer model evaluations. General goal is for non-analytical models.

68 Joint Posterior Density
Y0 - vector of observed data; η – parameter vector in the (joint) statistical model (the final solution is called β) [list1|list2] - conditional density of random variables in list1 given list2, e.g. is the conditional density of η given the data Y0. The term η is a vector of parameter values,

69 Kernel estimates of the marginal posterior densities by a) (solid line) exact joint posterior obtained from con- ventional MCMC Analysis with 10,000 function evaluations & b) (dashed lines) with our Surrogate Approximation method with 150 function evaluations. One graph for each parameter.

70 Significance of Results
To obtain the pdf by MCMC normally requires 10,000 or more”costly” function evaluations. These results indicated we were able to get good results with much less (by almost two orders of magnitude) computational effort by Doing some “costly” function evaluations (a few hundreds) and then fitting a Surrogate Approximation to the likelihood function and Evaluating 10,000 points on the inexpensive Surrogate Approximation surface with the MCMC

71 SOARS Paper 3 A third paper is not yet submitted:
Shoemaker, C.A., D. Cowen, N. Bliznyuk, D. Ruppert, J. Woodbury, X. Lin, “Application of SOARS for Uncertainty Analysis of Computationally Expensive Hydrologic Models with Application to Cannonsville Basin”

72 Problem is that phosphorous from surrounding
How Do We Protect This Water From Pollution? (New York City water supply) Problem is that phosphorous from surrounding Watershed can result in need for an $ 8 Billion Water Plant for NYC!

73 SWAT2000 Model Discretization of Cannonsville Watershed (1200 km2)
Using a spatially distributed model helps us evaluate management options. Watershed land Different than GWLF 43 subbasins 758 HRUs Avg HRU = 1.6 km2 Reservoir Lake

74 Rough Surface of Numerical Simulation Error Function Versus Parameter Value
Roughness and multiple local minima exist making solution difficult. ERROR Parameter 1 Parameter 14

75 Marginal posterior Densities of the parameters (βi) for
Computed by SOARS These are the marginal posterior densities for two of the parameters in the analysis of the SWAT model for the Cannonsville Watershed. The analysis considered a larger number of parameters. It is important to observe the asymmetry in the densities. It is obvious this is not a normally distributed density .

76 Uncertainty in Unmeasured Output during the Calibration Period
Preceding slide shows posterior joint densities of the parameters for the Cannonsville example, (which depend on data Yo during the calibration). The next slide shows quantiles of model output that is not directly measured (such as evapotranspiration or groundwater recharge) as shown in the following slide. This is only possible with a “process” model and is not possible with a polynomial regression model or neural net or genetic programming model for rainfall runoff.

77 Quantiles, Means, Standard Dev.
These are based on statistically rigorous analysis with transformations to account for non normal data obtained with a small fraction of the number of simulations required by other methods including MCMC and GLUE

78 What Have We Achieved? Applied modern statistical tools to calibration of environmental engineering models, including transformations. Implemented a Bayesian method of uncertainty analysis Substantially reduced the number of simulations (by one or two orders of magnitude) . This reduction makes the method feasible for computationally expensive models for which no other existing method is computationaly feasible.

79 Overall Conclusions Surrogate Response Surfaces in combination with global optimization and uncertainty quantification algorithms can enable us to do analysis on computationally expensive simulation models that would not otherwise be possible.

80 Some References

81 Additional References

82 End Contact Christine Shoemaker for information on open access for available codes.


Download ppt "Surrogate Response Surface Methods for Optimization,"

Similar presentations


Ads by Google