Function Approximation

Function Approximation
Function Approximation* Optimization for Computationally Expensive, Non Convex Functions Including Environmental Applications Christine Shoemaker Cornell University McMaster Optimization Conference, July 2004 Joint work with my Ph.D. students Rommel Regis and Pradeep Mugunthan

Applications of Optimization
Optimization in environmental and engineering problems can be used for designing the “best” (e.g. least expensive) solution to a problem. Optimization can also be used to solve the “inverse” problem, i.e. given observations, find the best values of calibration parameters to fit the observations. The numerical examples here are for calibration, but the method can be used for optimal design as well.

Optimization Our goal is to find the minimum of f(x) where x є D
We want to do very few evaluations of f(x) because it is “costly to evaluate. This can be a measure of error between model prediction and observations X can be parameter values

Function ApproximationMethods
A nonlinear function approximation R(x) is a continuous nonlinear multivariate approximation to f(x). R(x) has also been called a “response surface” or a “surrogate model”. We use radial basis functions for our response surface, but other methods for non-convex surfaces could also be used.

Why Use Function Approximation Methods?
A function approximation R(x) can be used as part of an efficient parallel optimization algorithm in order to reduce the number of points at which we evaluate f(x), and thereby significantly reduce computational cost. Our function approximation algorithm searches for the global minimum.

Goal: Derivative-Free Optimization of Costly, Black Box, Nonconvex Functions
For some complex functions derivatives are unavailable (because they are infeasible to compute or they cannot be accurately computed sufficiently quickly). Inaccurate derivatives lead to inefficient gradient-based optimization and some important functions are not differentiable. Our method does not require f’(x)

Goal: Derivative-Free Optimization of Costly, Black Box, Nonconvex Functions
Costly functions f(x) require a substantial amount of computation (minutes, hours, days) to evaluate the function once. Examples of costly functions include simulation models or codes that solve systems of nonlinear partial differential equation. Our method seeks to minimize number of costly function evaluations.

Goal: Derivative-free Optimization of Costly, Black Box, Nonconvex Functions
Gradient-based optimization methods stop at local minima instead of searching further for the global minimum. For black box functions, we don’t know if the function is nonconvex or convex. Our method is a good global optimization methods for black box and other nonconvex functions. The method can be easily connected to different simulation models (as are heuristic methods like GAs).

Global versus Local Minima
Many optimization methods only find one local minimum. We want a method that finds the global minimum. F(x) Local minimum Global minimum X (parameter value)

Experimental Design with Symmetric Latin Hypercube (SLHD)
To fit the first function approximation we need to have evaluated the function at several points. We use a symmetric Latin Hypercube (SLHD) to pick these initial points. The number of points we evaluate in the SLHD is (d+1)(d+2)/2, where d is the number of parameters (decision variables).

One Dimensional Example of Experimental Design
to Obtain Initial Function Approximation Objective Function f(x) measure of error Costly Function Evaluation (e.g. over .5 hour CPU time for one evaluation). x (parameter value-one dimensional example)

Function Approximation with Initial Points from Experimental Design
f(x) x (parameters) In real applications x is multidimensional since there are many parameters (e.g. 10).

Update in Function Approximation with New Evaluation
Update done in each iteration for function approximation for each algorithm expert. f(x) new x (parameter value) Function Approximation is a guess of the function value of f(x) for all x.

Outline of Function Approximation Optimization
1. Use a “space filling” experimental design (e.g. SLHD) to select a limited number of evaluation points. 2. Make an approximation of the function (with Radial Basis Function splines) based on experimental design points. 3. Use our algorithm to select next function point for evaluation based on FA and location of previously evaluated points 4. Construct a new function approximation that incorporates the newly evaluated point and all prior points. 5. Stop if reach maximum number of iterations. Otherwise go to Step 3

Outline of Function Approximation Optimization
1. Use a “space filling” experimental design (e.g. SLHD) to select a limited number of evaluation points. 2. Make an approximation of the function (with Radial Basis Function splines) based on experimental design points. 3. Use our algorithm to select next function point for evaluation based on FA and location of previously evaluated points. (We have developed different algorithms. This is a research topic) 4. Construct a new function approximation that incorporates the newly evaluated point and all prior points. 5. Stop if reach maximum number of iterations. Otherwise go to Step 3

Use of Derivatives We use the gradient-based methods only on the function approximations R(x) (for which accurate derivatives are inexpensive to compute). We do not try to compute gradients/derivatives for the underlying costly function f(x).

Our Radial Basis Function Derivative-Free Optimization Algorithm
Following graph illustrates the search procedure and the function approximation. We use a Radial Basis Functions RBF for the function approximation. The objective function in the following contour plots is a two dimensional global optimization test function.

Exploratory Radial Basis Function Method Applied on Goldstein-Price
X1 and X2 are two parameters, and contours indicate error function value TRUE Value Function Approximation x2 There are 3 other local min Evaluation point Global min x1 Points on right graph show the experimental design (SLHD) Best Value = (Global Min Value = 3) After 6 function evaluations (experimental design)

TRUE RESPONSE SURFACE Best Value = (Global Min Value = 3) After 20 function evaluations

RESPONSE SURFACE TRUE Best Value = (Global Min Value = 3) After 40 function evaluations

Our RBF Algorithm Our paper on RBF optimization algorithm has will appear soon in Jn. of Global Optimization . The following graphs show a related RBF method called “Our RBF” as well as an earlier RBF optimization suggested by Gutmann (2000) in Jn. of Global Optimization called “Gutmann RBF”.

How does our algorithm work?
One approach is pick the next point to satisfy an optimization problem based on both the function approximation value (which we want to be small in a local search) and the distance to the nearest previously evaluated points (which we want to be far for a global search). We cycle the allowable minimum distance so that some search is local and some is global. A proof shows this approach will eventually generate a point that is arbitrarily close to the global optimum.

Previously evaluated points
Variable 1 Minimum distance Potential new point for evaluation Variable 2 We would like to make the minimum distance large to explore the space.

Comparison of RBF Methods on a 14-dimensional Schoen Test Function (Average of 10 trials)
Objective Function Our RBF Number of Function Evaluations

Comparison of RBF Methods on a 12-dimensional Groundwater Aerobic Bioremediation Problem ( a PDE system) (Average of 10 trials) Objective Function Our RBF Number of Function Evaluations

The project is funded by the NSF Environmental Engineering Program.
The following results are from: NSF Project 1: Function Approximation Algorithms for Environment Analysis with Application to Bioremediation of Chlorinated Ethenes Title: “Improving Calibration, Sensitivity and Uncertainty Analysis of Data-Based Models of the Environment”, The project is funded by the NSF Environmental Engineering Program.

Now a real costly function: DECHLOR: Transport Model of Anaerobic Bioremediation of Chlorinated Ethene This model was originally developed by Willis and Shoemaker based on kinetics equations by Fennell and Gossett. This model will be our “costly” function in the optimization. Clean up of ground water is estimated to cost up to a trillion dollars (!), so optimization is well worth while.

Engineered Dechlorination by Injection of Hydrogen Donor and Extraction
Injected Donor promotes degradation by providing hydrogen. Forecasting effectiveness requires transport model that predicts movement of donor and species through space over time and effect of pumping and injection of donor. The simulation runs for 12 years. There are 3 stress periods of 4 years each. The injection and extraction rates are the same. Objective is computed based on the concentration at observation wells at the end of the simulation, plus pumping and material costs.

Dechlorination: PCE to ETH
Chlorinated Ethenes in groundwater (drinking water) Cl Cl Dechlorination 1 C C H Cl Cl Cl C C Tetrachloroethene (PCE) Dechlorination 2 Cl Cl Trichloroethene (TCE) H H C C Cl Cl Dechlorination 3 cis-1,2-Dichloroethene (cDCE) H H C C Dechlorination 4 Cl H H H Vinyl Chloride (VC) C C H H Ethene (ETH) All compounds except Ethene are very toxic

Complex model: 18 species at each of thousands of nodes of finite difference model
Butyrate Propionate H2 Acetate Lactate Lac2Ace PCE DCE TCE VC Ethene Dechlorinator Lac2Prop But2Ace Prop2Ace Methane Hyd2Meth Chlorinated.Ethenes Donors

Optimization of Calibration of Chlorinated Ethenes Model
The model involves solution of a system of partial differential equations by finite difference methods. There are 18 species variables at each node in the finite difference grid The equations are highly nonlinear because of biological reactions. Optimization applied to two cases: hypothetical-8 minutes/simulation field data application- 3 hours/simulation

Example of Objective Function for Optimization of Chlorinated Ethene Model
Observation Model where, SSE is the sum of squared errors between observed and simulated chlorinated ethenes is the observed molar concentration of species j at time t, location i is the simulated molar concentration of species j at time t, t = 1 to T represent time points at which measured data is available j = 1 to J represents PCE, TCE, DCE, VC and ethene in that order i = 1 to I is a set monitoring locations

Algorithms Used for Comparison of Optimization Performance on Calibration
Stochastic Greedy Algorithm Neighborhood defined to make search global Neighbors generated from triangular distribution around current solution. Moves only to a better solution. Evolutionary Algorithms Derandomized evolution strategy DES with lambda = 10 and b1 = 1/n and b2 = 1/n0.5 (Ostermeier et al. 1992) Binary or Real Genetic algorithm GA, population size 10, one point cross-over, mutation probability 0.1, crossover probability 1 RBF Function Approximation Algorithms RBF Gutmann- radial basis function approach, with cycle length five, SLH space filling design RBF-Cornell radial basis function approach. FMINCON derivative based optimizer in Matlab with numerical derivatives 10 trials of 100 function evaluations were performed for heuristic and function approximation algorithms for comparison

Lower curve is better ours
Average is based on 10 trials. The best possible value for –NS is – Experimental design evaluations done.

Boxplot comparing best objective value (CNS) produced by the algorithms in each trial over 10 trials
outlier average ours

Conclusions Optimizing costly functions is typically done only once.
The purpose for our examination of multiple trials is to examine how well one is likely to do if you do solve the problem only once. Hence we want the method that has both the smallest Mean objective function value and the smallest Variance. Our RBF has both the smallest Mean and the smallest Variance. The second best method is Gutmann RBF, so RBF methods seem very good in general.

Empirical Cummulative Distribution Plot – CNS Objective
ours Curves to the left are best and stochastically dominate (higher probability of lower objective function value)’

Conclusions Optimizing costly functions is typically done only once.
The purpose for our examination of multiple trials is to examine how well one is likely to do if you do solve the problem only once. Hence we want the method that has both the smallest Mean objective function value and the smallest Variance. Our RBF has both the smallest Mean and the smallest Variance. The second best method is Gutmann RBF, so RBF methods seem very good in general.

Alameda Field Data The next step was to work with a real field site.
We obtained data from a DOD field site studied by a group (including Alleman, Morse, Gossett, and Fennell). Running the simulation model takes about three hours for one run of the chlorinated ethene model at this site because of the nonlinearities in the kinetics equations.

Site Layout

Numerical Set-up for Simulation
Finite difference grid with over 5300 nodes 18 species and about 6000(?) time steps equals almost 8 billion variables Each simulation takes between 2-3 hours Optimization attempted for calibration of 8 parameters (6 biokinetic and 2 biological) h = 21’ (6.4 m) flow No flow No flow 65 ft (22m) Numerical set-up for simulation

Range of objective values for SSE objective function at Alameda field site - Mean, min and max are shown for each algorithm gradient ours

Conclusions on RBF Optimization of Calibration
Radial Basis Function Approximation Methods can be used effectively to find optimal solutions of costly functions. “Our RBF” performed substantially better than the previous RBF method by Gutmann on the difficult chlorinated ethene remediation problem, especially because our RBF is robust (small variance). Both Genetic algorithms and derivative-based search did very poorly. The two RBF methods did much better on the Alameda field data problem than other methods.

However,300 hours is a long time to wait! Solution: Parallel Algorithms
We would like to be able to speed up calculations for costly functions by using parallel computers. To get a good speed up on a parallel computer, you need an algorithm that parallelizes efficiently. We are developing such an algorithm.

New Project 2: Parallel Optimization Algorithms
Funded by the Computer Science (CISE) Directorate at NSF The method is general and can be used for a wide range of problems including other engineering systems in addition to environmental systems. This research is underway.

Development of Parallel Algorithms
This proposal focuses on developing function approximation algorithms that will be very efficient when done in parallel. One can re-code any optimization algorithm to run in parallel, but it might not be very efficient (e.g. runs only twice as fast with 10 processors as with one processor).

MAPO: Multi-Algorithm Parallel Optimization
MAPO is an optimization algorithm we are developing that is designed to find with parallel computation good solutions with relatively few function evaluations. The basic idea is that in each iteration there is a “committee” of algorithm experts, each of whom recommend several candidate points. .

MAPO is an optimization algorithm we are developing that is designed to find with parallel computation good solutions with relatively few function evaluations. The basic idea is that in each iteration there is a “committee” of algorithm experts, each of whom recommend several candidate points. All candidate points are compared and P are selected for evaluation (P=no. of processors) .

MAPO is an optimization algorithm we are developing that is designed to find with parallel computation good solutions with relatively few function evaluations. The basic idea is that in each iteration there is a “committee” of algorithm experts, each of whom recommend several candidate points. All candidate points are compared and P are selected for evaluation (P=no. of processors) Results of evaluations for f(x) values from all P processors are available for all algorithms to build their response surfaces in the next generation.

Evaluation Points in MAPO
Candidate point not evaluated Previously evaluated points Variable 1 Minimum distance new point for evaluation Variable 2 For P parallel processors we want to evaluate P points at once. P=4 in picture

MAPO Algorithm is Designed to be Coarse-Grained
Serial: Use Multiple Algorithms with Function Approximation to Generate N candidate points for Costly Function Evaluation Serial: Use multiple criteria and machine learning methods to select the P Points at which Costly Function Evaluation is actually done Parallel: do the P costly function evaluations (usually one function evaluation per processor on M processors) Serial: Use the values of the new Function Evaluations to update the Function approximations QUICK SLOW QUICK One iteration loops through all these steps.

Test Functions in Numerical Results
The following results are based application of function approximation methods for global optimization to four test functions, which are designed to mimic a typical global optimization problem.

Algorithms Used in Numerical Examples
In MAPO, we used four algorithm experts. Each algorithm expert used a greedy search on Radial Basis Functions (RBF) plus a linear polynomial for function approximation. The form of the RBF function approximation was different (cubic, thinplate, multquadric, Gaussian) for each expert.

Alternative Parallel Algorithms
The 8 alternative parallel algorithms were parallel versions of each of the 4 individual algorithm experts, parallelized in one of two ways: no communication (Run the algorithm separately on p processors without communication and stop when one of the processors finds a solution within 1% of optimum.) with communication (Run the algorithm separately on p processors and communicate all function evaluation values after each iteration so that improved function approximations can be made. Same stopping criterion).

Wall Clock Time(p) = Ep*(τ+c + A(p))
Estimating Wall Clock Time with p Processors for f(x) Evaluations Requiring τ CPU minutes Wall Clock Time(p) = Ep*(τ+c + A(p)) Ep= number of iterations (=number of function evaluations/ p) p= number of processors τ= time required for one costly function evaluation c=communication time per iteration (negligible for large τ) A(p)=serial calculations that do not depend on the evaluation time τ (e.g. doing the optimization search on function approximations and selecting evaluation points from the candidate points)

Importance of Ep Wall Clock Time(p) = Ep (τ +C+ A(p))
Ep= number of iterations (=number of function evaluations/ p). Best if value is low. p= number of processors τ= time required for one simulation C= communication time A(p) is on the order of a few minutes. C is very small. τ is assumed to be at least thirty minutes and Ep is much greater than 1. Hence Ep is important! . The following graphs give Ep for alternative algorithms.

Test Function Schoen4-10 Alternative parallel algorithms Ep No Communication With Communication MAPO Best results are low values

Test Function Schoen4-10-Average of 10 runs
No Convergence at 80 Ep MAPO Best results are low values

Test Function Hartman 6 no convergence MAPO

Comparison on Four Test Problems
NO CONVERGENCE Test Problem MAPO is only method that converges for all runs on all 4 problems.

Discussion of Numerical Results: Robustness of MAPO
MAPO is the only algorithm that always finds a solution within 1% of the optimum for all test problems. MAPO was designed to be robust because it uses multiple algorithms.

Discussion of Numerical Results: Computational Speed of MAPO
MAPO gives the fastest results for large τ (e.g. lowest E p ) of all the methods tried here on 8 processors, because it requires fewer function evaluations to find a good solution.

General Applications of MAPO
In these numerical examples we used four experts that were all RBF greedy methods that differed only in the functional form of the RBF.

Examples of Algorithms That Can Be Used in MAPO
FA-greedy search with Radial Basis Functions RBF (find the local minimums of the RBF function approximation FA surface), which uses gradients of the FA. Gutmann’s RBF (2001) algorithm, which seeks to minimize the “bumpiness” of an FA that passes through a target minimum point. Our own new FA algorithms, which combine a search for minimum with selecting new points that improve the FA. Kriging Neural Nets Other Splines for Function Approximation.

Combining Parallel Optimization and Parallel Simulation
Especially when combined with parallelized costly function evaluation, parallel optimization can lead to efficient use of a large number of processors.

Multiplicative Benefits of Parallelizing Optimization
Assume the simulation alone runs efficiently for S processors and above S, efficiency diminshes. Assume optimization can run efficiently alone with M processor. Combining both optimization and simulation permits efficient use of SM processors. For example, if S=20, M=8, then the problem can be efficiently solved on 160 processors.

Purpose of Generating More Candidate Points Than Processors
The MAPO algorithm uses multiple algorithms to generate a large number of candidate points . Only a subset of these candidate points is selected for actual costly function evaluation. The evaluation points are chosen from the candidate points based on a variety of criteria. In this way, the points selected for costly function evaluation have “competed” to be most likely to be helpful in finding the minimum and/or improving the function approximation.

Future Work Extend the current results to more processors to examine speed-ups. Incorporate more types of algorithm experts. A greater diversity of experts will probably generate even better results for MAPO. Apply MAPO method to additional “real” problems that actually do require long computation times for each simulation.

Possible Additional Directions for Costly Function Optimization
Objective functions that have a more complex statistical structure. For example, do calibration assuming parameters are described by probablility distributions with correlations between parameters. Goal is to quantify uncertainty associated with model predictions. (This is a topic of an expected new NSF project done in collaboration with a statistician.)

General Summary Function Approximation Optimization including the new algorithms shows promise for optimization analysis of costly, nonlinear models with multiple local minima. MAPO shows promise as an efficient parallel algorithm for costly functions.

Criteria for Selecting Evaluation Points from Candidate Points
Criteria include the value of the current function approximations at the candidate point, the distance from other points evaluated, and disagreement among algorithm experts.

Function Approximation

Similar presentations

Presentation on theme: "Function Approximation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Function Approximation

Similar presentations

Presentation on theme: "Function Approximation"— Presentation transcript:

Similar presentations

About project

Feedback