Stochastic Trust Region Gradient- Free Method (STRONG) -A Response-Surface-Based Algorithm in Stochastic Optimization via Simulation Kuo-Hao Chang Advisor:

Slides:

Advertisements

Similar presentations

Using random models in derivative free optimization

Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Control Structure Selection for a Methanol Plant using Hysys/Unisim

Experimental Design, Response Surface Analysis, and Optimization

Optimization 吳育德.

CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.

Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.

CHAPTER 2 D IRECT M ETHODS FOR S TOCHASTIC S EARCH Organization of chapter in ISSO –Introductory material –Random search methods Attributes of random search.

Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.

Gizem ALAGÖZ. Simulation optimization has received considerable attention from both simulation researchers and practitioners. Both continuous and discrete.

Linear Discriminant Functions

Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov

Function Optimization Newton’s Method. Conjugate Gradients

Tutorial 12 Unconstrained optimization Conjugate gradients.

1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.

Engineering Optimization

Efficient Methodologies for Reliability Based Design Optimization

Development of Empirical Models From Process Data

Optimization Methods One-Dimensional Unconstrained Optimization

Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.

1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.

Function Optimization. Newton’s Method Conjugate Gradients Method

Advanced Topics in Optimization

Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin The Chinese.

Nonlinear Stochastic Programming by the Monte-Carlo method Lecture 4 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO.

UNCONSTRAINED MULTIVARIABLE

CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.

Introduction to Adaptive Digital Filters Algorithms

1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.

CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO* –Background Motivation Finite sample and asymptotic (continuous)

Artificial Neural Networks

CHAPTER 4 S TOCHASTIC A PPROXIMATION FOR R OOT F INDING IN N ONLINEAR M ODELS Organization of chapter in ISSO –Introduction and potpourri of examples Sample.

Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.

Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.

Engineering Statistics ENGR 592 Prepared by: Mariam El-Maghraby Date: 26/05/04 Design of Experiments Plackett-Burman Box-Behnken.

Module 1: Statistical Issues in Micro simulation Paul Sousa.

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

Benk Erika Kelemen Zsolt

Chapter 11Design & Analysis of Experiments 8E 2012 Montgomery 1.

ENM 503 Lesson 1 – Methods and Models The why’s, how’s, and what’s of mathematical modeling A model is a representation in mathematical terms of some real.

Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.

Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.

Monte-Carlo method for Two-Stage SLP Lecture 5 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working Group on Continuous.

CHAPTER 6 STOCHASTIC APPROXIMATION AND THE FINITE-DIFFERENCE METHOD

Vaida Bartkutė, Leonidas Sakalauskas

CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)

Local Search and Optimization Presented by Collin Kanaley.

Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.

Written by Changhyun, SON Chapter 5. Introduction to Design Optimization - 1 PART II Design Optimization.

Heuristic Methods for the Single- Machine Problem Chapter 4 Elements of Sequencing and Scheduling by Kenneth R. Baker Byung-Hyun Ha R2.

Chapter 2-OPTIMIZATION

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:

The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.

INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.

A Hybrid Optimization Approach for Automated Parameter Estimation Problems Carlos A. Quintero 1 Miguel Argáez 1, Hector Klie 2, Leticia Velázquez 1 and.

Ch. Eick: Num. Optimization with GAs Numerical Optimization General Framework: objective function f(x 1,...,x n ) to be minimized or maximized constraints:

Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.

Axially Variable Strength Control Rods for The Power Maneuvering of PWRs KIM, UNG-SOO Dept. of Nuclear and Quantum Engineering.

Goal We present a hybrid optimization approach for solving global optimization problems, in particular automated parameter estimation models. The hybrid.

Non-linear Minimization

C.-S. Shieh, EC, KUAS, Taiwan

Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.

Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall CHAPTER 15 SIMULATION-BASED OPTIMIZATION II: STOCHASTIC GRADIENT AND.

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Evolutionary Computational Intelligence

Stochastic Methods.

Presentation transcript:

Stochastic Trust Region Gradient- Free Method (STRONG) -A Response-Surface-Based Algorithm in Stochastic Optimization via Simulation Kuo-Hao Chang Advisor: Hong Wan School of Industrial Engineering, Purdue University Acknowledgement: The project was partially supported by grant from Naval Postgraduate School. Purdue University

2 Outline Background Problem Statement Literatures Review STRONG Preliminary Numerical Evaluations Future Research

3 Background Stochastic Optimization The minimization (or maximization) of a function in the presence of randomness Optimization via Simulation: No explicit form of the objective function (only observations from simulation), function evaluations are stochastic and usually computationally expensive. Applications Investment portfolio optimization, production planning, traffic control etc.

4 Problem Statement (I) Consider the unconstrained continuous minimization problem The response can only be observed by : randomness defined in the probability space : the noisy term showing dependence on x

5 Problem Statement (II) Given: a simulation oracle of capable generating s.t. Strong Law of Large Numbers hold for every Find: a local minimizer, i.e., find having a neighborhood such that every satisfies

6 Problem Assumptions For For the underlying function 1. is bounded below and twice differentiable for every 2.

7 Literatures Review (Fu, 1994; Fu 2002) MethodologyEfficientConvergentAutomated Stochastic Approximation Usually No YesHuman tuning Sample-Path Optimization Usually Yes Response Surface Methodology (RSM) YesNo Other Heuristic Methods (e.g. Genetic Algorithm, Tabu Search etc.) YesUsually No Yes

8 Proposed Work A RSM-based method with convergence property (combining the trust region method for deterministic optimization with the RSM) Does not require human involvement Appropriate DOE to handle high-dimensional problems (on-going work)

9 Response Surface Methodology Stage I Employ a proper experimental design Fit a first-order model Perform a line search Move to a better solution Stage II (when close to the optimal solution) Employ a proper experimental design Fit a second-order model Find the optimal solution

10 RSM (Mongomery, 2001)

11 Deterministic Trust Region Framework (Conn et al. 2000) Suppose we want to minimize a deterministic objective function f(x) Step 0: Given an initial point,an initial trust-region radius, and some constants satisfy and set Step 1: Compute a step within the trust region that “sufficiently reduces” the local model constructed by Taylor expansion (to second-order) around Step 2: Compute if then define ; otherwise define Step 3: Increment k by 1 and go to step 1.

12 Trust Region Method

13 Comparison between RSM and TR Similarity  Build a local model to approximate the response function and use it to generate the search function Differences TR  Developed for deterministic optimization and has nice convergence property  Cannot handle the stochastic case. Require explicit objective function, gradient and Hessian matrix RSM  Can handle the stochastic case, has well-studied DOE techniques  Human involvement is required; no convergence property. Combine these two methods.

14 STRONG Stochastic TRust RegiON Gradient-Free Method “Gradient-Free”: No direct gradient measurement. Rather, the algorithm is based on an approximation to the gradient. (Spall, 2003; Fu, 2005) Combine RSM and TR Consists of two algorithms  Main algorithm: approach the optimal solution (major framework)  Sub_algorithm: obtain a satisfactory solution within the trust region

15 Stochastic Trust Region Use “response surface” model to replace Taylor’s expansion (deterministic model) (stochastic model) k: iteration counter Use to replace

16 STRONG- Main Algorithm

17 Trust Region & Sampling Region Trust Region, : radius of Trust Region iteration k Sampling Region, : radius of Sampling Region in iteration k Initial and are determined by users in the initialization stage ( ); Later shrink/expand by the same ratio automatically

18 Select Appropriate DOE For constructing first and second order model in stage I and stage II. Currently require orthogonality for the second- order model to guarantee the consistency of gradient estimation.

19 Estimation Method in STRONG Given an appropriate design strategy and initial sample size for the center point Intercept estimator, here represents the observation at the point, is determined by the algorithm. Gradient and Hessian estimator Suppose we have n design points and the response values are, respectively. : Design Matrix, then

20 Decide the Moving Direction and Step Definition (Subproblem) Determine the new iterate solution is accepted or not if then the solution is rejected then the solution is accepted Definition (Reduction Test) for stage I Definition (Sufficient Reduction Test) for stage II

21 Three situations we cannot find a satisfactory solution The local approximation model is poor The step size is too large Sampling error of observed response for and

22 Solutions Shrink the trust region and sampling region Increase the replications of the center point Add more the design points Collect all the visited solutions within the trust region and increase the replication for each of them.

23 STRONG- Sub-algorithm (Trust Region)

24 Sub-algorithm (Sampling Region)

25 STRONG- Sub_algorithm

26 Implementation Issues Initial solution Scaling problems Experimental designs Variance reduction techniques Timing to employ the “sufficient reduction” test Stopping rules

27 Advantages of STRONG Allow unequal variances Have the potential of solving high-dimensional problems with efficient DOE It is automated Local convergence property

28 Limitations of STRONG Computationally intensive if the problem is large-scaled Slow convergence speed if the variables are ill-scaled

29 Preliminary Numerical Evaluation (I) Rosenbrock test function The minimal solution locates at (1,1) and the minimal value of objective function is 0 Full factorial design for stage I and central composite design for stage II

30 The Performance of STRONG Case 1 Initial solution is (30, -30) Variance of noise is 10 Sample size for each design point is 2 # of observations

31 The performance of FDSA Case 2 Initial solution: (30, -30) Variance of noise: 10 Bound of Parameter: (100, -100) Number of observations (diverge) (diverge) (diverge)

32 The performance of FDSA-with good starting solution Case 3 Initial solution: (3,3) Variance of noise: 10 Bound of parameter: (0,5) # of observations

33 Future Research Large-Scale Problems  Design of experiment  Variance reduction technique Test Practical Problems Ill-Scale Problems  Iteratively different shape of trust region

34 Thanks! Questions?

35 Trust Region and Line Search

36 Hypothesis Testing Scheme Hypothesis testing cannot yield sufficient reduction can yield sufficient reduction Type I error is required to satisfy

37 Relevant Definitions in Sub_algorithm Reject-solution Set denotes the reject-solution set which collects all the visited solutions up to in the sub_algorithm and Simulation Allocation Rule (SAR) (Hong and Nelson, 2006) SAR guarantees that (additional observations allocated to x at iteration ) if x is a newly visited solution at iteration and for all visited solutions

38 Features of Sub_algorithm Trust Region and Sampling Region keep shrinking Sample size for center point is increasing Design points are accumulated The local model quality keeps improving Being more conservative in optimization step size ( ) Reduce the sampling error for each visited point in the set Intuitive explanation:

39 Significant Theorems in STRONG (Theorem 3.2.3) (Corollary 3) In the sub-algorithm, if (Theorem 3.2.4) For any initial point, the algorithm generates a sequence of iterates and

40 Some Problems with TR if it is applied in stochastic case TR is developed for deterministic optimization are available Bias in intercept and gradient estimation Ratio Inconsistent comparing basis Notice: In general,

41 General Properties of the Algorithm 1. a.s. 2. a.s. 3. If then therefore the algorithm won’t get stuck in a nonstationary point

42 Algorithm Assumptions For For the local approximation model

43 Literatures Review (I) Stochastic Approximation  Robbins-Monro (1951) algorithm-gradient based  Kiefer-Wolfowitz (1952) algorithm-use finite-difference method as the gradient estimate The basic form of stochastic approximation is the finite-difference gradient estimate,  Strength: Under proper conditions  Weakness: The gain sequence need to get tuned manually Suffers from slow convergence in some (Andradottir, 1998) When the objective function grows faster than quadratically, it will fail to converge. (Andradottir, 1998)

44 Literatures Review (II) Response Surface Methodology  Proposed by Box and Wilson (1951)  A sequential experimental procedure to determine the best input combination so as to maximize the output or yield rate.  Strength: A general procedure Power statistical tools such as design of experiment, regression analysis and ANOVA are all at its disposal (Fu, 1994)  Weakness: No convergence guarantee Human involvements needed

45 Literatures Review (III) Other heuristic methods  Genetic Algorithm  Evolutionary Strategies  Simulated Annealing  Tabu Search  Neld and Mead’s Simplex Search Strengths  “Usually” can obtain a satisfactory solution Weakness  No general convergence theory

46 Literatures Review (Fu, 1994; Fu 2002) StrengthWeakness Stochastic Approximation Various gradient estimation methods. Converge under proper conditions Converge slowly when the objective function is flat. Fail to converge when the objective function is steep. Sometimes need to get the gain sequence tuned manually. Only use the gradient information. Sample-Path Optimization Easy to extend it to situations where the objective function cannot be evaluated analytically Need excessive function evaluations Response Surface Methodology (RSM) Systematic and sequential procedure Efficient and effective Well-studied statistical tools as back-up No convergence guarantee Not automated Heuristic Methods Usually can obtain a satisfactory solution Simple and efficient No general convergence guarantee