CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CHAPTER 13 M ODELING C ONSIDERATIONS AND S TATISTICAL I NFORMATION “All models are wrong; some are useful.”  George E. P. Box Organization of chapter.
Pattern Recognition and Machine Learning
Kriging.
CHAPTER 3 CHAPTER 3 R ECURSIVE E STIMATION FOR L INEAR M ODELS Organization of chapter in ISSO –Linear models Relationship between least-squares and mean-square.
Use of Kalman filters in time and frequency analysis John Davis 1st May 2011.
Experimental Design, Response Surface Analysis, and Optimization
P ROBABILITY T HEORY APPENDIX C P ROBABILITY T HEORY you can never know too much probability theory. If you are well grounded in probability theory, you.
CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.
Chapter 4: Linear Models for Classification
11.1 Introduction to Response Surface Methodology
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Using process knowledge to identify uncontrolled variables and control variables as inputs for Process Improvement 1.
Maximum likelihood (ML) and likelihood ratio (LR) test
Maximum likelihood (ML)
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Gradient Methods May Preview Background Steepest Descent Conjugate Gradient.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Lecture 17 Today: Start Chapter 9 Next day: More of Chapter 9.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Maximum likelihood (ML)
Lecture II-2: Probability Review
Radial Basis Function Networks
Collaborative Filtering Matrix Factorization Approach
CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.
Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University
Particle Filtering in Network Tomography
CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO* –Background Motivation Finite sample and asymptotic (continuous)
CHAPTER 14 CHAPTER 14 S IMULATION - B ASED O PTIMIZATION I : R EGENERATION, C OMMON R ANDOM N UMBERS, AND R ELATED M ETHODS Organization of chapter in.
Biointelligence Laboratory, Seoul National University
Model Inference and Averaging
CHAPTER 4 S TOCHASTIC A PPROXIMATION FOR R OOT F INDING IN N ONLINEAR M ODELS Organization of chapter in ISSO –Introduction and potpourri of examples Sample.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Chapter 11Design & Analysis of Experiments 8E 2012 Montgomery 1.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Curve-Fitting Regression
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Linear Programming Erasmus Mobility Program (24Apr2012) Pollack Mihály Engineering Faculty (PMMK) University of Pécs João Miranda
Monte-Carlo method for Two-Stage SLP Lecture 5 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working Group on Continuous.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Introduction to Statistical Quality Control, 4th Edition Chapter 13 Process Optimization with Designed Experiments.
Computacion Inteligente Least-Square Methods for System Identification.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Estimating standard error using bootstrap
Chapter 7. Classification and Prediction
CHAPTER 3 RECURSIVE ESTIMATION FOR LINEAR MODELS
Collaborative Filtering Matrix Factorization Approach
Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall CHAPTER 15 SIMULATION-BASED OPTIMIZATION II: STOCHASTIC GRADIENT AND.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
CHAPTER 12 STATISTICAL METHODS FOR OPTIMIZATION IN DISCRETE PROBLEMS
Generally Discriminant Analysis
Parametric Methods Berlin Chen, 2005 References:
Uncertainty Propagation
Presentation transcript:

CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous) designs Precision matrix and D-optimality –Linear models Connections to D-optimality Key equivalence theorem –Response surface methods –Nonlinear models Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall

17-2 Optimal Design in Simulation Two roles for experimental design in simulation –Building approximation to existing large-scale simulation via “metamodel” –Building simulation model itself Metamodels are “curve fits” that approximate simulation input/output –Usual form is low-order polynomial in the inputs; linear in parameters  –Linear design theory useful Building simulation model –Typically need nonlinear design theory Some terminology distinctions: FactorsInputs –“Factors” (statistics term)  “Inputs” (modeling and simulation terms) LevelsValues –“Levels”  “Values” TreatmentsRuns –“Treatments”  “Runs”

17-3 Unique Advantages of Design in Simulation Simulation experiments may be considered special case of general experiments Some unique benefits occur due to simulation structure Can control factors not generally controllable (e.g., arrival rates into network) Direct repeatability due to deterministic nature of random number generators –Variance reduction (CRNs, etc.) may be helpful Not necessary to randomize runs to avoid systematic variation due to inherent conditions –E.g., randomization in run order and input levels in biological experiment to reduce effects of change in ambient humidity in laboratory –In simulation, systematic effects can be eliminated since analyst controls nature

17-4 Design of Computer Experiments in Statistics There exists significant activity among statisticians for experimental design based on computer experiments –T. J. Santner et al. (2003), The Design and Analysis of Computer Experiments, Springer-Verlag –J. Sacks et al (1989), “Design and Analysis of Computer Experiments (with discussion),” Statistical Science, 409–435 –Etc. Above statistical work differs from experimental design with Monte Carlo simulations –Above work assumes deterministic function evaluations via computer (e.g., solution to complicated ODE) One implication of deterministic function evaluations: no need to replicate experiments for given set of inputs Contrasts with Monte Carlo, where replication provides variance reduction

17-5 General Optimal Design Formulation (Simulation or Non-Simulation) Assume model z = h( ,  x) + v, where x is an input we are trying to pick optimally Experimental design  consists of N specific input values x =  i and proportions (weights) to these input values w i : Finite-sample design allocates n  N available measurements exactly; asymptotic (continuous) design allocates based on n  

17-6 D-Optimal Criterion Picking optimal design  requires criterion for optimization Most popular criterion is D-optimal measure Let M( ,  ) denote the “precision matrix” for an estimate of  based on a design  –M( ,  ) is inverse of covariance matrix for estimate and/or –M( ,  ) is Fisher information matrix for estimate D-optimal solution is

17-7 Equivalence Theorem Consider linear model Prediction based on parameter estimate and “future” measurement vector h T is Kiefer-Wolfowitz equivalence theorem states: D-optimal solution for determining  to be used in forming is the same  that minimizes the maximum variance of predictor Useful in practical determination of optimal 

17-8 Variance Function as it Depends on Input: Optimal Asymptotic Design for Example 17.6 in ISSO

17-9 Orthogonal Designs With linear models, usually more than one solution is D-optimal Orthogonality is means of reducing number of solutions Orthogonality also introduces desirable secondary properties –Separates effects of input factors (avoids “aliasing”) –Makes estimates for elements of  uncorrelated Orthogonal designs are not generally D-optimal; D-optimal designs are not generally orthogonal some –However, some designs are both Classical factorial (“cubic”) designs are orthogonal (and often D-optimal)

17-10 Example Orthogonal Designs, r = 2 Factors x k2 x k1 Cube (2 r design) x k2 x k1 Star (2r design)

17-11 Example Orthogonal Designs, r = 3 Factors Star (2r design) xk1xk1 xk2xk2 xk3xk3 Cube (2 r design) xk2xk2 xk1xk1 xk3xk3

17-12 Response Surface Methodology (RSM) Suppose want to determine inputs x that minimize the mean response z of some process (E(z)) –There are also other (nonoptimization) uses for RSM RSM can be used to build local models with the aim of finding the optimal x –Based on building a sequence of local models as one moves through factor (x) space Each response surface is typically a simple regression polynomial Experimental design can be used to determine input values for building response surfaces

17-13 Steps of RSM for Optimizing x Step 0 (Initialization) Step 0 (Initialization) Initial guess at optimal value of x. Step 1 (Collect data) Step 1 (Collect data) Collect responses z from several x values in neighborhood of current estimate of best x value (can use experimental design). Step 2 (Fit model) Step 2 (Fit model) From the x, z pairs in step 1, fit regression model in region around current best estimate of optimal x. Step 3 (Identify steepest descent path) Step 3 (Identify steepest descent path) Based on response surface in step 2, estimate path of steepest descent in factor space. Step 4 (Follow steepest descent path) Step 4 (Follow steepest descent path) Perform series of experiments at x values along path of steepest descent until no additional improvement in z response is obtained. This x value represents new estimate of best vector of factor levels. Step 5 (Stop or return) Step 5 (Stop or return) Go to step 1 and repeat process until final best factor level is obtained.

17-14 Conceptual Illustration of RSM for Two Variables in x; Shows More Refined Experimental Design Near Solution Adapted from: Montgomery (2001), Design and Analysis of Experiments, Fig. 11-3

17-15 Nonlinear Design Assume model z = h( ,  x) + v, where  enters nonlinearly D-optimality remains dominant measure –Maximization of determinant of Fisher information matrix (from Chapter 13 of ISSO: F n ( , x) is Fisher information matrix based on n data points) Fundamental distinction from linear case is that D- optimal criterion depends on  Leads to conundrum: Choosing x to best estimate , yet need to know  to determine x

17-16 Strategies for Coping with Dependence on  Assume nominal value of  and develop an optimal design based on this fixed value Sequential design strategy based on an iterated design and model fitting process. Bayesian strategy where a prior distribution is assigned to , reflecting uncertainty in the knowledge of the true value of .

17-17 Sequential Approach for Parameter Estimation and Optimal Design Step 0 (Initialization) Step 0 (Initialization) Make initial guess at , Allocate n 0 measurements to initial design. Set k = 0 and n = 0. Step 1 (D-optimal maximization) Step 1 (D-optimal maximization) Given X n, choose the n k inputs in X = to maximize Step 2 (Update  estimate) Step 2 (Update  estimate) Collect n k measurements based on inputs from step 1. Use measurements to update from to Step 3 (Stop or return) Step 3 (Stop or return) Stop if the value of  in step 2 is satisfactory. Else return to step 1 with the new k set to the former k + 1 and the new n set to the former n + n k (updated X n now includes inputs from step 1).

17-18 Comments on Sequential Design Note two optimization problems being solved: one for , one for  Determine next n k input values (step 1) conditioned on current value of  –Each step analogous to nonlinear design with fixed (nominal) value of  “Full sequential” mode (n k = 1) updates  based on each new input  ouput pair (x k, z k ) Can use stochastic approximation to update  : where

17-19 Bayesian Design Strategy Assume prior distribution (density) for , p(  ), reflecting uncertainty in the knowledge of the true value of . There exist multiple versions of D-optimal criterion One possible D-optimal criterion: Above criterion related to Shannon information While log transform makes no difference with fixed , it does affect integral-based solution. To simplify integral, may be useful to choose discrete prior p(  )