Computational Stochastic Optimization:

Slides:



Advertisements
Similar presentations
Dialogue Policy Optimisation
Advertisements

Modeling and Simulation By Lecturer: Nada Ahmed. Introduction to simulation and Modeling.
IEOR 4004: Introduction to Operations Research Deterministic Models January 22, 2014.
Markov Decision Process
Partially Observable Markov Decision Process (POMDP)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
Questions?. Setting a reward function, with and without subgoals Difference between agent and environment AI for games, Roomba Markov Property – Broken.
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
An Introduction to Markov Decision Processes Sarah Hickmott
Planning under Uncertainty
Slide 1 Harnessing Wind in China: Controlling Variability through Location and Regulation DIMACS Workshop: U.S.-China Collaborations in Computer Science.
© 2003 Warren B. Powell Slide 1 Approximate Dynamic Programming for High Dimensional Resource Allocation NSF Electric Power workshop November 3, 2003 Warren.
© 2004 Warren B. Powell Slide 1 Outline A car distribution problem.
Approximate Dynamic Programming for High-Dimensional Asset Allocation Ohio State April 16, 2004 Warren Powell CASTLE Laboratory Princeton University
An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell
Planning in MDPs S&B: Sec 3.6; Ch. 4. Administrivia Reminder: Final project proposal due this Friday If you haven’t talked to me yet, you still have the.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
7. Experiments 6. Theoretical Guarantees Let the local policy improvement algorithm be policy gradient. Notes: These assumptions are insufficient to give.
Discretization Pieter Abbeel UC Berkeley EECS
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
Planning to learn. Progress report Last time: Transition functions & stochastic outcomes Markov chains MDPs defined Today: Exercise completed Value functions.
D Nagesh Kumar, IIScOptimization Methods: M5L4 1 Dynamic Programming Other Topics.
More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.
Slide 1 © 2008 Warren B. Powell Slide 1 Approximate Dynamic Programming for High-Dimensional Problems in Energy Modeling Ohio St. University October 7,
Reinforcement Learning Yishay Mansour Tel-Aviv University.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Wavelets Series Used to Solve Dynamic Optimization Problems Lizandro S. Santos, Argimiro R. Secchi, Evaristo. C. Biscaia Jr. Programa de Engenharia Química/COPPE,
Polyhedral Risk Measures Vadym Omelchenko, Institute of Information Theory and Automation, Academy of Sciences of the Czech Republic.
MAKING COMPLEX DEClSlONS
Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state.
Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences The knowledge gradient December 10, 2014 Warren B. Powell Kris Reyes Si Chen Princeton University.
Optimal Nonlinear Neural Network Controllers for Aircraft Joint University Program Meeting October 10, 2001 Nilesh V. Kulkarni Advisors Prof. Minh Q. Phan.
An Overview of Dynamic Programming Seminar Series Joe Hartman ISE October 14, 2004.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Presenter: Chih-Yuan Chou GA-BASED ALGORITHMS FOR FINDING EQUILIBRIUM 1.
Chapter 4 MODELING AND ANALYSIS. Model component Data component provides input data User interface displays solution It is the model component of a DSS.
5.3 Geometric Introduction to the Simplex Method The geometric method of the previous section is limited in that it is only useful for problems involving.
1 S ystems Analysis Laboratory Helsinki University of Technology Flight Time Allocation Using Reinforcement Learning Ville Mattila and Kai Virtanen Systems.
Bekkjarvik, A Heuristic Solution Method for a Stochastic Vehicle Routing Problem Lars Magnus Hvattum.
Method of Hooke and Jeeves
© 2007 Warren B. Powell Slide 1 The Dynamic Energy Resource Model Lawrence Livermore National Laboratories September 24, 2007 Warren Powell Alan Lamont.
Approximate Dynamic Programming and Policy Search: Does anything work? Rutgers Applied Probability Workshop June 6, 2014 Warren B. Powell Daniel R. Jiang.
ECE 466/658: Performance Evaluation and Simulation Introduction Instructor: Christos Panayiotou.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.
Outline The role of information What is information? Different types of information Controlling information.
CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)
Dynamic Programming Discrete time frame Multi-stage decision problem Solves backwards.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
Presented by: Dardan Xhymshiti Spring 2016:. Authors: Publication:  ICDM 2015 Type:  Research Paper 2 Michael ShekelyamGregor JosseMatthias Schubert.
DEPARTMENT/SEMESTER ME VII Sem COURSE NAME Operation Research Manav Rachna College of Engg.
1 Nonlinear Sub-optimal Mid Course Guidance with Desired Alinement using MPQC P. N. Dwivedi, Dr. A.Bhattacharya, Scientist, DRDO, Hyderabad-,INDIA Dr.
Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.
Making complex decisions
Optimization Techniques for Natural Resources SEFS 540 / ESRM 490 B
Clearing the Jungle of Stochastic Optimization
CS 188: Artificial Intelligence
Hidden Markov Models Part 2: Algorithms
Collaborative Filtering Matrix Factorization Approach
Approximate Dynamic Programming for
CS 188: Artificial Intelligence Fall 2007
Markov Decision Problems
Reinforcement Learning Dealing with Partial Observability
Markov Decision Processes
Markov Decision Processes
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Discrete Optimization
Optimization under Uncertainty
Presentation transcript:

Computational Stochastic Optimization: Modeling October 25, 2012 Warren Powell CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu © 2012 Warren B. Powell, Princeton University © 2012 Warren B. Powell

Outline Overview and major problem classes How to model a sequential decision problem Steps in the modeling process Examples (underdevelopment) © 2012 Warren B. Powell

Problem classes Where to send a plane: Action: Where to send the plane to accomplish a goal. Noise: demands on the system, equipment failures. © 2012 Warren B. Powell

Problem classes How to land a plane: Control: angle, velocity, acceleration, pitch, yaw… Noise: wind, measurement © 2012 Warren B. Powell

Problem classes How to manage a fleet of planes Decision: Which plane to assign to each customer. Noise: demands on the system, equipment failures. © 2012 Warren B. Powell

Problem classes These three problems illustrate three very different applications: Managing a single entity, which can be represented with a discrete action, typical of computer science. Controlling a piece of machinery, which we model with a multi-dimensional (but low dimensional) control vector. Managing large fleets of vehicles with high dimensional vectors (but exploiting convexity). All three of these can be “modeled” using Bellman’s equation. Mathematically they look the same, but computationally they are very different. © 2012 Warren B. Powell

Problem classes Dimensions of our problem Decisions Information stages Discrete actions Multidimensional controls (without convexity) High dimensional vectors (with convexity) Information stages Single, deterministic decisions (or parameters), after which random information is revealed to compute the cost. Two-stage with recourse Make decision, see information, make one more decision Fully sequential (multistage) Decision, information, decision, information, decision, … The objective function Min/max expectation Dynamic risk measures Robust optimization © 2012 Warren B. Powell

Problem classes Our presentation focuses on sequential (also known as multistage) control problems. We consider problems which involve sequences of decision, information, decision, information, … There are important applications in stochastic optimization which belong to the first two classes of problems: Decision/information Decision/information/decision We will also focus on problems which use an expectation for the objective function. There are many problems where risk is a major issue. We take the position that the objective function is part of the model. © 2012 Warren B. Powell

Deterministic modeling For deterministic problems, we speak the language of mathematical programming For static problems For time-staged problems Arguably Dantzig’s biggest contribution, more so than the simplex algorithm, was his articulation of optimization problems in a standard format, which has given algorithmic researchers a common language. © 2012 Warren B. Powell

Modeling as a Markov decision process For stochastic problems, many people model the problem using Bellman’s equation where This is the canonical form of a dynamic program building on Bellman’s seminal research. Simple, elegant, widely used but difficult to scale to realistic problems. © 2012 Warren B. Powell

Modeling as a stochastic program A third strategy is to use the vocabulary of “stochastic programming”. For “two-stage” stochastic programs (decisions/information, or decisions/information/ decisions), this can be written in the generic form or where © 2012 Warren B. Powell

Modeling as a stochastic program In this talk, we will focus on multistage, sequential problems. Later in the presentation we show how the stochastic programming community models multistage, stochastic optimization problems. We are going to show that (for sequential problems), dynamic programming and stochastic programming begin by providing a model of a sequential problem (which we refer to as a dynamic program). However, we will show that stochastic programming (for sequential problems) is actually modeling what we will call the lookahead model (which is itself a dynamic program). This gives us what we will call a lookahead policy for solving dynamic programs. © 2012 Warren B. Powell

Outline Overview and major problem classes How to model a sequential decision problem Steps in the modeling process Examples (underdevelopment) © 2012 Warren B. Powell

Modeling We lack a standard language for modeling sequential, stochastic decision problems. In the slides that follow, we propose to model problems along five fundamental dimensions: State variables Decision variables Exogenous information processes Transition function Objective function This framework is widely followed in the control theory community, and almost completely ignored in operations research and computer science. © 2012 Warren B. Powell

Modeling dynamic problems The system state: The state variable is the minimally dimensioned function of history that is necessary and sufficient to calculate the decision function, cost function and transition function. © 2012 Warren B. Powell Slide 15 15

Modeling dynamic problems The system state: The state variable is, without question, one of the most controversial concepts in stochastic optimization. A number of leading authors will either claim that it cannot be defined, or should not. We argue that students need to learn how to model a system properly, and the state variable is central to a proper model. Our definition insists that the state variable include all the information we need to make a decision (and only the information needed), now or in the future. We also feel that it should be “minimally dimensioned” which is to say, as simple and compact as possible. This means that all (properly modeled) dynamic systems are Markovian, eliminating the need for the concept of “history dependent” processes. © 2012 Warren B. Powell Slide 16 16

Modeling dynamic problems Decisions: © 2012 Warren B. Powell Slide 17 17

Modeling dynamic problems Exogenous information: Note: Any variable indexed by t is known at time t. This convention, which is not standard in control theory, dramatically simplifies the modeling of information. © 2012 Warren B. Powell Slide 18 18

Modeling dynamic problems The transition function Also known as the: “System model” “State transition model” “Plant model” “Model” © 2012 Warren B. Powell Slide 19 19

Stochastic optimization models The objective function Given a system model (transition function) We have to find the best policy, which is a function that maps states to feasible actions, using only the information available when the decision is made. Cost function Decision function (policy) Finding the best policy Expectation over all random outcomes State variable © 2012 Warren B. Powell

Objective functions There are different objectives that we can use: Expectations Risk measures Worst case (“robust optimization”) © 2012 Warren B. Powell

Modeling This framework (very familiar to the control theory community) offers a model for sequential decision problems (minimizing expected costs). The most difficult hurdles involve: Understanding (and properly modeling) the state variable. Understanding what is meant (computationally) by the state transition function. While very familiar to the control theory community, this is not a term used in operations research or computer science. Understanding what in the world is meant by “minimizing over policies.” Finding computationally meaningful solution approaches involves entering what I have come to call the jungle of stochastic optimization. © 2012 Warren B. Powell

Outline Overview and major problem classes How to model a sequential decision problem Steps in the modeling process Examples (underdevelopment) © 2012 Warren B. Powell

Modeling stochastic optimization In these slides, I am going to try to present a four-step process for modeling a sequential, stochastic system. The approach begins by developing the idea of simulating a fixed policy. This is our model. We then address the challenge of finding an effective policy. The goal is to focus attention initially on modeling, after which we turn to the challenge of finding effective policies. © 2012 Warren B. Powell

Modeling stochastic optimization Step 1: Start by modeling the problem deterministically: In this step, we focus on understanding decisions and costs. © 2012 Warren B. Powell

Modeling stochastic optimization Step 2: Now imagine that the process is unfolding stochastically. Every time you see a decision replace it with the decision function (policy) and take the expectation. Instead of maximizing over decisions, we are now maximizing over the types of policies for making a decision. © 2012 Warren B. Powell

Stochastic optimization models Step 3: Now write out the objective function as a simulation. This can be done as one long simulation: … or an average over multiple sample paths: © 2012 Warren B. Powell

Stochastic optimization models Step 4 Now search for the best policy: First choose a type of policy: Myopic cost function approximation Lookahead policy (deterministic, stochastic) Policy function approximation Policy based on a value function approximation Or some sort of hybrid Then identify the tunable parameters of the policy Tune the parameters … using your favorite stochastic search or optimal learning algorithm. Loop over other types of policies. © 2012 Warren B. Powell

Stochastic programming Stochastic search Model predictive control Optimal control Reinforcement learning On-policy learning Off-policy learning Markov decision processes Simulation optimization Policy search © 2012 Warren B. Powell

Computational Stochastic Optimization Stochastic programming Stochastic search Model predictive control Optimal control Reinforcement learning On-policy learning Markov decision processes Simulation optimization Policy search © 2012 Warren B. Powell