Dynamic Programming Lecture 13 (5/31/2017).

Slides:



Advertisements
Similar presentations
Dynamic Programming Lecture 13 (5/21/2014). - A Forest Thinning Example
Advertisements

A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.
1 OR II GSLM Outline  introduction to discrete-time Markov Chain introduction to discrete-time Markov Chain  problem statement  long-term.
Markov Decision Processes (MDPs) read Ch utility-based agents –goals encoded in utility function U(s), or U:S  effects of actions encoded in.
Dynamic Programming Rahul Mohare Faculty Datta Meghe Institute of Management Studies.
Markov Decision Process
MARKOV CHAIN EXAMPLE Personnel Modeling. DYNAMICS Grades N1..N4 Personnel exhibit one of the following behaviors: –get promoted –quit, causing a vacancy.
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
Dynamic Decision Processes
Chapter 11 Dynamic Programming.
Dynamic Programming In this handout A shortest path example
MIT and James Orlin © Dynamic Programming 1 –Recursion –Principle of Optimality.
Deterministic Dynamic Programming.  Dynamic programming is a widely-used mathematical technique for solving problems that can be divided into stages.
CS433 Modeling and Simulation Lecture 06 – Part 03 Discrete Markov Chains Dr. Anis Koubâa 12 Apr 2009 Al-Imam Mohammad Ibn Saud University.
. Computational Genomics Lecture 7c Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
An Introduction to Markov Decision Processes Sarah Hickmott
Markov Decision Processes
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
Chapter 7 Dynamic Programming 7.
SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.
Markov Decision Processes
Planning in MDPs S&B: Sec 3.6; Ch. 4. Administrivia Reminder: Final project proposal due this Friday If you haven’t talked to me yet, you still have the.
Planning to learn. Progress report Last time: Transition functions & stochastic outcomes Markov chains MDPs defined Today: Exercise completed Value functions.
More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.
Revenue Management under a Markov Chain Based Choice Model
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Dynamic Programming. Well known algorithm design techniques:. –Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic programming.
Dynamic Programming Key characteristic of a Dynamic Program: Breaking up a large, unwieldy problem into a series of smaller, more tractable problems. Shortest.
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
Decision Making in Robots and Autonomous Agents Decision Making in Robots and Autonomous Agents The Markov Decision Process (MDP) model Subramanian Ramamoorthy.
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 31 Alternative System Description If all w k are given initially as Then,
Practical Dynamic Programming in Ljungqvist – Sargent (2004) Presented by Edson Silveira Sobrinho for Dynamic Macro class University of Houston Economics.
D Nagesh Kumar, IIScOptimization Methods: M5L2 1 Dynamic Programming Recursive Equations.
Markov Chains X(t) is a Markov Process if, for arbitrary times t1 < t2 < < tk < tk+1 If X(t) is discrete-valued If X(t) is continuous-valued i.e.
Chapter 61 Continuous Time Markov Chains Birth and Death Processes,Transition Probability Function, Kolmogorov Equations, Limiting Probabilities, Uniformization.
© 2015 McGraw-Hill Education. All rights reserved. Chapter 19 Markov Decision Processes.
Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking Presented by Shihao Ji Duke University Machine Learning.
Dynamic Programming Discrete time frame Multi-stage decision problem Solves backwards.
Course Overview and Overview of Optimization in Ag Economics Lecture 1.
Tuesday, April 30 Dynamic Programming – Recursion – Principle of Optimality Handouts: Lecture Notes.
Bayesian Brain: Probabilistic Approaches to Neural Coding Chapter 12: Optimal Control Theory Kenju Doya, Shin Ishii, Alexandre Pouget, and Rajesh P.N.Rao.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
11. Markov Chains (MCs) 2 Courtesy of J. Bard, L. Page, and J. Heyl.
Introduction and Preliminaries D Nagesh Kumar, IISc Water Resources Planning and Management: M4L1 Dynamic Programming and Applications.
Discrete-time Markov chain (DTMC) State space distribution
Markov Decision Process (MDP)
Water Resources Planning and Management Daene McKinney
Chapter 11 Dynamic Programming.
Industrial Engineering Dep
Discrete-time markov chain (continuation)
Dynamic Programming Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Markov Decision Processes
Biomedical Data & Markov Decision Process
Autonomous Cyber-Physical Systems: Reinforcement Learning for Planning
Markov Decision Processes
DYNAMIC PROGRAMMING.
Markov Decision Processes
Dynamic Programming Characterize the structure (problem state) of optimal solution Recursively define the value of optimal solution Compute the value of.
Discrete-time markov chain (continuation)
Dynamic Programming General Idea
Unit 4: Dynamic Programming
Discrete-time markov chain (continuation)
Dynamic Programming Characterize the structure (problem state) of optimal solution Recursively define the value of optimal solution Compute the value of.
Markov Decision Problems
Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Jan. 11, 2001
Hidden Markov Models (cont.) Markov Decision Processes
Dynamic Programming General Idea
Dynamic Programming 動態規劃
Discrete-time markov chain (continuation)
CS723 - Probability and Stochastic Processes
Presentation transcript:

Dynamic Programming Lecture 13 (5/31/2017)

- A Forest Thinning Example - Age 10 Age 20 Age 30 Age 10 Stage 1 Stage 2 Stage 3 5 850 850 6 750 750 2 650 150 200 7 650 650 100 1 260 50 3 535 11 260 175 8 600 600 100 4 410 75 500 9 500 150 10 400 400 Source: Dykstra’s Mathematical Programming for Natural Resource Management (1984)

Dynamic Programming Cont. Stages, states and actions (decisions) Backward and forward recursion

Solving Dynamic Programs Recursive relation at stage i (Bellman Equation):

Dynamic Programming Structural requirements of DP The problem can be divided into stages Each stage has a finite set of associated states (discrete state DP) The impact of a policy decision to transform a state in a given stage to another state in the subsequent stage is deterministic Principle of optimality: given the current state of the system, the optimal policy for the remaining stages is independent of any prior policy adopted

Examples of DP The Floyd-Warshall Algorithm (used in the Bucket formulation of ARM):

Examples of DP cont. Minimizing the risk of losing an endangered species (non-linear DP): $2M $2M $0 $2M $1M $2M $0 $2M $1M $1M $2M $1M $0 $1M $2M $1M $0 $0 $0 $0 Stage 1 Stage 2 Stage 3 Source: Buongiorno and Gilles’ (2003) Decision Methods for Forest Resource Management

Minimizing the risk of losing an endangered species (DP example) Stages (t=1,2,3) represent $ allocation decisions for Project 1, 2 and 3; States (i=0,1,2) represent the budgets available at each stage t; Decisions (j=0,1,2) represent the budgets available at stage t+1; denotes the probability of failure of Project t if decision j is made (i.e., i-j is spent on Project t); and is the smallest probability of total failure from stage t onward, starting in state i and making the best decision j*. Then, the recursive relation can be stated as:

Markov Chains (based on Buongiorno and Gilles 2003) Transition probability matrix P Vector pt converges to a vector of steady-state probabilities p*. p* is independent of p0!

Markov Chains cont. Berger-Parker Landscape Index Mean Residence Times Mean Recurrence Times

Markov Chains cont. Forest dynamics (expected revenues or biodiversity with vs. w/o management:

Markov Chains cont. Present value of expected returns

Markov Decision Processes