Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University.

Slides:

Advertisements

Similar presentations

Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.

Advertisements

Scenario: EOT/EOT-R/COT Resident admitted March 10th Admitted for PT and OT following knee replacement for patient with CHF, COPD, shortness of breath.

Variations of the Turing Machine

Angstrom Care 培苗社 Quadratic Equation II

AP STUDY SESSION 2.

Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.

Processes and Operating Systems

Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.

Author: Julia Richards and R. Scott Hawley

STATISTICS Joint and Conditional Distributions

STATISTICS HYPOTHESES TEST (I)

STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.

Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.

David Burdett May 11, 2004 Package Binding for WS CDL.

1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.

Local Customization Chapter 2. Local Customization 2-2 Objectives Customization Considerations Types of Data Elements Location for Locally Defined Data.

1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt BlendsDigraphsShort.

1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt RhymesMapsMathInsects.

Chapter 7 Sampling and Sampling Distributions

1 Outline relationship among topics secrets LP with upper bounds by Simplex method basic feasible solution (BFS) by Simplex method for bounded variables.

1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.

Break Time Remaining 10:00.

EE, NCKU Tien-Hao Chang (Darby Chang)

Turing Machines.

Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.

PP Test Review Sections 6-1 to 6-6

1 Atomic Routing Games on Maximum Congestion Costas Busch Department of Computer Science Louisiana State University Collaborators: Rajgopal Kannan, LSU.

Bright Futures Guidelines Priorities and Screening Tables

EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.

Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.

Bellwork Do the following problem on a ½ sheet of paper and turn in.

CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 ACM Principles and Practice of Parallel Programming, PPoPP, 2006 Panel Presentations Parallel Processing is.

Operating Systems Operating Systems - Winter 2012 Chapter 4 – Memory Management Vrije Universiteit Amsterdam.

Operating Systems Operating Systems - Winter 2010 Chapter 3 – Input/Output Vrije Universiteit Amsterdam.

Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.

Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.

1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.

Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.

Chapter 1: Expressions, Equations, & Inequalities

CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.

Adding Up In Chunks.

MaK_Full ahead loaded 1 Alarm Page Directory (F11)

1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.

Artificial Intelligence

Subtraction: Adding UP

1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.

1 Let’s Recapitulate. 2 Regular Languages DFAs NFAs Regular Expressions Regular Grammars.

Speak Up for Safety Dr. Susan Strauss Harassment & Bullying Consultant November 9, 2012.

©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.

Essential Cell Biology

Converting a Fraction to %

Chapter 8 Estimation Understandable Statistics Ninth Edition

Clock will move after 1 minute

PSSA Preparation.

Essential Cell Biology

Immunobiology: The Immune System in Health & Disease Sixth Edition

Physics for Scientists & Engineers, 3rd Edition

Energy Generation in Mitochondria and Chlorplasts

Select a time to count down from the clock above

Copyright Tim Morris/St Stephen's School

9. Two Functions of Two Random Variables

1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.

FIGURE 3-1 Basic parts of a computer. Dale R. Patrick Electricity and Electronics: A Survey, 5e Copyright ©2002 by Pearson Education, Inc. Upper Saddle.

Presentation transcript:

Online Sampling for Markov Decision Processes Bob Givan Joint work w/ E. K. P. Chong, H. Chang, G. Wu Electrical and Computer Engineering Purdue University

Bob Givan Electrical and Computer Engineering Purdue University 1 November 4-9, 2001 Markov Decision Process (MDP)  Ingredients:  System state x in state space X  Control action a in A(x)  Reward R(x,a)  State-transition probability P(x,y,a)  Find control policy to maximize objective fun

Bob Givan Electrical and Computer Engineering Purdue University 2 November 4-9, 2001 Optimal Policies Policy – mapping from state and time to actions  Stationary Policy – mapping from state to actions  Goal – a policy maximizing the objective function V H *(x 0 ) = max Obj [R(x 0,a 0 ), …, R(x H-1,a H-1 )] where the “max” is over all policies u = u 0,…,u H-1  For large H, a 0 independent of H. (w/ergodicity assum.)  Stationary optimal action a 0 for H =  via receding horizon control

Bob Givan Electrical and Computer Engineering Purdue University 3 November 4-9, 2001 Q Values  Fix a large H, focus on finite-horizon reward  Define Q(x,a) = R(x,a) + E[V H-1 *(y)]  “Utility” of action a at state x.  Name: Q-value of action a at state x.  Key identities (Bellman’s equations):  V H *(x) = max a Q(x,a)   0 * (x) = argmax a Q(x,a)

Bob Givan Electrical and Computer Engineering Purdue University 4 November 4-9, 2001 Solution Methods  Recall:  u 0 * (x) = argmax a Q(x,a)  Q(x,a) = R(x,a) + E [V H-1 *(y)]  Problem: Q-value depends on optimal policy.  State space is extremely large (often continuous)  Two-pronged solution approach:  Apply a receding-horizon method  Estimate Q-values via simulation/sampling

Bob Givan Electrical and Computer Engineering Purdue University 5 November 4-9, 2001 Methods for Q-value Estimation Previous work by other authors:  Unbiased sampling (exact Q value) [Kearns et al., IJCAI-99]  Policy rollout (lower bound) [Bertsekas & Castanon, 1999] Our techniques:  Hindsight optimization (upper bound)  Parallel rollout (lower bound)

Bob Givan Electrical and Computer Engineering Purdue University 6 November 4-9, 2001 Expectimax Tree for V *

Bob Givan Electrical and Computer Engineering Purdue University 7 November 4-9, 2001 Unbiased Sampling

Bob Givan Electrical and Computer Engineering Purdue University 8 November 4-9, 2001 Unbiased Sampling (Cont’d)  For a given desired accuracy, how large should sampling width and depth be?  Answered: Kearns, Mansour, and Ng (1999)  Requires prohibitive sampling width and depth  e.g. C  10 8, H s > 60 to distinguish “best” and “worst” policies in our scheduling domain  We evaluate with smaller width and depth

Bob Givan Electrical and Computer Engineering Purdue University 9 November 4-9, 2001 How to Look Deeper?

Bob Givan Electrical and Computer Engineering Purdue University 10 November 4-9, 2001 Policy Roll-out

Bob Givan Electrical and Computer Engineering Purdue University 11 November 4-9, 2001 Policy Rollout in Equations  Write V H u (y) for the value of following policy u  Recall: Q(x,a)= R(x,a) + E [V H-1 *(y)] = R(x,a) + E [max u V H-1 u (y)]  Given a base policy u, use R(x,a) + E [V H-1 u (y)] as an lower bound estimate of Q-value.  Resulting policy is PI(u), given infinite sampling

Bob Givan Electrical and Computer Engineering Purdue University 12 November 4-9, 2001 Policy Roll-out (cont’d)

Bob Givan Electrical and Computer Engineering Purdue University 13 November 4-9, 2001 Parallel Policy Rollout  Generalization of policy rollout, due to [Chang, Givan, and Chong, 2000]  Given a set U of base policies, use R(x,a) + E [max u ∊ U V H-1 u (y)] as an estimate of Q-value  More accurate estimate than policy rollout  Still gives a lower bound to true Q-value  Still gives a policy no worse than any in U

Bob Givan Electrical and Computer Engineering Purdue University 14 November 4-9, 2001 Hindsight Optimization – Tree View

Bob Givan Electrical and Computer Engineering Purdue University 15 November 4-9, 2001 Hindsight Optimization – Equations  Swap Max and Exp in expectimax tree.  Solve each off-line optimization problem  O (kC’ f(H)) time  where f(H) is the offline problem complexity  Jensen’s inequality implies upper bounds

Bob Givan Electrical and Computer Engineering Purdue University 16 November 4-9, 2001 Hindsight Optimization (Cont’d)

Bob Givan Electrical and Computer Engineering Purdue University 17 November 4-9, 2001 Application to Example Problems  Apply unbiased sampling, policy rollout, parallel rollout, and hindsight optimization to:  Multi-class deadline scheduling  Random early dropping  Congestion control

Bob Givan Electrical and Computer Engineering Purdue University 18 November 4-9, 2001 Basic Approach  Traffic model provides a stochastic description of possible future outcomes  Method  Formulate network decision problems as POMDPs by incorporating traffic model  Solve belief-state MDP online using sampling (choose time-scale to allow for computation time)

Bob Givan Electrical and Computer Engineering Purdue University 19 November 4-9, 2001 Domain 1: Deadline Scheduling Objective: Minimize weighted loss

Bob Givan Electrical and Computer Engineering Purdue University 20 November 4-9, 2001 Domain 2: Random Early Dropping Objective: Minimize delay without sacrificing throughput

Bob Givan Electrical and Computer Engineering Purdue University 21 November 4-9, 2001 Domain 3: Congestion Control

Bob Givan Electrical and Computer Engineering Purdue University 22 November 4-9, 2001 Traffic Modeling  A Hidden Markov Model (HMM) for each source  Note: state is hidden, model is partially observed

Bob Givan Electrical and Computer Engineering Purdue University 23 November 4-9, 2001 Deadline Scheduling Results Non-sampling Policies:  EDF: earliest deadline first.  Deadline sensitive, class insensitive.  SP: static priority.  Deadline insensitive, class sensitive.  CM: current minloss [Givan et al., 2000]  Deadline and class sensitive.  Minimizes weighted loss for the current packets.

Bob Givan Electrical and Computer Engineering Purdue University 24 November 4-9, 2001 Deadline Scheduling Results  Objective: minimize weighted loss  Comparison:  Non-sampling policies  Unbiased sampling (Kearns et al.)  Hindsight optimization  Rollout with CM as base policy  Parallel rollout  Results due to H. S. Chang

Bob Givan Electrical and Computer Engineering Purdue University 25 November 4-9, 2001 Deadline Scheduling Results

Bob Givan Electrical and Computer Engineering Purdue University 26 November 4-9, 2001 Deadline Scheduling Results

Bob Givan Electrical and Computer Engineering Purdue University 27 November 4-9, 2001 Deadline Scheduling Results

Bob Givan Electrical and Computer Engineering Purdue University 28 November 4-9, 2001 Random Early Dropping Results  Objective: minimize delay subject to throughput loss-tolerance  Comparison:  Candidate policies: RED and “buffer-k”  KMN-sampling  Rollout of buffer-k  Parallel rollout  Hindsight optimization  Results due to H. S. Chang.

Bob Givan Electrical and Computer Engineering Purdue University 29 November 4-9, 2001 Random Early Dropping Results

Bob Givan Electrical and Computer Engineering Purdue University 30 November 4-9, 2001 Random Early Dropping Results

Bob Givan Electrical and Computer Engineering Purdue University 31 November 4-9, 2001 Congestion Control Results  MDP Objective: minimize weighted sum of throughput, delay, and loss-rate  Fairness is hard-wired  Comparisons:  PD-k (proportional-derivative with k target queue)  Hindsight optimization  Rollout of PD-k == parallel rollout  Results due to G. Wu, in progress

Bob Givan Electrical and Computer Engineering Purdue University 32 November 4-9, 2001 Congestion Control Results

Bob Givan Electrical and Computer Engineering Purdue University 33 November 4-9, 2001 Congestion Control Results

Bob Givan Electrical and Computer Engineering Purdue University 34 November 4-9, 2001 Congestion Control Results

Bob Givan Electrical and Computer Engineering Purdue University 35 November 4-9, 2001 Congestion Control Results

Bob Givan Electrical and Computer Engineering Purdue University 36 November 4-9, 2001 Results Summary  Unbiased sampling cannot cope  Parallel rollout wins in 2 domains  Not always equal to simple rollout of one base policy  Hindsight optimization wins in 1 domain  Simple policy rollout – the cheapest method  Poor in domain 1  Strong in domain 2 with best base policy – but how to find this policy?  So-so in domain 3 with any base policy

Bob Givan Electrical and Computer Engineering Purdue University 37 November 4-9, 2001 Talk Summary  Case study of MDP sampling methods  New methods offering practical improvements  Parallel policy rollout  Hindsight optimization  Systematic methods for using traffic models to help make network control decisions  Feasibility of real-time implementation depends on problem timescale

Bob Givan Electrical and Computer Engineering Purdue University 38 November 4-9, 2001 Ongoing Research  Apply to other control problems (different timescales):  Admission/access control  QoS routing  Link bandwidth allotment  Multiclass connection management  Problems arising in proxy-services  Diagnosis and recovery

Bob Givan Electrical and Computer Engineering Purdue University 39 November 4-9, 2001 Ongoing Research (Cont’d)  Alternative traffic models  Multi-timescale models  Long-range dependent models  Closed-loop traffic  Fluid models  Learning traffic model online  Adaptation to changing traffic conditions

Bob Givan Electrical and Computer Engineering Purdue University 40 November 4-9, 2001 Congestion Control (Cont’d)

Bob Givan Electrical and Computer Engineering Purdue University 41 November 4-9, 2001 Congestion Control Results

Bob Givan Electrical and Computer Engineering Purdue University 42 November 4-9, 2001 Hindsight Optimization (Cont’d)

Bob Givan Electrical and Computer Engineering Purdue University 43 November 4-9, 2001 Policy Rollout (Cont’d) Base Policy Policy-performanceperformance

Bob Givan Electrical and Computer Engineering Purdue University 44 November 4-9, 2001 Receding-horizon Control  For large horizon H, policy is ~ stationary.  At each time, if state is x, then apply action u * (x) = argmax a Q(x,a) = argmax a R(x,a) + E [V H-1 *(y)]  Compute estimate of Q-value at each time.

Bob Givan Electrical and Computer Engineering Purdue University 45 November 4-9, 2001 Congestion Control (Cont’d)

Bob Givan Electrical and Computer Engineering Purdue University 46 November 4-9, 2001 Domain 3: Congestion Control  High-priority traffic:  Open-loop controlled  Low-priority traffic:  Closed-loop controlled  Resources: Bandwidth and buffer  Objective: optimize throughput, delay, loss, and fairness Bottleneck Node High-priority Traffic Best-effort Traffic...

Bob Givan Electrical and Computer Engineering Purdue University 47 November 4-9, 2001 Congestion Control Results

Bob Givan Electrical and Computer Engineering Purdue University 48 November 4-9, 2001 Congestion Control Results

Bob Givan Electrical and Computer Engineering Purdue University 49 November 4-9, 2001 Congestion Control Results

Bob Givan Electrical and Computer Engineering Purdue University 50 November 4-9, 2001 Congestion Control Results