CPSC 422, Lecture 9Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 9 Sep, 28, 2015.

Slides:

Advertisements

Similar presentations

CPSC 422, Lecture 9Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 9 Jan, 23, 2015.

Advertisements

Markov Decision Processes (MDPs) read Ch utility-based agents –goals encoded in utility function U(s), or U:S  effects of actions encoded in.

Markov Decision Process

Partially Observable Markov Decision Process (POMDP)

Department of Computer Science Undergraduate Events More

Decision Theoretic Planning

Optimal Policies for POMDP Presented by Alp Sardağ.

MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.

Chapter 19 Probabilistic Dynamic Programming

主講人：虞台文大同大學資工所智慧型多媒體研究室

COSC 878 Seminar on Large Scale Statistical Machine Learning 1.

Infinite Horizon Problems

Planning under Uncertainty

SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.

91.420/543: Artificial Intelligence UMass Lowell CS – Fall 2010

Markov Decision Processes CSE 473 May 28, 2004 AI textbook : Sections Russel and Norvig Decision-Theoretic Planning: Structural Assumptions.

Markov Decision Processes

Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.

1 Stochastic Modeling for Clinical Scheduling by Ji Lin Reference: Muthuraman, K., and Lawley, M. A Stochastic Overbooking Model for Outpatient Clinical.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Department of Computer Science Undergraduate Events More

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC421 – Fall 2003 material from Jean-Claude Latombe, and Daphne Koller.

MDP Reinforcement Learning. Markov Decision Process “Should you give money to charity?” “Would you contribute?” “Should you give money to charity?” $

Instructor: Vincent Conitzer

REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.

Modeling & Simulation What can they offer? March 28, 2012 Ottawa, ON Waiting Time Management Strategies for Scheduled Health Care Services: A Workshop.

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.

Computer Science CPSC 502 Lecture 14 Markov Decision Processes (Ch. 9, up to 9.5.3)

Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.

1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

Department of Computer Science Undergraduate Events More

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

INTRODUCTION TO Machine Learning

Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.

Restless Multi-Arm Bandits Problem (RMAB): An Empirical Study Anthony Bonifonte and Qiushi Chen ISYE8813 Stochastic Processes and Algorithms 4/18/2014.

CPS 570: Artificial Intelligence Markov decision processes, POMDPs

Department of Computer Science Undergraduate Events More

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

Some Final Thoughts Abhijit Gosavi. From MDPs to SMDPs The Semi-MDP is a more general model in which the time for transition is also a random variable.

Planning Under Uncertainty. Sensing error Partial observability Unpredictable dynamics Other agents.

Department of Computer Science Undergraduate Events More

Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.

1 Markov Decision Processes Finite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

Krishnendu ChatterjeeFormal Methods Class1 MARKOV CHAINS.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 2

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Planning to Maximize Reward: Markov Decision Processes

UBC Department of Computer Science Undergraduate Events More

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 2

CS 188: Artificial Intelligence Fall 2007

CS 188: Artificial Intelligence Fall 2008

Instructor: Vincent Conitzer

Markov Decision Problems

Hidden Markov Models (cont.) Markov Decision Processes

Reinforcement Learning Dealing with Partial Observability

Reinforcement Nisheeth 18th January 2019.

CS723 - Probability and Stochastic Processes

Markov Decision Processes

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Markov Decision Processes

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

CPSC 422, Lecture 9Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 9 Sep, 28, 2015

An MDP Approach to Multi-Category Patient Scheduling in a Diagnostic Facility Adapted from: Matthew Dirks

Goal / Motivation  To develop a mathematical model for multi-category patient scheduling decisions in computed tomography (CT), and to investigate associated trade-offs from economic and operational perspectives.  Contributions to AI, OR and radiology

Types of patients:  Emergency Patients (EP)  Critical (CEP)  Non-critical (NCEP)  Inpatients (IP)  Outpatients  Scheduled OP  Add-on OP: Semi-urgent (OP)  (Green = Types used in this model)

Proposed Solution  Finite-horizon MDP  Non-stationary arrival probabilities for IPs and EPs  Performance objective: Max $

MDP Representation

MDP Representation (cont’)  Transition Probabilities

example

Performance Metrics (over 1 work-day)  Expected net CT revenue  Average waiting-time  Average # patients not scanned by day’s end  Rewards  Discount factor? 1

Maximize total expected revenue  Optimal Policy  Solving this gives the policy for each state, n, in the day  Finite Horizon MDP  The recursive equation (3) has value of current state Vn calculated based on future state Vn+1, this contradicts with the equation given during class, where Vn+1 depends on Vn?

Evaluation: Comparison of MDP with Heuristic Policies Result Metric

Heuristics  FCFS: First come first serve  R-1: One patient from randomly chosen type is scanned  R-2: One patient randomly chosen from all waiting patients (favors types with more people waiting)  O-1: Priority  OP  NCEP  IP  O-2: Priority:  OP  IP  NCEP

Number of patients not scanned

Waiting-time

Single-scanner

Two-scanner

Sample Policy n=12, NCEP=5

Question Types from students  Finite vs. infinite  Simplicity. Lots of uncertainty about what can happen overnight  Non stationary process – best action depends on time  Arrival Probabilities  More scanners  Modeling more patient types (urgency) / different hospital….. can easily extend the model, Only data from one Hospital (general?)  Uniform slot length (realistic?)  the probability distribution of the time for CT scans to be completed rather than to make the assumption that they are all of fixed duration? Finer granularity of the time slots  Operational Cost of Implementing the policy (take into account): compute the policy vs. apply the policy  Modeling even more uncertainty “Accidents happen randomly without any pattern.” “Scanner not working”  2 patients at once (need to collect all the prob and consider those in the transition prob)  P-value  Why no VI?  Used in practice ?

 Other models: Is it better to use continuous Markov Chain and queuing theory in analyzing this scheduling problem?  How would this model handle two CEPs that came in at the same time? Randomly Push one to the next slot   How does approximate dynamic programming compare to value iteration? (approximate method, can deal with bigger models but not optimal)  Transfer model to other facilities? Yes…  Discount factor 1? Yes  This work failed to take into account human suffering, or the urgency of of scans for in and out patients. Could the reward function to tailored to include such nebulous concepts or is it beyond the capabilities of the model?  This model is specific to the target hospital  I think outperforming other MDP-based models can better illustrate the effectiveness of this model's features, so are the choices of comparison methods good in this paper?  First step showing that sound probabilistic models can be build and outperform heuristics then you can do the above