Modeling Long Term Care and Supportive Housing Marisela Mainegra Hing Telfer School of Management University of Ottawa Canadian Operational Research Society,

Slides:



Advertisements
Similar presentations
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
Advertisements

A Markov Decision Model for Determining Optimal Outpatient Scheduling Jonathan Patrick Telfer School of Management University of Ottawa.
1 Reinforcement Learning Problem Week #3. Figure reproduced from the figure on page 52 in reference [1] 2 Reinforcement Learning Loop state Agent Environment.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Reinforcement learning (Chapter 21)
1 Monte Carlo Methods Week #5. 2 Introduction Monte Carlo (MC) Methods –do not assume complete knowledge of environment (unlike DP methods which assume.
1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Reinforcement learning
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Lezione 6 - Reinforcement Learning Prof. Giancarlo.
Reinforcement Learning Tutorial
Blackjack as a Test Bed for Learning Strategies in Neural Networks A. Perez-Uribe and E. Sanchez Swiss Federal Institute of Technology IEEE IJCNN'98.
Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ADITI BHAUMICK ab3585. To use reinforcement learning algorithm with function approximation. Feature-based state representations using a broad characterization.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
MDP Reinforcement Learning. Markov Decision Process “Should you give money to charity?” “Would you contribute?” “Should you give money to charity?” $
Search and Planning for Inference and Learning in Computer Vision
Reinforcement Learning
Introduction Many decision making problems in real life
Reinforcement Learning (II.) Exercise Solutions Ata Kaban School of Computer Science University of Birmingham.
Reinforcement Learning
Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright E3 project, BUPT Autonomic Joint Session Admission Control using Reinforcement Learning.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Balancing Exploration and Exploitation Ratio in Reinforcement Learning Ozkan Ozcan (1stLT/ TuAF)
Reinforcement Learning
Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 1: Generalized policy iteration.
1 S ystems Analysis Laboratory Helsinki University of Technology Flight Time Allocation Using Reinforcement Learning Ville Mattila and Kai Virtanen Systems.
Solving POMDPs through Macro Decomposition
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
INTRODUCTION TO Machine Learning
CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.
CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:
1 Introduction to Reinforcement Learning Freek Stulp.
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
Reinforcement learning (Chapter 21)
Reinforcement Learning
CS 484 – Artificial Intelligence1 Announcements Homework 5 due Tuesday, October 30 Book Review due Tuesday, October 30 Lab 3 due Thursday, November 1.
Some Final Thoughts Abhijit Gosavi. From MDPs to SMDPs The Semi-MDP is a more general model in which the time for transition is also a random variable.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
CS 182 Reinforcement Learning. An example RL domain Solitaire –What is the state space? –What are the actions? –What is the transition function? Is it.
Reinforcement learning (Chapter 21)
Reinforcement Learning (1)
Reinforcement learning (Chapter 21)
"Playing Atari with deep reinforcement learning."
Markov Decision Processes
Planning to Maximize Reward: Markov Decision Processes
Markov Decision Processes
Reinforcement learning
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Reinforcement Learning with Neural Networks
Reinforcement Learning
Introduction to Reinforcement Learning and Q-Learning
Reinforcement Nisheeth 18th January 2019.
Reinforcement Learning (2)
Reinforcement Learning (2)
Presentation transcript:

Modeling Long Term Care and Supportive Housing Marisela Mainegra Hing Telfer School of Management University of Ottawa Canadian Operational Research Society, May 18, 2011

Outline  Long Term Care and Supportive Housing  Queueing Models  Dynamic Programming Model  Approximate Dynamic Programming

μ LTC, C LTC Community Hospital LTC λ C λHλH λRCλRC λ RH LTC problem Goal: Hospital level below a given threshold Community waiting times below 90 days

LTC previous results  MDP model determined a threshold policy for the Hospital but it did not take into account community demands  Simulation Model determined that current capacity is insufficient to achieve the goal

Queueing Model Station LTC: M/M/C LTC Station H_renege: M/M/∞ μ LTC, C LTC λ H-LTC λ C-LTC λ LTC λ RH λRCλRC Hospital λHλH Community λ C LTC H_reneg e μ RH, 

Queueing Model Station LTC: M/M/C LTC <1 steady state: The probability that no patients are in the system: The average number of patients in the waiting line: The average time a client spends in the waiting line: The number of patients from the Hospital that are in the queue for LTC (L qH-LTC ).

Queueing Model Station H_renege: M/M/∞  The average number of patients in the system is

Queueing Model Data analysis  Data on all hospital demand arriving to the CCAC from April 1st, 2006 to May 15th,  ρ LTC = for current capacity C LTC = 4530  To have ρ LTC , 2841 (62.71%) more beds than the current capacity.  With CLTC > 7370 we apply the formulas.  Given a threshold T for the hospital patients and the number Lq LTC of total patients waiting to go to LTC, what we want is to determine the capacity CLTC in LTC such as:

Queueing Model Results  19 iterations of capacity values  Goal achieved with capacity 7389, the average waiting time is 31 days and the average amount of Hospital patients waiting in the queue is 130 ( T=134).  This required capacity is 2859 (63.1%) more than the current capacity.

Queueing Model with SH λ RH Hospital μ LTC, C LTC λ H-LTC λRCλRC λHλH Community λ C LTC H_renege λ C-SH λ H-SH SH μ SH, C SH λ SH-LTC λ C-LTC μ RH, 

Queueing Model with SH Results  Required capacity in LTC is 6835, 2305 (50.883%) more beds than the current capacity (4530).  Required capacity in SH is  With capacity values at LTC: 6835 and at SH: 1169 there are (T= 134) Hospital Patients waiting for care (for LTC: , reneging: , for SH: ), and Community Patients wait for care in average (days) at LTC: , and at SH:

Semi-MDP Model State space: Action space: Transition time: Transition probabilities: Immediate reward: Optimal Criterion: S = {(D H_LTC, D H_SH, D C_LTC, D C­_SH, D SH_LTC, C LTC, C SH, p) } A = {0,..,max(TC LTC,TC SH )} d(s,a) = Pr(s,a,s’) = r(s,a) = Total expected discounted reward

Approximate Dynamic programming γ: discount factor find π : S A that maximizes the state-action value function Goal Bellman: there exists Q* optimal: Q* =maxQ(s,a) and the optimal policy π*

state action Reinforcement Reinforcement Learning

RL: environment transition probabilities reward function action next state, immediate reward ENVIROMMENT state

RL: Agent Knowledge: Q(s,a) exploratory Learning: update Q-values state Knowledge representation (FA) Backup table Neural networkNeural... Watkins QL Sarsa ( )... reward action Learning method Behavior

QL: parameters  θ: number of hidden neurons.  T: number of iterations of the learning process.   0 : initial value of the learning rate.   0 : initial value of the exploration rate.  Learning-rate decreasing function.  Exploration-rate decreasing function.

QL: algorithm exploration vs/ exploitation Learning and exploration rates θ (T, T

QL: tuning parameters (observed regularities) 1.(θ,  )-scheme: T=  10 4,  0 = 10 -3,  0 =1, T  =  10 3, T  =v10 3, v  [1,..  ]. PR(θ,  ): best performance with (θ,  )-scheme 2.PR(θ,  ) monotically increase respect  until certain value  (θ) 3.PR(θ,  ) monotically increase respect θ until certain value θ(  ) 4.  (θ) and θ(  ) depend on the problem instance

QL: tuning parameters (methodology: learning schedule given PR Heu ) 1.∆θ =50, θ =0, PR θ =0,  =0, v best =1, 2.While PR θ <PR heu or no-stop 1.θ = θ + ∆θ, PR best =0 2.While PR best ≥ PR  1.  =  +1, T=  10 4, T  =  PR  =PR best, 3.For v= v best to  T  =v10 3 PR[v]=Q-Learning(T, θ, 10 -3, 1, T , T  ) 4.[PR best,v best ]=max(PR) 3.PR θ = PR 

Discussion  For given capacities solve the SMDP with QL  Model other LTC complexities: different facilities and room accommodations, client choice and level of care

Thank you for your attention  Questions?

Neural Network for Q(s,a)