1 Possible Roles for Reinforcement Learning in Clinical Research S.A. Murphy November 14, 2007.

Slides:



Advertisements
Similar presentations
Piloting and Sizing Sequential Multiple Assignment Randomized Trials in Dynamic Treatment Regime Development 2012 Atlantic Causal Inference Conference.
Advertisements

Treatment Effect Heterogeneity & Dynamic Treatment Regime Development S.A. Murphy.
11 Confidence Intervals, Q-Learning and Dynamic Treatment Regimes S.A. Murphy Time for Causality – Bristol April, 2012 TexPoint fonts used in EMF. Read.
Experimental Trials. Oslin ExTENd Late Trigger for Nonresponse 8 wks Response TDM + Naltrexone CBI Random assignment: CBI +Naltrexone Nonresponse Early.
1 Meeting the Future in Managing Chronic Disorders: Individually Tailored Strategies S.A. Murphy Univ. of Michigan Oberlin College, Feb. 20, 2006.
Inference for Clinical Decision Making Policies D. Lizotte, L. Gunter, S. Murphy INFORMS October 2008.
Using Clinical Trial Data to Construct Policies for Guiding Clinical Decision Making S. Murphy & J. Pineau American Control Conference Special Session.
Experimenting to Improve Clinical Practice S.A. Murphy AAAS, 02/15/13 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
1 Dynamic Treatment Regimes Advances and Open Problems S.A. Murphy ICSPRAR-2008.
Causal Inference and Alternative Explanations S.A. Murphy Univ. of Michigan May, 2004.
1 Developing Adaptive Treatment Strategies using MOST Experimental Designs S.A. Murphy Univ. of Michigan Dallas: December, 2005.
Methodology for Adaptive Treatment Strategies for Chronic Disorders: Focus on Pain S.A. Murphy NIH Pain Consortium 5 th Annual Symposium on Advances in.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan JSM: August, 2005.
SMART Designs for Constructing Adaptive Treatment Strategies S.A. Murphy 15th Annual Duke Nicotine Research Conference September, 2009.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy LSU ---- Geaux Tigers! April 2009.
Substance Abuse, Multi-Stage Decisions, Generalization Error How are they connected?! S.A. Murphy Univ. of Michigan CMU, Nov., 2004.
An Experimental Paradigm for Developing Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan March, 2004.
Constructing Dynamic Treatment Regimes & STAR*D S.A. Murphy ICSA June 2008.
1 Dynamic Treatment Regimens S.A. Murphy PolMeth XXV July 10, 2008.
SMART Designs for Developing Adaptive Treatment Strategies S.A. Murphy K. Lynch, J. McKay, D. Oslin & T.Ten Have CPDD June, 2005.
Dynamic Treatment Regimes: Challenges in Data Analysis S.A. Murphy Survey Research Center January, 2009.
1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.
Screening Experiments for Dynamic Treatment Regimes S.A. Murphy At ENAR March, 2008.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan Florida: January, 2006.
SMART Experimental Designs for Developing Adaptive Treatment Strategies S.A. Murphy NIDA DESPR February, 2007.
Hypothesis Testing and Dynamic Treatment Regimes S.A. Murphy Schering-Plough Workshop May 2007 TexPoint fonts used in EMF. Read the TexPoint manual before.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan UNC: November, 2003.
1 A Confidence Interval for the Misclassification Rate S.A. Murphy & E.B. Laber.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan PSU, October, 2005 In Honor of Clifford C. Clogg.
Planning Survival Analysis Studies of Dynamic Treatment Regimes Z. Li & S.A. Murphy UNC October, 2009.
Statistical Issues in Developing Adaptive Treatment Strategies for Chronic Disorders S.A. Murphy Univ. of Michigan CDC/ATSDR: March, 2005.
SMART Experimental Designs for Developing Adaptive Treatment Strategies S.A. Murphy RWJ Clinical Scholars Program, UMich April, 2007.
Hypothesis Testing and Dynamic Treatment Regimes S.A. Murphy, L. Gunter & B. Chakraborty ENAR March 2007.
1 Meeting the Future in Managing Chronic Disorders: Individually Tailored Strategies S.A. Murphy Herbert E. Robbins Collegiate Professorship in Statistics.
1 SMART Designs for Developing Adaptive Treatment Strategies S.A. Murphy K. Lynch, J. McKay, D. Oslin & T.Ten Have UMichSpline February, 2006.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy ENAR March 2009.
A Finite Sample Upper Bound on the Generalization Error for Q-Learning S.A. Murphy Univ. of Michigan CALD: February, 2005.
Methodology for Adaptive Treatment Strategies R21 DA S.A. Murphy For MCATS Oct. 8, 2009.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan ACSIR, July, 2003.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy Psychiatric Biostatistics Symposium May 2009.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan February, 2004.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan Yale: November, 2005.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan April, 2006.
Exploratory Analyses Aimed at Generating Proposals for Individualizing and Adapting Treatment S.A. Murphy BPRU, Hopkins September 22, 2009.
SMART Experimental Designs for Developing Adaptive Treatment Strategies S.A. Murphy ISCTM, 2007.
1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.
Experiments and Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan Chicago: May, 2005.
Susan Murphy, PI University of Michigan Acknowledgements: MCAT network and NIH The Goal To facilitate methodological collaborations necessary for producing.
1 Dynamic Treatment Regimes: Interventions for Chronic Conditions (such as Poverty or Criminality?) S.A. Murphy Univ. of Michigan In Honor of Clifford.
SMART Designs for Developing Dynamic Treatment Regimes S.A. Murphy Symposium on Causal Inference Johns Hopkins, January, 2006.
Experiments and Dynamic Treatment Regimes S.A. Murphy At NIAID, BRB December, 2007.
1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.
Adaptive Treatment Strategies S.A. Murphy CCNIA Proposal Meeting 2008.
Adaptive Treatment Strategies S.A. Murphy Workshop on Adaptive Treatment Strategies Convergence, 2008.
Practical Application of Adaptive Treatment Strategies in Trial Design and Analysis S.A. Murphy Center for Clinical Trials Network Classroom Series April.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan January, 2006.
Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009.
1 Variable Selection for Tailoring Treatment S.A. Murphy, L. Gunter & J. Zhu May 29, 2008.
Hypothesis Testing and Adaptive Treatment Strategies S.A. Murphy SCT May 2007.
Adaptive Treatment Design and Analysis S.A. Murphy TRC, UPenn April, 2007.
1 Meeting the Future in Managing Chronic Disorders: Individually Tailored Strategies S.A. Murphy Univ. of Michigan In Honor of Clifford C. Clogg.
Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy UAlberta, 09/28/12 TexPoint fonts used in EMF. Read the TexPoint.
SMART Case Studies Module 3—Day 1 Getting SMART About Developing Individualized Adaptive Health Interventions Methods Work, Chicago, Illinois, June
Sequential, Multiple Assignment, Randomized Trials and Treatment Policies S.A. Murphy MUCMD, 08/10/12 TexPoint fonts used in EMF. Read the TexPoint manual.
Sequential, Multiple Assignment, Randomized Trials Module 2—Day 1 Getting SMART About Developing Individualized Adaptive Health Interventions Methods Work,
1 SMART Designs for Developing Adaptive Treatment Strategies S.A. Murphy K. Lynch, J. McKay, D. Oslin & T.Ten Have NDRI April, 2006.
Motivation Using SMART research designs to improve individualized treatments Alena Scott 1, Janet Levy 3, and Susan Murphy 1,2 Institute for Social Research.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy NIDA Meeting on Treatment and Recovery Processes January, 2004.
Designing An Adaptive Treatment Susan A. Murphy Univ. of Michigan Joint with Linda Collins & Karen Bierman Pennsylvania State Univ.
SMART Trials for Developing Adaptive Treatment Strategies S.A. Murphy Workshop on Adaptive Treatment Designs NCDEU, 2006.
Presentation transcript:

1 Possible Roles for Reinforcement Learning in Clinical Research S.A. Murphy November 14, 2007

2 Outline Goal: Improving Clinical Decision Support Systems Using Data –Clinical Decision Support Systems –Critical Decisions –Types of Data –Challenges Incomplete, primitive, mechanistic models Measures of Confidence –Clinical Trials

3

4

5 Questions Patient Evaluation Screen with MSE

6

7 Outline –Clinical Decision Support Systems –Critical Decisions –Types of Data –Challenges Incomplete mechanistic models Measures of Confidence –Clinical Trials

8 Critical Decisions Which treatments should be offered first? How long should we wait for these treatments to work? How long should we wait before offering a transition to a maintenance stage? Which treatments should be offered next?

9 Critical Decisions All of these questions relate to the formulation of a policy. Actions include medications, behavioral therapies, delivery mechanisms, monitoring Observations include biological measures, family history, severity, side effects, functionality, symptoms Rewards include functionality, side effects, symptoms

10 Outline –Clinical Decision Support Systems –Critical Decisions –Types of Data –Challenges Incomplete mechanistic models Measures of Confidence –Clinical Trials

11 Types of Data Large Observational Data Sets –Actions are not manipulated by scientist Clinical Trial Data –Actions are manipulated by scientist Bench research on cells/animals/humans

12 Clinical Trial Data Sets Experimental trial data collected for research purposes –Scientists decide proactively which data to collect and how to collect this data –Use scientific knowledge to enhance the quality of the proxies for observation, reward –Actions are manipulated (randomized) by scientist –Short Horizon (less than 5) –Hundreds of subjects.

13 Observational Data Sets Observational data collected for research purposes –Scientists decide proactively which data to collect and how to collect this data –Use scientific knowledge to enhance the quality of the proxies for observation, action, reward –Actions are not manipulated by scientist –Moderate Horizon –Hundreds to thousands of subjects.

14 Observational Data Sets Clinical databases or registries– (an example in the US would be the VA registries) –Data was not collected for research purposes –Use gross proxies to define observation, action, reward –Moderate to Long Horizon –Thousands to Millions of subjects

15 Outline –Clinical Decision Support Systems –Critical Decisions –Types of Data –Challenges Incomplete mechanistic models Measures of Confidence –Clinical Trials

16 Availability of Mechanistic Models In many areas of RL, scientists can use mechanistic theory, e.g., physical laws, to model or simulate the interrelationships between observations and how the actions might impact the observations. Scientists know many (the most important) of the causes of the observations and know a model for how the observations relate to one another.

17 Incomplete Mechanistic Models in Medical Sciences Scientists who want to use data on individuals to construct policies must confront the fact that non-causal “associations” occur due to the unknown causes of the observations.

18 Conceptual Structure in the Medical Sciences (observational data)

19 Unknown, Unobserved Causes (Incomplete Mechanistic Models)

20 Unknown, Unobserved Causes (Incomplete Mechanistic Models) Problem: Non-causal associations between treatment (here counseling) and rewards are likely. Solutions: –Collect clinical trial data in which treatments are randomized. This breaks the non-causal associations yet permits causal associations. –Prior to applying methods to observational data proactively brainstorm with domain experts to ascertain and measure the main determinants of treatment selection. Then take advantage of causal inference methods designed to minimize assumptions on the data

21 Unknown, Unobserved Causes (Incomplete Mechanistic Models)

22 Unknown, Unobserved Causes (Incomplete Mechanistic Models)

23 The problem: Even when treatments are randomized, non- causal associations occur in the data. Solutions: – Recognize that parts of the Q-function/transition probabilities can not be informed by domain expertise as these parts reflect non- causal associations –Or use methods for constructing policies that “average” over the non-causal associations between action and reward. I think that the importance of this second causal inference problem depends on the kind of data and how you use it. Unknown, Unobserved Causes (Incomplete Mechanistic Models)

24 Measures of Confidence –Measures of confidence are essential Noisy data Need to know when any one of a subset of actions will yield the best rewards –that is, when there is no or little evidence otherwise. It is important to minimize the number of observations that must be collected in the clinical setting

25 Measures of Confidence We would like measures of confidence for the following: –To compare the value of two estimated policies (both estimated using the training data). –To assess if there is sufficient evidence that a particular observation (e.g. output of a biological test) should be part of the policy. –To assess if there is sufficient evidence that a subset of the actions lead to better rewards for a given observation than the remaining actions.

26 Measures of Confidence I must both learn the policy and provide an evaluation of the policy using one data set. The data set is small

27 Measures of Confidence Traditional methods for constructing measures of conference require differentiability (if frequentist properties are desired). Q-functions are constructed via non- differentiable operations (e.g. maximization). The value of a policy is a non-differentiable function of the policy.

28 Outline –Clinical Decision Support Systems –Critical Decisions –Types of Data –Challenges Causal ::: Unknown, unobserved causes Measures of Confidence –Clinical Trials

29 Clinical Trials Data from the --short horizon– clinical trials make excellent test beds for combinations of supervised/unsupervised and reinforcement learning methods. –Developing methods for variable selection in decision making (in addition to variable selection for prediction) –Model selection when goal is learning good policies. –Confidence intervals for the difference in value between two policies. –Feature Construction

30 ExTENd Ongoing study at U. Pennsylvania (D. Oslin) Goal is to learn how best to help alcohol dependent individuals reduce alcohol consumption.

31 Oslin ExTENd Late Trigger for Nonresponse 8 wks Response TDM + Naltrexone CBI Random assignment: CBI +Naltrexone Nonresponse Early Trigger for Nonresponse Random assignment: Naltrexone 8 wks Response Random assignment: CBI +Naltrexone CBI TDM + Naltrexone Naltrexone Nonresponse

32 Adaptive Treatment for ADHD Ongoing study at the State U. of NY at Buffalo (B. Pelham) Goal is to learn how best to help children with ADHD improve functioning at home and school.

33 ADHD Study B. Begin low dose medication 8 weeks Assess- Adequate response? B1. Continue, reassess monthly; randomize if deteriorate B2. Increase dose of medication with monthly changes as needed Random assignment: B3. Add behavioral treatment; medication dose remains stable but intensity of bemod may increase with adaptive modifications based on impairment No A. Begin low-intensity behavior modification 8 weeks Assess- Adequate response? A1. Continue, reassess monthly; randomize if deteriorate A2. Add medication; bemod remains stable but medication dose may vary Random assignment: A3. Increase intensity of bemod with adaptive modifi- cations based on impairment Yes No Random assignment:

34 Studies under review H. Jones study of drug-addicted pregnant women (goal is to reduce cocaine/heroin use during pregnancy and thereby improve neonatal outcomes) J. Sacks study of parolees with substance abuse disorders (goal is reduce recidivism and substance use)

35 Jones’ Study for Drug-Addicted Pregnant Women rRBT 2 wks Response rRBT tRBT Random assignment: rRBT Nonresponse tRBT Random assignment: aRBT 2 wks Response Random assignment: eRBT tRBT rRBT Nonresponse

36 Sack’s Study of Adaptive Transitional Case Management Standard Services Standard TCM Random assignment: 4 wks Response Standard TCM Augmented TCM Standard TCM Nonresponse

37 Discussion Methods for online updating the policy as data accumulates. Methods for producing composite rewards. –High quality elicitation of functionality Human-Computer interface Improving tactics

38 This seminar can be found at: seminars/UAlberta07.ppt me with questions or if you would like a copy:

39 Unknown, Unobserved Causes Problem: We recruit students via flyers posted in dormitories. Associations between observations and rewards are highly likely to be (due to the unknown causes) non- representative. Solution: Sample a representative group of college students.

40 STAR*D This trial is over and the data is being analyzed (PI: J. Rush). One goal of the trial is construct good treatment sequences for patients suffering from treatment resistant depression.

41