# 11 Confidence Intervals, Q-Learning and Dynamic Treatment Regimes S.A. Murphy Time for Causality – Bristol April, 2012 TexPoint fonts used in EMF. Read.

## Presentation on theme: "11 Confidence Intervals, Q-Learning and Dynamic Treatment Regimes S.A. Murphy Time for Causality – Bristol April, 2012 TexPoint fonts used in EMF. Read."— Presentation transcript:

11 Confidence Intervals, Q-Learning and Dynamic Treatment Regimes S.A. Murphy Time for Causality – Bristol April, 2012 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA

2 Outline Dynamic Treatment Regimes Example Experimental Designs & Challenges Q-Learning & Challenges

3 Dynamic treatment regimes are individually tailored treatments, with treatment type and dosage changing according to patient outcomes. Operationalize clinical practice. k Stages for one individual Observation available at j th stage Action at j th stage (usually a treatment)

4 Example of a Dynamic Treatment Regime Adaptive Drug Court Program for drug abusing offenders. Goal is to minimize recidivism and drug use. Marlowe et al. (2008, 2011)

6 Goal : Construct decision rules that input information available at each stage and output a recommended decision; these decision rules should lead to a maximal mean Y where Y is a function of The dynamic treatment regime is a sequence of two decision rules: k=2 Stages

7 Outline Dynamic Treatment Regimes Example Experimental Designs & Challenges Q-Learning & Challenges

8 Data for Constructing the Dynamic Treatment Regime: Subject data from sequential, multiple assignment, randomized trials. At each stage subjects are randomized among alternative options. A j is a randomized action with known randomization probability. binary actions with P[A j =1]=P[A j =-1]=.5

9 Pelham’s ADHD Study B. Begin low dose medication 8 weeks Assess- Adequate response? B1. Continue, reassess monthly; randomize if deteriorate B2. Increase dose of medication with monthly changes as needed Random assignment: B3. Add behavioral treatment; medication dose remains stable but intensity of bemod may increase with adaptive modifications based on impairment No A. Begin low-intensity behavior modification 8 weeks Assess- Adequate response? A1. Continue, reassess monthly; randomize if deteriorate A2. Add medication; bemod remains stable but medication dose may vary Random assignment: A3. Increase intensity of bemod with adaptive modifi- cations based on impairment Yes No Random assignment:

10 Oslin’s ExTENd Study Late Trigger for Nonresponse 8 wks Response TDM + Naltrexone CBI Random assignment: CBI +Naltrexone Nonresponse Early Trigger for Nonresponse Random assignment: Naltrexone 8 wks Response Random assignment: CBI +Naltrexone CBI TDM + Naltrexone Naltrexone Nonresponse

11 Kasari Autism Study B. JAE + AAC 12 weeks Assess- Adequate response? B!. JAE+AAC B2. JAE +AAC ++ No A. JAE+ EMT 12 weeks Assess- Adequate response? JAE+EMT JAE+EMT+++ Random assignment: JAE+AAC Yes No Random assignment: Yes

Jones’ Study for Drug-Addicted Pregnant Women rRBT 2 wks Response rRBT tRBT Random assignment: rRBT Nonresponse tRBT Random assignment: aRBT 2 wks Response Random assignment: eRBT tRBT rRBT Nonresponse

13 Challenges Goal of trial may differ –Experimental designs for settings involving new drugs (cancer, ulcerative colitis) –Experimental designs for settings involving already approved drugs/treatments Choice of Primary Hypothesis/Analysis & Secondary Hypothesis/Analysis Longitudinal/Survival Primary Outcomes Sample Size Formulae

14 Outline Dynamic Treatment Regimes Example Experimental Designs & Challenges Q-Learning & Challenges

15 Secondary Analysis: Q-Learning Q-Learning (Watkins, 1989; Ernst et al., 2005; Murphy, 2005) (a popular method from computer science) Optimal nested structural mean model (Murphy, 2003; Robins, 2004) The first method is an inefficient version of the second method when (a) linear models are used, (b) each stages’ covariates include the prior stages’ covariates and (c) the treatment variables are coded to have conditional mean zero.

16 Goal : Construct for which is maximal. k=2 Stages is called the value and the maximal value is denoted by

17 Idea behind Q-Learning

18 There is a regression for each stage. Simple Version of Q-Learning – Stage 2 regression: Regress Y on to obtain Stage 1 regression: Regress on to obtain

19 for subjects entering stage 2: is a predictor of is the predicted end of stage 2 response when the stage 2 treatment is equal to the “best” treatment. is the dependent variable in the stage 1 regression for patients moving to stage 2

20 A Simple Version of Q-Learning – Stage 2 regression, (using Y as dependent variable) yields Arg-max over a 2 yields

21 A Simple Version of Q-Learning – Stage 1 regression, (using as dependent variable) yields Arg-max over a 1 yields

22 Confidence Intervals

23 Non-regularity Limiting distribution of is non-regular (Robins, 2004) Problematic area in parameter space is around for which Standard asymptotic approaches invalid without modification (see Shao, 1994; Andrews, 2000).

24 Non-regularity – Problematic term in is where This term is well-behaved if is bounded away from zero. We want to form an adaptive confidence interval.

25 Idea from Econometrics In nonregular settings in Econometrics there are a fixed number of easily identified “bad” parameter values at which you have nonregular behavior of the estimator Use a pretest (e.g. an hypothesis test) to test if you are near a “bad” parameter value; if the pretest rejects, use standard critical values to form confidence interval; if the pretest accepts, use the maximal critical value over all possible local alternatives. (Andrews & Soares, 2007; Andrews & Guggenberger, 2009)

Construct smooth upper and lower bounds so that for all n The upper/lower bounds use a pretest: Embed in the formula for a pretest of based on 26 Our Approach

27 Non-regularity The upper bound adds to :

28 Confidence Interval Let and be the bootstrap analogues of and is the probability with respect to the bootstrap weights. is the quantile of ; is the quantile of. Theorem: Assume moment conditions, invertible var- covariance matrices and, then

29 Adaptation Theorem: Assuming finite moment conditions and invertible var-covariance matrices are invertible,, then each converge, in distribution, to the same limiting distribution (the last two in probability).

30 Some Competing Methods Soft-thresholding (ST) Chakraborty et al. (2009) Centered percentile bootstrap (CPB) Plug-in pretesting estimator (PPE) Generative Models Nonregular (NR): Nearly Nonregular (NNR): Regular (R): n=150, 1000 Monte Carlo Reps, 1000 Bootstrap Samples Empirical Study

31 Example –Two Stages, two treatments per stage TypeCPBSTPPEACI NR.93*.95 (.34).93*.99(.50) R.94(.45).92*.93*.95(.48) NR.93*.76*.90*.96(.49) NNR.93*.76*.90*.97(.49) Size(width)

32 Example –Two Stages, three treatments in stage 2 TypeCPBPPEACI NR.93* 1.0(.72) R.94(.56).92*.96(.63) NR.89*.86*.97(.67) NNR.90*.86*.97(.67) Size(width)

33 Pelham’s ADHD Study B. Begin low dose medication 8 weeks Assess- Adequate response? B1. Continue, reassess monthly; randomize if deteriorate B2. Increase dose of medication with monthly changes as needed Random assignment: B3. Add behavioral treatment; medication dose remains stable but intensity of bemod may increase with adaptive modifications based on impairment No A. Begin low-intensity behavior modification 8 weeks Assess- Adequate response? A1. Continue, reassess monthly; randomize if deteriorate A2. Add medication; bemod remains stable but medication dose may vary Random assignment: A3. Increase intensity of bemod with adaptive modifi- cations based on impairment Yes No Random assignment:

34 (X 1, A 1, R 1, X 2, A 2, Y) –Y = end of year school performance –R 1 =1 if responder; =0 if non-responder –X 2 includes the month of non-response, M 2, and a measure of adherence in stage 1 (S 2 ) –S 2 =1 if adherent in stage 1; =0, if non-adherent –X 1 includes baseline school performance, Y 0, whether medicated in prior year (S 1 ), ODD (O 1 ) –S 1 =1 if medicated in prior year; =0, otherwise. ADHD Example

35 Stage 2 regression for Y: Stage 1 outcome: ADHD Example

36 Stage 1 regression for Interesting stage 1 contrast: is it important to know whether the child was medicated in the prior year (S 1 =1) to determine the best initial treatment in the sequence? ADHD Example

37 Stage 1 treatment effect when S 1 =1: Stage 1 treatment effect when S 1 =0: ADHD Example 90% ACI (-0.48, 0.16) (-0.05, 0.39)

38 IF medication was not used in the prior year THEN begin with BMOD; ELSE select either BMOD or MED. IF the child is nonresponsive and was non- adherent, THEN augment present treatment; ELSE IF the child is nonresponsive and was adherent, THEN select intensification of current treatment. Dynamic Treatment Regime Proposal

39 There are multiple ways to form ; what are the pros and cons? Improve adaptation by a pretest of High dimensional data; investigators want to collect real time data Feature construction & Feature selection Many stages or infinite horizon Challenges

40 This seminar can be found at: http://www.stat.lsa.umich.edu/~samurphy/ seminars/Bristol04.10.12.ppt Email Eric Laber or me for questions: laber@stat.ncsu.edu or samurphy@umich.edu.edu

Download ppt "11 Confidence Intervals, Q-Learning and Dynamic Treatment Regimes S.A. Murphy Time for Causality – Bristol April, 2012 TexPoint fonts used in EMF. Read."

Similar presentations