Variable Selection for Tailoring Treatment

Slides:



Advertisements
Similar presentations
A Spreadsheet for Analysis of Straightforward Controlled Trials
Advertisements

ASSESSING RESPONSIVENESS OF HEALTH MEASUREMENTS. Link validity & reliability testing to purpose of the measure Some examples: In a diagnostic instrument,
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Experimental Design, Response Surface Analysis, and Optimization
Treatment Effect Heterogeneity & Dynamic Treatment Regime Development S.A. Murphy.
Model Assessment, Selection and Averaging
Model assessment and cross-validation - overview
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
Inference for Clinical Decision Making Policies D. Lizotte, L. Gunter, S. Murphy INFORMS October 2008.
Using Clinical Trial Data to Construct Policies for Guiding Clinical Decision Making S. Murphy & J. Pineau American Control Conference Special Session.
1 Dynamic Treatment Regimes Advances and Open Problems S.A. Murphy ICSPRAR-2008.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy LSU ---- Geaux Tigers! April 2009.
Substance Abuse, Multi-Stage Decisions, Generalization Error How are they connected?! S.A. Murphy Univ. of Michigan CMU, Nov., 2004.
An Experimental Paradigm for Developing Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan March, 2004.
Constructing Dynamic Treatment Regimes & STAR*D S.A. Murphy ICSA June 2008.
Sizing a Trial for the Development of Adaptive Treatment Strategies Alena I. Oetting The Society for Clinical Trials, 29th Annual Meeting St. Louis, MO.
Hypothesis Testing and Dynamic Treatment Regimes S.A. Murphy Schering-Plough Workshop May 2007 TexPoint fonts used in EMF. Read the TexPoint manual before.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan UNC: November, 2003.
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Statistical Issues in Developing Adaptive Treatment Strategies for Chronic Disorders S.A. Murphy Univ. of Michigan CDC/ATSDR: March, 2005.
Hypothesis Testing and Dynamic Treatment Regimes S.A. Murphy, L. Gunter & B. Chakraborty ENAR March 2007.
1 SMART Designs for Developing Adaptive Treatment Strategies S.A. Murphy K. Lynch, J. McKay, D. Oslin & T.Ten Have UMichSpline February, 2006.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy ENAR March 2009.
Evaluating Hypotheses
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan ACSIR, July, 2003.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy Psychiatric Biostatistics Symposium May 2009.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy Univ. of Michigan February, 2004.
Methods for Estimating the Decision Rules in Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan IBC/ASC: July, 2004.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan April, 2006.
SMART Experimental Designs for Developing Adaptive Treatment Strategies S.A. Murphy ISCTM, 2007.
Variable Selection for Optimal Decision Making Susan Murphy & Lacey Gunter University of Michigan Statistics Department Artificial Intelligence Seminar.
1 Machine/Reinforcement Learning in Clinical Research S.A. Murphy May 19, 2008.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan January, 2006.
Lasso regression. The Goals of Model Selection Model selection: Choosing the approximate best model by estimating the performance of various models Goals.
Variable Selection for Optimal Decision Making Lacey Gunter University of Michigan Statistics Department Michigan Student Symposium for Interdisciplinary.
1 Variable Selection for Tailoring Treatment S.A. Murphy, L. Gunter & J. Zhu May 29, 2008.
Today Concepts underlying inferential statistics
Hypothesis Testing and Adaptive Treatment Strategies S.A. Murphy SCT May 2007.
Correlational Designs
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
CORRELATIO NAL RESEARCH METHOD. The researcher wanted to determine if there is a significant relationship between the nursing personnel characteristics.
McGraw-Hill © 2006 The McGraw-Hill Companies, Inc. All rights reserved. Correlational Research Chapter Fifteen.
Objectives of Multiple Regression
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Simple Linear Regression
Quality Measures for Rehabilitation: Policy, Provider and Patient Perspectives Measuring Clinical Change: Quality Indicators ACRM-ASNR Pre-Conference Institute.
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
Correlational Research Chapter Fifteen Bring Schraw et al.
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
Brian Macpherson Ph.D, Professor of Statistics, University of Manitoba Tom Bingham Statistician, The Boeing Company.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
Empirical Efficiency Maximization: Locally Efficient Covariate Adjustment in Randomized Experiments Daniel B. Rubin Joint work with Mark J. van der Laan.
Can Mental Health Services Reduce Juvenile Justice Involvement? Non-Experimental Evidence E. Michael Foster School of Public Health, University of North.
Relational Discord at Conclusion of Treatment Predicts Future Substance Use for Partnered Patients Wayne H. Denton, MD, PhD; Paul A. Nakonezny, PhD; Bryon.
CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)
Bayesian Approach For Clinical Trials Mark Chang, Ph.D. Executive Director Biostatistics and Data management AMAG Pharmaceuticals Inc.
1 SMART Designs for Developing Adaptive Treatment Strategies S.A. Murphy K. Lynch, J. McKay, D. Oslin & T.Ten Have NDRI April, 2006.
Assessing Responsiveness of Health Measurements Ian McDowell, INTA, Santiago, March 20, 2001.
An Experimental Paradigm for Developing Adaptive Treatment Strategies S.A. Murphy NIDA Meeting on Treatment and Recovery Processes January, 2004.
Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.
Abstract VARIABLE SELECTION FOR DECISION MAKING IN MENTAL HEALTH Lacey Gunter 1,2, Ji Zhu 1, and Susan Murphy 1,2 Departments of Statistics 1 and Institute.
Computacion Inteligente Least-Square Methods for System Identification.
Zhipeng (Patrick) Luo December 6th, 2016
Boosting and Additive Trees (2)
Linear Model Selection and regularization
Cross-validation for the selection of statistical models
CRISP: Consensus Regularized Selection based Prediction
Presentation transcript:

Variable Selection for Tailoring Treatment L. Gunter, J. Zhu & S.A. Murphy ASA, Nov 11, 2008 VARIABLE SELECTION FOR TAILORING TREATMENT. Susan A. Murphy1, Lacey Gunter1, Ji Zhu1, 1University of Michigan, Ann Arbor, Michigan United States In order to tailor treatment to individuals we should collect pretreatment variables that are useful in deciding which treatment to provide to whom. To decide which variables are most likely to be in future, we might use a combination of theory and statistical variable selection methods with presently available data. However most current variable selection methods are focused on finding risk and protective variables. While these variables may be useful in predicting whether an individual needs treatment, they do not necessarily tell us which treatment to provide. We will discuss the necessary characteristics of variables that are useful for tailoring treatment. Our method searches over through pretreatment variables in order to find those variables that satisfy these characteristics. We apply this method to ascertain which variables collected in a clinical trial of two depression treatments might be useful in tailoring the type of treatment to the individual.

Outline Motivation Need for Variable Selection Characteristics of a Tailoring Variable A New Technique for Finding Tailoring Variables Comparisons Discussion

Motivating Example

50+ baseline covariates, both categorical and continuous Simple Example Nefazodone - CBASP Trial Nefazodone Randomization Nefazodone + Cognitive Behavioral Analysis System of Psychotherapy (CBASP) From Wikipedia: Nefazodone is not considered to be an SSRI, MAOI or tricyclic antidepressant. It is not chemically related to either bupropion/amfebutamone, or venlafaxine. Nefazodone hydrochloride (trade name Serzone) is an antidepressant drug marketed by Bristol-Myers Squibb. Its sale was discontinued in 2003 in some countries, due to the small possibility of hepatic (liver) injury, which could lead to the need for a liver transplant, or even death. The incidence of severe liver damage is approximately one in 250,000 to 300,000 patient-years.[1] On May 20, 2004, Bristol-Myers Squibb discontinued the sale of Serzone in the United States. Several generic formulations of nefazodone are still available.[2][3] 50+ baseline covariates, both categorical and continuous

Which variables in X are important for tailoring the treatment? Simple Example Nefazodone - CBASP Trial Which variables in X are important for tailoring the treatment? X patient’s medical history, severity of depression, current symptoms, etc. A Nefazodone OR Nefazodone + CBASP R depression symptoms post treatment R is inverse coded HAMD (coded so high is good)

Optimization We want to select the treatment that “optimizes” R The optimal choice of treatment may depend on X

Optimization The optimal treatment(s) is given by The value of d is

Need for Variable Selection In clinical trials many pretreatment variables are collected to improve understanding and inform future treatment Yet in clinical practice, only the most informative variables for tailoring treatment can be collected. A combination of theory, clinical experience and statistical variable selection methods can be used to determine which variables are important. Cost in monetary terms, burden to clinical staff and patient requires that only a few variables are used in tailoring.

Current Statistical Variable Selection Methods Current statistical variable selection methods focus on finding good predictors of the response Also need variables to help determine which treatment is best for which types of patients, e.g. tailoring variables Experts typically have knowledge on which variables are good predictors, but intuition about tailoring variables is often lacking Tailoring variable==prescriptive variables

What is a Tailoring Variable? Tailoring variables help us determine which treatment is best Tailoring variables qualitatively interact with the treatment; different values of the tailoring variable result in different best treatments. No Interaction Non-qualitative Interaction Qualitative interaction High R is good

Qualitative Interactions Qualitative interactions have been discussed by many within stat literature (e.g. Byar & Corle,1977; Peto, 1982; Shuster & Van Eys, 1983; Gail & Simon, 1985; Yusuf et al., 1991; Senn, 2001; Lagakos, 2001) Many express skepticism concerning validity of qualitative interactions when found in studies Our approach for finding qualitative interactions should be robust to finding spurious results Skepticism due to Rarity of qualitative interactions especially in drug trials with very homogenous sample, (a least in a one time point setting, may be less common in multiple time point settings when gathering intermediate outcomes); indeed sample does not represent a well-defined population. Data fishing without controlling family-wise error rate Tendency of journals to only publish significant results

Qualitative Interactions We focus on two important factors The magnitude of the interaction between the variable and the treatment indicator The proportion of patients for whom the best choice of treatment changes given knowledge of the variable big interaction small interaction big interaction big proportion big proportion small proportion Green curves represent variable distribution

Ranking Score S Ranking Score: where S estimates the quantity described by Parmigiani (2002) as the value of information. These are linear regressions Green Ticks represent observations, yellow shaded area represents area S-score is estimating

Ranking Score S Higher S scores correspond to higher evidence of a qualitative interaction between X and A We use this ranking in a variable selection algorithm to select important tailoring variables. Avoid over-fitting in due to large number of X variables Consider variables jointly You can’t use the Ranking Score by itself as it treats each variable in isolation. X1 may only be a useful tailoring variable if tailoring variable X2 is not collected. Also there are sooooo many pretreatment variables that we will overfit the model. Overfitting the model means that we get a very good model fit on a particular data set but we can not replicate our result as much of our model is actual fit to noise specific to the data set we are using. This problem arises often when one has many covariates. We don’t want to say a variable is a potential tailoring variable if S is high just due to the noise in the data (not due to underlying structure). The following algorithm helps us avoid overfitting. In Latex code: \begin{eqnarray} U_j&=&\left(\frac{D_j - \min_{1\leq k\leq p} D_k} {\max_{1\leq k\leq p} D_k-\min_{1\leq k\leq p} D_k}\right)\left(\frac{P_j - \min_{1\leq k\leq p} P_k} {\max_{1\leq k\leq p} P_k - \min_{1\leq k\leq p} P_k}\right) \end{eqnarray}

Variable Selection Algorithm Select important predictors of R from (X, X*A) using Lasso -- Select tuning parameter using BIC Select all X*A variables with nonzero S. -- Use predictors from 1. to form linear regression estimator of to form S. For step 1 we used Lasso with penalty parameter chosen by BIC:We chose Bayesian Information Criterion to select the penalty parameter (Zou, Hastie and Tibshirani, 2007) because of its conservative nature to ensure only strong predictors enter the model. We used Bic over CV Lasso as CV Lasso selects too many variables; the estimated value or probably more precisely the estimated optimal policy seems to be very sensitive to the inclusion of spurious interactions. CV Lasso tends to include several spurious interactions, which when used in step 1 caused the U and S scores performance to suffer noticeably. But we still wanted to include interactions when selecting predictive variables, so we had to use a method in step 1 that would be far more conservative in its selection of interaction variables. For step 2 we included the variables selected in step 1 to decrease variability of estimates S is calculated one interaction variable at a time; we look for qualitative interactions individually using an approach which rates each variable in X based on its potential for a qualitative interaction with the action. (using linear models)

Lasso Lasso on (X, A, XA) (Tibshirani, 1996) Lasso minimization criterion: where Zi is the vector of predictors for patient i, λ is a penalty parameter Coefficient for A not penalized Value of λ chosen by Bayesian Information Criterion (BIC) (Zou, Hastie & Tibshirani, 2007)

Variable Selection Algorithm Rank order (X, X*A) variables selected in steps 1 & 2 using a weighted Lasso -- Weight is 1 if variable is not an interaction -- Otherwise weight for kth interaction is -- is a small positive number. -- Produces a combined ranking of the selected (X, X*A) variables (say p variables). In experimentation we found setting epsilon =# of interactions with non-zero S score (all variables which indicate different subjects should get different treatments) divided by sample size to work well. For step 3 we used a weighted Lasso to obtain a ranking over the variables in steps 1 and 2 to create the nested subsets. The weighting scheme gave main effects a weight of 1 and gave interactions a weight between 0 and 1 that was a non-increasing function of the ranking score S. We need step 3 because 2. just looks at interaction variables individually. It may be that once variable j is used, then we no longer need variable k. Step 3 forms a combined ranking that takes this into account. W = 1 if the variable is not an interaction W = 1 - (S)/[max(S)+ H/n] w = 1-\frac{S}{max(S)+\frac{H}{n}} For step 4 our criterion was the AGV criterion where in Latex code: \begin{equation*} AGV_k=\frac{(\hat{V}_k-\hat{V}_0)/k} {(\hat{V}_{m}-\hat{V}_0)/m}, \end{equation*} for $k = 1,...,m$, where $m = \argmax_k \hat{V}_k- \hat{V}_0$ and $\hat{V}_0$ is the estimated Value of the policy $\hat{\pi}^*_0 = \argmax_a

Variable Selection Algorithm Choose between variable subsets using a criterion that trades off maximal value of information and complexity. -- The ordering of the p variables creates p subsets of variables. Estimate the value of information for each of the p subsets -- Select the subset, k with largest criterion that trades off between the complexity and the observed mean response (unconditional mean response) of each of the models It is similar in idea to the adjusted R2 value. The model with j¤ = arg max_k V_k variables is akin to a saturated model, because the addition of more variables does not improve the Value of the model. Thus the denominator is the observed maximum gain in value, among the different variable subsets, divided by j¤, an estimate of the degrees of freedom used to achieve that gain in Value. The numerator then measures the gain in Value of the intermediate model, the model with k variables, divided by k, the estimated degrees of freedom needed to achieve that gain in Value.

Simulations Data simulated under wide variety of realistic decision making scenarios (with and without qualitative interactions) Used X from the CBASP study, generated new A and R Compared: New method: S with variable selection algorithm Standard method: BIC Lasso on (X, A, XA) 1000 simulated data sets: recorded percentage of time each variable’s interaction with treatment was selected for each method Data Generation: we randomly selected rows with replacement from the observation matrix from the Nefazodone CBASP trial data. We then generated new actions and new responses. For each generative model, we used main effect coefficients for X estimated in an analysis of the real data set. Interaction variables were randomly selected. The treatment, qualitative interaction and non-qualitative interaction coefficients were set using a variant of Cohen's D: ( in Latex code) \begin{equation} D = \frac{\beta \sqrt{Var(R|X,A)}}{\sqrt{Var(X_j)}} \end{equation} We maintained the definitions of `small' and `moderate' effect sizes suggested by Cohen as D = 0.2 and D=0.5 respectively.

Simulation Results Generative Model 0.5 -0.03 0.1 0.00 1.1 0.23 0.2 Ave # of Spurious Interactions Selected over BIC LASSO Ave % increase in Value over BIC LASSO* No Interactions 0.5 -0.03 Non-qualitative Interactions Only 0.1 0.00 Qualitative Interaction Only 1.1 0.23 Both Qualitative and Non-qualitative Interactions 0.2 0.39 interaction effect is always small as defined by cohen’s d beta=.2 *std(x_j)/residual std generative models 2,3,5,6 (2) Main e®ects of X, moderate treatment e®ect and no interactions with treatment (3) Main e®ects of X, moderate treatment e®ect, multiple medium to small non-qualitative interactions with treatment, no qualitative interaction with treatment (5) Main e®ects of X, small treatment e®ect, small qualitative interaction with a continuous variable, no non-qualitative interactions (6) Main e®ects of X, small treatment e®ect, multiple moderate to small non-qualitative interactions with treatment, small to moderate qualitative interaction with a binary variable and treatment When there are both qualitative and non-qualitative interactions bic lasso picks up the non-qualitative interactions. All "Ave % increase in E[R] over BIC LASSO" are significant except for the non-qualitative interactions only model 1)There are 1000 simulated data sets, each of size n=440 like the Nefazodone CBASP data set. The criterion for selecting a tailoring variable depends on the method. For Lasso it is the variables with non-zero coefficients for the chosen penalty parameter. For the new method S it is the variables in the subset chosen by the AGV criterion. A chosen interaction is deemed spurious if it was not in the generative model. I did not test for significance in Ave difference in spurious variables selected (I'm pretty sure they are probably all signififcant) * Over the total possible increase; 1000 data sets each of size 440

Simulation Results Pros: when the model contained qualitative interactions, the new method gave significant increases in expected response over BIC-Lasso Cons: the new method resulted in a slight increase in the number of spurious interactions over BIC-Lasso

Nefazodone - CBASP Trial Aim of the Nefazodone CBASP trial – to compare efficacy of three alternate treatments for major depressive disorder (MDD): Nefazodone, Cognitive behavioral-analysis system of psychotherapy (CBASP) Nefazodone + CBASP Which variables might help tailor the depression treatment to each patient?

Nefazodone - CBASP Trial For our analysis we used data from 440 patients with X 61 baseline variables A Nefazodone vs. Nefazodone + CBASP R Hamilton’s Rating Scale for Depression score, post treatment R=34-hamd high R is good

Method Application and Confidence Measures When applying new method to real data it is desirable to have a measure of reliability and to control family-wise error rate We used bootstrap sampling to assess reliability On each of 1000 bootstrap samples: Run variable selection method Record the interaction variables selected Calculate selection percentages over bootstrap samples

Error Rate Thresholds To help control family-wise error rate, compute the following inclusion thresholds for selection percentages: Repeat 100 times Permute interactions to remove effects from the data Run method on 1000 bootstrap samples of permuted data Calculate selection percentages over bootstrap samples Record largest selection percentage over the p interactions Threshold: (1-α)th percentile over 100 max selection percentages Select all interactions with selection percentage greater than threshold For threshold we permuted X*A with in the (X,A,X*A) data matrix Only interaction variables with selection percentages above the thresholds should be selected The latest update on the threshold simulations for new method S is: 26.5% of the simulations had a variable over the 70% threshold 20% of the simulations had a variable over the 80% threshold 10% of the simulations had a variable over the 90% threshold The latest update on the threshold simulations for BIC Lasso is: 25% of the simulations had a variable over the 70% threshold 10.5% of the simulations had a variable over the 90% threshold This is from 102 simulated runs.

Error Rate Thresholds When tested in simulations using new method, error rate threshold effectively controlled family-wise error rate This augmentation of bootstrap sampling and thresholding was also tested on BIC Lasso and effectively controlled family-wise error rate in simulations

Nefazodone - CBASP Trial OCD ALC ALC OCD Blue solid line is 90% threshold, green dotted line 80% threshhold (e.g. 80% of time no spurious variable has this high of a % selected) Selection percentages shown are actually the adjusted selection percentages: the absolute value of the number of times an interaction is selected with a positive coefficient minus the number of times an interaction is selected with a negative coefficient BIC Lasso Selected Obsessive Compulsive Disorder when using 80% threshold New Method selected past Alcohol Abuse when using 80% threshold Regression Analyses illustrate the potential qualitative interaction with the variable “past alcohol dependence” But not with the variable that appears second highest in the standard method “past OCD”

Interaction Plot Green bars show density of subjects. Here R is coded so High is good. (R=34-Hamd) It is not clear why the people with alcohol abuse/dependence history would do better on med than people without this history (red line slopes up to right). This may be due to how people were enrolled in trial (of the people with alcohol abuse/dependence history only the really motivated enrolled in trial whereas of the people with no history of alcohol abuse/dependence both motivated and unmotivated people enrolled) Also note that from the error bars, the response to medication is about the same for people with and without a prior alcohol abuse/dependence history. (blue line is flat)

Interaction Plot Green bars show density of subjects. Here R (34-hamd) is coded so High is good.

Discussion This method provides a list of potential tailoring variables while reducing the number of false leads. Replication is required to confirm the usefulness of a tailoring variable. Our long term goal is to generalize this method so that it can be used with data from Sequential, Multiple Assignment, Randomized Trials as illustrated by STAR*D.

Email Susan Murphy at samurphy@umich.edu for more information! This seminar can be found at http://www.stat.lsa.umich.edu/~samurphy/seminars/ ASA11.11.08.ppt Support: NIDA P50 DA10075, NIMH R01 MH080015 and NSF DMS 0505432 Thanks for technical and data support go to A. John Rush, MD, Betty Jo Hay Chair in Mental Health at the University of Texas Southwestern Medical Center, Dallas Martin Keller and the investigators who conducted the trial `A Comparison of Nefazodone, the Cognitive Behavioral-analysis System of Psychotherapy, and Their Combination for Treatment of Chronic Depression’

Interaction Plot Green bars show density of subjects. Here R is coded so High is good. (R=34-Hamd) It is not clear why the people with alcohol abuse/dependence history would do better on med than people without this history. This may be due to how people were enrolled in trial (of the people with alcohol abuse/dependence history only the really motivated enrolled in trial whereas of the people with no history of alcohol abuse/dependence both motivated and unmotivated people enrolled) Also note that from the error bars, the response to medication is about the same for people with and without a prior alcohol abuse/dependence history.

Interaction Plot Green bars show density of subjects. Here R (34-hamd) is coded so High is good.

Lasso Weighting Scheme Lasso minimization criterion equivalent to: so smaller wj means greater importance Weights where vj = 1 for predictive variables vj = for prescriptive variables epsilon=H/n

AGV Criterion For a subset of k variables, X{k} the Average Gain in Value ( AGV) criterion is where The criterion selects the subset of variables with the maximum proportion of increase in E[R] per variable a* = max(a)E[R|A=a]

Simulation Results (S-score) × Qualitative Interaction  Spurious Interaction Plots are of the selection percentages for the interaction variables across the 1000 samples Top Model: main effects of X, small treatment effect, multiple moderate to small non-qualitative interactions with treatment, small to moderate qualitative interaction with a binary variable and treatment Bottom Model: main effects of X, small treatment effect, multiple moderate to small non-qualitative interactions with treatment, small to moderate qualitative interaction with a continuous variable and treatment Selection percentages shown are actually the adjusted selection percentages: the absolute value of the number of times an interaction is selected with a positive coefficient minus the number of times an interaction is selected with a negative coefficient × Qualitative Interaction  Non-qualitative Interaction  Spurious Interaction