# What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007.

## Presentation on theme: "What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007."— Presentation transcript:

What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007

2 What We Have Learned Little. Generic. In linear regression: y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 In whatever other regression, the right-hand side is β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 For a binary outcome, we often use logistic regression. For example, the log-odds of cancer risk log(O ij ) = β 0 + β 1 ×sex + β 2 ×smoking + β 3 ×sex×smoking “main effect”“interaction effect”

3 Interaction Introduced by R. A. Fisher to generalize the concept “epistasis” in genetics. The concept is ubiquitous. The word sounds easy to understand, and is charismatic in some circles. Ambiguous without model context. Hard to interpret and translate to reality for some models, such as logistic regression.

4 Epistasis Example: Genotype BB masks the effect of gene A. It is a very special type of interaction. Such a phenomenon can be seen in other contexts, e.g. gene- environment interaction. bbBbBB aa Aa AA Exposure NoYes aa Aa AA

5 “No Interaction” ≠ Independence Interaction is about the joint effect of input variables on an outcome, or how the effect change as the values change at the input variables. Independence is about the statistical relationship between input variables, irrespective of the outcome or the effect on the outcome. Using “independent effect” to describe “no interaction” may be confusing.

6 Interaction = Effect Modification Effect modification: The effect of one variable on the outcome is modified depending on the values of other variables. It depends on how “effect” is measured and on what scale. ― Kenneth Rothman, Sander Greenland For a binary outcome, “effect” can be measured as –risk difference –risk ratio –odds ratio

7 Measuring Effect: Risk Difference If gender doesn’t modify the “effect” of smoking, then R 01 – R 00 = R 11 – R 10 R 11 – R 00 = (R 10 – R 00 ) + (R 01 – R 00 ) RR 11 – 1 = (RR 10 – 1) + (RR 01 – 1) additive decomposition of risk: R ij = a i + b j Smoking No (0)Yes (1)Marginal Male (0)R 00 R 01 R 0 Female (1)R 10 R 11 R1R1 MarginalR0 R1 “Effect” of smoking: R 01 – R 00 (in males) R 11 – R 10 (in females) Equivalent = R 1 – R 0 (!) = (R 1 – R 0 ) + (R 1 – R 0 ), where RR ij = R ij / R 00

8 Measuring Effect: Risk Ratio If gender doesn’t modify the “effect” of smoking, then R 01 / R 00 = R 11 / R 10 RR 11 = RR 10 × RR 01 RR 11 = (R 1 / R 0 ) × (R1 / R 0 ) multiplicative decomposition of risk: R ij = c i × d j Smoking No (0)Yes (1)Marginal Male (0)R 00 R 01 R 0 Female (1)R 10 R 11 R1R1 MarginalR0 R1 “Effect” of smoking: R 01 / R 00 (in males) R 11 / R 10 (in females) Equivalent = R 1 / R 0 (!)

9 Measuring Effect: Odds Ratio If gender doesn’t modify the “effect” of smoking, then O 01 / O 00 = O 11 / O 10 OR 11 = OR 10 × OR 01, where OR ij = O ij / O 00 additive decomposition of log-odds ln(O ij ) Even if gender doesn’t modify the effect of smoking, smoking’s marginal effect may be different from its gender-specific effect !?! Smoking No (0)Yes (1)Marginal Male (0)O 00 O 01 O 0 Female (1)O 10 O 11 O1O1 MarginalO0 O1 “Effect” of smoking: O 01 / O 00 (in males) O 11 / O 10 (in females) O ** = R ** /(1 – R ** ) Equivalent ≠ O 1 / O 0 in general (?!?)

10

11 “No interaction” under one definition often means interaction under another definition. Results from interaction analysis should be always reported with the scale that was used to measure effect. Some effect measures are intuitive, some are not intuitive and even not intrinsically consistent. Interaction = Effect Modification Measure

12 Biologic Interaction Biologic interaction = biologically causal interaction. Greenland and Rothman argued that “biologic interaction” is reflected by departure from additive risks. –Counterfactual arguments –Causal pie arguments Additive definition is difficult to test directly in case- control studies.

13 Advantages of Logistic Regression For retrospective studies (e.g., case-control studies), risk difference and risk ratio cannot be estimated and analyzed. But odds ratio can! Odds ratio doesn’t have boundary effect. Both risk difference and risk ratio do: –Interaction effect must exist under some circumstances. –May cause problems computationally. Odds ratio ≈ risk ratio, when risks are very small.

14 Misconception 1 Interaction terms are treated the same way as main-effect terms: –Numerical comparison between an interaction coefficient and a main-effect coefficient. –(logistic regression) Power to detect interaction when “interaction explains half of the total effect.” –(logistic regression) “Odds ratio” of the interaction. –Fact: They are oranges and apples.

15 Misconception Reinforced by Software Stata output:. logistic case v1 v2 v12 Logistic regression Number of obs = 1530 LR chi2(3) = 12.93 Prob > chi2 = 0.0048 Log likelihood = -878.77373 Pseudo R2 = 0.0073 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- v1 | 1.52674.8978875 0.72 0.472.4821329 4.83463 v2 |.7779552.4651644 -0.42 0.675.2409871 2.511397 v12 | 1.004005.3277949 0.01 0.990.5294554 1.903893 ------------------------------------------------------------------------------

16 Interaction in Logistic Regression μ 00 = β 0 μ 01 = β 0 + β 2 μ 10 = β 0 + β 1 μ 11 = β 0 + β 1 + β 2 + β 3 Smoking No (0)Yes (1) Male (0)O 00 O 01 Female (1)O 10 O 11 μ ij = log(O ij ) = β 0 + β 1 ×sex + β 2 ×smoking + β 3 ×sex×smoking Coefficient βexp(β) β 1 = μ 10 – μ 00 O 10 / O 00 β 2 = μ 01 – μ 00 O 01 / O 00 β 3 = (μ 11 – μ 10 ) – (μ 01 – μ 00 )(O 11 / O 10 ) / (O 01 / O 00 ) Ratio of odds ratios Baseline ORs β1β1 β2β2

17 Misconception 2 Interpret main-effect terms when interaction terms are included in the model: –Evaluation of statistical significance of “main-effect”. –Fact: Main-effect term should always be included in the model as long as it is involved in some interaction terms. –A main-effect coefficient is interpreted as the magnitude of “main effect” or “marginal effect”. –Fact: Main-effect coefficient of variable X represents its “baseline effect” when all variables “interacting” with X are zero (i.e. at baseline). –Its interpretation depends on how other variables are coded (i.e. where the baselines are).

18 Significance of a Main-Effect Term in Logistic Regression μ 00 = β 0 μ 01 = β 0 + β 2 μ 10 = β 0 + β 1 μ 11 = β 0 + β 1 + β 2 + β 3 Smoking No (0)Yes (1) Male (0)O 00 O 01 Female (1)O 10 O 11 Statistical significance of a term ≡ if it can be removed. μ ij = log(O ij ) = β 0 + β 1 ×sex + β 2 ×smoking + β 3 ×sex×smoking What would happen if β 2 = 0? This means differently when sex is coded differently.

19 One Input Variable is Continuous Y = β 0 + β 1 G + β 2 X + β 3 G×X A:Y A = β 0 + β 2 X B:Y B = (β 0 + β 1 )+ (β 2 + β 3 )X β 1 = Y B – Y A when X = 0 β 2 = slope for group A β 3 = difference in slopes (B – A) x y ab G = 0 (group A) G = 1 (group B) β 1 = 0 → same Y when X = 0. β 2 = 0 → group A is flat. β 3 = 0 → equal slopes. often extrapolative and meaningless Not marginal effects

20 Misconception 3 If a set of variables/genes together with all possible combinations among them (i.e. allowing full interactions) significantly predict the outcome, then we have found interaction among these variables. Fact: Interaction is about departure from additive effects. The variables may just have additive effects without interaction.

21 Do We Want Generic Interaction? Carcinogen exposure No (#case/#control)Yes (#case/#control) aa14/3012/34 Aa8/2019/19 AA9/1818/19 Generic interaction H 0 : 4 parameters H a : 6 parameters DF = 2, p = 0.19 Carcinogen NoYes aa−0.76 Aa0.862.14 AA1.072.03 A gene is identified to metabolize a carcinogen. Allele A is the putative susceptibility allele. Goal: Is the risk elevated for those who have carcinogen exposure and carry the risk allele? Data from Piegorsch et al. (1994)

22 Do We Want Generic Interaction? Approach 4 H 0 : 1 group H a : 2 groups DF = 1, p = 0.0043 Carcinogen NoYes aa−− Aa−2.31 AA−2.31 Approach 3 H 0 : 1 group H a : 3 groups DF = 2, p = 0.017 Carcinogen NoYes aa−− Aa−2.37 AA−2.25 Approach 2 H 0 : 2 groups H a : 4 groups DF = 2, p = 0.037 Carcinogen NoYes aa−0.77 Aa−2.19 AA−2.08

23 Testing for Interaction While Adjusting for Other Covariates μ age, 00 = (β 0 + β 4 age) μ age, 01 = (β 0 + β 4 age) + β 2 μ age, 10 = (β 0 + β 4 age) + β 1 μ age, 11 = (β 0 + β 4 age) + β 1 + β 2 + β 3 μ age, ij = log(O age, ij ) = β 0 + β 4 age + β 1 sex + β 2 smoking + β 3 sex×smoking We are testing for interaction under the assumption that the effects of sex, smoking, and sex×smoking are the same over the whole ranges of the covariates. Smoking No (0)Yes (1) Male (0)O age, 00 O age, 01 Female (1)O age, 10 O age, 11

Download ppt "What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007."

Similar presentations