What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007.

Slides:



Advertisements
Similar presentations
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Advertisements

Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
1 Logistic Regression EPP 245 Statistical Analysis of Laboratory Data.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Gene-Environment Interaction: Definitions and Study Designs
EPI 809/Spring Multiple Logistic Regression.
Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data.
BINARY CHOICE MODELS: LOGIT ANALYSIS
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data.
Concepts of Interaction Matthew Fox Advanced Epi.
Simple Linear Regression
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
 Is there a comparison? ◦ Are the groups really comparable?  Are the differences being reported real? ◦ Are they worth reporting? ◦ How much confidence.
POTH 612A Quantitative Analysis Dr. Nancy Mayo. © Nancy E. Mayo A Framework for Asking Questions Population Exposure (Level 1) Comparison Level 2 OutcomeTimePECOT.
Amsterdam Rehabilitation Research Center | Reade Multiple regression analysis Analysis of confounding and effectmodification Martin van de Esch, PhD.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
October 15. In Chapter 19: 19.1 Preventing Confounding 19.2 Simpson’s Paradox 19.3 Mantel-Haenszel Methods 19.4 Interaction.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.
1 Risk Assessment Tests Marina Kondratovich, Ph.D. OIVD/CDRH/FDA March 9, 2011 Molecular and Clinical Genetics Panel for Direct-to-Consumer (DTC) Genetic.
Chapter 13 Multiple Regression
Lecture 18 Ordinal and Polytomous Logistic Regression BMTRY 701 Biostatistical Methods II.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Multiple Regression  Similar to simple regression, but with more than one independent variable R 2 has same interpretation R 2 has same interpretation.
1 Chapter 16 logistic Regression Analysis. 2 Content Logistic regression Conditional logistic regression Application.
Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
The dangers of an immediate use of model based methods The chronic bronchitis study: bronc: 0= no 1=yes poll: pollution level cig: cigarettes smokes per.
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Conditional Logistic Regression Epidemiology/Biostats VHM812/802 Winter 2016, Atlantic Veterinary College, PEI Raju Gautam.
Exact Logistic Regression
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Logistic Regression Logistic Regression - Binary Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
EHS Lecture 14: Linear and logistic regression, task-based assessment
Advanced Quantitative Techniques
Logistic Regression.
Discussion: Week 4 Phillip Keung.
Lecture 18 Matched Case Control Studies
Introduction to Logistic Regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Scale, Causal Pies and Interaction 1h
Introduction to Logistic Regression
Problems with infinite solutions in logistic regression
Common Statistical Analyses Theory behind them
Case-control studies: statistics
Effect Modifiers.
Presentation transcript:

What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007

2 What We Have Learned Little. Generic. In linear regression: y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 In whatever other regression, the right-hand side is β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 For a binary outcome, we often use logistic regression. For example, the log-odds of cancer risk log(O ij ) = β 0 + β 1 ×sex + β 2 ×smoking + β 3 ×sex×smoking “main effect”“interaction effect”

3 Interaction Introduced by R. A. Fisher to generalize the concept “epistasis” in genetics. The concept is ubiquitous. The word sounds easy to understand, and is charismatic in some circles. Ambiguous without model context. Hard to interpret and translate to reality for some models, such as logistic regression.

4 Epistasis Example: Genotype BB masks the effect of gene A. It is a very special type of interaction. Such a phenomenon can be seen in other contexts, e.g. gene- environment interaction. bbBbBB aa Aa AA Exposure NoYes aa Aa AA

5 “No Interaction” ≠ Independence Interaction is about the joint effect of input variables on an outcome, or how the effect change as the values change at the input variables. Independence is about the statistical relationship between input variables, irrespective of the outcome or the effect on the outcome. Using “independent effect” to describe “no interaction” may be confusing.

6 Interaction = Effect Modification Effect modification: The effect of one variable on the outcome is modified depending on the values of other variables. It depends on how “effect” is measured and on what scale. ― Kenneth Rothman, Sander Greenland For a binary outcome, “effect” can be measured as –risk difference –risk ratio –odds ratio

7 Measuring Effect: Risk Difference If gender doesn’t modify the “effect” of smoking, then R 01 – R 00 = R 11 – R 10 R 11 – R 00 = (R 10 – R 00 ) + (R 01 – R 00 ) RR 11 – 1 = (RR 10 – 1) + (RR 01 – 1) additive decomposition of risk: R ij = a i + b j Smoking No (0)Yes (1)Marginal Male (0)R 00 R 01 R 0 Female (1)R 10 R 11 R1R1 MarginalR0 R1 “Effect” of smoking: R 01 – R 00 (in males) R 11 – R 10 (in females) Equivalent = R 1 – R 0 (!) = (R 1 – R 0 ) + (R 1 – R 0 ), where RR ij = R ij / R 00

8 Measuring Effect: Risk Ratio If gender doesn’t modify the “effect” of smoking, then R 01 / R 00 = R 11 / R 10 RR 11 = RR 10 × RR 01 RR 11 = (R 1 / R 0 ) × (R1 / R 0 ) multiplicative decomposition of risk: R ij = c i × d j Smoking No (0)Yes (1)Marginal Male (0)R 00 R 01 R 0 Female (1)R 10 R 11 R1R1 MarginalR0 R1 “Effect” of smoking: R 01 / R 00 (in males) R 11 / R 10 (in females) Equivalent = R 1 / R 0 (!)

9 Measuring Effect: Odds Ratio If gender doesn’t modify the “effect” of smoking, then O 01 / O 00 = O 11 / O 10 OR 11 = OR 10 × OR 01, where OR ij = O ij / O 00 additive decomposition of log-odds ln(O ij ) Even if gender doesn’t modify the effect of smoking, smoking’s marginal effect may be different from its gender-specific effect !?! Smoking No (0)Yes (1)Marginal Male (0)O 00 O 01 O 0 Female (1)O 10 O 11 O1O1 MarginalO0 O1 “Effect” of smoking: O 01 / O 00 (in males) O 11 / O 10 (in females) O ** = R ** /(1 – R ** ) Equivalent ≠ O 1 / O 0 in general (?!?)

10

11 “No interaction” under one definition often means interaction under another definition. Results from interaction analysis should be always reported with the scale that was used to measure effect. Some effect measures are intuitive, some are not intuitive and even not intrinsically consistent. Interaction = Effect Modification Measure

12 Biologic Interaction Biologic interaction = biologically causal interaction. Greenland and Rothman argued that “biologic interaction” is reflected by departure from additive risks. –Counterfactual arguments –Causal pie arguments Additive definition is difficult to test directly in case- control studies.

13 Advantages of Logistic Regression For retrospective studies (e.g., case-control studies), risk difference and risk ratio cannot be estimated and analyzed. But odds ratio can! Odds ratio doesn’t have boundary effect. Both risk difference and risk ratio do: –Interaction effect must exist under some circumstances. –May cause problems computationally. Odds ratio ≈ risk ratio, when risks are very small.

14 Misconception 1 Interaction terms are treated the same way as main-effect terms: –Numerical comparison between an interaction coefficient and a main-effect coefficient. –(logistic regression) Power to detect interaction when “interaction explains half of the total effect.” –(logistic regression) “Odds ratio” of the interaction. –Fact: They are oranges and apples.

15 Misconception Reinforced by Software Stata output:. logistic case v1 v2 v12 Logistic regression Number of obs = 1530 LR chi2(3) = Prob > chi2 = Log likelihood = Pseudo R2 = case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] v1 | v2 | v12 |

16 Interaction in Logistic Regression μ 00 = β 0 μ 01 = β 0 + β 2 μ 10 = β 0 + β 1 μ 11 = β 0 + β 1 + β 2 + β 3 Smoking No (0)Yes (1) Male (0)O 00 O 01 Female (1)O 10 O 11 μ ij = log(O ij ) = β 0 + β 1 ×sex + β 2 ×smoking + β 3 ×sex×smoking Coefficient βexp(β) β 1 = μ 10 – μ 00 O 10 / O 00 β 2 = μ 01 – μ 00 O 01 / O 00 β 3 = (μ 11 – μ 10 ) – (μ 01 – μ 00 )(O 11 / O 10 ) / (O 01 / O 00 ) Ratio of odds ratios Baseline ORs β1β1 β2β2

17 Misconception 2 Interpret main-effect terms when interaction terms are included in the model: –Evaluation of statistical significance of “main-effect”. –Fact: Main-effect term should always be included in the model as long as it is involved in some interaction terms. –A main-effect coefficient is interpreted as the magnitude of “main effect” or “marginal effect”. –Fact: Main-effect coefficient of variable X represents its “baseline effect” when all variables “interacting” with X are zero (i.e. at baseline). –Its interpretation depends on how other variables are coded (i.e. where the baselines are).

18 Significance of a Main-Effect Term in Logistic Regression μ 00 = β 0 μ 01 = β 0 + β 2 μ 10 = β 0 + β 1 μ 11 = β 0 + β 1 + β 2 + β 3 Smoking No (0)Yes (1) Male (0)O 00 O 01 Female (1)O 10 O 11 Statistical significance of a term ≡ if it can be removed. μ ij = log(O ij ) = β 0 + β 1 ×sex + β 2 ×smoking + β 3 ×sex×smoking What would happen if β 2 = 0? This means differently when sex is coded differently.

19 One Input Variable is Continuous Y = β 0 + β 1 G + β 2 X + β 3 G×X A:Y A = β 0 + β 2 X B:Y B = (β 0 + β 1 )+ (β 2 + β 3 )X β 1 = Y B – Y A when X = 0 β 2 = slope for group A β 3 = difference in slopes (B – A) x y ab G = 0 (group A) G = 1 (group B) β 1 = 0 → same Y when X = 0. β 2 = 0 → group A is flat. β 3 = 0 → equal slopes. often extrapolative and meaningless Not marginal effects

20 Misconception 3 If a set of variables/genes together with all possible combinations among them (i.e. allowing full interactions) significantly predict the outcome, then we have found interaction among these variables. Fact: Interaction is about departure from additive effects. The variables may just have additive effects without interaction.

21 Do We Want Generic Interaction? Carcinogen exposure No (#case/#control)Yes (#case/#control) aa14/3012/34 Aa8/2019/19 AA9/1818/19 Generic interaction H 0 : 4 parameters H a : 6 parameters DF = 2, p = 0.19 Carcinogen NoYes aa−0.76 Aa AA A gene is identified to metabolize a carcinogen. Allele A is the putative susceptibility allele. Goal: Is the risk elevated for those who have carcinogen exposure and carry the risk allele? Data from Piegorsch et al. (1994)

22 Do We Want Generic Interaction? Approach 4 H 0 : 1 group H a : 2 groups DF = 1, p = Carcinogen NoYes aa−− Aa−2.31 AA−2.31 Approach 3 H 0 : 1 group H a : 3 groups DF = 2, p = Carcinogen NoYes aa−− Aa−2.37 AA−2.25 Approach 2 H 0 : 2 groups H a : 4 groups DF = 2, p = Carcinogen NoYes aa−0.77 Aa−2.19 AA−2.08

23 Testing for Interaction While Adjusting for Other Covariates μ age, 00 = (β 0 + β 4 age) μ age, 01 = (β 0 + β 4 age) + β 2 μ age, 10 = (β 0 + β 4 age) + β 1 μ age, 11 = (β 0 + β 4 age) + β 1 + β 2 + β 3 μ age, ij = log(O age, ij ) = β 0 + β 4 age + β 1 sex + β 2 smoking + β 3 sex×smoking We are testing for interaction under the assumption that the effects of sex, smoking, and sex×smoking are the same over the whole ranges of the covariates. Smoking No (0)Yes (1) Male (0)O age, 00 O age, 01 Female (1)O age, 10 O age, 11