Presentation on theme: "Graphical Exploration of Statistical Interactions Nick Jackson University of Southern California Department of Psychology 10/25/2013 1."— Presentation transcript:
Graphical Exploration of Statistical Interactions Nick Jackson University of Southern California Department of Psychology 10/25/2013 1
Overview What is Interaction? 2-Way Interactions ◦ Categorical X Categorical ◦ Continuous X Categorical ◦ Continuous X Continuous 3-Way Interactions ◦ Categorical X Continuous X Continuous ◦ Continuous X Continuous X Continuous ◦ Time in a Three-Way Interaction 4-Way and beyond 2
What is an Interaction? Equivalent Statements: ◦ When the relationship between X and Y depends on the levels of a third variable Z. ◦ Z modifies the effect of X on Y. ◦ X and Y ‘s relationship is different at differing levels of Z Also Called Moderation or Effect Modification. Moderation is a stupid term. ◦ Moderation (n): The avoidance of excess or extremes. ◦ Moderate (v): To make or become less extreme or intense Those are kinda the opposite of what we mean when we say moderation in a statistical sense. 3
What is an Interaction? 4 X Y Z As SEM diagrams: X Y X*Z Z
What is an Interaction? 5 X Y Z=0 Z=1 Z Modifies the effect of X on Y X Y Effect of X on Y if we ignore Z
Types of Interaction 6 Quantitative Interaction OnlyQualitative Interaction Z=0 Y Z=1 X=0 X=1 X=0 X=1 Quantitative Interaction: Difference between X(0) and X(1) is significantly different between Z(0) and Z(1), though these differences are not qualitatively different (visually these things look to be about the same). This occurs as a result of substantial power. X*Z, p<0.05 Qualitative Interaction: Difference between X(0) and X(1) may or may not be significantly different between Z(0) and Z(1), however these differences are qualitatively different (ie. it really does look like an interaction) Z=0 Y Z=1 X=0 X=1 X=0 X=1
Graphing the Interaction 7 Why Graph? ◦ Interpreting the interaction coefficient(s) is not always intuitive Two ways to graph: ◦ 1) Look at observed means/values Represents your actual data Very easy to do in any package Does not represent the statistical model being used ◦ 2) Look at marginal (predicted) means/values from regression equation A direct representation of the statistical model you are using For interactions with continuous variables, it allows you to see where the interaction is occurring.
Graphing the Interaction 8 More about marginal (predicted) means/values from regression equation The General Idea: ◦ Take the regression equation and predict values for the different levels of your variables X and Z ◦ For any covariates, use the their mean levels ◦ An Example: Find the predicted means: Diabetes=1, Gender=1: 75 + 20.5(1) + 15(1) + 10.5(1*1)=121 Diabetes=0, Gender=1: 75 + 20.5(0) + 15(1) + 10.5(0*1)=90 Diabetes=1, Gender=0: 75 + 20.5(1) + 15(0) + 10.5(1*0)=95.5 Diabetes=0, Gender=0: 75 + 20.5(0) + 15(0) + 10.5(0*0)=75 Can get Standard Errors of predictions, though a bit difficult.
Graphing the Interaction (Marginal Estimates) 9 Available in most Software Packages: ◦ Margins/marginsplot command in Stata ◦ lsmeans and effects Packages in R. predict and predict.lm commands in R. Some good ways to look at interactions in R. http://www.ats.ucla.edu/stat/r/faq/concon.htm ◦ Least-Squares Means (LSMEANS), Slicing, Contrasts, Estimate in SAS. ◦ SPSS GLM (emmeans), estimated marginal means
Two-Way Interactions Categorical X Categorical Interaction ◦ Use Bar Graphs ◦ 2 X 2: Below are equivalent representations of the same interaction…so which is it? 10 Male Blood Pressure Female Asian White Among males, Asians have a higher blood pressure than whites. Among females, Asians have a lower blood pressure than whites. Male Blood Pressure Female Asian White Among Whites, Females have a higher blood pressure than Males. Among Asians, Females have a lower blood pressure than Males. MaleFemale
Two-Way Interactions Continuous X Categorical Interaction ◦ Could make continuous variable categorical and use a bar graph. ◦ Better idea, Use Scatter Plots/Linear Prediction for each category We can see that as BMI increases, blood pressure increases more sharply in Men than in Women. By looking at the Confidence Intervals we can start to get an idea about when the genders diverge (statistically) in their effects.
Two-Way Interactions Continuous X Categorical Interaction ◦ Look at how the Slope of Gender (differences between Men and Women) change across varying levels of BMI. ◦ We can use the 95% CI to see when these differences become significant. The differences in mean blood pressure between men and women become more pronounced at higher BMI’s such that women have a lower BP than men as BMI increases. These differences are statistically significant (95% CI of difference does not include 0) past a BMI of around 35.
Two-Way Interactions 13 Continuous X Categorical Interaction ◦ With more than Two Group categorical variable
Two-Way Interactions Continuous X Categorical Interaction ◦ With more than Two Group categorical variable Same as before, just plotting the differences relative to the reference group Works the same with non-linear continuous variables.
Two-Way Interactions Continuous X Continuous Interaction ◦ Traditional Methods Discretize one of the continuous variables making it categorical and do the usual procedures for categorical X continuous interactions. Usually +1 and -1 SD (This method sucks ) –Can miss where the interaction occurs ◦ Newer Method: Predict values at percentiles of the continuous variables Generally avoid the extremes of the percentiles ( 95) as the variability is greater at the extremes ◦ Newer Method: Use 3-D Graphing (Surface/Mesh Plots) Same ideas as predicting values at the percentiles, but utilizing a 3D modeling software
Two-Way Interactions 16 Continuous X Continuous Interaction: Predicted values at percentiles
Two-Way Interactions 17 Continuous X Continuous Interaction: Which way we graph it is fairly arbitrary We can see that the nature of the relationship changes at around a BMI 30. We could say that BMI has a positive association with Blood Pressure, and that this relationship is the strongest among those with high cholesterol. Those with low cholesterol do not see a relationship of BMI with Blood Pressure We can see that the nature of the relationship changes at around a cholesterol value of 3.5. We could say that Cholesterol has a positive association with Blood Pressure, and that this relationship is the strongest among those with high BMI. Those with low BMI have a negative or no relationship of Cholesterol with Blood Pressure
Two-Way Interactions 18 Continuous X Continuous Interaction: Another way to interpret: 4-Corners Method Low Chol, Low BMI=133 Low Chol, High BMI=125 High Chol, Low BMI=130 High Chol, High BMI=155 The combination of being Obese (BMI >30) and having high cholesterol results in high BP.
Two-Way Interactions 19 Continuous X Continuous Interaction: 3D Mesh Plots (Matlab, Sigma Plot, R) Same data as before, same interpretation. Use 4-Corners Why we generally don’t use observed data…not smooth Observed DataMarginal Estimates Data
Two-Way Interactions 20 Continuous X Continuous Interaction: Useful for Non-linear continuous interactions (Response Surface Model)
Three-Way Interactions 21 Now things get complicated. ◦ Variables W*X*Z used to predict Y. ◦ The Interaction of X*Z is different at differing levels of W ◦ Or X*W is different at differing levels of Z ◦ Or Z*W is different at differing levels of X ◦ Or relationship of X and Y is different according to the levels of W and Z etc. ◦ Substantially easier when one of X, W, or Z are categorical
Three-Way Interactions Substantially easier when one of X, W, or Z are categorical…. so we pick a small range of values to predict one of the variables over…treating it as semi-discrete (Quartiles?) Often Time is the third variable Interested in if the interaction of X*Z change over Time (W) 22
Three-Way Interactions 23 Categorical X Continuous X Continuous Interaction: Sleep Medication (Y/N) * BMI * Pulse: Stratify on categorical var Sleep Meds The interaction of BMI and Pulse exists for those on Sleep Medications only.
Three-Way Interactions 24 Another way to look at this is how the difference in Apnea between those on Sleep Medications versus Not changes depending upon the relationships of pulse and BMI
Three-Way Interactions 25 Continuous X Continuous X Continuous Interaction: Glucose Level* BMI * Pulse: Stratify on Glucose Asks the question: How does the interaction of Pulse and BMI change across levels of glucose
Three-Way Interactions 26 Continuous X Continuous X Continuous Interaction: Glucose Level* BMI * Pulse: Look at how the slopes of Glucose on Apnea change. Asks the question: How does the relationship of Glucose to Apnea change across levels of BMI and pulse.
Three-Way Interactions 27 What if we have time as our third variable? Same techniques, but perhaps in the future we won’t be limited to just static graphs. Interaction of BMI and Pulse on Apnea Score across Time
Presenting Data in Motion Even better, lets do some of this: ◦ http://www.ted.com/talks/hans_rosling_reve als_new_insights_on_poverty.html http://www.ted.com/talks/hans_rosling_reve als_new_insights_on_poverty.html 28
Four-Way Interactions and Beyond Understanding anything much more complex than a 3- way interaction is difficult without a good way to break down variables into categories Classification Techniques/Machine Leaning/Exploratory Data Mining ◦ Can take high-dimensional data and find homogenous groups based upon relationships of continuous/categorical variables. 29
Take Home Points Test for interactions in the beginning of model building ◦ Cause they are interesting ◦ Cause they obscure your main effects Interactions give us clues about underlying etiology (David Schwartz). It is not enough to detect them, we have to understand why the interaction exists. ◦ We must search for the variable(s) that make interactions go away (mediated moderation) Modern classification/Data Mining Methods are great at detecting high-dimensional (numerous variables) non- linear interactions Stata Version 12 and 13 are amazing at doing these types of plots (margin plots). Also, check out “Interpreting and Visualizing Regression Models Using Stata” by Michael Mitchell 31