Coding Multiple Category Variables for Inclusion in Multiple Regression More kinds of predictors for our multiple regression models Some review of interpreting.

Slides:



Advertisements
Similar presentations
Bivariate &/vs. Multivariate
Advertisements

Transformations & Data Cleaning
Canonical Correlation simple correlation -- y 1 = x 1 multiple correlation -- y 1 = x 1 x 2 x 3 canonical correlation -- y 1 y 2 y 3 = x 1 x 2 x 3 The.
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
Research Hypotheses and Multiple Regression: 2 Comparing model performance across populations Comparing model performance across criteria.
Cluster Analysis Purpose and process of clustering Profile analysis Selection of variables and sample Determining the # of clusters.
ANCOVA Workings of ANOVA & ANCOVA ANCOVA, Semi-Partial correlations, statistical control Using model plotting to think about ANCOVA & Statistical control.
Analytic Comparisons & Trend Analyses Analytic Comparisons –Simple comparisons –Complex comparisons –Trend Analyses Errors & Confusions when interpreting.
Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont.
Between Groups & Within-Groups ANOVA
Multiple Group X² Designs & Follow-up Analyses X² for multiple condition designs Pairwise comparisons & RH Testing Alpha inflation Effect sizes for k-group.
Linear Discriminant Function LDF & MANOVA LDF & Multiple Regression Geometric example of LDF & multivariate power Evaluating & reporting LDF results 3.
Regression Models w/ k-group & Quant Variables Sources of data for this model Variations of this model Main effects version of the model –Interpreting.
Multiple Group X² Designs & Follow-up Analyses X² for multiple condition designs Pairwise comparisons & RH Testing Alpha inflation Effect sizes for k-group.
ANCOVA Workings of ANOVA & ANCOVA ANCOVA, Semi-Partial correlations, statistical control Using model plotting to think about ANCOVA & Statistical control.
January 6, afternoon session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Introduction to Path Analysis Reminders about some important regression stuff Boxes & Arrows – multivariate models beyond MR Ways to “think about” path.
Multiple Regression Models Advantages of multiple regression Important preliminary analyses Parts of a multiple regression model & interpretation Differences.
Power Analysis for Correlation & Multiple Regression Sample Size & multiple regression Subject-to-variable ratios Stability of correlation values Useful.
Multiple Regression Models: Some Details & Surprises Review of raw & standardized models Differences between r, b & β Bivariate & Multivariate patterns.
Multiple Regression Models
Analyses of K-Group Designs : Analytic Comparisons & Trend Analyses Analytic Comparisons –Simple comparisons –Complex comparisons –Trend Analyses Errors.
Introduction the General Linear Model (GLM) l what “model,” “linear” & “general” mean l bivariate, univariate & multivariate GLModels l kinds of variables.
PSYC512: Research Methods PSYC512: Research Methods Lecture 19 Brian P. Dyre University of Idaho.
Simple Regression correlation vs. prediction research prediction and relationship strength interpreting regression formulas –quantitative vs. binary predictor.
PSY 1950 General Linear Model November 12, The General Linear Model Or, What the Hell ’ s Going on During Estimation?
Introduction to Path Analysis & Mediation Analyses Review of Multivariate research & an Additional model Structure of Regression Models & Path Models Direct.
Multiple Regression.
MANOVA LDF & MANOVA Geometric example of MANVOA & multivariate power MANOVA dimensionality Follow-up analyses if k > 2 Factorial MANOVA.
Multivariate Analyses & Programmatic Research Re-introduction to Multivariate research Re-introduction to Programmatic research Factorial designs  “It.
Multiple-group linear discriminant function maximum & contributing ldf dimensions concentrated & diffuse ldf structures follow-up analyses evaluating &
General Linear Models -- #1 things to remember b weight interpretations 1 quantitative predictor 1 quantitative predictor & non-linear component 1 2-group.
Analyses of K-Group Designs : Omnibus F & Pairwise Comparisons ANOVA for multiple condition designs Pairwise comparisons and RH Testing Alpha inflation.
Simple Correlation Scatterplots & r Interpreting r Outcomes vs. RH:
One-way Between Groups Analysis of Variance
Multivariate Analyses & Programmatic Research Re-introduction to Multivariate research Re-introduction to Programmatic research Factorial designs  “It.
Bivariate & Multivariate Regression correlation vs. prediction research prediction and relationship strength interpreting regression formulas process of.
K-group ANOVA & Pairwise Comparisons ANOVA for multiple condition designs Pairwise comparisons and RH Testing Alpha inflation & Correction LSD & HSD procedures.
An Introduction to Classification Classification vs. Prediction Classification & ANOVA Classification Cutoffs, Errors, etc. Multivariate Classification.
Intro to Parametric Statistics, Assumptions & Degrees of Freedom Some terms we will need Normal Distributions Degrees of freedom Z-values of individual.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Leedy and Ormrod Ch. 11 Gray Ch. 14
Statistics & Statistical Tests: Assumptions & Conclusions Kinds of degrees of freedom Kinds of Distributions Kinds of Statistics & assumptions required.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Extension to ANOVA From t to F. Review Comparisons of samples involving t-tests are restricted to the two-sample domain Comparisons of samples involving.
Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.
Analyses of K-Group Designs : Omnibus F & Follow-up Analyses ANOVA for multiple condition designs Pairwise comparisons, alpha inflation & correction Alpha.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
ANOVA and Linear Regression ScWk 242 – Week 13 Slides.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Regression Models w/ 2 Quant Variables Sources of data for this model Variations of this model Main effects version of the model –Interpreting the regression.
Advanced Meta-Analyses Heterogeneity Analyses Fixed & Random Efffects models Single-variable Fixed Effects Model – “Q” Wilson Macro for Fixed Efffects.
Regression Models w/ 2-group & Quant Variables Sources of data for this model Variations of this model Main effects version of the model –Interpreting.
Plotting Linear Main Effects Models Interpreting 1 st order terms w/ & w/o interactions Coding & centering… gotta? oughta? Plotting single-predictor models.
Meta-Analysis Effect Sizes effect sizes – r, d & OR computing effect sizes estimating effect sizes & other things of which to be careful!
Remember You just invented a “magic math pill” that will increase test scores. On the day of the first test you give the pill to 4 subjects. When these.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
3-way Designs defining 2-way & 3-way interactions all the effects conditional and unconditional effects orthogonality, collinearity & the like kxkxk kxkxq.
ANCOVA Workings of ANOVA & ANCOVA ANCOVA, partial correlations & multiple regression Using model plotting to think about ANCOVA & Statistical control Homogeneity.
General Linear Model What is the General Linear Model?? A bit of history & our goals Kinds of variables & effects It’s all about the model (not the data)!!
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 18 Multivariate Statistics.
Effect Sizes & Power Analyses for k-group Designs Effect Size Estimates for k-group ANOVA designs Power Analysis for k-group ANOVA designs Effect Size.
Linear Discriminant Function Classification vs. Prediction Classification & ANOVA Classification Cutoffs, Errors, etc. Multivariate Classification & LDF.
1. Refresher on the general linear model, interactions, and contrasts UCL Linguistics workshop on mixed-effects modelling in R May 2016.
Plotting Non-linear & Complex Main Effects Models
Data, Univariate Statistics & Statistical Inference
Multiple Regression.
Introduction to the General Linear Model (GLM)
Statistics for the Social Sciences
Presentation transcript:

Coding Multiple Category Variables for Inclusion in Multiple Regression More kinds of predictors for our multiple regression models Some review of interpreting binary variables Coding Binary variables –Dummy coding –Effect Coding –Interpreting b weights of binary coded variables –Interpreting weights in a larger model –Interpreting r of binary coded variables Coding multiple-category variables –Dummy coding –Effect coding –Interpreting b weights of coded multiple-category variables –Interpreting weights in a larger model –Interpreting r of coded multiple-category variables Comparison coding

Things we’ve learned so far … Interpreting multivariate b from quantitative & binary predictor variables in models Bivariate regression both can be interpreted as “direction and extent of expected change in y for a 1-unit increase in the predictor” binary can be interpreted as “direction and extent of y mean difference between groups” Multivariate regression both can be interpreted as “direction and extent of expected change in y for a 1-unit increase in that predictor, holding the value of all other predictors constant at 0.0” binary can be interpreted as “direction and extent of y mean difference between groups, holding the value of all other predictors constant at 0.0”

Review of interpreting unit-coded (1 vs. 2) binary predictors… Correlation r -- tells direction & strength of the predictor-criterion relationship -- tells which coded group has the larger mean criterion scores (significance test of r is test of mean difference) Bivariate Regression b -- tells size & direction mean difference between the groups (t-test of b is significance test of mean differences) a -- the expected value of y if x = 0 which can’t happen – since the binary variable is coded 1-2 !! Multivariate Regression b -- tells size & direction of meandifference between the groups, holding all other variables constant at 0.0 (t-test of b is test of group mean difference beyond that accounted for by other predictors -- ANCOVA) a -- the expected value of y if value of all predictors = 0 which can’t happen – since the binary variable is coded 1-2 !!

Coding & Transforming predictors for MR models Categorical predictors will be converted to dummy codes comparison/control group coded other group a “target group” of one dummy code, coded 1 Quantitative predictors will be centered, usually to the mean centered = score – mean so, mean = 0 Why? Mathematically – 0s (as control group & mean) simplify the math & minimize collinearity complications Interpretively – the “controlling for” included in multiple regression weight interpretations is really “controlling for all other variables in the model at the value 0” – “0” as the comparison group & mean will make b interpretations simpler and more meaningful

Dummy Coding for two-category variables need 1 code (since there is 1 BG df) comparison condition/group gets coded “0” the treatment or target group gets coded “1” “conceptually”... Group dc 1 1 2* 0 * = comparison group For several participants... Case group dc

Interpretations for dummy coded binary variables Correlation r -- tells direction & strength of the predictor-criterion relationship -- tells which coded group has the larger mean criterion scores (significance test of r is test of mean difference) Bivariate Regression R² is effect size & F sig-test of group difference a -- mean of comparison condition/group b -- tells size & direction of y mean difference between groups (t-test of b is significance test of mean differences) Multivariate Regression (including other variables) b -- tells size & direction of mean difference between the groups, holding all other variables constant at 0.0 (t-test of b is test of group mean difference beyond that accounted for by other predictors -- ANCOVA) a -- the expected value of y if value if all predictors = 0

1 & -1 Effects Coding for two-category variables need 2 codes (since there is 1 BG df) comparison condition/group gets coded “-1” the treatment or target group gets coded “1” “conceptually”... Group ec 1 1 2* -1 * = comparison group For several participants... Case group ec

Interpretations for 1 & -1 effect coded binary variables Correlation r -- tells direction & strength of the predictor-criterion relationship -- tells which coded group has the larger mean criterion scores (significance test of r is a significance test of mean difference) Bivariate Regression R² is effect size & F sig-test of group difference a – expected grand mean of all participants/groups b -- tells size & direction of y mean difference between target group & expected grand mean (1/2 of group mean difference) (t-test of b is significance test of that mean) Multivariate Regression (including other variables) b -- tells size & direction of mean difference between target group &. expected grand mean (1/2 of group mean difference),. holding all other variables constant at 0.0 (t-test of b is test of that mean conttrolling for all other predictors -- ANCOVA) a -- the expected value of y if value if all predictors = 0

What’s with all this expected stuff ??? We have to discriminate the “sample/descriptive grand mean” from the “expected grand mean” – the difference has to do with equal vs. unequal sample sizes! If we have equal n, then these two will match! Tx M=8 n=10 mean of cases = 6  ( ) / 20 Cx M=4 n=10 mean of group means = 6  (8 + 4) / 2 How should we estimate the grand mean for 2-group samples? The mean of all cases in all groups? The average of group means? If we have unequal n, then these two will not match! Tx M=8 n=30 mean of cases = 7  ( ) / 40 Cx M=4 n=10 mean of group means = 6  (8 + 4) / 2 “a” will be the “expected grand mean”  the mean of group means  for both equal-n and unequal-n samples

.5 & -.5 Effects Coding for two-category variables need 2 codes (since there is 1 BG df) comparison or control condition/group gets coded “-.5 the treatment or target group gets coded “.5” “conceptually”... Group ec 1.5 2* -.5 * = comparison group For several participants... Case group ec SPSS uses this coding system in GLM & ANOVA procs

Interpretations for.5 & -.5 effect coded binary variables Correlation –same as 1/-1ef coding r -- tells direction & strength of the predictor-criterion relationship -- tells which coded group has the larger mean criterion scores (significance test of r is a significance test of mean difference) Bivariate Regression R² is effect size & F sig-test of group difference a – expected grand mean of all participants/groups b -- tells size & direction of y mean group difference (t-test of b is significance test of that mean) Multivariate Regression (including other variables) b -- tells size & direction of y mean group difference, holding all other variables constant at 0.0 (t-test of b is test of that mean conttrolling for all other predictors -- ANCOVA) a -- the expected value of y if value if all predictors = 0

Dummy Coding for multiple-category variables can’t use the 1=Tx1, 2=Tx2, 3=Cx values put into SPSS - conditions aren’t quantitatively different need k-1 codes (one for each BG df) comparison or control condition/group gets “0” for all codes each other group gets “1” for one code and “0” for all others “conceptually”... Group dc1 dc * 0 0 * = comparison group For several participants... Case group dc1 dc

Interpreting Dummy Codes for multiple-category variables Multiple Regression including only k-1 dummy codes R² is effect size & F sig-test of group difference a -- mean of comparison condition/group each b -- tells size/direction of y mean dif of that group & control (t-test of b is significance test of the mean difference) Multivariate Regression (including other variables) b -- tells size/direction of y mean dif of that group & comparsion,... holding all other predictors constant at 0.0 (t-test of b is test of y mean difference between groups, beyond that accounted for by other predictors -- ANCOVA) a -- the expected value of y if value of all predictors = 0 Correlation Don’t interpret the r of k-group dummy codes !!!!!!! more later

A set of k-1 dummy codes is the “simple analytic comparisons” we looked at in Psyc941 notice -- won’t get all pairwise information - for k=3 groups you’ll get 2 of 3 pairwise comparisons - for k=4 groups you’ll get 3 of 6 pairwise comparisons - for k=5 groups you’ll get 4 of 10 pairwise comparisons often “largest” or “most common” group is used as comparison - give comparison of each other group to it - but doesn’t give comparisons among the others using comparison group with “middle-most mean” - G1 = 12 G2 = 10 G3 = 8  use G2 as comparison - dc1 = (G1 vs G2) dc2 = (G3 vs G2) - remember that the Omnibus-F tells us about the largest pairwise dif The obmnibus F(2,57) with p 8 dc1 p >.05 tells 12 = 10 dc2 p >.05 tells 10 = 8 The omnibus F(2,57) with p 8 dc1 p 10 dc2 p 8

We (usually) don’t interpret bivariate correlations between Dummy codes for k>2 groups and the criterion. Why?? The b-weights of k-group dummy codes have the interpretation we give them in a multiple regression because of the collinearity pattern produced by the set of coding weights Correlated with the criterion separately, they have different meanings that we (probably) don’t care about Group dc1 dc * 0 0 Taken by itself, dc1 compares Group 1 with the average of Groups 2 & 3 – a complex comparison & not comparable to the interpretation of b 1 in the multiple regression Taken by itself, dc2 compares Group 1 with the average of Groups 1 & 3 – another different complex comparison & not comparable to the interpretation of b 2 in the multiple regression

1 & -1 Effect Coding for multiple-category variables again need k-1 codes (one for each BG df) comparison/control condition/group gets “-1” for all codes each other group gets “1” for one code and “0” for all others “conceptually”... Group ec1 ec * * = comparison group For several participants... Case group ec1 ec

Interpreting 1 & -1 Effects Codes for multiple-category variables Multiple Regression including only k-1 effects codes R² is effect size & F sig-test of group difference a – expected grand mean of groups each b -- tells size/direction of mean dif that group & expected grand mean (t-test of b is significance test of the mean dif) Multivariate Regression (including other variables) b -- tells size/direction of y mean dif of that group & expected grand mean, holding other predictor variable values constant at 0.0 (t-test of b is test of difference between that group mean & grand mean controlling for other predictors -- ANCOVA) a -- the expected value of y if value of all predictors = 0 (which no one has!) Correlation Don’t interpret the r of k-group effect codes !!!!!!! next slide

We (usually) don’t interpret bivariate correlations between Effect codes for k>2 groups and the criterion. Why?? The b-weights of k-group effect codes have the interpretation we give them in a multiple regression because of the collinearity pattern produced by the set of coding weights Correlated with the criterion separately, they have different meanings that we (probably) don’t care about Group ec1 ec * Taken by itself, ec1 appears to be a quantitative variable that lines up the groups 3 – 2 –1, with equal spacing – not true!! Taken by itself, ec2 appears to be a quantitative variable that lines up the groups 3 – 1 –2, with equal spacing – not true!!

.5 & -.5 Effect Coding for multiple-category variables again need k-1 codes (one for each BG df) comparison/control condition/group gets “-.5” for all codes each other group gets “.5” for one code and “0” for all others “conceptually”... Group ec1 ec * * = comparison group For several participants... Case group ec1 ec

Interpreting.5 & -.5 Effects Codes for multiple-category variables Multiple Regression including only k-1 effects codes R² is effect size & F sig-test of group difference a – expected grand mean of groups (not sample grand mean!) each b -- tells size/direction of mean dif between that group & comparison group (t-test of b is significance test of the mean dif) Multivariate Regression (including other variables) b -- tells size/direction of mean dif between that group & comparison group, holding other predictor variable values constant at 0.0 (t-test of b is test of difference between that group means, controlling for other predictors -- ANCOVA) a -- the expected value of y if value of all predictors = 0 (which no one has!)

Comparison codes for multi-category predictors This is the same thing as “simple and complex analytic comparisons can be orthogonal or not folks who like them - like the “focused nature” of comparisons those who don’t - don’t like the “untested assumptions” &/or dissimilarity to more common pairwise comparisons Example… these codes will.. Compare average of two Tx groups to the control Compare the two TX groups Each b is the specific mean difference & t is the significance test Tx1 Tx2 Cx cc cc For these codes, we could interpret the r of cc1 – T1 &T 2 vs Cx But we should not interpret the r from cc2 – is not a quant var!!!