Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extension The General Linear Model with Categorical Predictors.

Similar presentations

Presentation on theme: "Extension The General Linear Model with Categorical Predictors."— Presentation transcript:

1 Extension The General Linear Model with Categorical Predictors

2 Extension  Regression can actually handle different types of predictors, and in the social sciences we are often interested in differences between groups  For now we will concern ourselves with the two independent groups case  E.g. gender, republican vs. democrat etc.

3 Dummy coding  There are different ways to code categorical data for regression, and in general, to represent a categorical variable you need k-1 coded variables 1  k = number of categories/groups  Dummy coding involves using zeros and ones to identify group membership, and since we only have two groups, one group will be zero (the reference group) and the other 1

4 Dummy coding  Example  The thing to note at this point is that we have a simple bivariate correlation/simple regression setting  The correlation between group and the DV is.76  This is sometimes referred to as the point biserial correlation (r pb ) because of the categorical variable  However, don’t be fooled, it is calculated exactly the same way as the Pearson before i.e. you treat that 0,1 grouping variable like any other in calculating the correlation coefficient  However, the sign is arbitrary since either group could have been a one or zero, and so that needs to be noted Group Outcome 03 05 07 02 03 16 17 18 19

5 Example  Graphical display  The R-square is.76 2 =.577  The regression equation is

6 Example  Look closely at the descriptive output compared to the coefficients.  What do you see?

7 The constant  Note again our regression equation  Recall the definition for the slope and constant  First the constant, what does “when X = O” mean here in this setting?  It means when we are in the O group  What is that predicted value?  Y pred = 4 + 3.4(0) = 4  That is the group’s mean  The constant here is thus the reference group’s mean

8 The coefficient  Now think about the slope  What does a ‘1 unit change in X’ mean in this setting?  It means we go from one group to the other  Based on that coefficient, what does the slope represent in this case (i.e. can you derive that coefficient from the descriptive stats in some way?)  The coefficient is the difference between means

9 The regression line  The regression line covers the values represented  i.e. 0, 1, for the two groups  It passes through each of their means  Using least squares regression the regression line always passes through the mean of X and Y, though the mean of X here is nonsensical  The constant (if we are using dummy coding) is the mean for the zero (reference) group  The coefficient is the difference between means

10  Furthermore, the previous gives the same results we would have gotten via a t-test, to which we are about to turn,  However, you now can see it is not a distinct procedure from regression with a linear model of some outcome predicted by a grouping variable. Two Sample t-test data: Outcome by Group t = 3.3024, df = 8, p-value = 0.01082 95 percent confidence interval: 5.774177 1.025823

11  Understanding the basics regarding the general linear model can go a long way toward one’s ability to understand any analysis  It not only specifically holds here but is utilized in more complex univariate and multivariate analyses, and even in some nonlinear situations (e.g. logistic regression), we use ‘generalized’ linear models  Y = Xb + e  For properly specified models, linear models provide reasonable fits and an intuitive understanding relative to more complex approaches.

Download ppt "Extension The General Linear Model with Categorical Predictors."

Similar presentations

Ads by Google