# Extension The General Linear Model with Categorical Predictors.

## Presentation on theme: "Extension The General Linear Model with Categorical Predictors."— Presentation transcript:

Extension The General Linear Model with Categorical Predictors

Extension  Regression can actually handle different types of predictors, and in the social sciences we are often interested in differences between groups  For now we will concern ourselves with the two independent groups case  E.g. gender, republican vs. democrat etc.

Dummy coding  There are different ways to code categorical data for regression, and in general, to represent a categorical variable you need k-1 coded variables 1  k = number of categories/groups  Dummy coding involves using zeros and ones to identify group membership, and since we only have two groups, one group will be zero (the reference group) and the other 1

Dummy coding  Example  The thing to note at this point is that we have a simple bivariate correlation/simple regression setting  The correlation between group and the DV is.76  This is sometimes referred to as the point biserial correlation (r pb ) because of the categorical variable  However, don’t be fooled, it is calculated exactly the same way as the Pearson before i.e. you treat that 0,1 grouping variable like any other in calculating the correlation coefficient  However, the sign is arbitrary since either group could have been a one or zero, and so that needs to be noted Group Outcome 03 05 07 02 03 16 17 18 19

Example  Graphical display  The R-square is.76 2 =.577  The regression equation is

Example  Look closely at the descriptive output compared to the coefficients.  What do you see?

The constant  Note again our regression equation  Recall the definition for the slope and constant  First the constant, what does “when X = O” mean here in this setting?  It means when we are in the O group  What is that predicted value?  Y pred = 4 + 3.4(0) = 4  That is the group’s mean  The constant here is thus the reference group’s mean

The coefficient  Now think about the slope  What does a ‘1 unit change in X’ mean in this setting?  It means we go from one group to the other  Based on that coefficient, what does the slope represent in this case (i.e. can you derive that coefficient from the descriptive stats in some way?)  The coefficient is the difference between means

The regression line  The regression line covers the values represented  i.e. 0, 1, for the two groups  It passes through each of their means  Using least squares regression the regression line always passes through the mean of X and Y, though the mean of X here is nonsensical  The constant (if we are using dummy coding) is the mean for the zero (reference) group  The coefficient is the difference between means

 Furthermore, the previous gives the same results we would have gotten via a t-test, to which we are about to turn,  However, you now can see it is not a distinct procedure from regression with a linear model of some outcome predicted by a grouping variable. Two Sample t-test data: Outcome by Group t = 3.3024, df = 8, p-value = 0.01082 95 percent confidence interval: 5.774177 1.025823

 Understanding the basics regarding the general linear model can go a long way toward one’s ability to understand any analysis  It not only specifically holds here but is utilized in more complex univariate and multivariate analyses, and even in some nonlinear situations (e.g. logistic regression), we use ‘generalized’ linear models  Y = Xb + e  For properly specified models, linear models provide reasonable fits and an intuitive understanding relative to more complex approaches.