Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Regression 1 Sociology 5811 Lecture 22 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Similar presentations


Presentation on theme: "Multiple Regression 1 Sociology 5811 Lecture 22 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission."— Presentation transcript:

1 Multiple Regression 1 Sociology 5811 Lecture 22 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

2 Announcements None!

3 Multiple Regression Question: What if a dependent variable is affected by more than one independent variable? Strategy #1: Do separate bivariate regressions –One regression for each independent variable This yields separate slope estimates for each independent variable –Bivariate slope estimates implicitly assume that neither independent variable mediates the other –In reality, there might be no effect of family wealth over and above education

4 Multiple Regression Job Prestige: Two separate regression models Both variables have positive, significant slopes

5 Multiple Regression Idea #2: Use Multiple Regression Multiple regression can examine “partial” relationships –Partial = Relationships after the effects of other variables have been “controlled” (taken into account) This lets you determine the effects of variables “over and above” other variables –And shows the relative impact of different factors on a dependent variable And, you can use several independent variables to improve your predictions of the dependent var

6 Multiple Regression Job Prestige: 2 variable multiple regression Education slope is basically unchanged Family Income slope decreases compared to bivariate analysis (bivariate: b = 2.07) And, outcome of hypothesis test changes – t < 1.96

7 Multiple Regression Ex: Job Prestige: 2 variable multiple regression 1. Education has a large slope effect controlling for (i.e. “over and above”) family income 2. Family income does not have much effect controlling for education Despite a strong bivariate relationship Possible interpretations: Family income may lead to education, but education is the critical predictor of job prestige Or, family income is wholly unrelated to job prestige… but is coincidentally correlated with a variable that is (education), which generated a spurious “effect”.

8 The Multiple Regression Model A two-independent variable regression model: Note: There are now two X variables And a slope (b) is estimated for each one The full multiple regression model is: For k independent variables

9 Multiple Regression: Slopes Regression slope for the two variable case: b 1 = slope for X 1 – controlling for the other independent variable X 2 b 2 is computed symmetrically. Swap X 1 s, X 2 s Compare to bivariate slope:

10 Multiple Regression Slopes Let’s look more closely at the formulas: What happens to b 1 if X 1 and X 2 are totally uncorrelated? Answer: The formula reduces to the bivariate What if X 1 and X 2 are correlated with each other AND X 2 is more correlated with Y than X 1 ? Answer: b 1 gets smaller (compared to bivariate)

11 Regression Slopes So, if two variables (X 1, X 2 ) are correlated and both predict Y: The X variable that is more correlated with Y will have a higher slope in multivariate regression –The slope of the less-correlated variable will shrink Thus, slopes for each variable are adjusted to how well the other variable predicts Y –It is the slope “controlling” for other variables

12 Multiple Regression Slopes One last thing to keep in mind… What happens to b 1 if X 1 and X 2 are almost perfectly correlated? Answer: The denominator approaches Zero The slope “blows up”, approaching infinity Highly correlated independent variables can cause trouble for regression models… watch out

13 Interpreting Results (Over)Simplified rules for interpretation –Assumes good sample, measures, models, etc. Multivariate regression with two variables: A, B If slopes of A, B are the same as bivariate, then each has an independent effect If A remains large, B shrinks to zero we typically conclude that effect of B was spurious, or operates through A If both A and B shrink a little, each has an effect, but some overlap or mediation is occurring

14 Interpreting Multivariate Results Things to watch out for: 1. Remember: Correlation is not causation –Ability to “control” for many variables can help detect spurious relationships… but it isn’t perfect. –Be aware that other (omitted) variables may be affecting your model. Don’t over-interpret results. 2. Reverse causality –Many sociological processes involve bi-directional causality. Regression slopes (and correlations) do not identify which variable “causes” the other. Ex: self-esteem and test scores.

15 Standardized Regression Coefficients Regression slopes reflect the units of the independent variables Question: How do you compare how “strong” the effects of two variables if they have totally different units? Example: Education, family wealth, job prestige –Education measured in years, b = 2.5 –Family wealth measured on 1-5 scale, b =.18 –Which is a “bigger” effect? Units aren’t comparable! Answer: Create “standardized” coefficients

16 Standardized Regression Coefficients Standardized Coefficients –Also called “Betas” or Beta Weights” –Symbol: Greek b with asterisk:  * –Equivalent to Z-scoring (standardizing) all independent variables before doing the regression Formula of coeficient for X j : Result: The unit is standard deviations Betas: Indicates the effect a 1 standard deviation change in X j on Y

17 Standardized Regression Coefficients Ex: Education, family income, and job prestige: An increase of 1 standard deviation in Education results in a.52 standard deviation increase in job prestige Betas give you a sense of which variables “matter most” What is the interpretation of the “family income” beta?

18 R-Square in Multiple Regression Multivariate R-square is much like bivariate: But, SSregression is based on the multivariate regression The addition of new variables results in better prediction of Y, less error (e), higher R-square.

19 R-Square in Multiple Regression Example: R-square of.272 indicates that education, parents wealth explain 27% of variance in job prestige “Adjusted R-square” is a more conservative, more accurate measure in multiple regression –Generally, you should report Adjusted R-square.

20 Dummy Variables Question: How can we incorporate nominal variables (e.g., race, gender) into regression? Option 1: Analyze each sub-group separately –Generates different slope, constant for each group Option 2: Dummy variables –“Dummy” = a dichotomous variables coded to indicate the presence or absence of something –Absence coded as zero, presence coded as 1.

21 Dummy Variables Strategy: Create a separate dummy variable for all nominal categories Ex: Gender – make female & male variables –DFEMALE: coded as 1 for all women, zero for men –DMALE: coded as 1 for all men Next: Include all but one dummy variables into a multiple regression model If two dummies, include 1; If 5 dummies, include 4.

22 Dummy Variables Question: Why can’t you include DFEMALE and DMALE in the same regression model? Answer: They are perfectly correlated (negatively): r = -1 –Result: Regression model “blows up” For any set of nominal categories, a full set of dummies contains redundant information –DMALE and DFEMALE contain same information –Dropping one removes redundant information.

23 Dummy Variables: Interpretation Consider the following regression equation: Question: What if the case is a male? Answer: DFEMALE is 0, so the entire term becomes zero. –Result: Males are modeled using the familiar regression model: a + b 1 X + e.

24 Dummy Variables: Interpretation Consider the following regression equation: Question: What if the case is a female? Answer: DFEMALE is 1, so b 2 (1) stays in the equation (and is added to the constant) –Result: Females are modeled using a different regression line: (a+b 2 ) + b 1 X + e –Thus, the coefficient of b 2 reflects difference in the constant for women.

25 Dummy Variables: Interpretation Remember, a different constant generates a different line, either higher or lower –Variable: DFEMALE (women = 1, men = 0) –A positive coefficient (b) indicates that women are consistently higher compared to men (on dep. var.) –A negative coefficient indicated women are lower Example: If DFEMALE coeff = 1.2: –“Women are on average 1.2 points higher than men”.

26 Dummy Variables: Interpretation Visually: Women = blue, Men = red INCOME 100000800006000040000200000 HAPPY 10 9 8 7 6 5 4 3 2 1 0 Overall slope for all data points Note: Line for men, women have same slope… but one is high other is lower. The constant differs! If women=1, men=0: The constant (a) reflects men only. Dummy coefficient (b) reflects increase for women (relative to men)

27 Dummy Variables What if you want to compare more than 2 groups? Example: Race –Coded 1=white, 2=black, 3=other (like GSS) Make 3 dummy variables: –“DWHITE” is 1 for whites, 0 for everyone else –“DBLACK” is 1 for Af. Am., 0 for everyone else –“DOTHER” is 1 for “others”, 0 for everyone else Then, include two of the three variables in the multiple regression model.

28 Dummy Variables: Interpretation Ex: Job Prestige Negative coefficient for DBLACK indicates a lower level of job prestige compared to whites –T- and P-values indicate if difference is significant.

29 Dummy Variables: Interpretation Comments: 1. Dummy coefficients shouldn’t be called slopes –Referring to the “slope” of gender doesn’t make sense –Rather, it is the difference in the constant (or “level”) 2. The contrast is always with the nominal category that was left out of the equation –If DFEMALE is included, the contrast is with males –If DBLACK, DOTHER are included, coefficients reflect difference in constant compared to whites.

30 Interaction Terms Question: What if you suspect that a variable has a totally different slope for two different sub- groups in your data? Example: Income and Happiness –Perhaps men are more materialistic -- an extra dollar increases their happiness a lot –If women are less materialistic, each dollar has a smaller effect on income (compared to men) Issue isn’t men = “more” or “less” than women –Rather, the slope of a variable (income) differs across groups

31 Interaction Terms Issue isn’t men = “more” or “less” than women –Rather, the slope of a variable coefficient (for income) differs across groups Again, we want to specify a different regression line for each group –We want lines with different slopes, not parallel lines that are higher or lower.

32 Interaction Terms Visually: Women = blue, Men = red INCOME 100000800006000040000200000 HAPPY 10 9 8 7 6 5 4 3 2 1 0 Overall slope for all data points Note: Here, the slope for men and women differs. The effect of income on happiness (X1 on Y) varies with gender (X2). This is called an “interaction effect”

33 Interaction Terms Interaction effects: Differences in the relationship (slope) between two variables for each category of a third variable Option #1: Analyze each group separately Option #2: Multiply the two variables of interest: (DFEMALE, INCOME) to create a new variable –Called: DFEMALE*INCOME –Add that variable to the multiple regression model.

34 Interaction Terms Consider the following regression equation: Question: What if the case is male? Answer: DFEMALE is 0, so b 2 (DFEM*INC) drops out of the equation –Result: Males are modeled using the ordinary regression equation: a + b 1 X + e.

35 Interaction Terms Consider the following regression equation: Question: What if the case is male? Answer: DFEMALE is 1, so b 2 (DFEM*INC) becomes b 2 *INCOME, which is added to b 1 –Result: Females are modeled using a different regression line: a + (b 1 +b 2 ) X + e –Thus, the coefficient of b 2 reflects difference in the slope of INCOME for women.

36 Interaction Terms Interpreting interaction terms: A positive b for DFEMALE*INCOME indicates the slope for income is higher for women vs. men –A negative effect indicates the slope is lower –Size of coefficient indicates actual difference in slope Example: DFEMALE*INCOME. Observed b’s: –Income: b =.5 –DFEMALE * INCOME: b = -.2 Interpretation: Slope is.5 for men,.3 for women.

37 Interaction Terms Continuous variable can also interact Example: Effect of education and income on happiness –Perhaps highly educated people are less materialistic –As education increases, the slope between between income and happiness would decrease Simply multiply Education and Income to create the interaction term “EDUCATION*INCOME” –And add it to the model

38 Interaction Terms How do you interpret continuous variable interactions? Example: EDUCATION*INCOME: Coefficient = 2.0 Answer: For each unit change in education, the slope of income vs. happiness increases by 2 –Note: coefficient is symmetrical: For each unit change in income, education slope increases by 2 –Dummy interactions result in slopes for each group –Continuous interactions result in many slopes Each category of education*income has a different slope.

39 Interaction Terms Comments: 1. If you make an interaction you should also include the component variables in the model: –A model with “DFEMALE * INCOME” should also include DFEMALE and INCOME –There is some debate on this issue… but that is the safest course of action 2. Sometimes interaction terms are highly correlated with its components Watch out for that.

40 Interaction Terms Question: Can you think of examples of two variables that might interact? Either from your final project? Or anything else?


Download ppt "Multiple Regression 1 Sociology 5811 Lecture 22 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission."

Similar presentations


Ads by Google