 # Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.

## Presentation on theme: "Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance."— Presentation transcript:

Multiple Regression Fenster

Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance of a relationship between two variables where one variable is considered the dependent variable and another variable, the independent variable. Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance of a relationship between two variables where one variable is considered the dependent variable and another variable, the independent variable.

Multiple Regression Today we want to extend that discussion in three important ways. Today we want to extend that discussion in three important ways. (1) looking at tests of statistical significance for bivariate regression, (1) looking at tests of statistical significance for bivariate regression, (2) looking at the case where we have more than one independent variable predicting to a dependent variable at the same time. (2) looking at the case where we have more than one independent variable predicting to a dependent variable at the same time. (3) Looking at both points 1 and 2 at the same time. That is to say, tests of statistical significance where we have more than one independent variable. (3) Looking at both points 1 and 2 at the same time. That is to say, tests of statistical significance where we have more than one independent variable.

Multiple Regression Remember we have three key assumptions of simple regression: Remember we have three key assumptions of simple regression: (1)linearity (1)linearity (2)bivariate normality (2)bivariate normality (3) homoscedasticity (3) homoscedasticity

Multiple Regression In the multiple regression case, assumptions 1 and 3 are the same. Assumption 2 will no longer work. In the multiple regression case, assumptions 1 and 3 are the same. Assumption 2 will no longer work. We change assumption 2 to MULTIVARIATE NORMALITY. We change assumption 2 to MULTIVARIATE NORMALITY. That is to say the distribution of X 1, X 2, etc. are simultaneously normal with y That is to say the distribution of X 1, X 2, etc. are simultaneously normal with y

Multiple Regression The distinction between simple and multiple regression is very straightforward. The distinction between simple and multiple regression is very straightforward. If we have 1 independent variable we have a simple regression. If we have 1 independent variable we have a simple regression. If we have more than one independent variable we have a multiple regression. If we have more than one independent variable we have a multiple regression.

Multiple Regression Example of simple regression Example of simple regression y= α + b 1 X 1 + e y= α + b 1 X 1 + e Examples of Multiple regression Examples of Multiple regression y= α + b 1 X 1 + b 2 X 2 + e OR y= α + b 1 X 1 + b 2 X 2 + b 3 X 3 +e

Multiple Regression WHY RUN A MULTIPLE REGRESSION? WHY RUN A MULTIPLE REGRESSION? Very often more than one independent variable affects a dependent variable. We might get a distorted picture of the relationship(s) we are trying to test if we ignored this effect. Very often more than one independent variable affects a dependent variable. We might get a distorted picture of the relationship(s) we are trying to test if we ignored this effect.

Multiple Regression EXAMPLE EXAMPLE Let us say our dependent variable was performance on state-mandated assessments. What possible independent variables are related to this dependent variable? Let us say our dependent variable was performance on state-mandated assessments. What possible independent variables are related to this dependent variable?

Multiple Regression (a) Socio-economic status of families sending their child to school. (One way to assess this relationship would be to hypothesize that the higher the percentage of students on free and reduced lunch, the lower students will score on state mandated assessments). (a) Socio-economic status of families sending their child to school. (One way to assess this relationship would be to hypothesize that the higher the percentage of students on free and reduced lunch, the lower students will score on state mandated assessments). (b) School size. (Some researchers have argued that smaller the size of the school, the higher students will score on state mandated assessments.) (b) School size. (Some researchers have argued that smaller the size of the school, the higher students will score on state mandated assessments.)

Multiple Regression (c) Class size. (Some researchers have argued that smaller the class size of the school, the higher students will score on state mandated assessments.) (c) Class size. (Some researchers have argued that smaller the class size of the school, the higher students will score on state mandated assessments.) (d) Per capita student spending. (One could hypothesize the higher the per capita student expenditures, the higher students will score on state mandated assessments.) (d) Per capita student spending. (One could hypothesize the higher the per capita student expenditures, the higher students will score on state mandated assessments.)

Multiple Regression (e) Quality of teacher (William Sanders, developer of the Tennessee Value Added System, has data that shows that the quality of a teacher can make a difference in student learning (e) Quality of teacher (William Sanders, developer of the Tennessee Value Added System, has data that shows that the quality of a teacher can make a difference in student learning (f) (add your own) (f) (add your own) Not all of these variables may affect the dependent variable at the same time. However, multiple regression gives us a way to assess if more than one variable influences a dependent variable at the same time. Not all of these variables may affect the dependent variable at the same time. However, multiple regression gives us a way to assess if more than one variable influences a dependent variable at the same time.

Multiple Regression (2) Multiple regression reduces unexplained variation giving education models increased predictive power and strength they would not otherwise have. (2) Multiple regression reduces unexplained variation giving education models increased predictive power and strength they would not otherwise have. Why? Multiple regression allows for more than one independent variable to predict to a dependent variable. Why? Multiple regression allows for more than one independent variable to predict to a dependent variable. (3) Multiple regression allows us to test a wide range of hypothesis simultaneously. (3) Multiple regression allows us to test a wide range of hypothesis simultaneously.

Multiple Regression Interpreting a Multiple Regression Equation Interpreting a Multiple Regression Equation First, let us review how to interpret a bivariate regression equation. First, let us review how to interpret a bivariate regression equation. In the equation In the equation y= α + b 1 X 1 + e y= α + b 1 X 1 + e α = the predicted value of y when X 1 =0 α = the predicted value of y when X 1 =0 b 1 = for every one unit increase in X 1, we predict y to increase by b 1 b 1 = for every one unit increase in X 1, we predict y to increase by b 1

Multiple Regression We could have said We could have said α = the predicted value of y when all X's =0 α = the predicted value of y when all X's =0 b 1 = for every one unit increase in X 1, we predict y to increase by b 1, holding all other X's equal. b 1 = for every one unit increase in X 1, we predict y to increase by b 1, holding all other X's equal. Since we only had one X it did not matter. In a multiple regression equation, we have more than one X and it will matter. Since we only had one X it did not matter. In a multiple regression equation, we have more than one X and it will matter.

Multiple Regression Let us say we had the following multiple regression equation: Let us say we had the following multiple regression equation: y= α + b 1 X 1 + b 2 X 2 + e y= α + b 1 X 1 + b 2 X 2 + e We interpret the equation in the following way: We interpret the equation in the following way: α = the predicted value of y when all X's =0 α = the predicted value of y when all X's =0 b 1 = for every one unit increase in X 1, we predict y to increase by b 1, holding all other X's equal. b 1 = for every one unit increase in X 1, we predict y to increase by b 1, holding all other X's equal. b 2 = for every one unit increase in X 2, we predict y to increase by b 2, holding all other X's equal. b 2 = for every one unit increase in X 2, we predict y to increase by b 2, holding all other X's equal.

Multiple Regression By holding a variable constant we mean controlling for that variable statistically, allowing us to assess the unique effects of X 1 and X 2 on y simultaneously. By holding a variable constant we mean controlling for that variable statistically, allowing us to assess the unique effects of X 1 and X 2 on y simultaneously.

HOW TO TEST HYPOTHESES IN REGRESSION We test hypotheses in simple and multiple regression in the same manner. We test hypotheses in simple and multiple regression in the same manner. We already have beta, a point estimate of the parameter we wish to estimate. We already have beta, a point estimate of the parameter we wish to estimate. What we do not have is any estimate of the variance of beta. What we do not have is any estimate of the variance of beta. The measure of variance we use is called the STANDARD ERROR of beta. The measure of variance we use is called the STANDARD ERROR of beta.

HOW TO TEST HYPOTHESES IN REGRESSION Hypotheses testing in regression is easily done. Take: Hypotheses testing in regression is easily done. Take: your estimate for beta and your estimate for beta and divide by the standard error for beta. divide by the standard error for beta. This gives us a t statistic. This gives us a t statistic. You then can look up the significance of the t value in any statistics book, or simply look at your computer printout. You then can look up the significance of the t value in any statistics book, or simply look at your computer printout.

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION Note: Remember a standard error is the standard deviation of a sampling distribution. We have already dealt with the standard error of the mean. Here we are dealing with a different sampling distribution, a sampling distribution of beta. Note: Remember a standard error is the standard deviation of a sampling distribution. We have already dealt with the standard error of the mean. Here we are dealing with a different sampling distribution, a sampling distribution of beta.

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION Hypothesis testing in multiple regression proceeds in a very similar manner to hypothesis testing in simple regression. Hypothesis testing in multiple regression proceeds in a very similar manner to hypothesis testing in simple regression. For multiple regression we have the same formulas to test hypotheses as for bivariate regression: For multiple regression we have the same formulas to test hypotheses as for bivariate regression:

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION t 1 = beta 1 for the first independent variable t 1 = beta 1 for the first independent variable stand err b 1 stand err b 1 And And t 2 = beta 2 for the second independent variable t 2 = beta 2 for the second independent variable stand err b 2 stand err b 2 And if you have three independent variables And if you have three independent variables t 3 = beta 3 for the third independent variable t 3 = beta 3 for the third independent variable stand err b 3 stand err b 3

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION This means we may be able to REJECT the NULL HYPOTHESES for the first independent variable and NOT BE ABLE TO REJECT THE NULL HYPOTHESES for a second (and perhaps) additional independent variable. This means we may be able to REJECT the NULL HYPOTHESES for the first independent variable and NOT BE ABLE TO REJECT THE NULL HYPOTHESES for a second (and perhaps) additional independent variable. Since we have more than one independent variable our tests for each may come to different results. Since we have more than one independent variable our tests for each may come to different results.

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION Dummy Variable Regression Dummy Variable Regression Let us say we had the following model: Let us say we had the following model: y= α + b 1 X 1 + e y= α + b 1 X 1 + e where X=gender and where X=gender and y=score on math assessment y=score on math assessment

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION We hypothesized that males had higher math assessment scores than females. Can such a question be handled using regression? We hypothesized that males had higher math assessment scores than females. Can such a question be handled using regression? Our assumption that the levels of measurement of dependent and independent variables be measured at interval level or higher would seem to be a problem in this case. Our assumption that the levels of measurement of dependent and independent variables be measured at interval level or higher would seem to be a problem in this case.

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION For the dependent variable, we have no problem. Assessment scores are an interval level variable. For the dependent variable, we have no problem. Assessment scores are an interval level variable. However, our independent variable, gender, is nominal. Is there anything we can do? However, our independent variable, gender, is nominal. Is there anything we can do?

HOW TO TEST HYPOTHESES IN REGRESSION Let us say we coded gender in the following manner: Let us say we coded gender in the following manner: 1=females 1=females 0=males 0=males

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION What implications would this coding system have for our equation? What implications would this coding system have for our equation? y= α + b 1 X 1 + e y= α + b 1 X 1 + e Suppose we were trying to predict assessment scores for males. What happens to the beta coefficient for males? Suppose we were trying to predict assessment scores for males. What happens to the beta coefficient for males? Since males equals zero, the beta coefficient drops out of the equation because ANYTHING times zero=0. Since males equals zero, the beta coefficient drops out of the equation because ANYTHING times zero=0. y= α + b 1 * 0 + e y= α + b 1 * 0 + e

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION Thus the equation for males simplifies to Thus the equation for males simplifies to y= α + e y= α + e For females the beta coefficient does enter the equation and we get For females the beta coefficient does enter the equation and we get y= α + b 1 X 1 + e OR y= α + b 1 X 1 + e OR y= α + b 1 *1 + e OR y= α + b 1 *1 + e OR y= α + b + e because X=1 and 1 multiplied by ANYTHING= ANYTHING. y= α + b + e because X=1 and 1 multiplied by ANYTHING= ANYTHING.

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION This means that beta "shifts" the regression equation up or down to reflect differences in the left out category. This means that beta "shifts" the regression equation up or down to reflect differences in the left out category. If the beta coefficient is negative (as we hypothesize in this case) the regression line would be shifted down. If the beta coefficient is negative (as we hypothesize in this case) the regression line would be shifted down. If the beta coefficient is positive the regression line would be shifted upward. If the beta coefficient is positive the regression line would be shifted upward.

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION By coding nominal level variables as 0 and 1 we can deal with nominal level data in regression equations. By coding nominal level variables as 0 and 1 we can deal with nominal level data in regression equations. On a substantive level, our equation would be very uninteresting. Test developers go to considerable efforts to insure that assessments are “gender neutral”. On a substantive level, our equation would be very uninteresting. Test developers go to considerable efforts to insure that assessments are “gender neutral”.

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION Consider a different example: Consider a different example: y= α + b 1 X 1 + b 2 X 2 + b 3 X 3 + e y= α + b 1 X 1 + b 2 X 2 + b 3 X 3 + e where X 1 =gender, coded the same as before where X 1 =gender, coded the same as before X 2 =math preparation [number of prior courses] X 2 =math preparation [number of prior courses] X 3 =math preparation [type of course preparation {AP Calculus, or algebra] X 3 =math preparation [type of course preparation {AP Calculus, or algebra]

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION Now we hypothesize that males have higher math assessment scores than females (H 1 ) AND, Now we hypothesize that males have higher math assessment scores than females (H 1 ) AND, the greater the number of prior math courses, the higher the performance on math assessment (H 2 ), AND the greater the number of prior math courses, the higher the performance on math assessment (H 2 ), AND students who take advanced math courses will have higher scores on a math assessment (H 3 ). students who take advanced math courses will have higher scores on a math assessment (H 3 ).

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION If X 1 is significant then we have some support for the hypothesis that males have higher scores on a math assessment, controlling for differences in math preparation (H 2 and H 3 ). If X 1 is significant then we have some support for the hypothesis that males have higher scores on a math assessment, controlling for differences in math preparation (H 2 and H 3 ). If X 2 is significant then we have support for the hypothesis that number of courses is positively related to performance on a math assessment, controlling for differences in gender and math preparation (H 1 and H 3 ). If X 2 is significant then we have support for the hypothesis that number of courses is positively related to performance on a math assessment, controlling for differences in gender and math preparation (H 1 and H 3 ).

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION If X 3 is significant then we have support for the hypothesis that type of math preparation is associated with performance on a math assessment, controlling for differences in gender and number of courses (H 1 and H 2 ). If X 3 is significant then we have support for the hypothesis that type of math preparation is associated with performance on a math assessment, controlling for differences in gender and number of courses (H 1 and H 2 ). This three variable multiple regression equation would be a MUCH more demanding test of the hypothesis concerning gender differences on math achievement than the simple regression equation. This three variable multiple regression equation would be a MUCH more demanding test of the hypothesis concerning gender differences on math achievement than the simple regression equation.

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION If X 1 retains significance after controlling for number and type of math courses, then we would have more confidence in the conclusion that there were gender differences in math achievement. If X 1 retains significance after controlling for number and type of math courses, then we would have more confidence in the conclusion that there were gender differences in math achievement. If X 1 loses significance after controlling for number and type of courses, then we would know that the differences in math performance attributed to gender at the bivariate level were more properly related to number and type of math courses and, (perhaps) that males were more likely to take these courses. If X 1 loses significance after controlling for number and type of courses, then we would know that the differences in math performance attributed to gender at the bivariate level were more properly related to number and type of math courses and, (perhaps) that males were more likely to take these courses.

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION Two points follow from this example: Two points follow from this example: (1) We can control for other variables and still introduce dummy variables in the equation. (1) We can control for other variables and still introduce dummy variables in the equation. (2) SPSS does not know the difference between dummy variables and any other type of variable. YOU DO!! YOU ARE THE ONE WHO INTERPRETS THE VARIABLES. SPSS can not interpret variables for you. (2) SPSS does not know the difference between dummy variables and any other type of variable. YOU DO!! YOU ARE THE ONE WHO INTERPRETS THE VARIABLES. SPSS can not interpret variables for you.

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION Dummy variables with more than two categories Dummy variables with more than two categories For gender we had a variable with two categories: males and females. For gender we had a variable with two categories: males and females. One dummy variable picked up the effect for gender. One dummy variable picked up the effect for gender. In general, if we had z categories, we need z-1 dummy variables. In general, if we had z categories, we need z-1 dummy variables.

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION Example: consider the variable region of a country with the following coding scheme: Example: consider the variable region of a country with the following coding scheme: 1=north 1=north 2=south 2=south 3=midwest 3=midwest 4=west 4=west

HOW TO TEST HYPOTHESES IN MULTIPLE REGRESSION We have 4 categories, so we need 3 dummy variables to account for region. We have 4 categories, so we need 3 dummy variables to account for region. We want to create the following coding scheme We want to create the following coding scheme X 1 X 2 X 3 North 1 0 0 North 1 0 0 South 0 1 0 South 0 1 0 Midwest 0 0 1 Midwest 0 0 1 West 0 0 0 West 0 0 0

Dummy Variables with more than 2 categories That is to say, for X 1 a state has a value of 1 if it is in the north, otherwise it has a code of zero. That is to say, for X 1 a state has a value of 1 if it is in the north, otherwise it has a code of zero. For X 2 a state has a value of 1 if it is in the south, otherwise it has a code of zero. For X 2 a state has a value of 1 if it is in the south, otherwise it has a code of zero. For X 3 a state has a value of 1 if it is in the midwest, otherwise it has a code of zero. For X 3 a state has a value of 1 if it is in the midwest, otherwise it has a code of zero. We do not need to do anything for the west. The west will represent the left out category and we will interpret the alpha term as the western region. We do not need to do anything for the west. The west will represent the left out category and we will interpret the alpha term as the western region.

Dummy Variables with more than 2 categories How can region be set up for dummy variable regression in SPSS? EASY! How can region be set up for dummy variable regression in SPSS? EASY! We can create new variables using the COMPUTE function. We can create new variables using the COMPUTE function.

Dummy Variables with more than 2 categories Now we can test the impact of region in a regression framework. Now we can test the impact of region in a regression framework. Let us say we had the following equation Let us say we had the following equation y= α + b 1 X 1 + b 2 X 2 + b 3 X 3 + e y= α + b 1 X 1 + b 2 X 2 + b 3 X 3 + e

Dummy Variables with more than 2 categories y= α + b 1 X 1 + b 2 X 2 + b 3 X 3 + e y= α + b 1 X 1 + b 2 X 2 + b 3 X 3 + e Where X 1 =NORTH Where X 1 =NORTH X 2 =SOUTH X 2 =SOUTH X 3 =MIDWEST X 3 =MIDWEST b 1 = the shift of the regression line for the NORTH b 1 = the shift of the regression line for the NORTH b 2 = the shift of the regression line for the SOUTH b 2 = the shift of the regression line for the SOUTH b 3 = the shift of the regression line for the MIDWEST b 3 = the shift of the regression line for the MIDWEST α = the reference group, the predicted value of y when all X's =0. [The west in this case] y= the dependent variable y= the dependent variable

Dummy Variables with more than 2 categories FUNDAMENTAL ASSUMPTION IN DUMMY VARIABLE REGRESSION: FUNDAMENTAL ASSUMPTION IN DUMMY VARIABLE REGRESSION: The slope coefficient for all categories of the nominal level variable are equal. The slope coefficient for all categories of the nominal level variable are equal. The only change is in the intercept term. The only change is in the intercept term. If you want hypothesize different slopes, you need to consider interaction terms. If you want hypothesize different slopes, you need to consider interaction terms.

Statistical Controls STATISTICAL CONTROLS STATISTICAL CONTROLS Regression controls for variables statistically. By controlling for a variable statistically, we mean accounting for that portion of the variance that UNIQUELY contributes to the explanation of the dependent variable holding all other all other statistical effects constant. Regression controls for variables statistically. By controlling for a variable statistically, we mean accounting for that portion of the variance that UNIQUELY contributes to the explanation of the dependent variable holding all other all other statistical effects constant.

Another example looking at income differences by gender y= α + b 1 X 1 + b 2 X 2 + b 3 X 3 + e y= α + b 1 X 1 + b 2 X 2 + b 3 X 3 + e Where y=yearly earned income Where y=yearly earned income X 1 = gender, coded 1 for females, 0 for males. X 1 = gender, coded 1 for females, 0 for males. X 2 = number of years of formal education. X 2 = number of years of formal education. X 3 = years in the labor force. X 3 = years in the labor force.

Another example looking at income differences by gender Gender would be extremely significant, since census data indicate that females earn about \$0.74 for every \$1.00 earned by males. Gender would be extremely significant, since census data indicate that females earn about \$0.74 for every \$1.00 earned by males. However, what could we conclude from this equation? However, what could we conclude from this equation? Could we argue that females are discriminated against? Could we argue that females are discriminated against?

Another example looking at income differences by gender Is there any other interpretation to the same equation, consistent with the hypothesis that females make less money than males? Is there any other interpretation to the same equation, consistent with the hypothesis that females make less money than males? Let us say that we think that females have lower income than males because females have lower educational attainment. Let us say that we think that females have lower income than males because females have lower educational attainment. Additionally, let us hypothesize that the greater the number of years in the labor force, the higher the income level. Additionally, let us hypothesize that the greater the number of years in the labor force, the higher the income level.

Another example looking at income differences by gender With those three variables, we would have a much more demanding test of our hypothesis that females earn less than men. The simple model With those three variables, we would have a much more demanding test of our hypothesis that females earn less than men. The simple model y= α + b 1 X 1 + e y= α + b 1 X 1 + e does not allow us to control for education level, and number of years in the labor force. Additionally, one could think of many other variables that could influence yearly earned income (number of years out of the labor force to deal with child raising responsibilities, etc.) does not allow us to control for education level, and number of years in the labor force. Additionally, one could think of many other variables that could influence yearly earned income (number of years out of the labor force to deal with child raising responsibilities, etc.)

Download ppt "Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance."

Similar presentations