Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Regression Analysis Multiple Regression Model Sections 16.1 - 16.6.

Similar presentations


Presentation on theme: "Multiple Regression Analysis Multiple Regression Model Sections 16.1 - 16.6."— Presentation transcript:

1 Multiple Regression Analysis Multiple Regression Model Sections 16.1 - 16.6

2 The Model and Assumptions If we can predict the value of a variable on the basis of one explanatory variable, we might make a better prediction with two or more explanatory variables  Expect to reduce the chance component of our model  Hope to reduce the standard error of the estimate  Expect to eliminate bias that may result if we ignore a variable that substantially affects the dependent variable

3 The Model and Assumptions The multiple regression model is where y i is the dependent variable for the i th observation  0 is the Y intercept  1,..,  k are the population partial regression coefficients x 1i, x 2i,…x ki are the observed values of the independent variables, X 1, X 2 ….X k. k = 1,2,3…K explanatory variables

4 The Model and Assumptions The assumptions of the model are the same as those discussed for simple regression  The expected value of Y for the given Xs is a linear function of the Xs  The standard deviation of the Y terms for given X values is a constant, designated as  y|x  The observations, y i, are statistically independent  The distribution of the Y values (error terms) is normal

5 Interpreting the Partial Regression Coefficients For each X term there is a partial regression coefficient,  k This coefficient measures the change in the E(Y) given a one unit change in the explanatory variable X k,  holding the remaining explanatory variables constant  controlling for the remaining explanatory variables  ceteris parabis  Equivalent to a partial derivative in calculus

6 Method of Least Squares - OLS To estimate the population regression equation, we use the method of least squares The model written in terms of the sample notation is The sample regression equation is

7 Method of Least Squares - OLS Goal is to minimize the distance between the predicted values of Y, the, and the observed values, y i, that is, minimize the residual, e i Minimize

8 Method of Least Squares - OLS Take partial derivatives of SSE with respect to each of the partial regression coefficients and the intercept Each equation is set equal to zero  This gives us k+1 equations in k+1 unknowns  The equations must be independent and non-homogeneous  Using matrix algebra or a computer, this system of equations can be solved  With a single explanatory variable, the fitted model is a straight line With two explanatory variables, the model represents a plane in a three dimensional space With three or more variables it becomes a hyperplane in higher dimensional space  The sample regression equation is correctly called a regression surface, but we will call it a regression line

9 An Example: The Human Capital Model Consider education as an investment in human capital There should be a return on this investment in terms of higher future earnings Most people accept that earnings tend to rise with schooling levels, but this knowledge by itself does not imply that individuals should go on for more schooling More is usually costly  Direct payments (tuition)  Indirect payments (foregone earnings) Thus the actual magnitude of the increased earnings with additional years of schooling is important Can not simply calculate the average earnings for a sample of workers with different education levels  Have to consider the effects on earnings of other factors, for example, experience in the labor market, age, ability, race and sex

10 An Example: The Human Capital Model Consider a first simple model  (1) Earnings =  0 +  1 education +   Expect that the coefficient on education will be positive,  1 > 0 Realize that most people have higher earnings as they age, regardless of their education  If age and education are positively correlated, the estimated regression coefficient on education will overstate the marginal impact of education  A better model would account for the effect of age  (2) Earnings =  0 +  1 education +  2 age + 

11 A Conceptual Experiment Multiple regression involves a conceptual experiment that we might not be able to carry out in practice What we would like to do is to compare individuals with different education levels who are the same age  We would then be able to see the effects of education on average earnings, while controlling for age

12 Current Population Survey, White Males, March 1991 What is the affect of an additional year of education? $31,523.24 - 27,970.59 = $3,552.65 All workers are 40 years oldn Average Annual Earnings Educ = 12227$27,970.59 Educ = 13132$31,523.24

13 A Conceptual Experiment Frequently we do not have large enough data sets to be able to ask this type of question Multiple regression analysis allows us to perform the conceptual exercise of comparing individuals with the same age and different education levels, even if the sample contains no such pairs of individuals

14 Sample Data Data was obtained for the March 1992 Current Population Survey  The CPS is the source of the official Government statistics on employment and unemployment  A very important secondary purpose is to collect information such as age, sex, race, education, income and previous work experience.  The survey has been conducted monthly for over 50 years  About 57,000 households are interviewed monthly, containing approximately 114,500 persons 15 years and older; based on the civilian non-institutional population  For multiple regression question, sample consists of white male respondents 18-65 years old, who spent at least one week in the labor force in the preceding year and who provided information on wage earnings during the preceding year.  Sample size is 30,040 Students download Multiple Regression Human Capital Hand-out

15 Sample Statistics ageearneduc Mean37.5027561.9213.02 Standard Error0.070119.6100.017 Median362400013 Mode353000012 Standard Deviation12.1920730.892.92 Sample Variance148.54429769891.238.54 Minimum1820 Maximum6519999820 Count30040 In 1991, the average white male in the sample was 37.5 years old, had 13.0 years of education and earned $27,561.92.

16 Correlation Matrix Second, consider the correlation matrix, which shows the simple correlation coefficients for all pairs of variables There is a small, but positive correlation between education and age  A simple regression of earnings on education will overstate the effect of education because education is positively correlated with age and age has a strong positive effect on earnings ageearneduc age1 earn0.3650511 educ0.0728560.4134961

17 Earnings =  0 +  1 education +  b 0 = b 1 = S b0 = S b1 = = S e

18 Is Education a Significant Explanatory Variable? Use t-test  H 0 :  1 ≤ 0 No relationship  H 1 :  1 > 0 Positive relationship t-test statistic = 78.709 and the p-value is 0.000  Reject the H 0 :  1 ≤ 0 There is a significant positive relationship between education and earnings

19 Additional Information from the Analysis For each additional year of schooling, average earnings increase by $2,933.78 The R 2 =.1710  Find that 17.1% of the variation in earnings across workers is explained by variation in education levels The standard error of the estimate, S e equals $18,876

20 Earnings =  0 +  1 education +  2 age +  b 1 = b 2 = b 0 = S b0 = S b1 = S b2 = =S e

21 Interpret the Coefficients In terms of this problem  For each additional year of schooling, average earnings increase by $2,759.73, controlling for age  For each additional year of age, average earnings increase by $572.74, controlling for schooling

22 Prediction Predict the mean earnings for white male workers who are 30 old and have a college degree The standard error of the estimate, S e = $17,545 where k = no. of explanatory variables

23 Assessing the Regression as a Whole Want to assess the performance of the model as a whole  H 0 :  1 =  2 =  3 = …=  k = 0 The model has no worth  H 1 : At least one regression coefficient is not equal to zero The model has worth If all the b’s are close to zero, then the SSR will approach zero

24 Assessing the Regression as a Whole Test Statistic  where k = the number of explanatory variables If the null hypothesis is true, the calculated test statistic will be close to zero; if the null hypothesis is false, the F test statistic will be “large”

25 Assessing the Regression as a Whole The calculated F test statistic is compared with the critical F to determine whether the null hypothesis should be rejected  If F k,n-k-1 > F ,k,n-k-1 (cv) reject the H 0 ⍺ cvF reject

26 ANOVA Table in Regression SSR SSE Finally note the p-value, written as Significance F, which equals 0.0000. This tells us that we have a zero probability of observing a test statistic as large as 5,949.8 if the null hypothesis is true. The model has worth. P-value

27 Inferences Concerning the Population Regression Coefficients Which explanatory variables have coefficients significantly different from zero? Perform a hypothesis test for each explanatory variable  Essentially the same t-test used for simple regression Hypotheses  H 0 :  k = 0  H 1 :  k  0

28 Inferences Concerning the Population Regression Coefficients The test statistic is  where K = number of independent variables  The denominator,, is the standard error of the regression coefficient, b k Take the standard errors of the regression coefficients from the computer output

29 Inferences Concerning the Population Regression Coefficients In our model, there are two explanatory variables  There will be two tests about population regression coefficients Test whether Education is a significant variable  H 0 :  educ ≤ 0  H 1 :  educ > 0 Test whether Age is a significant variable  H 0 :  age ≤ 0  H 1 :  age > 0 Let ⍺ = 0.01  t ,.01 = 2.326 from the t tables

30 T-test Test statistic: educ Test statistic: age Reject the null hypothesis, one tail test,  =.01. Find that education is significantly and positively related to earnings. Again, we reject the null hypothesis and conclude that age is significantly and positively related to earnings. p-values < 0.01

31 The Coefficient of Determination and the Adjusted R 2 The R 2 value is still defined as the ratio of the SSR to the SST We see that 28.38% of the variation in earnings is explained by variation in education and in age The simple regression has an R 2 = 0.1710  Appears that adding the new explanatory variable improved the “goodness of fit”  This conclusion can be misleading As we add new explanatory variables to our model, the R 2 always increases, even when the new explanatory variables are not significant The SSE always decreases as more explanatory variables are added  This is a mathematical property and doesn’t depend on the relevance of the additional variables

32 The Coefficient of Determination and the Adjusted R 2 If we take into account the degrees of freedom SSE/(n-k-1) can increase or decrease  Depending on whether the additional variables are significant explanatory variables or not Adjust the R 2 statistic as follows:  Adjusted R 2 can increase if the additional explanatory variables are important  Can decrease if the additional explanatory variables are not significant When comparing regression models with different numbers of explanatory variables, you should compare the adjusted R 2 to decide which is the best model The adjusted R 2  1, but can take on a value less than zero if the model is very poor

33 Online Homework - Chapter 16 Multiple Regression CengageNOW sixteenth assignment


Download ppt "Multiple Regression Analysis Multiple Regression Model Sections 16.1 - 16.6."

Similar presentations


Ads by Google