Presentation on theme: "1 8.4 Multiple Regression Lecture Unit 8. 2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable,"— Presentation transcript:
1 8.4 Multiple Regression Lecture Unit 8
2 8.4 Introduction In this section we extend simple linear regression where we had one explanatory variable, and allow for any number of explanatory variables. We expect to build a model that fits the data better than the simple linear regression model.
3 We shall use computer printout to –Assess the model How well it fits the data Is it useful Are any required conditions violated? –Employ the model Interpreting the coefficients Predictions using the prediction equation Estimating the expected value of the dependent variable Introduction
4 Coefficients Dependent variableIndependent variables Random error variable Multiple Regression Model We allow for k explanatory variables to potentially be related to the response variable y = 0 + 1 x 1 + 2 x 2 + …+ k x k +
The Multiple Regression Model Idea: Examine the linear relationship between 1 response variable (y) & 2 or more explanatory variables (x i ) Population model: Y-interceptPopulation slopesRandom Error Estimated (or predicted) value of y Estimated slope coefficients Estimated multiple regression model: Estimated intercept
Simple Linear Regression Random Error for this x value y x Observed Value of y for x i Predicted Value of y for x i xixi Slope = b 1 Intercept = b 0 εiεi
7 Multiple Regression, 2 explanatory variables Least Squares Plane (instead of line) Scatter of points around plane are random error.
Multiple Regression Model Two variable model y x1x1 x2x2 yiyi yiyi < e = (y i – y i ) < x 2i x 1i The best fit equation, y, is found by minimizing the sum of squared errors, e 2 < Sample observation
9 The error is normally distributed. The mean is equal to zero and the standard deviation is constant ( for all values of y. The errors are independent. Required conditions for the error variable
10 –If the model assessment indicates good fit to the data, use it to interpret the coefficients and generate predictions. –Assess the model fit using statistics obtained from the sample. –Diagnose violations of required conditions. Try to remedy problems when identified. 8.4 Estimating the Coefficients and Assessing the Model The procedure used to perform regression analysis: –Obtain the model coefficients and statistics using statistical software.
11 Predicting final exam scores in BUS/ST 350 –We would like to predict final exam scores in 350. –Use information generated during the semester. –Predictors of the final exam score: Exam 1 Exam 2 Exam 3 Homework total Estimating the Coefficients and Assessing the Model, Example
12 Data were collected from 203 randomly selected students from previous semesters The following model is proposed final exam = exam1 exam2 exam3 hwtot Estimating the Coefficients and Assessing the Model, Example exam 1exam2exam3hwtotfinalexm
13 This is the sample regression equation (sometimes called the prediction equation) This is the sample regression equation (sometimes called the prediction equation) Regression Analysis, Excel Output Final exam score = exam exam exam hwtot
14 b 0 = This is the intercept, the value of y when all the variables take the value zero. Since the data range of all the independent variables do not cover the value zero, do not interpret the intercept. b 1 = In this model, for each additional point on exam 1, the final exam score increases on average by (assuming the other variables are held constant). Interpreting the Coefficients
15 b 2 = In this model, for each additional point on exam 2, the final exam score increases on average by (assuming the other variables are held constant). b 3 = For each additional point on exam 3, the final exam score increases on average by (assuming the other variables are held constant). b 4 = For each additional point on the homework, the final exam score increases on average by (assuming the other variables are held constant). Interpreting the Coefficients
16 Predict the average final exam score of a student with the following exam scores and homework score: –Exam 1 score 75, –Exam 2 score 79, –Exam 3 score 85, –Homework score 310 –Use trend function in Excel Final exam score = (75) (79) (85) (310) = Final Exam Scores, Predictions
17 Model Assessment The model is assessed using three tools: –The standard error of the residuals –The coefficient of determination –The F-test of the analysis of variance The standard error of the residuals participates in building the other tools.
18 The standard deviation of the residuals is estimated by the Standard Error of the Residuals : The magnitude of s is judged by comparing it to Standard Error of Residuals
19 Regression Analysis, Excel Output Standard error of the residuals; sqrt(MSE) (standard error of the residuals) 2 : MSE=SSE/198 Sum of squares of residuals SSE
20 From the printout, s = …. Calculating the mean value of y we have It seems s is not particularly small. Question: Can we conclude the model does not fit the data well? Standard Error of Residuals
21 The proportion of the variation in y that is explained by differences in the explanatory variables x 1, x 2, …, x k R = 1 – (SSE/SSTotal) From the printout, R 2 = … 38.25% of the variation in final exam score is explained by differences in the exam1, exam2, exam3, and hwtot explanatory variables % remains unexplained. When adjusted for degrees of freedom, Adjusted R 2 = 36.99% Coefficient of Determination R 2 (like r 2 in simple linear regression
22 We pose the question: Is there at least one explanatory variable linearly related to the response variable? To answer the question we test the hypothesis H 0 : 1 = 2 = … = k =0 H 1 : At least one i is not equal to zero. If at least one i is not equal to zero, the model has some validity. Testing the Validity of the Model
23 The hypotheses are tested by what is called an F test shown in the Excel output below Testing the Validity of the Final Exam Scores Regression Model k = n–k–1 = n-1 = P-value SSR SSE MSE=SSE/(n-k-1) MSR=SSR/k MSR/MSE
24 [Variation in y] = SSR + SSE. Large F results from a large SSR. Then, much of the variation in y is explained by the regression model; the model is useful, and thus, the null hypothesis H 0 should be rejected. Reject H 0 when P-value < 0.05 Testing the Validity of the Final Exam Scores Regression Model
25 The P-value (Significance F) < 0.05 Reject the null hypothesis. Testing the Validity of the Final Exam Scores Regression Model Conclusion: There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At least one of the i is not equal to zero. Thus, at least one explanatory variable is linearly related to y. This linear regression model is valid
26 The hypothesis for each i is Excel printout H 0 : i 0 H 1 : i 0 d.f. = n - k -1 Test statistic Testing the Coefficients