28.4 IntroductionIn this section we extend simple linear regression where we had one explanatory variable, and allow for any number of explanatory variables.We expect to build a model that fits the data better than the simple linear regression model.
3Introduction We shall use computer printout to Assess the model How well it fits the dataIs it usefulAre any required conditions violated?Employ the modelInterpreting the coefficientsPredictions using the prediction equationEstimating the expected value of the dependent variable
4Multiple Regression Model We allow for k explanatory variables to potentially be related to the response variabley = b0 + b1x1+ b2x2 + …+ bkxk + eCoefficientsRandom error variableDependent variableIndependent variables
5The Multiple Regression Model Idea: Examine the linear relationship between 1 response variable (y) & 2 or more explanatory variables (xi)Population model:Y-interceptPopulation slopesRandom ErrorEstimated multiple regression model:Estimated(or predicted)value of yEstimatedinterceptEstimated slope coefficients
6Simple Linear Regression yObserved Value of y for xiεiSlope = b1Predicted Value of y for xiRandom Error for this x valueIntercept = b0xix
7Multiple Regression, 2 explanatory variables *Y21Least Squares Plane (instead of line)Scatter of points around plane are random error.
8Multiple Regression Model Two variable modelySample observationyiyi<e = (yi – yi)<x2ix2x1i<The best fit equation, y ,is found by minimizing thesum of squared errors, e2x1
9Required conditions for the error variable The error e is normally distributed.The mean is equal to zero and the standard deviation is constant (se) for all values of y.The errors are independent.
108.4 Estimating the Coefficients and Assessing the Model The procedure used to perform regression analysis:Obtain the model coefficients and statistics using statistical software.Diagnose violations of required conditions. Try to remedy problems when identified.Assess the model fit using statistics obtained from the sample.If the model assessment indicates good fit to the data, use it to interpret the coefficients and generate predictions.
11Estimating the Coefficients and Assessing the Model, Example Predicting final exam scores in BUS/ST 350We would like to predict final exam scores in 350.Use information generated during the semester.Predictors of the final exam score:Exam 1Exam 2Exam 3Homework total
12Estimating the Coefficients and Assessing the Model, Example Data were collected from 203 randomly selected students from previous semestersThe following model is proposed final exam = b0 + b1exam1 + b2exam2 + b3exam3 + b4hwtotexam 1exam2exam3hwtotfinalexm80601597270753597695903308410092272643448535188352005525140293
13Regression Analysis, Excel Output This is the sample regression equation(sometimes called the prediction equation)Final exam score =exam exam exam hwtot
14Interpreting the Coefficients b0 = This is the intercept, the value of y when all the variables take the value zero. Since the data range of all the independent variables do not cover the value zero, do not interpret the intercept.b1 = In this model, for each additional point on exam 1, the final exam score increases on average by (assuming the other variables are held constant).
15Interpreting the Coefficients b2 = In this model, for each additional point on exam 2, the final exam score increases on average by (assuming the other variables are held constant).b3 = For each additional point on exam 3, the final exam score increases on average by (assuming the other variables are held constant).b4 = For each additional point on the homework, the final exam score increases on average by (assuming the other variables are held constant).
16Final Exam Scores, Predictions Predict the average final exam score of a student with the following exam scores and homework score:Exam 1 score 75,Exam 2 score 79,Exam 3 score 85,Homework score 310Use trend function in ExcelFinal exam score =(75) (79) (85) (310) =
17Model Assessment The model is assessed using three tools: The standard error of the residualsThe coefficient of determinationThe F-test of the analysis of varianceThe standard error of the residuals participates in building the other tools.
18Standard Error of Residuals The standard deviation of the residuals is estimated by the Standard Error of the Residuals:The magnitude of se is judged by comparing it to
19Regression Analysis, Excel Output Standard error of the residuals; sqrt(MSE)(standard error of the residuals)2: MSE=SSE/198Sum of squares of residuals SSE
20Standard Error of Residuals From the printout, se = ….Calculating the mean value of y we haveIt seems se is not particularly small.Question: Can we conclude the model does not fit the data well?
21Coefficient of Determination R2 (like r2 in simple linear regression The proportion of the variation in y that is explained by differences in the explanatory variables x1, x2, …, xkR = 1 – (SSE/SSTotal)From the printout, R2 = …38.25% of the variation in final exam score is explained by differences in the exam1, exam2, exam3, and hwtot explanatory variables % remains unexplained.When adjusted for degrees of freedom, Adjusted R2 = 36.99%
22Testing the Validity of the Model We pose the question:Is there at least one explanatory variable linearly related to the response variable?To answer the question we test the hypothesisH0: b1 = b2 = … = bk=0H1: At least one bi is not equal to zero.If at least one bi is not equal to zero, the model has some validity.
23Testing the Validity of the Final Exam Scores Regression Model The hypotheses are tested by what is called an F test shown in the Excel output belowMSR/MSEP-valuek =n–k–1 =n-1 =SSRMSR=SSR/kMSE=SSE/(n-k-1)SSE
24Testing the Validity of the Final Exam Scores Regression Model [Variation in y] = SSR + SSE.Large F results from a large SSR. Then, much of the variation in y is explained by the regression model; the model is useful, and thus, the null hypothesis H0 should be rejected. Reject H0 when P-value < 0.05
25Testing the Validity of the Final Exam Scores Regression Model Conclusion: There is sufficient evidence to rejectthe null hypothesis in favor of the alternative hypothesis.At least one of the bi is not equal to zero. Thus, at least one explanatory variable is linearly related to y.This linear regression model is validThe P-value (Significance F) < 0.05Reject the null hypothesis.
26Testing the Coefficients The hypothesis for each bi isExcel printoutH0: bi = 0H1: bi ¹ 0Test statisticd.f. = n - k -1