Economics 173 Business Statistics Lecture 18 Fall, 2001 Professor J. Petry

Economics 173 Business Statistics Lecture 18 Fall, 2001 Professor J. Petry http://www.cba.uiuc.edu/jpetry/Econ_173_fa01/

2 17.9 Regression Diagnostics - I The three conditions required for the validity of the regression analysis are: –the error variable is normally distributed. –the error variance is constant for all values of x. –The errors are independent of each other. How can we diagnose violations of these conditions? For now we will use visual inspection, soon we will conduct formal tests to analyze these conditions.

3 One Other Issue Before Using Equation: Outliers –An outlier is an observation that is unusually small or large. –Several possibilities need to be investigated when an outlier is observed: There was an error in recording the value. The point does not belong in the sample. The observation is valid. –Identify outliers from the scatter diagram. –It is customary to suspect an observation is an outlier if its |standard residual| > 2

4 + + + + + + + + + + + + + + + + + The outlier causes a shift in the regression line … but, some outliers may be very influential ++++++++++ An outlier An influential observation

5 Procedure for regression diagnostics –Develop a model that has a theoretical basis. –Gather data for the two variables in the model. –Draw the scatter diagram to determine whether a linear model appears to be appropriate. –Check the required conditions for the errors. –Assess the model fit. –If the model fits the data, use the regression equation.

6 Chapter 18 Multiple Regression 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent variables. We expect to build a model that fits the data better than the simple linear regression model.

7 We will use computer printout to –Assess the model How well it fits the data Is it useful Are any required conditions violated? –Employ the model Interpreting the coefficients Predictions using the prediction equation Estimating the expected value of the dependent variable

8 Coefficients Dependent variableIndependent variables Random error variable 18.2 Model and Required Conditions We allow for k independent variables to potentially be related to the dependent variable y =  0 +  1 x 1 +  2 x 2 + …+  k x k + 

9 y =  0 +  1 x X y X2X2 1 The simple linear regression model allows for one independent variable, “x” y =  0 +  1 x +  The multiple linear regression model allows for more than one independent variable. Y =  0 +  1 x 1 +  2 x 2 +  Note how the straight line becomes a plain, and... y =  0 +  1 x 1 +  2 x 2

10 X y X2X2 1 … a parabola becomes a parabolic surface y= b 0 + b 1 x 2 y = b 0 + b 1 x 1 2 + b 2 x 2 b0b0

11 Required conditions for the error variable  –The error  is normally distributed with mean equal to zero –The error term has a constant standard deviation   (independent of the value of y). –The errors are independent. These conditions are required in order to –estimate the model coefficients, –assess the resulting model.

12 –If the model passes the assessment tests, use it to interpret the coefficients and generate predictions. –Assess the model fit and usefulness using the model statistics. –Diagnose violations of required conditions. Try to remedy problems when identified. 18.3 Estimating the Coefficients and Assessing the Model The procedure –Obtain the model coefficients and statistics using a statistical computer software.

13 –La Quinta Motor Inns is planning an expansion. –Management wishes to predict which sites are likely to be profitable. –Several areas where predictors of profitability can be identified are: Competition Market awareness Demand generators Demographics Physical quality Example 18.1 Where to locate a new motor inn?

14 Profitability Competition Market awareness CustomersCommunity Physical Margin RoomsNearestOffice space College enrollment IncomeDisttown Distance to downtown. Median household income. Distance to the nearest La Quinta inn. Number of hotels/motels rooms within 3 miles from the site.

15 –Data was collected from randomly selected 100 inns that belong to La Quinta, and ran for the following suggested model: Margin =     Rooms   Nearest   Office    College +  5 Income +  6 Disttwn +

16 Excel output This is the sample regression equation (sometimes called the prediction equation) MARGIN = 72.455 - 0.008 ROOMS - 1.646 NEAREST + 0.02 OFFICE +0.212 COLLEGE - 0.413 INCOME + 0.225 DISTTWN Assessing this equation

17 Standard error of estimate –We need to estimate the standard error of estimate –Compare s  to the mean value of y From the printout, Standard Error = 5.5121 Calculating the mean value of y we have –It seems s  is not particularly small. –Can we conclude the model does not fit the data well?

18 Coefficient of determination –The definition is –From the printout, R 2 = 0.5251 –52.51% of the variation in the measure of profitability is explained by the linear regression model formulated above. –When adjusted for degrees of freedom, Adjusted R 2 = 1-[SSE/(n-k-1)] / [SST/(n-1)] = = 49.44%

19 Testing the validity of the model –We pose the question: Is there at least one independent variable linearly related to the dependent variable? –To answer the question we test the hypothesis H 0 :  1 =  2 = … =  k = 0 H 1 : At least one  i is not equal to zero. –If at least one  i is not equal to zero, the model is valid.

20 To test these hypotheses we perform an analysis of variance procedure. The F test –Construct the F statistic –Rejection region F>F ,k,n-k-1 MSE MSR F  MSR=SSR/k MSE=SSE/(n-k-1) SST = SSR + SSE. Large F results from a large SSR. Then, much of the variation in y is explained by the regression model. The null hypothesis should be rejected; thus, the model is valid. Required conditions must be satisfied.

21 Excel provides the following ANOVA results Example 18.1 - continued SSE SSR MSE MSR MSR/MSE

22 Excel provides the following ANOVA results Example 18.1 - continued F ,k,n-k-1 = F 0.05,6,100-6-1 =2.17 F = 17.14 > 2.17 Also, the p-value (Significance F) = 3.03382(10) -13 Clearly,  = 0.05>3.03382(10) -13, and the null hypothesis is rejected. Conclusion: There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At least one of the  i is not equal to zero. Thus, at least one independent variable is linearly related to y. This linear regression model is valid

Economics 173 Business Statistics Lecture 18 Fall, 2001 Professor J. Petry

Similar presentations

Presentation on theme: "Economics 173 Business Statistics Lecture 18 Fall, 2001 Professor J. Petry"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Economics 173 Business Statistics Lecture 18 Fall, 2001 Professor J. Petry

Similar presentations

Presentation on theme: "Economics 173 Business Statistics Lecture 18 Fall, 2001 Professor J. Petry"— Presentation transcript:

Similar presentations

About project

Feedback