# 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Notes on Residuals Simple Linear Regression Models.

## Presentation on theme: "1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Notes on Residuals Simple Linear Regression Models."— Presentation transcript:

1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Notes on Residuals Simple Linear Regression Models

2 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Terminology

3 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example Consider the following data on percentage unemployment and suicide rates. * Smith, D. (1977) Patterns in Human Geography, Canada: Douglas David and Charles Ltd., 158.

4 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example The plot of the data points produced by Minitab follows

5 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Residual Analysis The simple linear regression model equation is y =  +  x + e where e represents the random deviation of an observed y value from the population regression line  +  x. Key assumptions about e 1.At any particular x value, the distribution of e is a normal distribution 2.At any particular x value, the standard deviation of e is , which is constant over all values of x.

6 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Residual Analysis To check on these assumptions, one would examine the deviations e 1, e 2, …, e n. Generally, the deviations are not known, so we check on the assumptions by looking at the residuals which are the deviations from the estimated line, a + bx. The residuals are given by

7 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Standardized Residuals Recall: A quantity is standardized by subtracting its mean value and then dividing by its true (or estimated) standard deviation. For the residuals, the true mean is zero (0) if the assumptions are true. The estimated standard deviation of a residual depends on the x value. The estimated standard deviation of the i th residual,, is given by

8 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Standardized Residuals As you can see from the formula for the estimated standard deviation the calculation of the standardized residuals is a bit of a calculational nightmare. Fortunately, most statistical software packages are set up to perform these calculations and do so quite proficiently.

9 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Standardized Residuals - Example Consider the data on percentage unemployment and suicide rates Notice that the standardized residual for Pittsburgh is -2.50, somewhat large for this size data set.

10 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example Pittsburgh This point has an unusually high residual

11 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Normal Plots Notice that both of the normal plots look similar. If a software package is available to do the calculation and plots, it is preferable to look at the normal plot of the standardized residuals. In both cases, the points look reasonable linear with the possible exception of Pittsburgh, so the assumption that the errors are normally distributed seems to be supported by the sample data.

12 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. More Comments The fact that Pittsburgh has a large standardized residual makes it worthwhile to look at that city carefully to make sure the figures were reported correctly. One might also look to see if there are some reasons that Pittsburgh should be looked at separately because some other characteristic distinguishes it from all of the other cities. Pittsburgh does have a large effect on model.

13 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. This plot is an example of a satisfactory plot that indicates that the model assumptions are reasonable. Visual Interpretation of Standardized Residuals

14 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. This plot suggests that a curvilinear regression model is needed. Visual Interpretation of Standardized Residuals

15 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. This plot suggests a non-constant variance. The assumptions of the model are not correct. Visual Interpretation of Standardized Residuals

16 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. This plot shows a data point with a large standardized residual. Visual Interpretation of Standardized Residuals

17 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. This plot shows a potentially influential observation. Visual Interpretation of Standardized Residuals

18 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example - % Unemployment vs. Suicide Rate This plot of the residuals (errors) indicates some possible problems with this linear model. You can see a pattern to the points. Generally decreasing pattern to these points. Unusually large residual These two points are quite influential since they are far away from the others in terms of the % unemployed