# Residuals.

## Presentation on theme: "Residuals."— Presentation transcript:

Residuals

Deviations from the overall pattern of the regression line are important.
“Left-over” variations in the response after fitting the regression line are called residuals.

Residuals A residual is the difference in the observed value of the response variable and the value predicted by the regression line. (how far the data fall from the regression line). Residual = observed y – predicted y Residual = y – ŷ

Ex: Gesell Scores Does the age at which a child begins to talk predict later score on a test of mental ability? Scatter plot of Gesell Adaptive scores.

Describe the distribution
The line is the LSRL for predicting Gesell score from age of first word. Plot shows negative association. Pattern is moderately (some scatter)strong and roughly linear. Correlation r = describes direction and strength.

Predictions LSRL: ŷ = 109.8738 – 1.1270x
For a child who first spoke at 15 months, we predict: ŷ = – ( ) ŷ = 92.97 The child’s actual score was 95. Residual = Residual = - 15 92.97 95 obsrv. y – pred. y = 2.03

Residual = 2.03 The residual is positive because the data point lies above the line.

The mean of the least-squares residuals is always zero.
take into account round off error A line at 0 is reference point that helps orient us.

Scatterplot and Residual plot
Residual plot for the regression of Gesell score on age of first word. Child 19 is an outlier. Child 18 is an influential obser. that does not have a large residual.

Residual Plots A Residual Plot is a scatterplot of the regression residuals against the explanatory variable. They help us assess the fit of a regression line. If the regression line captures the overall relationship between x and y, the residuals should have no systematic pattern.

Things to look out for with residual plots
The uniform scatter of points indicates that the regression line fits the data well, so the line is a good model.

A curved pattern shows that the relationship is not linear.

Increasing or decreasing spread about the line. The response variable y has more spread for larger values of the explanatory variable x, so the prediction will be less accurate when x is large.

Watch out for: Individual points with large residuals, like Child 19.
Individual points that are extreme in the x direction, like Child 18.

Outliers and Influential Observations in Regression
an observation that lies outside the overall pattern. Influential: an observation is influential if removing it would markedly change the result of the calculation.

Points that are outliers in the x direction of a scatterplot are often influential for the LSRL.
The dashed line is calculated leaving out Child 18 (Influential observation). Leaving out this observation changes the regression line quite a bit

Least-Squares Regression
Correlation and Regression Wisdom Examine the change in the LSRL when removing outlier Child 19 and influential point child 18. Least-Squares Regression Definition: An outlier is an observation that lies outside the overall pattern of the other observations. Points that are outliers in the y direction but not the x direction of a scatterplot have large residuals. Other outliers may not have large residuals. An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation. Points that are outliers in the x direction of a scatterplot are often influential for the least-squares regression line.

Exercise: Investing at Home and Overseas (with calc)
Investors ask about the relationship between returns on investments in the Unites States and on investments overseas. The table gives the total returns on U.S. and overseas common stocks over a 26-year period.

Residual plots with the calculator
a. Make a scatterplot for predicting overseas returns (y) from U.S. returns(x). Clear L1, L2, L3 Enter U.S. returns in L1, overseas returns in L2

STATPLOT [this first graph is scatterplot] L1,L2; ZOOM:STAT

b. Find the correlation and r2
Describe the relationship between U.S. and overseas returns in words, using r and r2 to make your description more precise. STAT:CALC:LinReg(a+bx):L1,L2,Y1 r = r2 = = 21.4%

There is a positive association between U. S
There is a positive association between U.S. and overseas returns but it is not very strong. Knowing the U.S. returns accounts for only about 21.4% of the variation in overseas returns.

c. Find the LSRL of overseas returns on U.S. returns.
Draw the line on the scatterplot. ŷ = x (from (b)) (Equation should be at Y1: Y1= x) Just select GRAPH

Use the regression line to predict
d. In 1997, the return on U.S. stocks was 33.4%. Use the regression line to predict the overseas stocks. The actual overseas return was 2.1%. ŷ = (33.4)

With calculator: ŷ = 26.3% When x = 33.4%, ŷ = 26.3%

Are you confident that predictions using the regression line will be quite accurate? Why?
Since the correlation is so low, the predictions will not be very reliable.

Look at graph (TRACE) and table: 1986, the overseas return was 69.4%.
e. Identify the point that has the largest residual either positive or negative. What year is this? Are there any points that seem to be very influential? Look at graph (TRACE) and table: 1986, the overseas return was 69.4%. There are no points that look influential.

Graphing residuals f. Make a scatterplot of the residuals on the U.S. % return. Turn off Y1 graph 2nd STAT(LIST): Note: The calculator automatically stores the residuals in “resid” after LinReg(a+bx) is executed.

Graphing residuals At main screen:2nd STAT:NAME
Scroll down to “resid”: enter STO L3 STATPLOT: L1, L3 The x axis in the residual plot serves as a reference line. Points above it are positive residuals and points below are negative residuals.

HOMEWORK 3.42, 3.44