Presentation on theme: "Residuals. Deviations from the overall pattern of the regression line are important. “Left-over” variations in the response after fitting the regression."— Presentation transcript:
Deviations from the overall pattern of the regression line are important. “Left-over” variations in the response after fitting the regression line are called residuals.
Residuals A residual is the difference in the observed value of the response variable and the value predicted by the regression line. (how far the data fall from the regression line). Residual = observed y – predicted y Residual = y – ŷ
Ex: Gesell Scores Does the age at which a child begins to talk predict later score on a test of mental ability? Scatter plot of Gesell Adaptive scores.
Describe the distribution The line is the LSRL for predicting Gesell score from age of first word. Plot shows negative association. Pattern is moderately (some scatter)strong and roughly linear. Correlation r = -0.640 describes direction and strength.
Predictions LSRL: ŷ = 109.8738 – 1.1270x For a child who first spoke at 15 months, we predict: ŷ = 109.8738 – 1.1270( ) ŷ = 92.97 The child’s actual score was 95. Residual = Residual = - 15 95 obsrv. y – pred. y 92.97 = 2.03
Residual = 2.03 The residual is positive because the data point lies above the line.
The mean of the least-squares residuals is always zero. take into account round off error A line at 0 is reference point that helps orient us.
Scatterplot and Residual plot Residual plot for the regression of Gesell score on age of first word. Child 19 is an outlier. Child 18 is an influential obser. that does not have a large residual.
Residual Plots A Residual Plot is a scatterplot of the regression residuals against the explanatory variable. They help us assess the fit of a regression line. If the regression line captures the overall relationship between x and y, the residuals should have no systematic pattern.
Things to look out for with residual plots The uniform scatter of points indicates that the regression line fits the data well, so the line is a good model.
A curved pattern shows that the relationship is not linear.
Increasing or decreasing spread about the line. The response variable y has more spread for larger values of the explanatory variable x, so the prediction will be less accurate when x is large.
Watch out for: Individual points with large residuals, like Child 19. Individual points that are extreme in the x direction, like Child 18.
Outliers and Influential Observations in Regression Outlier: an observation that lies outside the overall pattern. Influential: an observation is influential if removing it would markedly change the result of the calculation.
Points that are outliers in the x direction of a scatterplot are often influential for the LSRL. The dashed line is calculated leaving out Child 18 (Influential observation). Leaving out this observation changes the regression line quite a bit
Least-Squares Regression Correlation and Regression Wisdom Definition: An outlier is an observation that lies outside the overall pattern of the other observations. Points that are outliers in the y direction but not the x direction of a scatterplot have large residuals. Other outliers may not have large residuals. An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation. Points that are outliers in the x direction of a scatterplot are often influential for the least-squares regression line. Examine the change in the LSRL when removing outlier Child 19 and influential point child 18.
Exercise: Investing at Home and Overseas (with calc) Investors ask about the relationship between returns on investments in the Unites States and on investments overseas. The table gives the total returns on U.S. and overseas common stocks over a 26- year period.
Residual plots with the calculator a.Make a scatterplot for predicting overseas returns (y) from U.S. returns(x). Clear L1, L2, L3 Enter U.S. returns in L1, overseas returns in L2
STATPLOT [this first graph is scatterplot] L1,L2; ZOOM:STAT
b. Find the correlation and r 2 Describe the relationship between U.S. and overseas returns in words, using r and r 2 to make your description more precise. STAT:CALC:LinReg(a+bx):L1,L2,Y1 r = 0.463 r 2 = 0.214 = 21.4%
There is a positive association between U.S. and overseas returns but it is not very strong. Knowing the U.S. returns accounts for only about 21.4% of the variation in overseas returns.
c. Find the LSRL of overseas returns on U.S. returns. Draw the line on the scatterplot. ŷ = 5.683 + 0.6181x (from (b)) (Equation should be at Y1: Y1= 5.683 + 0.6181x) Just select GRAPH
Use the regression line to predict d. In 1997, the return on U.S. stocks was 33.4%. Use the regression line to predict the overseas stocks. The actual overseas return was 2.1%. ŷ = 5.683 + 0.6181(33.4)
With calculator: ŷ = 26.3% When x = 33.4%, ŷ = 26.3%
Are you confident that predictions using the regression line will be quite accurate? Why? Since the correlation is so low, the predictions will not be very reliable.
e. Identify the point that has the largest residual either positive or negative. What year is this? Are there any points that seem to be very influential? Look at graph (TRACE) and table: 1986, the overseas return was 69.4%. There are no points that look influential.
Graphing residuals f.Make a scatterplot of the residuals on the U.S. % return. Turn off Y1 graph 2nd STAT(LIST): Note: The calculator automatically stores the residuals in “resid” after LinReg(a+bx) is executed.
Graphing residuals At main screen:2 nd STAT:NAME Scroll down to “resid”: enter STO L3 STATPLOT: L1, L3 The x axis in the residual plot serves as a reference line. Points above it are positive residuals and points below are negative residuals.