Presentation on theme: "Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible. Because those vertical distances."— Presentation transcript:
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible. Because those vertical distances represent “left- over” variation in the response after fitting the regression line, these distances are called residuals.
Or in other words, the residuals are the distances from the points to the LSRL.
Calculating a Residual One subject's NEA rose by 135 calories and he gained 2.7 kg of fat. The predicted gain for 135 calories from the regression equation is: The residual for this subject is therefore: observed - predicted
Fat Gain & NEA (yet again!) Here are the residuals for all 16 data values from the NEA experiment: Although residuals can be calculated from any model that is fitted to the data, the residuals from the least- squares line have a special property: the sum of the least-squares residuals is always zero. (Try adding the numbers above- - they add up to zero!)
The line y=0 corresponds with the regression line, and also marks the mean of our residuals. The residuals plot magnifies the deviations from the line to make patterns easier to see.
Residual Plots What to look for when examining a residual plot: 1. Residual plots should have no pattern.
Residual Plots What to look for when examining a residual plot: A curved pattern shows that the relationships may not be linear. Increasing spread about the line as x increases indicates the prediction will be less accurate for larger x values. Similarly, decreasing spread indicates the prediction will be less accurate for smaller x values.
Residual Plots What to look for when examining a residual plot: 1. The residual plot should show no pattern. 2.The residuals should be relatively small in size.
The role of r 2 in regression A residual plot is a graphical tool for evaluating how well a linear model fits the data. Look at the residual plot first to see if a linear model is a good fit. If the linear model is a good fit, then there is also a numerical quantity that tells us how well the LSRL does at predicting values of the response variable y. It is r 2, the coefficient of determination.
The role of r 2 in regression r 2 is actually the correlation squared, but there's more to the story... The idea of r 2 is this: how much better is the least- squares line at predicting responses y than if we just used our mean?
The role of r 2 in regression Is the LSRL better at predicting the data values than the mean? r 2 tells us how much better. Here's the line that represents the y mean of our data. Here's our LSRL
Note: Remember we defined the variance back when we talked about standard deviation. r 2 compares the variance from the mean (the SST part of the equation) with the residuals (the SSE part of the equation). Here's the formula:
For example, if r 2 =0.606 (as it does in the NEA example), then about 61% of the variation in fat gain among the individual subjects is due to the straight-line relationship between fat gain and NEA. The other 39% is individual variation among subjects that is not explained by the linear relationship.
When you report a regression, give r 2 as a measure of how successful the regression was in explaining the response. When you see a correlation, square it to get a better feel for the strength of the linear relationship.
Review Facts About Least-Square Regression The distinction between explanatory and response variables is essential in regression. In the regression setting you must know clearly which variable is explanatory!
Review Facts About Least-Square Regression There is a close connection between correlation and the slope of the LSRL. The slope is This equation says that along the regression line, a change of one standard deviation in x corresponds to a change of r standard deviations in y.
The least-squares regression line of y on x always passes through the point (mean of x values, mean of y values) Review Facts About Least-Square Regression
The correlation r describes the strength of a straight-line relationship. The square of the correlation, r 2, is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x.