AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION
Regression Line A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x. Regression, unlike correlation, requires we have an explanatory variable and a response variable. LSRL – Is the abbreviation for least squares regression line. LSRL is a mathematical model.
Least – squares Regression Line Error = observed – predicted To find the most effective model we must square the errors and sum them to find the least errors squared.
Least – squares Regression Line The least – squares regression line of y on x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.
Equation of the LSRL We have data on an explanatory variable x and a response variable y for n individuals. From the data, calculate the means x and y and the standard deviations s x and s y, and their correlation r. ¯¯
What happened to y = mx+b? y represents the observed (actual) values for y, and y represents the predicted values for y. We use y hat in the equation of the regression line to emphasize that the line gives predicted values for any x. When you are solving regression problems, be sure to distinguish between y and y. Hot tip: (x, y) is always a point on the regression line! ˆ ˆ ¯¯
AP STATISTICS LESSON 3 – 3 (DAY 2) The role of r 2 in regression
Essential Question: How is the r 2 used to determine the reliability of a linear regression line? To calculate r 2. To find the SST, the SSE and find the r 2 from them.
Definitions and Abbreviations r 2 = coefficient of determination ( The proportion of the total sample variability that is explained by the least-squares regression of y on x. LSRL – Least squares regression line. SST – (Total Sum of Squares) SST = ∑ ( y – y ) SSE – (Sum of squares of errors) SSE = ∑ ( y – ŷ) 2 2
Exercises Small r 2 and Large r 2 Page 158: Example 3.10 SMALL r 2 Page 160: Example 3.11 LARGE r 2
r 2 in Regression The coefficient of determination r 2, is the fraction of the variation in the values of y that is explained by least-squares regression of y on x. r 2 = SST - SSE SST
Facts about Least-squares Regressions Fact 1: The distinction between explanatory and response variable is essential in regression. Fact 2: There is a close connection between correlation and the slope of the least-squares line. A change of one standard deviation of x corresponds to a change of r standard deviations in y.
Facts of Regression (continued) Fact 3. The least-squares regression line always passes through the point ( x, y ). Fact 4. The square of the correlation, r 2, is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x.
A P STATISTICS LESSON 3 – 3 (DAY 3) A P STATISTICS LESSON 3 – 3 (DAY 3) RESIDUALS
ESSENTIAL QUESTION: What is a residual and what can a residual graph tell us about linear regression lines? Objective: To define and use residuals in the analysis of linear regression lines.
Residuals A residual is the difference between an observed variable and the value predicted by the regression line. That is, residual = observed y – predicted y = y - ŷ
Residual Facts The mean of the least-square residuals is always zero. The sum is not exactly 0 because the software rounded the residuals to four decimal places. This is roundoff error. The horizontal line of the residual plot is at zero.
Residual Plots A residual plot is a scatterplot of the regression residuals against the explanatory variable. Residual plots help us assess the fit of a regression line. If the regression line captures the overall relationship between x and y, the residuals should should have no systematic pattern. The residual plot will look something like the simplfied pattern. That plot shows a uniform scatter of the points about the fitted line, with no unusual individual observations.