^ y = a + bx Stats Chapter 5 - Least Squares Regression Definition of a regression line: A regression line is a straight line that describes how a response variable (y) changes as an explanatory variable (x) changes… Used to predict a y value given an x value. Requires an explanatory and a response variable. Given as an equation of a line in slope intercept form: ^ y = a + bx Read as: “y-hat” a = y-intercept b = slope
How It Works: ^ y = a + bx ^ y x Using the regression line to predict a y-value ^ y = a + bx y ^ x
Vertical Distance = Observed - Predicted Close-Up: We are trying to find a line that minimizes the squares of the vertical distances… Observed y Vertical Distance y = positive Predicted y Vertical Distance = Observed - Predicted y = negative
Least-Squares Regression Line: The least-squares regression line of y on x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. The slope is the amount of change in y when x increases by one unit. ^ The intercept of the line is the predicted value of y when x = 0. ^
Calculator Procedure 1) Enter Data into lists… List 1 List 2 List 1 List 2
Write regression line from Run Stat > Calc > 8 Write regression line from Calculated a and b values ^ y = a + bx ^ y = 1.089 + .189x y-int = gas used when degree days = 0 slope = increase in gas used when degree days increase by one
Correlation vs. Regression The square of the correlation (r2) is the fraction of the variation in the values of y that is explained by the least squares regression line of y on x. 0 < r2 < 1 When reporting a regression, give r2 as a measure of how successful the regression was in explaining the response. ex: 5.4 pg 134.
Residual = The difference between observed & predicted y-values. Residuals Residual = The difference between observed & predicted y-values. ^ Residual = y - y Residual Plot - plots the residual values on the y-axis vs. the explanatory variable on the x-axis. -1 1 55 Makes patterns easier to see. Used to assess the fit of a regression line.
4) Set window values to match data range Calculator Procedure 1) Run regression 2) Go into Stat Plot 1 3) Set Y-List to ‘RESID’ 2nd > STAT > 7 4) Set window values to match data range 5) Graph
Residual Patterns Ideal: Curved: Spread:
Outliers: Vertical Horizontal
CORRELATION DOES NOT IMPLY CAUSATION!!! Cautions About Correlation and Regression Both describe LINEAR relationships Both are affected by outliers Always plot your data before interpreting Beware of EXTRAPOLATION Beware of LUKRING VARIABLES CORRELATION DOES NOT IMPLY CAUSATION!!!