# The Role of r2 in Regression Target Goal: I can use r2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,

## Presentation on theme: "The Role of r2 in Regression Target Goal: I can use r2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,"— Presentation transcript:

The Role of r2 in Regression Target Goal: I can use r2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48, 53, 63,

r : correlation coefficient r2 : the coefficient of determination
If the line ŷ is a poor model, the value of r2 turns out to be: too small, closer to 0. If the line ŷ fit the data fairly well, the value of r2 turns out to be: larger, closer to 1.

What is the meaning of r2 in regression?
Squares of the deviations about ŷ

Least-Squares Regression
The Role of r2 in Regression The standard deviation of the residuals gives us a numerical estimate of the average size of our prediction errors. There is another numerical quantity that tells us how well the least-squares regression line predicts values of the response y. Least-Squares Regression Definition: The coefficient of determination r2 is the fraction of the variation in the values of y that is accounted for by the least-squares regression line of y on x. We can calculate r2 using the following formula: where and

Formula for r2 made up of these parts
SST: total sum of squares about the mean y bar. SST = ∑(y – y bar)2 SSE: sum of the squares for error. SSE = ∑(y – ŷ)2 r2: coefficient of determination. r2 = SST – SSE SST

Ex: Large r2 If , then the deviations and thus the ; in fact, if all of the points fell exactly on the regression line, SSE would be 0. r2 = SST – SSE SST x is a good predictor of y SSE would be small

For the data in this example, x: 0 5 10 y: 0 7 8
r2 = SST – SSE = 38 – 6 = SST 38 Conclusion: We say that ____ of the ___________ is explained by the __________________________. 0.842 84% variation in y least-squares regression of y on x.

r2 in Regression The coefficient of determination r2, is the
fraction of the variation in the values that are explained by least-squared regression of y on x.

Least-Squares Regression
The Role of r2 in Regression r 2 tells us how much better the LSRL does at predicting values of y than simply guessing the mean y for each value in the dataset. Consider the example on page If we needed to predict a backpack weight for a new hiker, but didn’t know each hikers weight, we could use the average backpack weight as our prediction. Least-Squares Regression If we use the mean backpack weight as our prediction, the sum of the squared residuals is SST = 83.87 If we use the LSRL to make our predictions, the sum of the squared residuals is SSE = 30.90 SSE/SST = 30.97/83.87 SSE/SST = 0.368 Therefore, 36.8% of the variation in pack weight is unaccounted for by the least-squares regression line. 1 – SSE/SST = 1 – 30.97/83.87 r2 = 0.632 63.2 % of the variation in backpack weight is accounted for by the linear model relating pack weight to body weight.

Least-Squares Regression
Correlation and Regression Wisdom Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, be aware of their limitations Least-Squares Regression Fact 1. The distinction between explanatory and response variables is important in regression.

Fact 2: There is a close connection between the slope of the least-squared regression line. As the correlation grows less strong, the in response to changes in x. correlation and prediction ŷ moves less

Fact 3: Every LSRL passes through
Remember: When reporting a regression, give r2 as a measure of how successful the regression was in explaining the response. When you see a correlation (r), square it to get a better feel for the strength of the association.

Exercise: Predicting The Stock Market.
Some people think that the behavior of the stock market in January predicts its behavior for the rest of the year. Take the explanatory variable x to be the percent change in a stock market index in January and the response variable y to be the change in the index for the entire year. We expect a positive correlation between x and y because the change during January contributes to the years full change.

Calculation from the data for the years 1960 to 1997 gives:
x bar = 1.75%, sx = 5.36 y-bar = 9.07%, sy = 15.35% and r = 0.596

r = 0.596 a. What percent of the observed variation in yearly changes in the index with is explained by a straight-line relationship the changes during January? The straight-line relationship is explained by r2 = or, 35.5% of the variations in yearly changes in the index is explained by the changes during January.

What is the equation of the least-squared regression line for predicting full-year change from January change? Find b: , b = 1.707 Find a: , a = 6.083%

The regression equation is
ŷ = a + bx ŷ = 6.083% x

Predictions The mean change in January is = 1.75%.
Use your regression line to predict the change in the index in a year in which the index rises 1.75% (x bar) in January. Why could you have given this result w/out doing the calculation? Every LSRL passes through (x bar, y bar). Recall y bar = 9.07%, so the predicted change is ŷ = 9.07%.