Presentation on theme: "The Role of r2 in Regression Target Goal: I can use r2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,"— Presentation transcript:
1 The Role of r2 in Regression Target Goal: I can use r2 to explain the variation of y that is explained by the LSRL.D4: 3.2bHw: pg 191 – 43, 46, 48, 53, 63,
2 r : correlation coefficient r2 : the coefficient of determination If the line ŷ is a poor model, the value of r2 turns out to be: too small, closer to 0.If the line ŷ fit the data fairly well, the value of r2 turns out to be: larger, closer to 1.
3 What is the meaning of r2 in regression? Squares of the deviations about ŷ
4 Least-Squares Regression The Role of r2 in RegressionThe standard deviation of the residuals gives us a numerical estimate of the average size of our prediction errors. There is another numerical quantity that tells us how well the least-squares regression line predicts values of the response y.Least-Squares RegressionDefinition:The coefficient of determination r2 is the fraction of the variation in the values of y that is accounted for by the least-squares regression line of y on x. We can calculate r2 using the following formula:whereand
5 Formula for r2 made up of these parts SST: total sum of squares about the mean y bar.SST = ∑(y – y bar)2SSE: sum of the squares for error.SSE = ∑(y – ŷ)2r2: coefficient of determination.r2 = SST – SSESST
6 Ex: Large r2If , then the deviations and thus the ; in fact, if all of the points fell exactly on the regression line, SSE would be 0.r2 = SST – SSESSTx is a good predictor of ySSE would be small
7 For the data in this example, x: 0 5 10 y: 0 7 8 r2 = SST – SSE = 38 – 6 =SST 38Conclusion: We say that ____ of the ___________ is explained by the__________________________.0.84284%variation in yleast-squares regression of y on x.
8 r2 in Regression The coefficient of determination r2, is the fraction of the variation in the valuesthat are explainedby least-squared regression of y on x.
9 Least-Squares Regression The Role of r2 in Regressionr 2 tells us how much better the LSRL does at predicting values of y than simply guessing the mean y for each value in the dataset. Consider the example on page If we needed to predict a backpack weight for a new hiker, but didn’t know each hikers weight, we could use the average backpack weight as our prediction.Least-Squares RegressionIf we use the mean backpack weight as our prediction, the sum of the squared residuals isSST = 83.87If we use the LSRL to make our predictions, the sum of the squared residuals isSSE = 30.90SSE/SST = 30.97/83.87SSE/SST = 0.368Therefore, 36.8% of the variation in pack weight is unaccounted for by the least-squares regression line.1 – SSE/SST = 1 – 30.97/83.87r2 = 0.63263.2 % of the variation in backpack weight is accounted for by the linear model relating pack weight to body weight.
10 Least-Squares Regression Correlation and Regression WisdomCorrelation and regression are powerful tools for describing the relationship between two variables. When you use these tools, be aware of their limitationsLeast-Squares RegressionFact 1. The distinction between explanatory and response variables is important in regression.
11 Facts about least-squared regression Fact 2: There is a close connection between the slope of the least-squared regression line.As the correlation grows less strong, the in response to changes in x.correlation andprediction ŷ moves less
12 Fact 3: Every LSRL passes through Remember: When reporting a regression, give r2 as a measure of how successful the regression was in explaining the response.When you see a correlation (r), square it to get a better feel for thestrength of the association.
13 Exercise: Predicting The Stock Market. Some people think that the behavior of the stock market in January predicts its behavior for the rest of the year.Take the explanatory variable x to be the percent change in a stock market index in January and theresponse variable y to be the change in the index for the entire year.We expect a positive correlation between x and y because the change during January contributes to the years full change.
14 Calculation from the data for the years 1960 to 1997 gives: x bar = 1.75%, sx = 5.36y-bar = 9.07%, sy = 15.35%and r = 0.596
15 r = 0.596a. What percent of the observed variation in yearly changes in the index with is explained by a straight-line relationship the changes during January?The straight-line relationship is explained by r2 = or,35.5% of the variations in yearly changes in the index is explained by the changes during January.
16 What is the equation of the least-squared regression line for predicting full-year change from January change?Find b: ,b = 1.707Find a: ,a = 6.083%
17 The regression equation is ŷ = a + bxŷ = 6.083% x
18 Predictions The mean change in January is = 1.75%. Use your regression line to predict the change in the index in a year in which the index rises 1.75% (x bar) in January.Why could you have given this result w/out doing the calculation?Every LSRL passes through (x bar, y bar). Recall y bar = 9.07%, so the predicted change isŷ = 9.07%.
19 Exercise: Class attendance and grades A study of class attendance and grades among first year students at a state university showed that in general students who attended a higher percent of their classes earn higher grades.
20 Class attendance explained 16% of the variation in grade index among students. What is the numerical value of the correlation between percent of class attended and grade index?r2r =High attendance goes with high grades so the correlation must be positive.0.40