Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 8. Residuals, Part 2 Residuals are the amount that a data score is above or below the line. A data score which is above the line will result in.

Similar presentations


Presentation on theme: "Chapter 8. Residuals, Part 2 Residuals are the amount that a data score is above or below the line. A data score which is above the line will result in."— Presentation transcript:

1 Chapter 8

2 Residuals, Part 2 Residuals are the amount that a data score is above or below the line. A data score which is above the line will result in a positive residual. A data score which is below the line will result in a negative residual. The line represents every possible prediction our model might make about the data.

3 Residuals, Part 2 A positive residual means it was an underestimate. This is because a positive residual means the data score was above the line...which means the line was below the data score. Since the line is our estimate, it would be an underestimate.

4 Residuals, Part 2 Once we have calculated all of the residuals, we can plot them in a special kind of scatterplot. It is called a residual plot, but it is really just a scatterplot with the residuals for the y-values. An ideal residual plot will have no visible patterns and will not fan out in one direction.

5 Residuals, Part 2 When we make our line, we want it to be the best model available for our data. There are two parts to the relationship between our two data sets. One of these part is the part we describe with our model. The other part is the part which our model does not account for.

6 Residuals, Part 2 The residuals represent the variation in the data which our model does not account for. If the residuals have an obvious pattern, then that suggests that our model is incomplete. If the residual plot has a pattern, this provides insight on what sort of transformation would be ideal to use. We will not be transforming data based on this, but it is what later classes in statistics might do. You are still expected to be aware this is why we check for patterns in the residual plot though.

7 The Basic Idea The least squares regression line is based on the idea that you can approximate the z-scores for the response variable. The way this is done is we take a data value from our predictor variable, find the z-score for that data value, multiply it by our correlation, and then convert this new z-score into a data score for our response variable.

8 Conservative Estimates Since the correlation is never more than 1 and never less than -1, our predicted z-score for the response variable will never be further from the mean than our z-score for the predictor variable. In other words, we will tend to make predictions conservatively, leaning closer to the mean.

9 Conservative Estimates So, if we were using weights to predict height, any weight we used would be predicting a more conservative height. In other words a more unusual weight would predict a less unusual height. If we went the other way around, then any height would predict a more conservative weight. This is the reverse of what we had before.

10 Conservative Estimates Because we will be predicting conservatively, it matters which variable is doing the predicting. Even though correlation is the same between two variables no matter what the order, the regression line changes. If we want to switch which variable we use for predicting, then we need to recalculate the regression line.

11 Example from Book There is an example in your book, starting on page 180. The basic idea is that there is a moderate relationship between the grams of fat and grams of protein in Burger King menu items. Because this example is roughly one page worth of paragraphs, I am only going to summarize it, but I recommend you make time to read it.

12 Example from Book So first grams of protein are used to predict grams of fat. Ex. 30 grams of protein predicts 35.9 grams of fat. The grams of fat are used to predict grams of protein. If we put in 35.9 grams of fat, intuitively we would expect 30 grams of protein. 26.0 grams of protein are predicted by the new equation instead. This is why we have to refigure our line every time we use the other variable to predict.

13 Interpreting the Line To interpret the equation of the line we need to interpret the slope and we need to interpret the intercept. This is generally best done as a mostly scripted process. As a side note, interpret is a key word in a stats question to let you know you need to produce one or more sentences.

14 Interpreting the Slope Slope is rise over run. This means that when you express slope as a fraction, the top of the fraction is your change in the y direction. In other words, the change in your response variable. The bottom of the fraction is your change in the x direction. So, the change in the predictor variable.

15 Interpreting the Slope We can turn any number into a fraction by putting it over 1. So the slope we calculated, we can just put over 1, and now it is a fraction. In other words, when we interpret the slope, the change on the x-axis (predictor) will always be 1 unit.

16 Interpreting the Slope The slope we calculated (b or b 1 ) will be the change on our y-axis (response). So in yesterday’s example estimating the cost of tops based on the cost of pants, our slope was 1.10 and presumably in US dollars. I’m clarifying now that it IS in US dollars. So the change in our x (pants) is 1, and the change in our y (top) is 1.10.

17 Interpreting the Slope The general form of the script is: “The model predicts that for every that will by.” Because I care, here is the specific example: “The model predicts that for every dollar more that the pants cost, that the cost of the top worn with them will increase by $1.10.”

18 Interpreting the Slope Once more: “The model predicts that for every dollar more that the pants cost, that the cost of the top worn with them will increase by $1.10.” Note that when I talk about the 1 unit change in x, I did not even use a number as much as I just mentioned the unit. Note that for the change in y, I very much used the number.

19 Interpreting the Intercept Two minus one is one. 2 – 1 = 1 One minus zero is one. 1 – 0 = 1 Any starving person can tell you that the difference between one meal and two meals is not the same as the difference between one meal and no meals at all.

20 Interpreting the Intercept Concept Alert! 0 is the number we typically used to represent things such as “none”, “nothing”, and “completely missing”. It is important to be able to see the number 0 and then rationally consider what “nothing” means in the context of the problem.

21 Interpreting the Intercept The intercept is our prediction when x (the predictor variable) equals zero. Sometimes the intercept is ridiculous to talk about, because in the context of the problem, “nothing” is a goofy idea. We still want to interpret the slope even when it is ridiculous, but we usually want to also mention that it is ridiculous.

22 Interpreting the Intercept Consider the Burger King foods from the book: 0 grams of fat could make sense. 0 grams of protein could make sense. We would want to use terms like “fat free” and “protein free” instead of saying 0 grams. To consider our clothing example, we have to get a bit more technical.

23 Interpreting the Intercept One way to interpret what $0 cost pants are is “free pants”. If the study was instead based on the retail value of clothing rather than what the person wearing it personally paid, it now refers to genuinely valueless pants. This pretty much means either no pants at all or pants so useless they are not all that different from no pants at all.

24 Interpreting the Intercept While some of the bolder contrarians out there will want to disagree with me, I say it is ridiculous to discuss valueless pants. I think the concept of valueless pants is just silly and it should be considered an example of when an intercept does not have useful predictive power.

25 Interpreting the Intercept If we were using the nicotine content of cigarettes to see how many times a week a smoker would go buy cigarettes, our intercept would be a prediction for cigarettes with no nicotine at all. While not personally a smoker, it seems to me that a cigarette would kill the point even more than a caffeine-free diet soda. So I would say this is also a ridiculous intercept.

26 Interpreting the Intercept Whether or not it is ridiculous, here is the script: “The model predicts that for the would be.” So in the context of our clothing, where the intercept was -10: “The model predicts that for truly valueless pants the cost of the top worn with them would be -$10.”

27 Assignments Read the first half of Chapter 9 for Friday. Chapter 8 Quiz Friday. Ch. 8: Do 1 problem from each set of 5 You may skip two sets of 5 (of your choosing). You should do 8 problems. Due Monday.

28 Chapter 8 Quiz Bulletpoints Know why we need a hat for regression equations. Know how to find a least squares regression line from the data. Know how to find a least squares regression line from the summary statistics. Know how to find r from r 2. Know how to interpret the slope and the intercept for a regression line.


Download ppt "Chapter 8. Residuals, Part 2 Residuals are the amount that a data score is above or below the line. A data score which is above the line will result in."

Similar presentations


Ads by Google