Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 9 Regression Wisdom. Getting the “Bends” Linear regression only works for data with a linear association. Curved relationships may not be evident.

Similar presentations


Presentation on theme: "Chapter 9 Regression Wisdom. Getting the “Bends” Linear regression only works for data with a linear association. Curved relationships may not be evident."— Presentation transcript:

1 Chapter 9 Regression Wisdom

2 Getting the “Bends” Linear regression only works for data with a linear association. Curved relationships may not be evident when looking at a scatterplot; it is sometimes more be obvious in the residuals plot. You need to examine the residual plot to re-check the Straight Enough Condition after you find the regression equation.

3 More About Residuals Residuals can reveal subtleties that are not clear from a plot of the original data. Sometimes the things we see are additional details that help confirm or refine our understanding of our data. Sometimes they reveal violations of the conditions required for regression.

4 Examine both a histogram and a scatterplot of the residuals after a regression, looking for curves in the scatterplot and a unimodal/symmetric distribution in the histogram of the residuals: The smaller modes in this histogram are marked with x and ■ in the scatterplot above. What do you see? More About Residuals

5 Sifting Residuals for Groups/Subsets A careful examination of the residuals may lead us to discover groups of observations that are different from the rest. If we decide there is more than one group, we should consider analyzing the groups separately, obtaining different regression models. Together or apart? You must decide what makes the most sense in context.

6 Extrapolation: Reaching Beyond the Data Linear models allow us to make predictions based on the data we have. It is dangerous to assume that a linear relationship exists beyond the range of our data. The farther an x value is from the mean of x, the less trust we should place in the model to accurately predict y-hat. Predictions outside the range of our existing data are called extrapolations. Extrapolations are dangerous and should be avoided.

7 Extrapolation (cont.) Here is a timeplot of the Energy Information Administration’s predictions and actual prices of oil. How did forecasters do? They extrapolated and missed a sharp run-up in oil prices in the past few years.

8 Predicting the Future – We Can’t!! Extrapolation is especially dangerous when the x- variable in the model is time, extrapolation becomes an attempt to look into the future. Knowing that extrapolation is dangerous doesn’t stop people from doing it. The temptation to see into the future is hard to resist. My Advice: If you’re asked to extrapolate into the future, at least don’t believe the prediction will come true (and use weasel words liberally!!).

9

10 Unusual Points, Leverage, and Influence Unusual points can strongly influence a regression. Even a single point far from the body of the data can dominate a regression analysis. Any point that stands away from the others deserves your special attention. This scatterplot shows that something was awry in Palm Beach County, Florida, during the 2000 presidential election…

11 Outliers, Leverage, and Influence (cont.) The red line shows the effect an unusual point has on a regression: Pay special attention to points with large residuals and leverage.

12 Outliers, Leverage, and Influence (cont.) Data points with x-values far away from the mean of x have high leverage.

13 Outliers, Leverage, and Influence (cont.) The extraordinarily large shoe size gives the data point high leverage. Wherever the IQ is, the line will follow! A point is influential if removing it from the analysis gives a very different model – be especially cautious about slope changes.

14 Outliers, Leverage, and Influence (cont.) When we investigate an unusual point, we often learn more about the data than we could have learned from the model alone. Don’t just delete unusual points from the data set. You should fit a model with and without the unusual points and examine the two regression models to understand how they differ and what influence the point exerts. Influential points can ‘hide’ in residual plots! Points with high leverage “pull” the regression line closer to them, so their residual decreases. It’s often easies to see influential points in scatterplots of the original data.

15

16 Lurking Variables and Causation No matter how strong the association is… No matter how large R 2 is… No matter how straight the points appear… No matter how reasonable the association seems… No matter how “random” the residual plot looks… There is no way to conclude from a regression alone that one variable causes the other. There is always the possibility that a lurking variable is affecting both variables. So…………How can we prove causation? Answer: Controlled Experiments

17 Lurking Variables and Causation (cont.) Average life expectancy vs. number of doctors per person: Average life expectancy vs. number of TVs per person: Since TVs are cheaper than doctors, we should send TVs to countries with low life expectancies!!

18 Lurking Variables and Causation (cont.) TVs are cheaper than doctors, so if we sent TVs to countries with low life expectancies we can raise life expectancy…Right? Lurking variable? Countries with higher standards of living have both longer life expectancies and more doctors (and TVs!). Improving living standards might prolong lives, incent doctors to move there and provide more spending money so people can buy TVs!! REMEMBER ICE CREAM AND SHARK ATTACKS!! (SLIDES SHOW??)

19 Working With Summary Values Summary statistics (averages) show less variability than we would see if we measured the same variable on every individual.

20 Working With Summary Values (cont.) Weight vs. HeightMean Weight vs. Height

21 Working With Summary Values (cont.) Means vary less than individual values. Scatterplots of summary statistics show less scatter than the individual data points, giving a false impression of how strong the model is (both r and R 2 will be artificially high). There is no easy way to correct this so we must be aware of it and mention it in our analysis.

22 What Can Go Wrong? Make sure the relationship is straight. Check the Straight Enough Condition before and after the regression. Be on guard for different groups in your regression. If you find subsets that behave differently, consider fitting a different linear model to each subset. Beware of extrapolating…be especially cautious about extrapolating into the future!

23 Look for unusual points & consider additional regression models…use r-squared to evaluate how good your model is! Treat unusual points honestly. Don’t just remove them to get a model that fits better. Beware of lurking variables - association does not equal causation!! Be cautious about your model’s strength when dealing with data that are averages. Summary data tend to inflate the impression of the strength of a relationship. What Can Go Wrong?


Download ppt "Chapter 9 Regression Wisdom. Getting the “Bends” Linear regression only works for data with a linear association. Curved relationships may not be evident."

Similar presentations


Ads by Google