Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 3 Unusual points and cautions in regression.

Similar presentations


Presentation on theme: "Chapter 3 Unusual points and cautions in regression."— Presentation transcript:

1 Chapter 3 Unusual points and cautions in regression

2 Look for Outliers & Influential Observations Does the age at which a child begins to talk predict later scores on a test of mental ability? Analyze the following data (using the technology toolbox!)‏

3 Remember the toolbox... Answer the key questions. Graph the data Calculate Numerical Summaries When possible, use a mathematical model to represent the data Interpretation

4 1.Key Questions Who? What? When? Where? Why? How? By Whom?

5 2.Graph

6 3.Numerical Summaries s x =7.947, s y =13.987 Which numerical summaries should I report? r=-0.640, r 2 =.41

7 4.Model How do we express the data with a model? The equation of the LSRL is

8 Residual Plot

9 Interpretation? What do the graphs, numerical summaries, and model tell you?

10 Outliers Child 19 is an outlier in the y-direction, with a score so high that we should check for a mistake in recording it. (In fact, it is correct). Child 18 is an outlier in the x-direction.

11 Influential Points This picture adds a second regression line (blue), calculated after leaving out child 18. This one point moves the line quite a bit. In fact, the equation of the new least-squares line is with r= -0.33

12

13 Be aware... In the regression setting, not all outliers are influential. Influential points often have small residuals because they pull the regression line toward themselves. The surest way to verify that a point is influential is the find the regression line both with and without the suspect point.

14 Gesell scores continued The original data have r 2 =0.41. That is, the LSRL relating age at which a child begins to talk with Gesell score explains 41% of the variation on this later test of mental ability. This relationship is strong enough to be interesting to parents. If we leave out Child 18, r 2 drops to only 11%. The apparent strength of the association was largely due to a single influential observation Wow!

15 Gesell scores continued What should the researcher do? Without Child 18, the evidence for a connection between the variables vanishes. If she keeps Child 18, she needs data on other children who were also slow to begin talking so that the analysis no longer depends so heavily on just one child.

16 Beware the Lurking Variable

17 Ice cream causes drowning? The amount of ice cream consumed and the number of drowning deaths are positively associated. This might lead someone to wonder if eating ice cream causes drowning. What other variable might influence both ice cream consumption and drowning deaths?

18 Nonsense Correlation How close is the linear relationship between these two variables? Guess the correlation.

19 Nonsense Correlation Now look at the labels to the bottom and side... Is the amount of goods imported to the United States really related to private health spending? No. In fact, any two variables that both increase over time will show a strong association. This does not mean that one variable explains or influences the other.

20 Nonsense Correlation Nonsense correlations are real correlation. Just make sure you understand that association does not imply causation.

21 Lurking Variable Hides Relationship A housing study in Hull, England did a study comparing overcrowding with lack of toilets. They figured the two would be correlated...

22 Lurking Variable Hides Relationship but found the correlation to be only r= 0.08!

23 But when they looked again... A lurking variable, the amount of public housing, actually divided the data into two clusters, which when looked at as a whole made the variables look uncorrelated. These areas had lots of public housing (and more toilets)‏ overcrowding lack of public toilets These areas had less public housing (and less toilets)‏

24 Beware Correlations Based on Averaged Data Correlations based on averages are usually higher than correlations based on individual scores.


Download ppt "Chapter 3 Unusual points and cautions in regression."

Similar presentations


Ads by Google