# Outliers and Influential Points

## Presentation on theme: "Outliers and Influential Points"— Presentation transcript:

Outliers and Influential Points
Erik Johnson AP Statistics 5/25/04 erik.PPT

Definitions Outlier: A value in a set of data that does not fit with the rest of the data Influential point: A point in a data set that has leverage on the regression coefficient Leverage: A point which, when removed, the regression line changes substantially is said to have leverage Q1, Q3: the boundaries in which approximately half of the data is contained Interquartile range: Q3-Q1 erik.PPT

Outliers Data points more than 2 standard deviations away from the mean of the data set Data points that do not fit the pattern governed by the rest of the data In regression, any data point that has an unusually large residual erik.PPT

How can I tell if a point in my data set is an outliers?
Take the IQR (interquartile range) of your data set and multiply it by Subtract that number from Quartile 1 and then from Quartile 3. Any number lying outside these points can be considered an outlier. Now you try a sample problem on outliers! erik.PPT

Sample Problem on IQR In a data set with 5 number summary [12,18,19,21,25], how many values can be considered outliers? A) None B) Exactly 1 C) At least 1 D) Exactly 2 E) At least 2 erik.PPT

IF YOU ANSWERED C….. YOU’RE RIGHT!!!!!
The interquartile range for this set of data is 3, and when multiplied by 1.5 you get Adding this number to 21 gives you 25.5, which is larger than the maximum value of the data set. This means that there are no outliers on the upper side of the data. When you subtract 4.5 from 18, you get The minimum value of 12 is outside this number, meaning that there is at least 1 outlier in the set of data. erik.PPT

Influential Points Influential points are normally outliers in the X direction, but are not always outliers in terms of regression A point is said to influence the data if it is responsible for changes to the LSR line. Any point that has leverage on a set of data is an influential point erik.PPT

There are no outliers on either the X or Y axis
To the right is a chart of a data set with a perfect linear regression of r^2=1 and an equation of Y=X There are no outliers on either the X or Y axis erik.PPT

Now look at this graph. The X value previously at 5 has been moved to 8. The equation has changed and the r^2 value has significantly decreased erik.PPT

The point (8,5) is an influential point in this data set
Watch how the regression line changes as the point (8,5) is added The point (8,5) is an influential point in this data set erik.PPT

Sample Problem on Influential Points
Given the plot below, which of the following can you conclude about the data point in the upper right-hand corner? A) It is an Outlier in the Regression B) It is an Influential Point C) It does not fit the pattern of the data D) It has a large residual E) All of the Above erik.PPT

The correct answer is……
B erik.PPT

Explanation for Sample Question
Since the data point in question seems to fit the general pattern of the other observations in the data set, there is no evidence to call it an outlier in terms of regression. Likewise, it will not have a large residual when a LSR line is fit to the data. This data point IS an influential point, because it has an X value differing greatly from the others in the set. erik.PPT

THE END erik.PPT