Presentation on theme: "erik.PPT Outliers and Influential Points Erik Johnson AP Statistics 5/25/04."— Presentation transcript:
erik.PPT Outliers and Influential Points Erik Johnson AP Statistics 5/25/04
erik.PPT Definitions Outlier: A value in a set of data that does not fit with the rest of the data Influential point: A point in a data set that has leverage on the regression coefficient Leverage: A point which, when removed, the regression line changes substantially is said to have leverage Q1, Q3: the boundaries in which approximately half of the data is contained Interquartile range: Q3-Q1
erik.PPT3 Outliers Data points more than 2 standard deviations away from the mean of the data set Data points that do not fit the pattern governed by the rest of the data In regression, any data point that has an unusually large residual
erik.PPT How can I tell if a point in my data set is an outliers? Take the IQR (interquartile range) of your data set and multiply it by 1.5. Subtract that number from Quartile 1 and then from Quartile 3. Any number lying outside these points can be considered an outlier. Now you try a sample problem on outliers!
erik.PPT Sample Problem on IQR In a data set with 5 number summary [12,18,19,21,25], how many values can be considered outliers ? A) None B) Exactly 1 C) At least 1 D) Exactly 2 E) At least 2
erik.PPT IF YOU ANSWERED C ….. The interquartile range for this set of data is 3, and when multiplied by 1.5 you get 4.5. Adding this number to 21 gives you 25.5, which is larger than the maximum value of the data set. This means that there are no outliers on the upper side of the data. When you subtract 4.5 from 18, you get The minimum value of 12 is outside this number, meaning that there is at least 1 outlier in the set of data. YOU’RE RIGHT!!!!!
erik.PPT Influential Points Influential points are normally outliers in the X direction, but are not always outliers in terms of regression A point is said to influence the data if it is responsible for changes to the LSR line. Any point that has leverage on a set of data is an influential point
erik.PPT To the right is a chart of a data set with a perfect linear regression of r^2=1 and an equation of Y=X There are no outliers on either the X or Y axis
erik.PPT Now look at this graph. The X value previously at 5 has been moved to 8. The equation has changed and the r^2 value has significantly decreased
erik.PPT10 Watch how the regression line changes as the point (8,5) is added The point (8,5) is an influential point in this data set
erik.PPT11 Sample Problem on Influential Points A) It is an Outlier in the Regression B) It is an Influential Point C) It does not fit the pattern of the data D) It has a large residual E) All of the Above Given the plot below, which of the following can you conclude about the data point in the upper right-hand corner?
erik.PPT12 The correct answer is…… B
erik.PPT13 Explanation for Sample Question Since the data point in question seems to fit the general pattern of the other observations in the data set, there is no evidence to call it an outlier in terms of regression. Likewise, it will not have a large residual when a LSR line is fit to the data. This data point IS an influential point, because it has an X value differing greatly from the others in the set.