Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics That Deceive. Simpson’s Paradox  It is a widely accepted rule that the larger the data set, the better  Simpson’s Paradox demonstrates that.

Similar presentations


Presentation on theme: "Statistics That Deceive. Simpson’s Paradox  It is a widely accepted rule that the larger the data set, the better  Simpson’s Paradox demonstrates that."— Presentation transcript:

1 Statistics That Deceive

2 Simpson’s Paradox  It is a widely accepted rule that the larger the data set, the better  Simpson’s Paradox demonstrates that a great deal of care has to be taken when combining smaller data sets into a larger one  Sometimes the conclusions from the larger data set are opposite the conclusion from the smaller data sets

3 Example: Simpson’s Paradox First HalfSecond HalfTotal Season Carson.400.250.264 Kennington.350.200.336 Baseball batting statistics for two players: How could Carson beat Kennington for both halves individually, but then have a lower total season batting average?

4 Example Continued First HalfSecond HalfTotal Season Carson4/10 (.400)25/100 (.250)29/110 (.264) Kennington35/100 (.350) 2/10 (.200)37/110 (.336) We weren’t told how many at bats each player had: Carson’s dismal second half and Kennington’s great first half had higher weights than the other two values.

5 Another Example Average college physics grades for students in an engineering program: HS PhysicsNo HS Physics Number of Students505 Average Grade8070 Average college physics grades for students in a liberal arts program: HS PhysicsNo HS Physics Number of Students550 Average Grade9585 It appears that in both classes, taking high school physics improves your college physics grade by 10.

6 Example continued In order to get better results, let’s combine our datasets. In particular, let’s combine all the students that took high school physics. More precisely, combine the students in the engineering program that took high school physics with those students in the liberal arts program that took high school physics. Likewise, combine the students in the engineering program that did not take high school physics with those students in the liberal arts program that did not take high school physics. But be careful! You can’t just take the average of the two averages, because each dataset has a different number of values.

7 Example continued Average college physics grades for students who took high school physics: # StudentsGradesWeight Engineering508050/55*80=72.7 Lib Arts5955/55*95=8.6 Total55 Average (72.7 + 8.6) 81.3 Average college physics grades for students who did not take high school physics: # StudentsGradesWeight Engineering5705/55*70=6.4 Lib Arts508550/55*85=77.3 Total55 Average (6.4 + 77.3) 83.7 Did the students that did not have high school physics actually do better?

8 Example another way Average college physics grades for students who took high school physics: # StudentsGradesGrade Pts Engineering50804000 Lib Arts595475 Total554475 Average (4000/4475*80 + 475/4475*95) 81.3 Average college physics grades for students who did not take high school physics: # StudentsGradesGrade Pts Engineering570350 Lib Arts50854250 Total554600 Average (350/4600*70 + 4250/4600*85) 83.7 Did the students that did not have high school physics actually do better?

9 The Problem  Two problems with combining the data  There was a larger percentage of one type of student in each table  The engineering students had a more rigorous physics class than the liberal arts students, thus there is a hidden variable  So be very careful when you combine data into a larger set

10 More …  There are many real examples of this type of situation which leads to an apparent contradiction  The deceptive results is based on this [remember this]: If you view the same data in 2 different ways or break it into 2 different parts, you CAN get different results!


Download ppt "Statistics That Deceive. Simpson’s Paradox  It is a widely accepted rule that the larger the data set, the better  Simpson’s Paradox demonstrates that."

Similar presentations


Ads by Google