Presentation is loading. Please wait.

Presentation is loading. Please wait.

Professor B. Jones University of California, Davis.

Similar presentations


Presentation on theme: "Professor B. Jones University of California, Davis."— Presentation transcript:

1 Professor B. Jones University of California, Davis

2  Pitfalls and Paradoxes  The Concept of a “Lurker”

3  Bottom of the ninth, down by 1 run  Two Outs  Runners on second and third  …and the pitcher is up  You have only two players left  …and this is the National League.  What will you do?

4  Player 1: 280 hits from 1200 at bats.  Player 2: 110 hits from 500 at bats.  Their “batting average”  Player 1: 110/500=.220  Player 2: 280/1200=.233  Who would you choose?  On batting average, Player 2 > Player 1

5  Both players are switch-hitters (they can bat from the left or right side of the plate)  We’ll go “money ball” and play the best match-up  The data: Player 1Player 2 SideFrom RightFrom LeftFrom RightFrom Left At Bats400100400800 Hits842680200 0.2100.2600.2000.250

6  What happened?  Not accounting for switch hitting, Player 2 is preferred to Player 1  When accounting for switch hitting, Player 1 is preferred to Player 2  Worse! From either side of the plate, we would conclude Player 1 is better than Player 2 even though Player 2’s overall batting average is higher!

7  University Admission Statistics  1000 women apply, 1000 men apply  Admission Rate:  Women: 510/1000=51 percent  Men: 800/1000=80 percent  Conclusion?  Evidence of gender bias?  This was basis of U.C. Berkeley gender bias case in the 1970s Source: http://walrandpc.eecs.berkeley.edu/126/simpson.htmhttp://walrandpc.eecs.berkeley.edu/126/simpson.htm

8  Two colleges students apply to, College A and College B.  The Admissions Data:  Findings?  Admission Rate for each college is higher for women than men.  Overall admission rate is higher for men. FemaleMale CollegeAppliedAcceptedRateAppliedAcceptedRate A98049050%2008040% B20 100%80072090% Total100051051%100080080%

9  Two preceding examples illustrate Simpson’s Paradox  Named for E.H. Simpson (based on 1951 paper)  Phenomenon has been known since at least 1899 (and Yule 1903 published a paper on it).  Why a paradox?  The result is counterintuitive.

10  The Paradox:  A “reversal result”  The relationship between two variables found within sub-groups differ in direction when the sub- groups are combined  Batting Averages on Left/Right Side vs. Overall  Gender admissions by college vs. Overall Gender Admission Rate  Consider admissions data again.

11  Our example  The “model”: Admission Rate=f(Gender)  Gender Bias Hypothesis: Admission rates of women will be lower than men.  Y=Admission Rate; X=Gender  Data seem consistent with the hypothesis.  The Problem:  There is a third variable; what is it?  College to which students applied (A vs. B)  Z=College

12  The Problem is Simple  (A) There is a strong association between Y and Z  One college (B) is easier to “get into” than the other college (A)  (B) There is a strong association between X and Z  Women tend to apply to the harder college (A) at higher rates; men tend to apply to the easier college (B) at higher rates.  Therefore, because of (A) and (B), there is a strong connection between Y and X  This connection, however, is spurious.

13  The Nature of the Problem Gender (X) Admission Rate (Y) College (Z)

14  Beware the Lurker Variable  Lurker Variable:  A lurking variable (confounding factor or variable, or simply a confound or confounder) is a "hidden" variable in a statistical or research model that affects the variables in question but is not known or acknowledged, and thus (potentially) distorts the resulting data. This hidden third variable causes the two measured variables to falsely appear to be in a causal relation. Such a relation between two observed variables is termed a spurious relationship. (Source: http://en.wikipedia.org/wiki/Confounder)causalspurious relationshiphttp://en.wikipedia.org/wiki/Confounder  The Problem: Z is a confounder. If we had accounted for Z, we would have arrived at different conclusions.

15  Berkeley, 1973  Gender bias not found when accounting for departmental admission rates  Interestingly, it was found that women tended to apply to more difficult graduate programs than men.  Across departments, graduate admission rates were higher for women.  Not accounting for departmental differences, gender bias appeared

16  Combining sub-groups (aggregation) can lead to serious inferential problems  ESPECIALLY if the presence of lurking variables are not accounted for  Large samples with lots of subgroups can lead to these kinds of problems  Simpson’s Paradox is a real concern  …but often not recognized.


Download ppt "Professor B. Jones University of California, Davis."

Similar presentations


Ads by Google