Simpson's Paradox & Simpon’s 2 nd Paradox H. James Norton, William Anderson, Megan Templin Carolinas Medical Center, Charlotte, NC George Divine, Henry.

Slides:



Advertisements
Similar presentations
How would you explain the smoking paradox. Smokers fair better after an infarction in hospital than non-smokers. This apparently disagrees with the view.
Advertisements

Three or more categorical variables
Chapter 4 Review: More About Relationship Between Two Variables
Exploratory Data Analysis I
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Describing Data: Two Variables SECTIONS 2.1, 2.4, 2.5 Two categorical (2.1)
V.: 9/7/2007 AC Submit1 Statistical Review of the Observational Studies of Aprotinin Safety Part I: Methods, Mangano and Karkouti Studies CRDAC and DSaRM.
Associations between Obesity and Depression by Race/Ethnicity and Education among Women: Results from the National Health and Nutrition Examination Survey,
Simple Logistic Regression
Simpson’s Paradox Jeff Witmer 18 March 2010 G. Udny Yule (Not Edward H. Simpson)
AP Statistics Section 4.2 Relationships Between Categorical Variables.
Analysis of Prostate Cancer Prevention Behavior in Florida Utilizing The 2002 BRFSS Data Yussif Dokurugu MPH Candidate April 9, 2004.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Chapter 51 Experiments, Good and Bad. Chapter 52 Experimentation u An experiment is the process of subjecting experimental units to treatments and observing.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous.
American Pride and Social Demographics J. Milburn, L. Swartz, M. Tottil, J. Palacio, A. Qiran, V. Sriqui, J. Dorsey, J. Kim University of Maryland, College.
PREDICTING AKI IS MORE CHALLENGING AS AGE PROGRESSES Sandra Kane-Gill, PharmD, MSc Associate Professor, School of Pharmacy.
1 Chapter 5 Two-Way Tables Associations Between Categorical Variables.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Nathaniel Cannon Describing Data: Categorical Variables SECTIONS 2.1 One categorical variable Two.
Evidence-Based Medicine 3 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
Otis W. Brawley M.D. Director, Georgia Cancer Center Associate Director, Winship Cancer Institute Professor of Hematology, Oncology, and Epidemiology Emory.
American Pride and Social Demographics J. Milburn, L. Swartz, M. Tottil, J. Palacios, A. Qiran, V. Sriqui, J. Dorsey, J. Kim University of Maryland, College.
POTH 612A Quantitative Analysis Dr. Nancy Mayo. © Nancy E. Mayo A Framework for Asking Questions Population Exposure (Level 1) Comparison Level 2 OutcomeTimePECOT.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/11/12 Describing Data: Two Variables SECTIONS 2.1, 2.4, 2.5 Two categorical.
Introductory Statistics Week 4 Lecture slides Exploring Time Series –CAST chapter 4 Relationships between Categorical Variables –Text sections.
Lecture 9 Chapter 22. Tests for two-way tables. Objectives The chi-square test for two-way tables (Award: NHST Test for Independence)  Two-way tables.
October 15. In Chapter 19: 19.1 Preventing Confounding 19.2 Simpson’s Paradox 19.3 Mantel-Haenszel Methods 19.4 Interaction.
Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
Adaptive randomization
Shane Lloyd, MPH 2011, 1,2 Annie Gjelsvik, PhD, 1,2 Deborah N. Pearlman, PhD, 1,2 Carrie Bridges, MPH, 2 1 Brown University Alpert Medical School, 2 Rhode.
Unit 3 Relations in Categorical Data. Looking at Categorical Data Grouping values of quantitative data into specific classes We use counts or percents.
Prevalence of Poor Glycemic Control and Depression Among Diabetic Adolescent Youth Presented By: Atwater K and Wilson S Prairie View A & M University Prevalence.
Professor B. Jones University of California, Davis.
Two-way tables BPS chapter 6 © 2006 W. H. Freeman and Company.
Analysis of two-way tables - Data analysis for two-way tables IPS chapter 2.6 © 2006 W.H. Freeman and Company.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In this chapter we will study the relationship between two categorical variables (variables.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
SIMPSON’S PARADOX Any statistical relationship between two variables may be reversed by including additional factors in the analysis. Application: The.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Lecture 9 Chapter 22. Tests for two-way tables. Objectives (PSLS Chapter 22) The chi-square test for two-way tables (Award: NHST Test for Independence)[B.
Hospital racial segregation and racial disparity in mortality after injury Melanie Arthur University of Alaska Fairbanks.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Teaching Intensity, Race and Surgical Outcomes Jeffrey H. Silber The University of Pennsylvania The Children’s Hospital of Philadelphia.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In prior chapters we studied the relationship between two quantitative variables with.
AP Statistics Section 4.2 Relationships Between Categorical Variables
01/20151 EPI 5344: Survival Analysis in Epidemiology Confounding and Effect Modification March 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Producing Data: Experiments BPS - 5th Ed. Chapter 9 1.
Kelsey Vonderheide, PA1.  Heart Failure—a large number of conditions affecting the structure and function of the heart that make it difficult for the.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
David S. Moore • George P. McCabe Practice of Statistics
The Practice of Statistics in the Life Sciences Third Edition
Two-Way Tables and The Chi-Square Test*
Sleep quality but not duration is associated with testosterone levels: a pilot study of men from an urban fertility clinic Linda G. Kahn1, Pam Factor-Litvak1,
AP Statistics Chapter 3 Part 3
Confounding and Effect Modification
Analysis of two-way tables - Data analysis for two-way tables
Looking at Data - Relationships Data analysis for two-way tables
The Practice of Statistics in the Life Sciences Fourth Edition
4.4 Statistical Paradoxes
Daniela Stan Raicu School of CTI, DePaul University
David S. Moore • George P. McCabe Practice of Statistics
Research Methods & Statistics
Presentation transcript:

Simpson's Paradox & Simpon’s 2 nd Paradox H. James Norton, William Anderson, Megan Templin Carolinas Medical Center, Charlotte, NC George Divine, Henry Ford Hospital, Detroit MI Website:

Paradox in published research goes back to at least the late 1800s See “The Pirates of Penzance” by Gilbert & Sullivan(1879)

Keeping undergraduate biology students and medical residents interested in statistics, when the majority of the students are taking the class as a requirement, can be challenging

The following are examples of Simpson’s Paradox that you might find helpful in your course instruction.

Survival Rates DiedSurvivedDeath Rate Hospital A % Hospital B % In which hospital would you want to have your surgery, A or B? Moore, D.S., McCabe, G.P., 1999, Introduction to the Practice of Statistics, 3 rd edition: W. H. Freeman.

Patients in Good Condition DiedSurvivedDeath Rate Hospital C % Hospital D % If you are in good condition, in which hospital would you want to have your surgery, C or D? Moore, D.S., McCabe, G.P., 1999, Introduction to the Practice of Statistics, 3 rd edition: W. H. Freeman.

Patients in Poor Condition DiedSurvivedDeath Rate Hospital C % Hospital D % If you are in poor condition, in which hospital would you want to have your surgery, C or D? Moore, D.S., McCabe, G.P., 1999, Introduction to the Practice of Statistics, 3 rd edition: W. H. Freeman.

Survival Rates DiedSurvivedDeath Rate (%) Hospital A Hospital B Good Condition Hospital C Hospital D Bad Condition Hospital C Hospital D Hospital A is the combined data for Hospital C, Hospital B is the combined data for Hospital D.

Simpson’s Paradox Refers to the reversal of the direction of a comparison or an association when data from several groups are combined to form a single group.

Brief History of Simpson’s Paradox Yule, GU, 1903, “Notes on the theory of association of attributes in Statistics”, Biometrika, 2: 121–134. Cohen, MR, and Nagel, E, 1934, An Introduction to Logic and Scientific Method, New York: Harcourt, Brace and Co. Simpson, EH, 1951, “The interpretation of interaction in contingency tables”, Journal of the Royal Statistical Society (Series B), 13: 238–241. Blyth, CR, 1972, “On Simpson's Paradox and the Sure Thing Principle”, Journal of the American Statistical Association, 67: 364–366. Bickel, PJ, Hjammel, EA, and O'Connell, JW, 1975, “Sex Bias in Graduate Admissions: Data From Berkeley”, Science, 187: 398–404.

Conditions when Simpson’s Paradox will not occur: Sample sizes are the same DiedSurvivedDeath Rate (%) Hospital A Hospital B Good Condition Hospital A Hospital B Bad Condition Hospital A Hospital B # Hospital A Good Condition = # Hospital A Bad Condition = 200, # Hospital B Good Condition = # Hospital B Bad Condition = 1500

Conditions when Simpson’s Paradox will not occur: Rates are the same DiedSurvivedDeath Rate (%) Hospital C Hospital D Good Condition Hospital C Hospital D Bad Condition Hospital C Hospital D

Examples of Simpson’s Paradox in the Literature Study TypeAuthorDependent Variable Independent Variable Stratification Variable EpidemiologicalCohendeathlocationrace EpidemiologicalMorrellmedical aidchildren (followed, not followed) race EpidemiologicalSeverijnenurinary tract infection antibiotic prophylaxis (y/n) incidence of urinary tract infection LegalBickeladmissiongenderdepartment LegalBlumedeath sentencerace of offenderrace of victim MedicalCharigsuccess removing kidney stones open surgery or percutaneous kidney stone diameter (<2cm) MedicalGatlingdeathinsulin dependent (y/n) age (<40) PsychologicalHandpercent maleyear (1970/1975)age (<65)

Severijnen AJ, Verbrugh HA, Mintjes-de Groot AJ, Vandenbroucke- Grauls CMJE, van Pelt W. Sentinel System for nosocomial Infections in the Netherlands: A Pilot Study. Infect Control Hosp Epidemiol 1997; 18:818–824. Low incidence hospitalsUTINo UTI% with UTI Antibiotic Prophylaxis No Antibiotic Prophylaxis High incidence hospitals Antibiotic Prophylaxis No Antibiotic Prophylaxis Combined Antibiotic Prophylaxis No Antibiotic Prophylaxis

C. Morrell, Mathematical Science Department, Loyola College, Baltimore, MD CaucasiansMedical AidNo Medical Aid Rate of Medical Aid (%) Children not traced Five-year group African Americans Children not traced Five-year group Combined Children not traced Five-year group

Hand DJ. Psychiatric examples of Simpson's paradox. Br J Psychiat 1979;135: 90–1. Age < 65MaleFemalePercent Male Year Year Age >/= 65 Year Year Combined Year Year

Cohen, M. R., and Nagel, E., 1934, An Introduction to Logic and Scientific Method, New York: Harcourt, Brace and Co. Death Rates from Tuberculosis in Richmond and New York City in 1910 CaucasiansDiedSurvived Death Rate per 100,000 (%) New York 8,3654,666, Richmond 13180, African Americans New York 51391, Richmond 15546, Combined New York 8,8784,758, Richmond ,

Charig, C.R., Webb, D.R., Payne, S.R., Wickham, J.E. "Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy. Br Med J (Clin Res Ed) 292 (6524): 879–882. Kidney stones < 2cmsuccessfailure% successful Open81693 Percutaneous Kidney stones >= 2cm Open Percutaneous Combined Open Percutaneous

Gatling W, Mullee MA, Hill RD. The general characteristics of a community based population. Practical Diabetes 1989; 5: Patients age <= 40DiedAlive% Died Insulin Dependent Non-Insulin Dependent Patients age > 40 Insulin Dependent Non-Insulin Dependent Combined Insulin Dependent Non-Insulin Dependent

Blume JH, Eisenberg T, Wells MT; Explaining Death Row’s Population & Racial Composition; A Digital Repository;3/1/2004. Indiana Death Sentence Rate Black victim Murders# of death sentences Death sentence rate Black offender2, White offender10000 White victim Black offender White offender2, Combined Black offender2, White offender2,

Bickel, PJ, Hammel, EA and O’Connell, JW. Sex Bias in Graduate Admissions: Data from Berkeley. Science 1975; 187: OverallAdmittedApplicants Admittance Rate(%) Men Women Department A Men Women Department B Men Women

Bickel, PJ, Hammel, EA and O’Connell, JW. Sex Bias in Graduate Admissions: Data from Berkeley. Science 1975; 187: OverallAdmittedApplicants Admittance Rate(%) Men Women Department C Men Women Department D Men Women

Bickel, PJ, Hammel, EA and O’Connell, JW. Sex Bias in Graduate Admissions: Data from Berkeley. Science 1975; 187: OverallAdmittedApplicants Admittance Rate(%) Men Women Department E Men Women Department F Men Women

Which statistical methods will “Lift the curtain of illusion to let the truth of my soul shine through*” regarding Simpson’s Paradox? Stratification Standardization Logistic regression * By Trudy Symeonakis Vesotksy

Definition of confounding Confounding/distortion can arise when two conditions are true: 1.The risk groups differ on the background factor/variable 2.The background factor/variable itself influences the outcome If you do not control for confounding, the unadjusted variables can be distorted/misleading Simpson’s Paradox is caused by confounding Anderson S, Auquier A, Hauck WW, Oakes D, Vandaele W, Weisberg, HI; 1980; Statistical Methods For Comparative Studies: Techniques For Bias Reduction; John Wiley & Sons; New York.

Beware the Lurking Variable: Understanding Confounding from Lurking Variables Using Graphs; Schield, Milo; STATS, Fall 2006, #46, The data below are an example of Simpson’s Paradox.

The combined overall death rate of 3.5% for the rural hospital versus 5.5% for the urban hospital is not a fair comparison as 30% of the rural hospital’s patients are in poor condition while 90% or the urban hospital’s patients are in poor condition. Let’s standardize the rates to make it a fair comparison. The standard population will consist of all the patients from both hospitals. The combined population consists of 800 patients in good condition and 1200 patients in poor condition (40% vs. 60%). The standardized death rate for the rural hospital would be (.02 X x.60) = =.05 = 5%. The standardized death rate for the urban hospital would be (.01 X x.60) = =.04 = 4%. With the death rates standardized, we see that the urban hospital has a lower death rate of 4% vs. 5% for the rural hospital! Dr. Schield then provides us with a graphic presentation.

Graphic Presentation of Simpson’s Paradox Let SRR = standardized rate rural, SRC = standardized rate city P= proportion of patients in poor condition in a standard population. SRR=(.07 X P) +.02 x (1 –P)= ( )P +.02 =.02 + (.05 x P) SRC=(.06 X P) +.01 x (1 –P)= ( )P +.01 =.01 + (.05 x P)

SAS code for logistic regression using Dr. Schield’s data. data simpsonsparadox; input condition hospital death numcell ; cards; ; run; proc logistic data=simpsonsparadox descending; class condition hospital death / param=ref; model death=hospital condition; weight numcell; format condition condition. hospital hospital. death death.; title 'Data from Dr. Schield analyzed using logistic regression – Probability modeled is death = yes'; run;

Data from Dr. Schield analyzed using logistic regression

Simpson’s 2 nd Paradox Whether “the sensible interpretation” exists in the separate tables, or is instead found in the combined table, depends upon the context of the data being analyzed. This means that the correct interpretation cannot be reliably determined merely by looking at the numbers in the table. Suppose (hypothetical) data are analyzed to determine whether a new treatment (A) is superior to the standard treatment (B) for septic shock.

Success of treatment for septic shock by diastolic blood pressure CombinedAliveDead% alive Treatment A Treatment B DBP < 50 Treatment A50 Treatment B DBP ≥ 50 Treatment A81090 Treatment B

In the previous examples, a sensible interpretation has been found in the separate tables. Could it be that the separate tables are not showing the complete story for this situation? Suppose the facts regarding the data are: 2000 patients thought to have septic shock are randomized equally to Treatment A or Treatment B. The 2 groups have identical DBP distributions upon arrival at the Emergency Department. All patients survive the 1 st day. DBP is measured at the end of 24 hours of treatment and the DBP in the table is based upon this 2 nd measurement. Only one tenth (100/1000) of the Treatment A patients crash below 50. One half (500/1000) of the Treatment B patients crash below 50. The biology of the situation would suggest that the sensible interpretation is found in the combined table. Note that DBP was not a variable that was fixed at the start of the experiment but was an intermediate outcome affected by the treatment. If the variable is only on the causal pathway, it is not a confounder variable and you should not adjust for it.

Conclusions: Hopefully, the use of Simpson's Paradox will improve the learning experience for the students.

Facts are stubborn but statistics are more pliable.

There are 2 kinds of statistics- the kind you look up & the kind you make up.

Aaron Levenstein “Statistics are like a bikini. What they reveal is suggestive, But what they conceal is vital.”

Statistics can be used to support just about anything...

including statisticians!!