Analysis of matched data Analysis of matched data
Pair Matching: Why match? Pairing can control for extraneous sources of variability and increase the power of a statistical test. Match 1 control to 1 case based on potential confounders, such as age, gender, and smoking.
Example Johnson and Johnson (NEJM 287: , 1972) selected 85 Hodgkin’s patients who had a sibling of the same sex who was free of the disease and whose age was within 5 years of the patient’s…they presented the data as…. Hodgkin’s Sib control TonsillectomyNone From John A. Rice, “Mathematical Statistics and Data Analysis. OR=1.47; chi-square=1.53 (NS)
Example But several letters to the editor pointed out that those investigators had made an error by ignoring the pairings. These are not independent samples because the sibs are paired…better to analyze data like this: From John A. Rice, “Mathematical Statistics and Data Analysis. OR=2.14; chi-square=2.91 (p=.09) Tonsillectomy None TonsillectomyNone Case Control
Pair Matching Match each MI case to an MI control based on age and gender. Ask about history of diabetes to find out if diabetes increases your risk for MI.
Pair Matching Diabetes No diabetes DiabetesNo Diabetes MI cases MI controls
Each pair is it’s own “age- gender” stratum Diabetes No diabetes Case (MI)Control Example: Concordant for exposure (cell “a” from before)
Diabetes No diabetes Case (MI)Control Diabetes No diabetes Case (MI)Control x 9 x 37 Diabetes No diabetes Case (MI)Control Diabetes No diabetes Case (MI)Control x 16 x 82
Mantel-Haenszel for pair- matched data We want to know the relationship between diabetes and MI controlling for age and gender. Mantel-Haenszel methods apply.
RECALL: The Mantel-Haenszel Summary Odds Ratio Exposed Not Exposed CaseControl ab c d
Diabetes No diabetes Case (MI)Control Diabetes No diabetes Case (MI)Control ad/T = 0 bc/T=0 ad/T=1/2 bc/T=0 Diabetes No diabetes Case (MI)Control Diabetes No diabetes Case (MI)Control ad/T=0 bc/T=1/2 ad/T=0 bc/T=0
Mantel-Haenszel Summary OR
Diabetes No diabetes DiabetesNo Diabetes MI cases MI controls OR estimate comes only from discordant pairs!! OR= 37/16 = 2.31 Makes Sense!
McNemar’s Test Diabetes No diabetes DiabetesNo Diabetes MI cases MI controls OR estimate comes only from discordant pairs! The question is: among the discordant pairs, what proportion are discordant in the direction of the case vs. the direction of the control. If more discordant pairs “favor” the case, this indicates OR>1.
Diabetes No diabetes DiabetesNo Diabetes MI cases MI controls P(“favors” case/discordant pair) =
Diabetes No diabetes DiabetesNo Diabetes MI cases MI controls odds(“favors” case/discordant pair) =
Diabetes No diabetes DiabetesNo Diabetes MI cases MI controls McNemar’s Test Null hypothesis: P(“favors” case / discordant pair) =.5 (note: equivalent to OR=1.0 or cell b=cell c) By normal approximation to binomial:
McNemar’s Test: generally By normal approximation to binomial: Equivalently: exp No exp expNo exp ab c d cases controls
From: “Large outbreak of Salmonella enterica serotype paratyphi B infection caused by a goats' milk cheese, France, 1993: a case finding and epidemiological study” BMJ 312: ; Jan Example: Salmonella Outbreak in France, 1996
Epidemic Curve
Matched Case Control Study Case = Salmonella gastroenteritis. Community controls (1:1) matched for: age group ( = 65 years) gender city of residence
Results
In 2x2 table form: any goat’s cheese Goat’s cheese None 2930 Goat’ cheeseNone Cases Controls
In 2x2 table form: Brand B Goat’s cheese Goat’s cheese B None 1049 Goat’ cheese BNone Cases Controls
Introduction to Logistic Regression: binary outcome!
Example : The Bernouilli (binomial) distribution Smoking (cigarettes/day) Lung cancer; yes/no y n
Could model probability of lung cancer…. = + 1 *X Smoking (cigarettes/day) The probability of lung cancer ( ) 1 0 But why might this not be best modeled as linear? [ ]
Alternatively… log( /1- ) = + 1 *X Logit function
The Logit Model Logit function (log odds) Baseline odds Linear function of risk factors for individual i: 1 x 1 + 2 x 2 + 3 x 3 + 4 x 4 …
To get back to OR’s…
“Adjusted” Odds Ratio Interpretation
Adjusted odds ratio, continuous predictor
Practical Interpretation The odds of disease increase multiplicatively by e ß for for every one-unit increase in the exposure, controlling for other variables in the model.
Example: >2 exposure levels *(dummy coding) CHD status WhiteBlackHispanicOther Present Absent2010
SAS CODE data race; input chd race_2 race_3 race_4 number; datalines; end; run; proc logistic data=race descending; weight number; model chd = race_2 race_3 race_4; run; Note the use of “dummy variables.” “Baseline” category is white here.
SAS OUTPUT – model fit Intercept Intercept and Criterion Only Covariates AIC SC Log L Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald
SAS OUTPUT – regression coefficients Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept race_ race_ race_
SAS output – OR estimates The LOGISTIC Procedure Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits race_ race_ race_ Interpretation: 8x increase in odds of CHD for black vs. white 6x increase in odds of CHD for hispanic vs. white 4x increase in odds of CHD for other vs. white