Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 13: Case-control studies: introduction to matching

Similar presentations


Presentation on theme: "Lecture 13: Case-control studies: introduction to matching"— Presentation transcript:

1 Lecture 13: Case-control studies: introduction to matching
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II Department of Public Health Sciences Medical University of South Carolina Spring 2015

2 Matching: overview Control confounding in the design stage
Residual confounding may occur with matched variables, or with other covariates Matched analysis: pairs of subjects with the same exposure status (in a case-control study) are non-informative Only pay attention to discordant pairs

3 Matching (definition)
Matching refers to the selection of a reference series (unexposed subjects in a cohort study or controls in a case-control study)– that is identical, or nearly so, to the index series with respect to the distribution of one or more potentially confounding factors - Rothman and Greenland, Modern Epidemiology, 1998

4 Matching Matching is most commonly done in case-control studies
Cases (usually those with the disease of interest) are matched with controls (those without the disease) based on the value of a certain variable. Can be employed in other studies Exposed members of a cohort are matched with unexposed members and followed longitudinally

5 Types of Matching Individual Matching Frequency Matching
One or more reference subjects with matching factor values equal to those of the index subject Frequency Matching Selection of an entire stratum of reference subjects with matching-factor values equal to that of a stratum of index subjects

6 Advantages of Matching
Controls factors that are unknown or unmeasureable Matching siblings can control for many genetic and enviromental factors Improves precision (study validity) Convenience Narrows down a large number of controls Can reduce sampling variability Less heterogeneity across paired subjects Can increase statistical power Can improve efficiency by reducing sample size needed for a study

7 Disadvantages of Matching
Lose the ability to assess the matched variable as a risk factor Additional time and expense Decision is irreversible Requires special analytic techniques If the matching variable is not an independent risk factor for the outcome, matching is wasteful. If the matching variable is not an independent risk factor for the outcome, but is associated with the risk factor, matching is wasteful and inefficient. Potentially alters the distribution of the risk factor

8 Elements of Matching Unit of analysis is the pair of components with something in common. Twins Siblings Neighborhood Controls Two components are selected to have different exposures (RCT or cohort study) or different outcomes (Case-Control)

9 Elements of Matching Matched pairs can be used in different study designs: Cross-sectional Cohort Case-Control Randomized Clinical Trial

10 Elements of Matching Same Subject Matching (within subject)
Data pairs could be measured on the same subject Example: Compare skin-grafting techniques by applying each treatment to the same person Compare intra-ocular pressure in both eyes of one subject, after giving each eye a different treatment. Notice the data structure may bear a superficial resemblance to multi-level analysis…both analytic strategies take “groups” into account

11 Overmatching The factor to be matched is (partially or wholly) on the causal pathway between the risk factor and the outcome. Example Study Goal: Assess the association between alcohol consumption and heart disease (case-control study) Matching Variable: sleep apnea (want to control for conditions that may influence oxygenation) If sleep apnea is a condition that occurs as a result of alcohol consumption and is a risk factor for heart disease, then matching Forces cases and controls to have a more similar distribution of alcohol consumption Accordingly, attenuates the odds ratio The true association between alcohol consumption and heart disease cannot be assessed.

12 Example 1: Number of Sick Days
Twin study (children) – 10 pairs of twins One twin is immunized The other twin is not Subjects are followed through school year This is a matched study – each pair of twins is a matched set for the clinical trial

13 Example 1: Number of Sick Days
Pair Control Treated Control-Treated 1 4 2 8 6 3 5 7 -2 9 10

14 Example 1: Number of Sick Days
Mean of column 3 = 1.5 days Standard deviation = = 1.78 days

15 Example 1: Number of Sick Days
Standard error = = = 0.56 days 95% confidence interval: 1.5 ± 1.96*(0.56) 1.5 (0.40, 2.60) more days in control group t-test: p=0.026

16 Example 1: Unmatched analysis?
Control mean = 4.6 days Treated mean = 3.1 days Difference between means: Estimated benefit = 1.5 days 95% CI (-0.47, 3.47) Two-sample t-test: p=0.145 Control Treated 4 8 6 5 3 2 7

17 Example 1: Number of Sick Days
Conclusion: a study based on matched pairs can be more powerful than an unmatched study Tighter confidence limits; easier to show statistical significance In this case, the unmatched analysis did not result in bias (we still estimated 1.5 fewer sick days in the treated group)

18 Statistical Methods for Analyzing Matched Data
Unmatched study: one entry for each subject Exposure E _ Outcome D a b c d

19 Statistical Methods for Analyzing Matched Data
Matched cohort study: one entry for each pair Exposed D _ Unexposed a b c d

20 Statistical Methods for Analyzing Matched Data
Matched case-control study: one entry for each pair Control E _ Case a b c d

21 Statistical Methods for Analyzing Matched Data
Matched case-control study: Matched pair both exposed: add 1 to a Case exposed; control unexposed: add 1 to b Control exposed; case unexposed: add 1 to c Matched pair both unexposed: add 1 to d Only discordant pairs (in cells “b” and “c”) give useful information

22 Statistical Methods for Analyzing Matched Data
Discordant pairs  estimate odds ratio! p1 = probability of exposure in cases p0 = probability of exposure in controls Therefore: probability of cell “b” = p1(1-p0) probability of cell “c” = p0(1-p1)

23 Statistical Methods for Analyzing Matched Data
Formula for standard error of odds ratio: 95% confidence limits: lnOR ± 1.96(SE(lnOR)) (must then exponentiate confidence limits of lnOR to obtain 95% CI for odds ratio)

24 Statistical Methods for Analyzing Matched Data
McNemar’s test Chi-squared statistic for matched data with 1 degree of freedom

25 Example 2: Matched Case-Control Study
Research question: Is there an association between the amount of time a mother spends on her feet during a pregnancy and the likelihood of preterm birth? Study Sample: 223 matched case-control pairs of women who had given birth at a local hospital, Disease 1= preterm birth (<37 weeks gestation) 0= no preterm birth Exposure 1= mother’s work required standing 0= mother’s work did not require standing

26 Example 2 (continued) Matching
Each case (disease=1) was matched with a control (disease=0) on the basis of Maternal Age (<3 years) and Parity (1, 0) 4 Possible Exposure Combinations of Matched Pairs Both Case and Control are EXPOSED Case EXPOSED and Control NON-EXPOSED Case NON-EXPOSED and Control EXPOSED Both Case and Control NON-EXPOSED

27 A Look at the Raw Data ID/ Matched Pair Preterm Birth (case/control)
Age Parity Work Standing (exposure) 1 22 23 2 28 27 3 19 4 32

28 Example 2 (continued) Control (Standing) Control (Not Standing)
Case (Standing) 147 31 Case 14 Note: Relevant information is confined to discordant pairs. OR cannot be estimated in studies in which all matched pairs have the same level of exposure

29 Example 2 (continued) Odds ratio for matched case-control study: ratio of the number of positive to negative discordant pairs. 31 pairs (in which exposed member experienced the outcome and the non-exposed member did not) 14 pairs (in which these outcomes were reversed) Odds Ratio= 31/14= 2.21

30 Example 2 (continued) Approximate 95% confidence interval:
Two standard deviations on each side of the estimated log odds ratio Exponentiate the result (take the anti-logarithm) Confidence interval ranges from 1.16, 4.22 Conclusion: standing is associated with pre-term birth. Note: McNemar’s Chi-square statistic will provide the p-value testing the null hypothesis of no association between exposure and disease

31 Example 2: What happens if you ignore the matching?
Four possible combinations of Matched Pairs Unexposed Control, Unexposed Case Unexposed Control, Exposed Case Exposed Control, Unexposed Case Exposed Control, Exposed Case Unexp Exp Control 1 + 2 3 + 4 Case 1 + 3 2 + 4

32 Unmatched Data from the same study
Standing Not Standing Preterm 178 45 Term 161 62 OR= (178 x 62)/ (45 x 161) = 1.52 95% CI= 0.98, 2.36

33 When Matching is ignored?
A noticeable difference between the matched and unmatched analyses Matched: OR = 2.21 (1.16, 4.22) Unmatched: OR = 1.52 (0.98, 2.36) Unmatched analysis ignores any correlation in exposure status between the case and control in the matched pair. If this correlation is substantial, then the unmatched analysis gives a biased result.

34 Should a matched analysis always be used for matched data?
If there is no evidence of a correlation within pairs, should you still proceed with a matched analysis? NOT NECESSARILY Matched analyses can give an unstable result if the sample size is too small

35 Example 3: Matched Cohort Study
Research question: Is there an association between vasectomy and myocardial infarction? Study Sample: 4830 exposed-unexposed pairs of men Matching Variables: Age (5-year band), current smoking status (yes/no) Disease Outcome 1= MI 0= No MI Exposure 1= Vasectomy 0= No vasectomy

36 Example 3: Matched Cohort Study Analysis
The previous odds ratio computational methodology applies to pair-matched cross-sectional or cohort studies with binary outcomes. Matching Each matched pair contains one exposed and one un-exposed individual 4 Possible Exposure Combinations of Matched Pairs Unexposed has No Disease / Exposed has No Disease Unexposed has No Disease / Exposed has Disease Unexposed has Disease / Exposed has No Disease Unexposed has Disease / Exposed has Disease

37 Example 3 (continued) No Vasectomy MI No MI Vasectomy 20 16 4,794
20 16 4,794 Note: Relevant information is confined to discordant pairs. OR cannot be estimated in studies in which all matched pairs have the same disease outcome

38 Example 3: OR computation
The odds ratio of a matched cohort study may be estimated by taking the ratio of the number of positive to negative discordant pairs. 20 pairs (in which exposed member experienced the outcome and the non-exposed member did not) 16 pairs (in which these outcomes were reversed) Odds Ratio= 20/16= 1.25 (0.65, 2.41) No association between having a vasectomy and suffering an MI.

39 Matched analysis in modeling
Use “conditional logistic regression” Produce matched OR and confidence interval Control for confounders Ignores pairs that are concordant on all variables Logistic regression in unmatched data is “unconditional logistic regression” Either one of these can be done using PROC LOGISTIC

40 Conditional logistic regression in SAS
proc logistic data=one; strata ID; model outcome=expose cov1 cov2; run;

41 Summary: reasons to match
Control for confounding in design phase Nuisance variables Not important predictors you hope to assess Improve study efficiency Useful if sample size is limited May clarify or simplify decision-making about control recruitment

42 Summary: problems with matching
Risk of over-matching, or unnecessary matching May add a layer of complexity and difficulty to the study implementation May be difficult or impossible to find a match for some individuals May therefore add expense Once matching is done, cannot be undone Must (usually) use matched analysis


Download ppt "Lecture 13: Case-control studies: introduction to matching"

Similar presentations


Ads by Google