Presentation on theme: "Ian White MRC Biostatistics Unit, Cambridge"— Presentation transcript:
1 Ian White MRC Biostatistics Unit, Cambridge An overview of meta-analysis in Stata Part II: multivariate meta-analysisIan White MRC Biostatistics Unit, CambridgeStata Users’ GroupLondon, 10th September 2010Abstract: An overview of meta-analysis in StataA comprehensive range of user-written commands for meta-analysis is available in Stata, and documented in detail in the recent book Meta-Analysis in Stata (Stata Press 2009).The purpose of this session will be to describe these commands, with a focus on recent developments and areas in which further work is needed. We will define systematic reviews and meta-analyses, and introduce the metan command, which is the main Stata meta-analysis command. We will distinguish between meta-analyses of randomized controlled trials and observational studies, and discuss the additional complexities inherent in systematic reviews of the latter.Meta-analyses are often complicated by heterogeneity – variation between the results of different studies beyond that expected due to sampling variation alone. Meta-regression – implemented in the metareg command – can be used to explore reasons for heterogeneity, although its utility in medical research is limited by the modest numbers of studies typically included in meta-analyses, and the many possible reasons for heterogeneity. Heterogeneity is a striking feature of meta-analyses of diagnostic test accuracy studies. We will describe how to use the midas and metandi command to display and meta-analyse the results of such studies.Many meta-analysis problems involve combining estimates of more than one quantity: for example, treatment effects on different outcomes, or contrasts between more than two groups. Such problems can be tackled using multivariate meta-analysis, implemented in the mvmeta command. We will describe how the model is fitted, when it may be superior to a set of univariate meta-analyses, and illustrate its application in a variety of settings.
2 Plan Example 1: Berkey data Multivariate random-effects meta-analysis modelSituations where it could be usedSoftware: mvmetaA problem: unknown within-study correlationExample 2: fibrinogensoftware: mvmeta_makeMultivariate vs. univariate
3 Example from Berkey et al (1998) 5 trials comparing a surgical with a non-surgical procedure for treating periodontal disease2 outcomes:“probing depth” (PD)“attachment level” (AL)trialy1s1y2s2corr10.470.09-0.320.3920.200.08-0.600.030.4230.400.05-0.120.040.4140.26-0.310.4350.560.12-0.390.170.34y1,y2 - treatment effects for PD, AL; s1,s2 - standard errors
4 Berkey data (1) Could analyse the outcomes one by one Random effects weightRandom effects weightStudy IDStudy ID117.8217.82119.7119.71219.8419.84222.0522.05325.6425.64321.8321.83424.0824.08421.7921.79512.6212.62514.6114.61Overall100.00100.00Overall100.00100.00I2 = 68.8%, p = 0.012I2 = 96.4%, p = 0.000.51-1-.5Mean improvement in probing depthMean improvement in attachment level
5 Berkey data (2) Dots mark the point estimates for the 5 studies -.8-.6-.4-.2Mean improvement in attachment level.22.214.171.124Mean improvement in probing depthDots mark the point estimates for the 5 studiesBubbles show 50% confidence regionsNote the positive within-study correlation ( for all studies)bubble.ado, available on my website34512
6 One or two stages?I’m assuming a two-stage meta-analysis (as in the Berkey data):1st stage: compute results for each study2nd stage: use these results as “data”makes a Normal approximation to the within-study log-likelihoodsOne-stage meta-analysis is possible if we have individual participant data (IPD), but can be computationally horrible (Smith et al 2005)we’ll use the two-stage method even with IPD
7 Bivariate meta-analysis: data Data from ith study:yi1, yi2 – estimates for 1st, 2nd outcomessi1, si2 – their standard errorsbut we also need the correlation rWi of yi1 and yi2It’s often most convenient to use matrix notation:estimatewith within-study varianceNB yi1 or yi2 can be missing.
8 Bivariate meta-analysis: the model Data from ith study:yi – vector of estimatesSi – variance-covariance matrixModel is yi ~ N(m, Si+S)Total variance = within between variance:knownto be estimated
9 Bivariate meta-analysis: 2 correlations Within-study correlation rWione per studyshould be known from 1st stage of meta-analysisbut often unknown: discussed laterBetween-study correlation rBoverall parameterto be estimated
10 Multivariate meta-analysis: the model Data from ith study:yi – vector of estimates (p-dimensional)Si – variance-covariance matrix (pxp)Model is again yi ~ N(m, Si+S)Can also extend to meta-regression: e.g. yi ~ N(bxi, Si+S)xi is a q–dimensional vector of explanatory variablesb is a pxq matrix containing the regression coefficients for each of the p outcomesmore generally, can allow different x’s for different outcomes
11 When could multivariate meta-analysis be used? (1) Original applications: meta-analysis of randomised controlled trials (RCTs)several outcomes of interestsome trials report more than one outcome“data” are treatment effects on each outcome in each study (some may be missing)data are correlated within studies because outcomes are correlatedalso used in health economics for cost and effect (Pinto et al, 2005)
12 When could multivariate meta-analysis be used? (2) Meta-analysis of diagnostic accuracy studies“data” are sensitivity and specificity in each studydata are uncorrelated within studies because they refer to different subgroupsstill likely to be correlated between studiesSee Roger’s talksparse data often invalidates Normal approximationbest to use metandi
13 When could multivariate meta-analysis be used? (3) Meta-analysis of RCTs comparing more than two treatments“data” are treatment effects for each treatment compared to same controldata are correlated within studies because they use same control groupSimilarly multiple treatments meta-analysismy current area of research
14 When could multivariate meta-analysis be used? (4) Meta-analysis of observational studies exploring shape of exposure-disease relationshipif exposure is categorised, “data” could be contrasts between categoriesif fractional polynomial model is used, “data” would be coefficients of different model terms
15 Stata software for multivariate random-effects meta-analysis Can almost use xtmixedbut you need to constrain the level 1 (co)variancesnot possible in xtmixedSo I wrote mvmeta (White, 2009)
16 My program: mvmetaAnalyses a data set containing point estimates with their (within-study) variances and covariancesUtility mvmeta_make creates a data set in the correct format (demo later)Fits random-effects modeluses ml to maximise the (restricted) likelihood using numerical derivativesbetween-studies variance-covariance matrix is parameterised via its Cholesky decompositionCIs are based on Normal distributionalso offers method of moments estimation (Jackson et al, 2009)
17 Data format for mvmeta: Berkey data trialy1y2V11V22V1210.47-0.320.00750.00770.00320.2-0.60.00570.00080.000930.4-0.120.00210.00140.000740.26-0.310.00290.001550.56-0.390.01480.03040.0072y1, y2 treatment effects for PD, ALV11, V22 squared standard errors (si12, si22)V12 covariance (rWisi1si2)
18 Running mvmeta: Berkey data . mvmeta y VNote: using method remlNote: using variables y1 y2Note: 5 observations on 2 variables[5 iterations]Number of obs =Wald chi2(2) =Log likelihood = Prob > chi2 =| Coef. Std. Err z P>|z| [95% Conf. Interval]Overall_mean |y1 |y2 |Estimated between-studies SDs and correlation matrix:SD y y2yy
19 Running mvmeta: method of moments . mvmeta y V, mmNote: using method mm (truncated)Note: using variables y1 y2Note: 5 observations on 2 variablesMultivariate meta-analysisMethod = mm Number of dimensions = 2Number of observations = 5| Coef. Std. Err z P>|z| [95% Conf. Interval]y1 |y2 |Estimated between-studies SDs and correlation matrix:SD y y2yy
20 Running mvmeta: I2I2 measures the impact of heterogeneity (Higgins & Thompson, 2002). mvmeta1 y V, i2[output omitted]I-squared statistics:Variable I-squared [95% Conf. Interval]y % % %y % % %(computed from estimated between and typical within variances)Requires updated mvmeta1
21 Running mvmeta: meta-regression . mvmeta1 y V publication_year, reml dof(n-2)Note: using method remlNote: using variables y1 y2Note: 5 observations on 2 variablesVariance-covariance matrix: unstructured[4 iterations]Multivariate meta-analysisMethod = reml Number of dimensions = 2Restricted log likelihood = Number of observations = 5Degrees of freedom = 3| Coef. Std. Err z P>|z| [95% Conf. Interval]y |publicatio~r |_cons |y |publicatio~r |_cons |
22 mvmeta: programmingBasic parameters: Cholesky decomposition of the between-studies variance SEliminate fixed parameters from (restricted) likelihoodMaximise using ml, method d0 (can’t use lf for REML)Likelihood now coded in MataStata creates matrices yi , Si for each study & sends them to Mata
23 Estimating the within-study correlation ρwi Sometimes known to be 0e.g. in diagnostic test studies where sens and spec are estimated on different subgroupsEstimation usually requires IPDeven then, not always trivial: e.g. for 2 outcomes in RCTs, can fit seemingly unrelated regressions, or observe ρwi = correlation of the outcomesPublished literature never (?) reports ρwinot the objective of the original studydifficult to estimate from summary dataWhat do we do in a published literature meta-analysis if ρwi values are missing?23
24 Unknown ρwi: possible solutions Ignore within-study correlation (set ρwi = 0)not advisable (Riley, 2009)Sensitivity analysis using a range of valuescan be time-consuming & confusingUse external evidence (e.g. IPD on one study)Bayesian approach (Nam et al., 2004)e.g. ρwi ~U(0,1)Some special cases where it can be done% survival at multiple time-pointsnested binary outcomes?Use an alternative model that models the ‘overall’ correlation (Riley et al., 2008)2424
25 Alternative bivariate model Standard model with overall rB and one rWi per study:Alternative model with one ‘overall’ correlation r:mvmeta1: corr(riley) option25
26 Example: Fibrinogen Fibrinogen Studies Collaboration (2005) assembled IPD from 31 observational studiesparticipantsto explore the association between fibrinogen levels (measured in blood) and coronary heart diseaseWe focus on exploring the shape of the association using grouped fibrinogenData (IPD):Variable fg contains fibrinogen in 5 groupsStudies are identified by variable cohortTime to CHD has been stsetIn each cohort, I want to run the Cox model xi: stcox age i.fg, strata(sex tr)
27 1st stage of meta-analysis: mvmeta_make Getting IPD into the right format can be the hardest bitI wrote mvmeta_make to do thisIt assumes the 1st stage of meta-analysis involves fitting a regression model
28 Fibrinogen data: using mvmeta_make Stata command within each study:xi: stcox age i.fg, strata(sex tr)Create meta-analysis data set:xi: mvmeta_make stcox age i.fg, strata(sex tr) by(cohort) usevars(i.fg) name(b V) saving(FSC2)Creates file FSC2.dta containingcoefficients: b_Ifg_2, b_Ifg_3, b_Ifg_4, b_Ifg_5variances and covariances: V_Ifg_2_Ifg_2, V_Ifg_2_Ifg_3 etc.We then run mvmeta b V on file FSC2.dta.
29 A problem: perfect prediction . tab fg allchd if cohort=="KORA_S3"Fibrinogen | Any CHD event?groups | | Total1 | |2 | |3 | |4 | |5 | |Total | 3, | 3,134No events in the reference categoryFit Cox model: HR for 2 vs 1 is (se 0.91) – wrong
30 mvmeta_make: handling perfect prediction Recall:no events in fg=1 (reference) groupstcox’s “fix” can yield large hazard ratios with small standard errors – and disaster for mvmeta!mvmeta_make implements a different “fix” in any study with perfect prediction:add a few observations, with very small weight, that “break” the perfect predictionall contrasts with fg=1 are large with large s.e.all other contrasts (e.g. fg=3 vs. fg=2) are correctWorks fine for likelihood-based procedures (REML, ML, fixed-effect model) but not for method of moments
31 FSC: partial results of mvmeta_make . l c b* V_Ifg_2_Ifg_2 V_Ifg_3_Ifg_3 , clean noocohort b_Ifg_2 b_Ifg_3 b_Ifg_4 b_Ifg_5 V_Ifg_~2 ~3_Ifg_3ARICBRUNCAERCHSCOPENEASFINRISKIFRAMGOTOGOTOGRIPSHONOLKIHDKORA_SKORA_SMALMO...Study with no events in fg=1 group: “perfect prediction”
32 FSC: results of mvmeta . mvmeta b V Number of obs = 31 Wald chi2(4) =Log likelihood = Prob > chi2 =| Coef. Std. Err. z P>|z| [95% Conf. Int.]Overall_mean |b_Ifg_2 |b_Ifg_3 |b_Ifg_4 |b_Ifg_5 |Estimated between-studies variance matrix Sigma:b_Ifg_2 b_Ifg_3 b_Ifg_4 b_Ifg_5b_Ifg_b_Ifg_b_Ifg_b_Ifg_
33 FSC: graphical results Other choices of reference category give the same results.
34 Example 2: borrowing strength y2>0 ⇒ y1 missingy2<0 ⇒ y1 observedPooling the observed y1 can’t be a good way to estimate m1Bivariate model helps:assumes a linear regression of m1 on m2assumes data are missing at randomBivariate model can avoid bias & increase precision (“Borrowing strength”)StudyLog hazard ratio (mutant vs. normal p53 gene)Disease-free survivalOverall survivaly1s1y2s21-0.580.56-0.1820.790.2430.210.664-1.020.39-0.630.2951.010.486-0.690.40-0.64
35 Multivariate vs. univariate meta-analysis Advantages:“borrowing strength”avoiding bias from selective outcome reportingJoint confidence / prediction intervalsFunctions of estimatesLongitudinal dataCoherenceDisadvantages:more computationally complexboundary solutions for rBunknown within-study correlationsmore assumptions
36 Getting mvmeta mvmeta is in the SJ Current update mvmeta1 is available on my website (includes meta-regression, I2, structured S, speed & other improvements)net frombubble is also available
37 ReferencesBerkey CS et al. Meta-analysis of multiple outcomes by regression with random effects. Statistics in Medicine 1998;17:2537–2550.Fibrinogen Studies Collaboration. Plasma fibrinogen and the risk of major cardiovascular diseases and non-vascular mortality. JAMA 2005; 294: 1799–1809.Higgins J, Thompson S. Quantifying heterogeneity in a meta-analysis. Statistics in Medicine 2002;21:1539–58.Jackson D, White I, Thompson S. Extending DerSimonian and Laird’s methodology to perform multivariate random effects meta-analyses. Statistics in Medicine 2009;28:Kenward MG, Roger JH. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics 1997; 53: 983–997.Nam IS, Mengersen K, Garthwaite P. Multivariate meta-analysis. Statistics in Medicine 2003; 22: 2309–2333.Pinto E, Willan A, O’Brien B. Cost-effectiveness analysis for multinational clinical trials. Statistics in Medicine 2005;24:1965–82.Riley RD. Multivariate meta-analysis: the effect of ignoring within-study correlation. JRSSA 2009;172:Riley RD, Thompson JR, Abrams KR. An alternative model for bivariate random-effects meta-analysis when the within-study correlations are unknown. Biostatistics 2008; 9:Smith CT, Williamson PR, Marson AG. Investigating heterogeneity in an individual patient data meta-analysis of time to event outcomes. Statistics In Medicine 2005;24:1307–1319.White IR. Multivariate random-effects meta-analysis. Stata Journal 2009;9:40–56.