University of Wisconsin School of Medicine and Public Health An Introduction to Bayesian GLM Methods for Cost-Effectiveness Analysis of Primary Data Dave Vanness, Ph.D. University of Wisconsin School of Medicine and Public Health
Introduction In cost-effectiveness analysis conducted alongside clinical trials, it is common to simply calculate the Incremental Cost-Effectiveness Ratio (ICER) or Net Benefit (NB) directly from average costs and outcomes observed in treatment groups. In a classical (“frequentist”) inference, we would construct confidence intervals for the ICER or NB. “95% confidence” does NOT express the probability that an unknown (e.g., the "true" NB) lies within a specific confidence interval. Rather, it represents the likelihood that your procedure of calculating such intervals would capture the true parameter value about 95% of the time when an experiment is repeated over and over.
Is Inference Irrelevant in CEA? “If the objective is to maximise health gain for a given budget, then programmes should be selected based on the posterior mean net benefit irrespective of whether any differences are regarded as statistically significant or fall outside a Bayesian range of equivalence. This is because one of the mutually exclusive alternatives must be chosen and this decision cannot be deferred. The opportunity costs of failing to make the correct decision based on the mean are symmetrical and the historical accident that dictates which of the alternatives is regarded as current practice is irrelevant.” -- Karl Claxton, 1999 p. 347-8.
Overview/Learning Objectives Introduction to Bayesian Analysis Probability as Uncertainty Bayes’ Rule Markov Chain Monte Carlo Analytical Example Hypothetical CEA alongside a trial (simulated dataset) Generalized Linear Models (GLM) of: Cure and Adverse Events - Bernoulli with Probit Link Cost - Gamma with Log Link QALY - Beta with Logistic Link Decision-Theoretic Analysis of Results Goal: The probability that a treatment is cost-effective Posterior ICER, expected net benefit and acceptability
Introduction to Bayesian Analysis
Probability as Uncertainty 1654 – Pascal and Fermat (The “Points Problem”) – how to divide winnings when a sequence of games is interrupted. stochastic 1662, "pertaining to conjecture," from Gk. stokhastikos "able to guess, conjecturing," from stokhazesthai "guess," from stokhos "a guess, aim, target, mark," lit. "pointed stick set up for archers to shoot at" (see sting). [http://www.etymonline.com/index.php?search=stochastic&searchmode=none] 1701?-1761, Rev. Thomas Bayes
Philos. Trans. R. Soc. London, 1763, 53: 370-418.
Bayes’ Rule Posterior Likelihood Prior Normalizing Constant
Is proportional to…
A Simple Bayesian Hierarchical Model Let Yi = 1 if a treatment is successful for individual i, Yi = 0 otherwise. Yi ~ Bernoulli (θ) θ = P(Yi = 1) for all i. All individuals are “exchangeable” members of the population with unknown probability of success, θ. As Bayesians, we represent uncertainty about θ with a probability distribution: θ ~ P(θ) (prior: before observing data) θ ~ P(θ|Y) (posterior: after observing data)
P(θ) Pr(Yi = 1) = E[Yi] = θ θ 1 Prior beliefs for θ (must integrate to 1) θ 1
Suppose we observe one individual… Suppose Y1 = 0. What is P(Y2 = 1 | Y1 = 0)? It’s still θ – but do we now have the same beliefs about θ that we had before? No! We apply Bayes’ Rule to update our prior beliefs.
The Likelihood Function L(Y1 = 0 | θ) 1 θ 1
Applying Bayes’ Rule P(θ) L(Y1 = 1 | θ) 1 1 X θ θ 1 1
P(θ|Y1 = 1) 1 = θ 1 This doesn’t integrate to 1.
Just a normalizing constant: P(Y)
L(Y1 = 1 | θ) 1 θ 1
1 ÷ P(θ|Y1) = θ θ 1 Now it integrates to 1
Now, we observe one more… Suppose Y2 = 1. Now, what is P(Y3 = 1 | Y2 = 1)? It’s still θ. This time, we brought some information with us. Our posterior P(θ|Y1) becomes our new prior, and we apply Bayes’ Rule again.
P(θ) L(Y1 = 1 |θ) 2 1 X θ θ θ 1 1
1 P(θ|Y2) α θ 1
Conjugate Analysis It happens to be that the prior and posterior distributions you just saw are Beta distributions. The likelihood function was Bernoulli (binomial if we observed multiple individuals at once). If you multiply a Beta prior distribution by a Bernoulli/Binomial Likelihood function, you get back a Beta posterior distribution. Posterior(θ|Data) = Beta(NS + a, NF + b) NS = number of observed successes (say, 6) NF = number of observed failures (say, 3) Where prior is Beta(a, b) [a = b = 1 in our example]
Posterior(θ) = Beta(7,4) Prior(θ) = Beta(1,1)
Markov Chain Monte Carlo The Metropolis-Hastings algorithm uses distributions we already know to generate random draws from a Markov Chain whose stationary distribution is P(θ|X). We then collect those draws and analyze them (take their mean, median, etc., or run them through as parameters of a cost-effectiveness simulation, etc.).
P(θj|θ-j,X) = L(X|θj,θ-j) P(θj|θ-j) Gibbs Sampling When θ is multidimensional, it can be useful to break down the joint distribution P(θ|X) into a sequence of “full conditional distributions” P(θj|θ-j,X) = L(X|θj,θ-j) P(θj|θ-j) where “-j” signifies all elements of θ other than j. We can then specify a starting vector θ-j0 and, if P(θj|θ-j,X) is not from a known type of distribution, we can use the Metropolis algorithm to sample from it. Running from j = 1 to M gives one full sample of θ.
θ2 θ02 1 5 4 2 3 θ1
Heterogeneity Our first example was a very simple and homogenous model: every individual’s outcome is drawn from the same distribution. The extreme in the opposite direction (complete heterogeneity) is also fairly simple: Yi ~ Bernoulli(θi) But it’s pretty difficult to extrapolate (make predictions) when there is no systematic variation.
Modeling Heterogeneity Usually, we assume there is a systematic relationship that explains some of the heterogeneity in observed outcomes. The classical normal regression model (which we usually estimate with Ordinary Least Squares) can be thought of as a hierarchical model. Yi = Xiβ + εi Xi is a row vector of individual covariates β is a column vector of parameters εi ~ N(0,σ2) We can also write this as: Yi ~ N(µi,,σ2), µi = Xiβ θ = {β, σ2} ~ P(θ) P(θ) summarizes our knowledge about the joint distribution of unknown parameters. β is probably multidimensional; σ2 is a variance term – has to be positive; this is probably a weird mixture of distributions. Yikes!
http://www.mrc-bsu.cam.ac.uk/bugs/welcome.shtml
Analytical Example: Cost-Effectiveness Analysis
Using MCMC to Conduct CEA We can use Markov Chain Monte Carlo in WinBUGS to estimate models of virtually any type. Draws from the posterior distributions can be used to conduct inference, test simple hypotheses – or can become inputs for policy-relevant simulations. Using the flexibility of WinBUGS, we can also explore the relationships among treatments, covariates, costs and health outcomes using regression analysis with Generalized Linear Models (GLM) and perform probabilistic sensitivity analysis at the same time.
Simulated CEA Dataset We use a combination of real-world variables and simulated relationships. 800 individuals selected at random from 2,452 individuals who self-reported hypertension in the 2005 MEPS-HC Covariates were: Age Sex (Male = 1) BMI We created a latent class that equals 1 if an individual self-reported diabetes; 0 otherwise. The class variable was excluded from all analysis (assumed to be unobservable)
Simulated CEA Dataset Treatment (T = 1) Ti ~ Bernoulli (0.5) Adverse events (AE = 1) AEi ~ Bernoulli (PiAE) PiAE = 0.1 + 0.9*Ti*Classi “Cure” (S = 1) Si ~ Bernoulli (PiS) PiS = 0.8*Ti + 0.1*(1-Ti)
CiX ~ Gamma(4,1/exp(XiβC)) Simulated CEA Dataset Costs were simulated from Gamma-distributions as follows: Ci = CiT*Ti + CiX CiT ~ Gamma(25,1/400) CiX ~ Gamma(4,1/exp(XiβC)) where Xi is a row vector consisting of: 1~Agei~Sexi~AEi~Si and βC is a column vector of parameters: 7|.03|0|1.5|-.5.
Simulated Costs by Treatment Group
Simulated CEA Dataset QALYs were simulated from Beta distributions: Qi ~ Beta(αi,βi) αi = βi exp(XiβQ) βi = 1.2 + 2*Ti where Xi is a row vector consisting of: 1~Agei~Sexi~AEi~Si and βQ is a column vector of parameters: 1|-.01|.25|-1|1.5
Simulated QALYs by Treatment Group
-> t = 0 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- c | 390 7671.389 7850.924 477.4742 65191 q | 390 .6354654 .2557378 .0053504 .9961702 t | 390 0 0 0 0 age | 390 54.7641 5.513782 45 64 male | 390 .4871795 .5004777 0 1 ae | 390 .1230769 .3289475 0 1 s | 390 .1025641 .3037784 0 1 bmi | 390 31.78385 6.994469 16 57.4 class | 390 .2512821 .4343075 0 1 -> t = 1 c | 410 18385.52 8746.241 6958.342 55654.68 q | 410 .7849 .163569 .1765754 .9979415 t | 410 1 0 1 1 age | 410 54.7561 5.482493 45 64 male | 410 .4414634 .4971683 0 1 ae | 410 .297561 .4577439 0 1 s | 410 .802439 .3986455 0 1 bmi | 410 30.90756 6.540621 17.2 56.5 class | 410 .2268293 .4192929 0 1
Sample (n=800) ICER: $74,357/QALY Population (n=2452) ICER: $79,933/QALY Population ICER by (unobserved class): Class 0 (no diabetes): $43,519/QALY Class 1 (diabetes): $589,954/QALY
BMI by Class
GLM Cure and Adverse Event Models (Bernoulli with Probit Link) S ~ Bernoulli(Pis) Pis = Φ(Xisβs) Xis = 1~Agei~BMIi~Sexi~Ti~Ti*(Agei~BMIi~Sexi) AE ~ Bernoulli(PiAE) PiAE = Φ(XiAEβAE) XiAE = 1~Agei~BMIi~Sexi~Ti~Ti*(Agei~BMIi~Sexi) Note: we are using non-informative (“flat”) priors, which give results comparable to Maximum Likelihood. But we could bring outside information into the prior through meta-analysis.
What the BUGS Code Looks Like for ( i in 1 : 800 ){ S[i] ~ dbern(p_S[i]) p_S[i] <- phi(arg.S[i]) arg.S[i] <- max(min(bS[1] + bS[2]*st_AGE[i] + bs[3]*MALE[i] + bS[4]*st_BMI[i] + bS[5]*T[i] + bS[6]*T[i]*st_AGE[i] + bS[7]*T[i]*MALE[i] + bS[8]*T[i]*st_BMI[i],5),-5) AE[i] ~ dbern(p_AE[i]) p_AE[i] <- phi(arg.AE[i]) arg.AE[i] <- max(min(bAE[1] + bAE[2]*st_AGE[i] + bAE[3]*MALE[i] + bAE[4]*st_BMI[i] + bAE[5]*T[i] + bAE[6]*T[i]*st_AGE[i] + bAE[7]*T[i]*MALE[i] + bAE[8] *T[i]*st_BMI[i],5),-5) } Note: For complete code, email dvanness@wisc.edu
GLM Cost Model (Gamma with Log Link) We model cost as a mixture of Gammas (separate distributions of cost with and without treatment). Ci = Ti*Ci1 + (1-Ti)*Ci0 Ci1 ~ Gamma(shapeC1,scaleC1) Ci0 ~ Gamma(shapeC0,scaleC0) Using log function to link mean cost to shape and scale parameters. Ln[mean(C)] = Xβ exp(Ln[mean(C)]) = exp(Xβ) mean(C) = exp(Xβ) shape/scale = exp(Xβ) scale = r/exp(Xβ)
GLM QALY Model (Beta with Logit Link) Rescale Q to [0,1] interval by dividing by maximum possible Q (follow-up time) Qi = Ti*Qi1 + (1-Ti)*Qi0 Qi1 ~ Beta(aQ1, bQ1) Qi0 ~ Beta(aQ0, bQ0) We use the logit function to link mean QALYs to the Beta parameters. mean(Q) = exp(Xβ)/(1+exp(Xβ)) mean(Q) = a/(a + b) a + a exp(Xβ) = a exp(Xβ) + b exp(Xβ) a = b exp(Xβ)
Posterior Credible Interval Posterior Standard Deviation Posterior Inference Posterior Credible Interval Node Posterior Mean Posterior Standard Deviation Monte Carlo Error 2.50% median 97.50% Adverse Events (Bernoulli - Probit) AGE 0.0094 0.0783 0.0025 -0.1363 0.0090 0.1647 BMI -0.0119 0.0029 -0.1665 -0.0095 0.1339 CONSTANT -1.1630 0.1181 0.0054 -1.4030 -1.1610 -0.9381 MALE -0.0121 0.1690 0.0080 -0.3376 -0.0135 0.3178 T 0.6391 0.1487 0.0069 0.3501 0.6405 0.9348 TxAGE 0.0608 0.1023 0.0034 -0.1457 0.0620 0.2571 TxBMI 0.2338 0.1043 0.0037 0.0342 0.2353 0.4395 TxMALE -0.0047 0.2211 0.0104 -0.4334 -0.0014 0.4152 Cure (Bernoulli - Probit) -0.0022 0.0885 0.0031 -0.1753 -0.0013 0.1693 -0.0012 0.0859 -0.1764 0.0002 0.1633 -1.3500 0.1214 0.0055 -1.5990 -1.3490 -1.1180 0.1391 0.1687 0.0077 -0.1796 0.1384 0.4846 2.2460 0.1519 0.0068 1.9600 2.2480 2.5600 0.0326 0.1156 0.0039 -0.1936 0.0336 0.2552 -0.1557 0.1131 0.0040 -0.3741 -0.1560 0.0684 -0.2274 0.2189 0.0097 -0.6529 -0.2242 0.2165
Posterior Credible Interval Posterior Standard Deviation Posterior Inference Posterior Credible Interval Node Posterior Mean Posterior Standard Deviation Monte Carlo Error 2.50% median 97.50% Cost w/o Treatment (Gamma-Log) AE 1.5190 0.0737 0.0015 1.3770 1.6690 AGE 0.1552 0.0240 0.0004 0.1098 0.1550 0.2037 BMI 0.0031 0.0230 -0.0424 0.0032 0.0477 CONSTANT -0.8600 0.0358 0.0012 -0.9289 -0.7893 MALE 0.0277 0.0490 -0.0671 0.0289 0.1231 S -0.5440 0.0775 0.0016 -0.6897 -0.5430 -0.3867 SHAPE 4.5590 0.3130 0.0029 3.9700 4.5530 5.1890 Cost w/Treatment (Gamma - Log) 0.7035 0.0253 0.0006 0.6544 0.7544 0.0746 0.0114 0.0002 0.0522 0.0747 0.0971 0.0147 0.0121 -0.0080 0.0144 0.0390 0.2002 0.0303 0.1382 0.2013 0.2563 0.0229 -0.0420 0.0472 -0.1725 0.0299 0.0014 -0.2294 -0.1736 -0.1133 18.3500 1.2890 0.0119 15.9000 18.3300 20.9600
Posterior Credible Interval Posterior Standard Deviation Posterior Inference Posterior Credible Interval Node Posterior Mean Posterior Standard Deviation Monte Carlo Error 2.50% median 97.50% QALY w/o Treatment (Beta - Logit) AE -1.1760 0.1578 0.0030 -1.4950 -1.1740 -0.8751 AGE -0.0193 0.0493 0.0009 -0.1144 -0.0205 0.0795 BMI -0.0015 0.0479 0.0010 -0.0954 -0.0017 0.0927 CONSTANT 0.5294 0.0733 0.0023 0.3849 0.5300 0.6753 MALE 0.1366 0.0963 -0.0565 0.1368 0.3198 S 1.3100 0.1615 0.9928 1.3160 1.6240 β 1.1600 0.0756 0.0008 1.0170 1.1580 1.3110 QALY w/Treatment (Beta - Logit) -0.9542 0.0677 0.0015 -1.0880 -0.9524 -0.8262 -0.0092 0.0300 0.0006 -0.0673 -0.0088 0.0484 0.0865 0.0305 0.0298 0.0862 0.1478 0.4160 0.0797 0.0041 0.2625 0.4129 0.5812 0.2812 0.0577 0.0017 0.1621 0.2827 0.3906 1.4720 0.0799 0.0040 1.3120 1.4760 1.6230 3.4980 0.2322 0.0022 3.0580 3.4950 3.9740
Subgroup Simulations Within WinBUGS, we take the draws from the parameter posteriors and, for a hypothetical individual (X profile): Assign to No Treatment (T = 0) Simulate cure or no cure Simulate adverse event Simulate cost and QALY, given simulated cure and adverse event status Assign to Treatment (T = 1) Repeat cure, adverse event, cost and QALY simulations Repeat, say, 1,000 times and calculate average incremental costs and QALYs. The following ICERs apply to a female (Sexi = 0) of average age (z-transformed Agei = 0) at 5 different levels of z-transformed BMIi = {-2,-1,0,1,2}
Posterior ICERs by BMI z-score
Posterior Acceptability (BMI = -2) (WTP from $0 to $200,000)
Posterior Acceptability (BMI = 2) (WTP from $0 to $200,000)
Posterior Net Benefit (WTP from $0 to $200,000) BMI = -2 BMI = -1 BMI = 0 BMI = 1 BMI = 2
References George Woodworth’s book “Biostatistics: A Bayesian Introduction” (Wiley-Interscience, 2004 ISBN: 0471468428 9780471468424 has an excellent WinBUGS tutorial (Appendix B), the text of which may be found here: http://www.stat.uiowa.edu/~gwoodwor/BBIText/AppendixBWinbugs.pdf Carlin BP, TA Louis. Bayes and Empirical Bayes Methods for Data Analysis. 2nd Edition, London: Chapman & Hall, 2000. Claxton K. The irrelevance of inference: a decision-making approach to the stochastic evaluation of health care technologies. J.Health Econ. 1999 Jun;18(3):341-364. Fryback DG, NK Stout, MA Rosenberg. An Elementary Introduction to Bayesian Computing Using WinBUGS. International Journal of Technology Assessment in Health Care, 2001;17(1):98-113. Gelman A and J Hill. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press, 2007. Gilks WR, S Richardson, DG Spiegelhalter. Markov Chain Monte Carlo in Practice. London: Chapman & Hall, 1996. Hastie T, R Tibshirani and J Friedman. The Elements of Statistical Learning. Data Mining, Inference and Prediction. Springer, 2002. Luce BR, Claxton K. Redefining the analytical approach to pharmacoeconomics. Health Econ. 1999 May;8(3):187-189. Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS-A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing 2000;10(4):325-337. O'Hagan A, Stevens JW, Montmartin J. Bayesian cost-effectiveness analysis from clinical trial data. Stat.Med. 2001 Mar 15;20(5):733-753. Skrepnek GH. The contrast and convergence of Bayesian and frequentist statistical approaches in pharmacoeconomic analysis. Pharmacoeconomics 2007;25(8):649-664. Spiegelhalter DJ, Myles JP, Jones DR, Abrams KR. Bayesian methods in health technology assessment: a review. Health Technol.Assess. 2000;4(38):1-130. Tanner MA. Tools for Statistical Inference. Methods for the Exploration of Posterior Distributions and Likelihood Functions. 3d ed. Springer, 1996. Vanness DJ and Kim WR. “Empirical Modeling, Simulation and Uncertainty Analysis Using Markov Chain Monte Carlo: Ganciclovir Prophylaxis in Liver Transplantation,” Health Economics, 2002: 11(6), 551-566.