Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applied Bayesian Inference, KSU, April 29, 2012 §  / §❹ The Bayesian Revolution: Markov Chain Monte Carlo (MCMC) Robert J. Tempelman 1.

Similar presentations


Presentation on theme: "Applied Bayesian Inference, KSU, April 29, 2012 §  / §❹ The Bayesian Revolution: Markov Chain Monte Carlo (MCMC) Robert J. Tempelman 1."— Presentation transcript:

1 Applied Bayesian Inference, KSU, April 29, 2012 §  / §❹ The Bayesian Revolution: Markov Chain Monte Carlo (MCMC) Robert J. Tempelman 1

2 Applied Bayesian Inference, KSU, April 29, 2012 §  / Simulation-based inference Suppose you’re interested in the following integral/expectation: You can draw random samples x 1,x 2,…,x n from f(x). Then compute With Monte Carlo Standard Error: As n →  2 f(x): density g(x): function.

3 Applied Bayesian Inference, KSU, April 29, 2012 §  / Beauty of Monte Carlo methods You can determine the distribution of any function of the random variable(s). Distribution summaries include: – Means, – Medians, – Key Percentiles (2.5%, 97.5%) – Standard Deviations, – Etc. Generally more reliable than using “Delta method” especially for highly non-normal distributions. 3

4 Applied Bayesian Inference, KSU, April 29, 2012 §  / Using method of composition for sampling (Tanner, 1996). Involve two stages of sampling. Example: – Suppose Y i | i ~Poisson( i ) – In turn., i | ,  ~ Gamma( ,  ) – Then 4 negative binomial distribution with mean  /  and variance (  /  )(1+  -1 ).

5 Applied Bayesian Inference, KSU, April 29, 2012 §  / Using method of composition for sampling from negative binomial: data new; seed1 = 2; alpha = 2; beta = 0.25; do j = 1 to 10000; call rangam(seed1,alpha,x); lambda = x/beta; call ranpoi(seed1,lambda,y); output; end; run; proc means mean var; var y; run; 5 1.Draw i | ,  ~ Gamma( ,  ). 2.Draw Y i ~Poisson( i ) The MEANS Procedure VariableMeanVariance y7.974939.2638 E(y) =  /  /  Var(y) = (  /  )(1+  -1 ) = 8*(1+4)=40

6 Applied Bayesian Inference, KSU, April 29, 2012 §  / Another example? Student t. data new; seed1 = 29523; df=4; do j = 1 to 100000; call rangam(seed1,df/2,x); lambda = x/(df/2); t = rannor(seed1)/sqrt(lambda); output; end; run; proc means mean var p5 p95; var t; run; data new; t5 = tinv(.05,4); t95 = tinv(.95,4); run; proc print; run; 6 1.Draw i | ~ Gamma( ,  ). 2.Draw t i | i ~Normal(0,1/ i ) Then t ~ Student t VariableMeanVariance5th Pctl95th Pctl t-0.005242.011365-2.13762.122201 Obst5t95 1-2.13192.13185

7 Applied Bayesian Inference, KSU, April 29, 2012 §  / Expectation-Maximization (EM) Ok, I know that EM is NOT a simulation-based inference procedure. – However, it is based on data augmentation. Important progenitor of Markov Chain Monte Carlo (MCMC) methods – Recall the plant genetics example 7

8 Applied Bayesian Inference, KSU, April 29, 2012 §  / Data augmentation Augment “data” by splitting first cell into two cells with probabilities ½ and  /4 for 5 categories: Looks like a Beta Distribution to me! 8

9 Applied Bayesian Inference, KSU, April 29, 2012 §  / Data augmentation (cont’d) So joint distribution of “complete” data: Consider the part just including the “missing data” binomial 9

10 Applied Bayesian Inference, KSU, April 29, 2012 §  / Expectation-Maximization. Start with complete log-likelihood: 1. Expectation (E-step) 10

11 Applied Bayesian Inference, KSU, April 29, 2012 §  / 2. Maximization step – Use first or second derivative methods to maximize – Set to 0: 11

12 Applied Bayesian Inference, KSU, April 29, 2012 §  / Recall the data ProbabilityGenotypeData (Counts) Prob(A_B_)y 1 =1997 Prob(aaB_)y 2 =906 Prob(A_bb)y 3 =904 Prob(aabb)y 4 =32 0    1  → 0: close linkage in repulsion  → 1: close linkage in coupling 12

13 Applied Bayesian Inference, KSU, April 29, 2012 §  / PROC IML code: proc iml; y1 = 1997; y2 = 906; y3 = 904; y4 = 32; theta = 0.20; /*Starting value */ do iter = 1 to 20; Ex2 = y1*(theta)/(theta+2); /* E-step */ theta = (Ex2+y4)/(Ex2+y2+y3+y4);/* M-step */ print iter theta; end; run; itertheta 10.1055303 20.0680147 30.0512031 40.0432646 50.0394234 60.0375429 70.036617 80.0361598 90.0359338 100.0358219 110.0357666 120.0357392 130.0357256 140.0357189 150.0357156 160.0357139 170.0357131 180.0357127 190.0357125 200.0357124 13 Slower than Newton-Raphson/Fisher scoring…but generally more robust to poorer starting values.

14 Applied Bayesian Inference, KSU, April 29, 2012 §  / How derive an asymptotic standard error using EM? From Louis (1982): Given: 14

15 Applied Bayesian Inference, KSU, April 29, 2012 §  / Finish off Now Hence: 15

16 Applied Bayesian Inference, KSU, April 29, 2012 §  / Stochastic Data Augmentation (Tanner, 1996) Posterior Identity Predictive Identity Implies Transition function for Markov Chain 16 Suggests an “iterative” method of composition approach for sampling

17 Applied Bayesian Inference, KSU, April 29, 2012 §  / Sampling strategy from p(  |y) Start somewhere: (starting value  =   ) – Sample x [1] from – Sample   ] from – Sample x [2] from – Sample   ] ] from – etc. – It’s like sampling from “E-steps” and “M-steps” Cycle 1 Cycle 2 17

18 Applied Bayesian Inference, KSU, April 29, 2012 §  / What are these Full Conditional Densities (FCD) ? Recall “complete” likelihood function Assume prior on  is “flat” : FCD: Beta(  =(y 1 -x +y 4 +1),  =(y 2 +y 3 +1)) Binomial(n=y 1, p = 2/(  +2)) 18

19 Applied Bayesian Inference, KSU, April 29, 2012 §  / IML code for Chained Data Augmentation Example proc iml; seed1=4; ncycle = 10000; /* total number of samples */ theta = j(ncycle,1,0); y1 = 1997; y2 = 906; y3 = 904; y4 = 32; beta = y2+y3+1; theta[1] = ranuni(seed1); /* initial draw between 0 and 1 */ do cycle = 2 to ncycle; p = 2/(2+theta[cycle-1]); xvar= ranbin(seed1,y1,p); alpha = y1+y4-xvar+1; xalpha = rangam(seed1,alpha); xbeta = rangam(seed1,beta); theta[cycle] = xalpha/(xalpha+xbeta); end; create parmdata var {theta xvar }; append; run; data parmdata; set parmdata; cycle = _n_; run; 19 Starting value

20 Applied Bayesian Inference, KSU, April 29, 2012 §  / Trace Plot proc gplot data=parmdata; plot theta*cycle; run; Burn -in? “bad” starting value Should discard the first “few” samples to ensure that one is truly sampling from p(  |y) Starting value should have no impact. “Convergence in distribution”. How to decide on this stuff? Cowles and Carlin (1996) 20 Throw away the first 1000 samples as “burn-in”

21 Applied Bayesian Inference, KSU, April 29, 2012 §  / Histogram of samples post burn-in proc univariate data=parmdata ; where cycle > 1000; var theta ; histogram/normal(color=red mu=0.0357 sigma=0.0060); run; Bayesian inference N9000 Posterior Mean0.03671503 Post. Std Deviation0.00607971 Quantiles for Normal Distribution PercentQuantile Observed (Bayesian) Asymptotic (Likelihood) 5.00.027020.02583 95.00.047280.04557 Asymptotic Likelihood inference 21

22 Applied Bayesian Inference, KSU, April 29, 2012 §  / Zooming in on Trace Plot Hints of autocorrelation. Expected with Markov Chain Monte Carlo simulation schemes. Number of drawn samples is NOT equal number of independent draws. 22 The greater the autocorrelation…the greater the problem…need more samples!

23 Applied Bayesian Inference, KSU, April 29, 2012 §  / Sample autocorrelation Autocorrelation Check for White Noise To LagChi- Square DFPr > ChiSq Autocorrelations 63061.396<.00010.4970.2530.1410.0790.0450.029 proc arima data=parmdata plots(only)=series(acf); where cycle > 1000; identify var= theta nlag=1000 outcov=autocov ; run; 23

24 Applied Bayesian Inference, KSU, April 29, 2012 §  / How to estimate the effective number of independent samples (ESS) Consider posterior mean based on m samples: Initial positive sequence estimator (Geyer, 1992; Sorensen and Gianola, 1995): Lag-m autocovariance 24 Sum of adjacent lag autocovariances variance

25 Applied Bayesian Inference, KSU, April 29, 2012 §  / Initial positive sequence estimator Choose t such that all SAS PROC MCMC chooses a slightly different cutoff (see documentation). 25 Extensive autocorrelation across lags…..leads to smaller ESS

26 Applied Bayesian Inference, KSU, April 29, 2012 §  / SAS code %macro ESS1(data,variable,startcycle,maxlag); data _null_; set &data nobs=_n;; call symputx('nsample',_n); run; proc arima data=&data ; where iteration > &startcycle; identify var= &variable nlag=&maxlag outcov=autocov ; run; proc iml; use autocov; read all var{'COV'} into cov; nsample = &nsample; nlag2 = nrow(cov)/2; Gamma = j(nlag2,1,0); cutoff = 0; t = 0; do while (cutoff = 0); t = t+1; Gamma[t] = cov[2*(t-1)+1] + cov[2*(t-1)+2]; if Gamma[t] < 0 then cutoff = 1; if t = nlag2 then do; print "Too much autocorrelation"; print "Specify a larger max lag"; stop; end; varm = (-Cov[1] + 2*sum(Gamma)) / nsample; ESS = Cov[1]/varm; /* effective sample size */ stdm = sqrt(varm); parameter = "&variable"; /* Monte Carlo standard error */ print parameter stdm ESS; run; %mend ESS1; 26 Recall: 9000 MCMC post burnin cycles.

27 Applied Bayesian Inference, KSU, April 29, 2012 §  / Executing %ESS1 %ESS1(parmdata,theta,1000,1000); 27 Recall: 1000 MCMC burnin cycles. parameterstdmESS theta0.00011162967.1289 i.e. information equivalent to drawing 2967 independent draws from density.

28 Applied Bayesian Inference, KSU, April 29, 2012 §  / How large of an ESS should I target? Routinely…in the thousands or greater. Depends on what you want to estimate. – Recommend no less than 100 for estimating “typical” location parameters: mean, median, etc. – Several times that for “typical” dispersion parameters like variance. Want to provide key percentiles? – i.e., 2.5 th, 97.5 th percentiles? Need to have ESS in the thousands! – See Raftery and Lewis (1992) for further direction. 28

29 Applied Bayesian Inference, KSU, April 29, 2012 §  / Worthwhile to consider this sampling strategy? Not too much difference, if any, with likelihood inference. But how about smaller samples? – e.g., y 1 =200,y 2 =91,y 3 =90,y 4 =3 – Different story 29

30 Applied Bayesian Inference, KSU, April 29, 2012 §  / Gibbs sampling: origins (Geman and Geman, 1984). Gibbs sampling was first developed in statistical physics in relation to spatial inference problem – Problem: true image  was corrupted by a stochastic process to produce an observable image y (data) Objective: restore or estimate the true image  in the light of the observed image y. – Inference on  based on the Markov random field joint posterior distribution, through successively drawing from updated FCD which were rather easy to specify. – These FCD each happened to be the Gibbs distn’s. Misnomer has been used since to describe a rather general process. 30

31 Applied Bayesian Inference, KSU, April 29, 2012 §  / Gibbs sampling Extension of chained data augmentation for case of several unknown parameters. Consider p = 3 unknown parameters: Joint posterior density Gibbs sampling: MCMC sampling strategy where all FCD are recognizeable: 31

32 Applied Bayesian Inference, KSU, April 29, 2012 §  / Gibbs sampling: the process 1) Start with some “arbitrary” starting values (but within allowable parameter space) 2) Draw from 3) Draw from 4) Draw from 5) Repeat steps 2)-4) m times. Steps 2-4 constitute one cycle of Gibbs sampling m: length of Gibbs chain 32 One cycle = one random draw from

33 Applied Bayesian Inference, KSU, April 29, 2012 §  / General extension of Gibbs sampling When there are d parameters and/or blocks of parameters: Again specify starting values: Sample from the FCD’s in cycle i Sample  1 (k+1) from Sample  2 (k+1) from … Sample  d (k+1) from 33 Generically, sample  i from

34 Applied Bayesian Inference, KSU, April 29, 2012 §  / Throw away enough burn-in samples (k { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/14/4272733/slides/slide_34.jpg", "name": "Applied Bayesian Inference, KSU, April 29, 2012 §  / Throw away enough burn-in samples (k

35 Applied Bayesian Inference, KSU, April 29, 2012 §  / Mixed model example with known variance components, flat prior on . Recall: – where Write – i.e. 35 ALREADY KNOW JOINT POSTERIOR DENSITY!

36 Applied Bayesian Inference, KSU, April 29, 2012 §  / FCD for mixed effects model with known variance components Ok..really pointless to use MCMC here..but let’s demonstrate. But it be can shown FCD are: where 36 i th row i th column i th diagonal element

37 Applied Bayesian Inference, KSU, April 29, 2012 §  / Two ways to sample  and u 1. Block draw from – faster MCMC mixing (less/no autocorrelation across MCMC cycles) – But slower computing time (depending on dimension of  ). i.e. compute Cholesky of C Some alternative strategies available (Garcia-Cortes and Sorensen, 1995) 2. Series of univariate draws from – Faster computationally. – Slower MCMC mixing Partial solution: “thinning the MCMC chain” e.g., save every 10 cycles rather than every cycle 37

38 Applied Bayesian Inference, KSU, April 29, 2012 §  / Example: A split plot in time example (Data from Kuehl, 2000, pg.493) Experiment designed to explore mechanisms for early detection of phlebitis during amiodarone therapy. – Three intravenous treatments: (A1) Amiodarone (A2) the vehicle solution only (A3) a saline solution. – 5 rabbits/treatment in a completely randomized design. – 4 repeated measures/animal (30 min. intervals) 38

39 Applied Bayesian Inference, KSU, April 29, 2012 §  / SAS data step data ear; input trt rabbit time temp; y = temp; A = trt; B = time; trtrabbit = compress(trt||'_'||rabbit); wholeplot=trtrabbit; cards; 1 1 1 -0.3 1 1 2 -0.2 1 1 3 1.2 1 1 4 3.1 1 2 1 -0.5 1 2 2 2.2 1 2 3 3.3 1 2 4 3.7etc. 39

40 Applied Bayesian Inference, KSU, April 29, 2012 §  / The data (“spaghetti plot”) 40

41 Applied Bayesian Inference, KSU, April 29, 2012 §  / Profile (Interaction) means plots 41

42 Applied Bayesian Inference, KSU, April 29, 2012 §  / A split plot model assumption for repeated measures Treatment 1 Rabbit 3 Rabbit 2 Rabbit 1 Time 1 Time 2 Time 3 Time 4 Time 1 Time 2 Time 3 Time 4 Time 1 Time 2 Time 3 Time 4 RABBIT IS THE EXPERIMENTAL UNIT FOR TREATMENT RABBIT IS THE BLOCK FOR TIME 42

43 Applied Bayesian Inference, KSU, April 29, 2012 §  / Suppose CS assumption was appropriate CONDITIONAL SPECIFICATION: Model variation between experimental units (i.e. rabbits) – This is a partially nested or split-plot design. i.e. for treatments, rabbits is the experimental unit;  for time, rabbits is the block! 43

44 Applied Bayesian Inference, KSU, April 29, 2012 §  / Analytical (non-simulation) Inference based on PROC MIXED Let’s assume “known” Flat priors on fixed effects p(  )  1. title 'Split Plot in Time using Mixed'; title2 'Known Variance Components'; proc mixed data=ear noprofile; class trt time rabbit; model temp = trt time trt*time /solution; random rabbit(trt); parms (0.1) (0.6) /hold = 1,2; ods output solutionf = solutionf; run; proc print data=solutionf; where estimate ne 0; run; 44

45 Applied Bayesian Inference, KSU, April 29, 2012 §  / (Partial) Output ObsEffecttrttimeEstimateStdErrDF 1Intercept__0.22000.374212 2trt1_2.36000.529212 3trt2_-0.22000.529212 5time_1-0.90000.489936 6time_20.020000.489936 7time_3-0.64000.489936 9trt*time11-1.92000.692836 10trt*time12-1.22000.692836 11trt*time13-0.060000.692836 13trt*time210.32000.692836 14trt*time22-0.54000.692836 15trt*time230.58000.692836 45

46 Applied Bayesian Inference, KSU, April 29, 2012 §  / MCMC inference First set up dummy variables. /* Based on the zero out last level restrictions */ proc transreg data=ear design order =data; model class(trt|time / zero=last); id y trtrabbit; output out=recodedsplit; run; proc print data=recodedsplit (obs=10); var intercept &_trgind; run; 46 Corner parameterization implicit in SAS linear model s software

47 Applied Bayesian Inference, KSU, April 29, 2012 §  / Partial Output (First two rabbits) Obs_NA ME_ Inter cept trt1trt2time 1 time 2 time 3 Trt1 time 1 Trt1 time 2 Trt1 time 3 Trt2 time 1 Trt2 time 2 Trt2 time 3 trttimeytrtrab bit 1-0.311010010000011 1_1 2-0.211001001000012 1_1 31.211000100100013 1_1 43.111000000000014 1_1 5-0.511010010000011 1_2 62.211001001000012 1_2 73.311000100100013 1_2 83.711000000000014 1_2 9-1.111010010000011 1_3 102.411001001000012 1_3 Part of X matrix (full-rank) 47

48 Applied Bayesian Inference, KSU, April 29, 2012 §  / MCMC using PROC IML proc iml; seed = &seed; nburnin = 5000; /* number of burn in samples */ total = 200000;/* total number of Gibbs cycles beyond burnin */ thin= 10;/* saving every “thin" */ ncycle = total/skip;/* leaving a total of ncycle saved samples */ Full code available online 48

49 Applied Bayesian Inference, KSU, April 29, 2012 §  / Key subroutine (univariate sampling) start gibbs; /* univariate Gibbs sampler */ do j = 1 to dim; /* dim = p + q */ /* generate from full conditionals for fixed and random effects */ solt = wry[j] - coeff[j,]*solution + coeff[j,j]*solution[j]; solt = solt/coeff[j,j]; vt = 1/coeff[j,j]; solution[j] = solt + sqrt(vt)*rannor(seed); end; finish gibbs; 49

50 Applied Bayesian Inference, KSU, April 29, 2012 §  / Output samples to SAS data set called soldata proc means mean median std data=soldata; run; ods graphics on; %tadplot(data=soldata, var=_all_); ods graphics off; %tadplot is a SAS automacro suited for processing MCMC samples. 50

51 Applied Bayesian Inference, KSU, April 29, 2012 §  / Comparisons for fixed effects 51 MCMC (Some Monte Carlo error) EXACT (PROC MIXED) EffecttrttimeEstimateStdErr Intercept__0.22000.3742 trt1_2.36000.5292 trt2_-0.22000.5292 time_1-0.90000.4899 time_20.020000.4899 time_3-0.64000.4899 trt*time11-1.92000.6928 trt*time12-1.22000.6928 trt*time13-0.060000.6928 trt*time210.32000.6928 trt*time22-0.54000.6928 trt*time230.58000.6928 VariableMeanMedianStd DevN int0.218 0.37420000 TRT12.3652.3680.52620000 TRT2-0.22-0.2150.53220000 TIME1-0.902-0.9030.49520000 TIME20.02250.02030.49120000 TIME3-0.64-0.6430.48820000 TRT1 TIME1 -1.915-1.9160.69220000 TRT1 TIME2 -1.224-1.2190.6920000 TRT1 TIME3 -0.063-0.0660.69620000 TRT2 TIME1 0.3210.3160.70120000 TRT2 TIME2 -0.543-0.540.69620000 TRT2 TIME3 0.580.5890.69420000

52 Applied Bayesian Inference, KSU, April 29, 2012 §  / %TADPLOT output on “intercept”. 52 Trace Plot Autocorrelation Plot Posterior Density

53 Applied Bayesian Inference, KSU, April 29, 2012 §  / Marginal/Cell Means Effects on previous 2-3 slides not of particular interest. Marginal means: – Can derive using contrast vectors that are used to compute least squares means in PROC GLM/MIXED/GLIMMIX etc. lsmeans trt time trt*time / e; –  Ai : marginal mean for trt i –  Bj : marginal mean for time j –  AiBj : cell mean for trt i time j. 53

54 Applied Bayesian Inference, KSU, April 29, 2012 §  / Examples of marginal/cell means Marginal means Cell mean 54

55 Applied Bayesian Inference, KSU, April 29, 2012 §  / Marginal/cell (“LS”) means. VariableMeanMedianStd Dev A11.4031.4010.223 A2-0.293-0.2920.223 A3-0.162 0.224 B1-0.501-0.50.216 B20.3660.3650.213 B30.4650.4660.217 B40.9320.9310.216 A1B1-0.234-0.2310.373 A1B21.382 0.371 A1B31.881.8780.374 A1B42.583 0.372 A2B1-0.584-0.5850.375 A2B2-0.524-0.5260.373 A2B3-0.062-0.0580.373 A2B4-0.003-0.0050.377 A3B1-0.684 0.377 A3B20.240.2420.374 A3B3-0.422-0.4230.376 A3B40.218 0.374 trttimeEstimateStandard Error 11.40.2236 2-0.290.2236 3-0.160.2236 1-0.50.216 20.36670.216 30.46670.216 40.93330.216 11-0.240.3742 121.380.3742 131.880.3742 142.580.3742 21-0.580.3742 22-0.520.3742 23-0.060.3742 24-3.61E-160.3742 31-0.680.3742 320.240.3742 33-0.420.3742 340.220.3742 55 MCMC (Monte Carlo error) EXACT (PROC MIXED)

56 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior densities of  a1,  b1,  a1b1. 56 Dotted lines: normal density inferences based on PROC MIXED Closed lines: MCMC

57 Applied Bayesian Inference, KSU, April 29, 2012 §  / Generalized linear mixed models (Probit Link Model) Stage 1: Stage 2: Stage 3: 57

58 Applied Bayesian Inference, KSU, April 29, 2012 §  / Rethinking prior on  i.e. – Might not be the best idea for binary data, especially when the data is “sparse” Animal breeders call this the “extreme category problem” – e.g., if all of responses in a fixed effects subclass is either 1 or 0, then ML/PM of corresponding marginal mean will approach -/+ ∞. – PROC LOGISTIC has the FIRTH option for this very reason. Alternative: – Typically, 58 16 <  2  < 50 is probably sufficient on the underlying latent scale (conditionally N(0,1))

59 Applied Bayesian Inference, KSU, April 29, 2012 §  / Recall Latent Variable Concept (Albert and Chib, 1993) Recall Suppose for animal i Then 59

60 Applied Bayesian Inference, KSU, April 29, 2012 §  / Data augmentation with ={ i }, i.e. 60 distribution of Y becomes degenerate or point mass in form conditional on

61 Applied Bayesian Inference, KSU, April 29, 2012 §  / Rewrite hierarchical model Stage 1a) Stage 1b) Those two stages define likelihood function 61

62 Applied Bayesian Inference, KSU, April 29, 2012 §  / Joint Posterior Density Now Let’s for now assume known  2 u : 62

63 Applied Bayesian Inference, KSU, April 29, 2012 §  / FCD Liabilities: if y i = 1 if y i = 0 63 i.e., draw from truncated normals

64 Applied Bayesian Inference, KSU, April 29, 2012 §  / FCD (cont’d) Fixed and random effects 64 where

65 Applied Bayesian Inference, KSU, April 29, 2012 §  / Alternative Sampling strategies for fixed and random effects 1. Joint multivariate draw from – faster mixing…but computationally expensive? 2. Univariate draws from FCD using partitioned matrix results. – Refer to Slides # 36, 37, 49 – Slower mixing. 65

66 Applied Bayesian Inference, KSU, April 29, 2012 §  / Recall “binarized” RCBD 66

67 Applied Bayesian Inference, KSU, April 29, 2012 §  / MCMC analysis 5000 burn-in cycles 500,000 additional cycles – Saving every 10: 50,000 saved cycles Full conditional univariate sampling on fixed and random effects. “Known”  2 u = 0.50. Remember…no  2 e. 67

68 Applied Bayesian Inference, KSU, April 29, 2012 §  / Fixed Effect Comparison on inferences (conditional on “known”  2 u = 0.50) MCMC PROC GLIMMIX 68 Solutions for Fixed Effects EffectdietEstimateStandard Error Intercept0.30970.4772 diet1-0.59350.5960 diet20.67610.6408 diet3-0.90190.6104 diet40.67750.6410 diet50. VariableMeanMedianStd DevN intercept0.3490.3450.50650000 DIET1-0.659-0.6540.6450000 DIET20.7610.750.68250000 DIET3-0.9930.64950000 DIET40.760.7530.68650000

69 Applied Bayesian Inference, KSU, April 29, 2012 §  / Marginal Mean Comparisons Based on K’  69 diet Least Squares Means dietEstimateStandard Error 1-0.28380.4768 20.98580.5341 3-0.59220.4939 40.98720.5343 50.30970.4772 MCMC PROC GLIMMIX VariableMeanMedianStd DevN mm1-0.31-0.3020.49950000 mm21.111.0970.56250000 mm3-0.651-0.6440.51550000 mm41.1091.0920.56350000 mm50.3490.3450.50650000

70 Applied Bayesian Inference, KSU, April 29, 2012 §  / Diet 1 Marginal Mean (  +  1 ) 70

71 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior Density discrepancy between MCMC and Empirical Bayes for  i ? 71 Diet Marginal Means Dotted lines: normal approximation based on PROC GLIMMIX Closed lines: MCMC Do we run the risk of overstating precision with conventional methods?

72 Applied Bayesian Inference, KSU, April 29, 2012 §  / How about probabilities of success? i.e.,  (K’  ) or normal cdf of marginal means 72 dietEstimateStandard Error MeanStandard Error Mean 1-0.28380.47680.38830.1827 20.98580.53410.83790.1311 3-0.59220.49390.27690.1653 40.98720.53430.83820.1309 50.30970.47720.62160.1815 VariableMeanMedianStd DevN prob10.3910.3810.17320000 prob20.8330.8640.12620000 prob30.2820.260.15720000 prob40.8330.8630.12620000 prob50.6230.6350.17320000 MCMC PROC GLIMMIX DELTA METHOD

73 Applied Bayesian Inference, KSU, April 29, 2012 §  / Comparison of Posterior Densities for Diet Marginal Mean Probabilities 73 Dotted lines: normal approximation based on PROC GLIMMIX Closed lines: MCMC Largest discrepancies along the boundaries

74 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior density of  (  +  1 ) &  (  +  2 ) 74 (+2)(+2) (+1)(+1)

75 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior density of  (  +  2 ) -  (  +  1 ) 75 Probability (  (  +  2 ) -  (  +  1 ) < 0) = 0.0164 “Two-tailed” P- value = 2*0.0164 = 0.0328 prob21_diffFrequencyPercent prob21_diff < 08191.64 prob21_diff >= 04918198.36

76 Applied Bayesian Inference, KSU, April 29, 2012 §  / How does that compare with PROC GLIMMIX? Estimates LabelEstimateStandard Error DFt ValuePr > |t|MeanStandard Error Mean diet 1 lsmean -0.28380.476810000-0.600.55170.38830.1827 diet 2 lsmean 0.98580.5341100001.850.06500.83790.1311 diet1 vs diet2 dif -1.26970.643310000-1.970.0484Non-est. 76 Recall, we assumed “known”  2 u …hence normal rather than t-distributed test statistic.

77 Applied Bayesian Inference, KSU, April 29, 2012 §  / What if variance components are not known? Specify priors on variance components: Options? – 1. Conjugate (Scaled Inverted Chi-Square) denoted as  -2 ( m, m s m 2 )) – 2. Flat (and bounded as well?) – 3. Gelman’s (2006) prior 77

78 Applied Bayesian Inference, KSU, April 29, 2012 §  / Relationship between Scaled Inverted Chi-Square & Inverted Gamma Scaled Inverted Chi- square: Inverted Gamma 78 Gelman’s prior

79 Applied Bayesian Inference, KSU, April 29, 2012 §  / Gibbs sampling and mixed effects models Recall the following hierarchical model: 79

80 Applied Bayesian Inference, KSU, April 29, 2012 §  / Joint Posterior Density and FCD FCD for  and u: same as before: normal FCD for VC:  -2 80

81 Applied Bayesian Inference, KSU, April 29, 2012 §  / Back to Split Plot in Time Example Empirical Bayes (EGLS based on REML) title 'Split Plot in Time using Mixed'; title2 'UnKnown Variance Components'; proc mixed data=ear covtest ; class trt time rabbit; model temp = trt time trt*time /solution; random rabbit(trt); ods output solutionf = solutionf; run; proc print data=solutionf; where estimate ne 0; run; Fully Bayes: 5000 burnin-cycles 200000 subsequent cycles Save every 10 post burn-in Use Gelman’s prior on VC 81 Code available online

82 Applied Bayesian Inference, KSU, April 29, 2012 §  / Variance component inference Covariance Parameter Estimates Cov ParmEstimateStandard Error Z ValuePr > Z rabbit(trt)0.083360.099100.840.2001 Residual0.57830.13634.24<.0001 82 MCMC PROC MIXED VariableMeanMedianStd DevN sigmau0.1270.08690.14120000 sigmae0.6320.6110.1520000

83 Applied Bayesian Inference, KSU, April 29, 2012 §  / MCMC plots 83 Random effects variance Residual Variance

84 Applied Bayesian Inference, KSU, April 29, 2012 §  / Estimated effects ± se (sd) PROC MIXED EffecttrttimeEstimateStdErr Intercept__0.220.3638 trt1_2.360.5145 trt2_-0.220.5145 time_1-0.90.481 time_20.020.481 time_3-0.640.481 trt*time11-1.920.6802 trt*time12-1.220.6802 trt*time13-0.060.6802 trt*time210.320.6802 trt*time22-0.540.6802 trt*time230.580.6802 MCMC 84 VariableMeanMedianStd DevN intercept0.2170.2140.38820000 TRT12.3632.3680.5520000 TRT2-0.22-0.2190.5520000 TIME1-0.898-0.8930.49920000 TIME20.02060.02480.50220000 TIME3-0.64-0.6350.50120000 TRT1 TIME1 -1.924-1.9310.70820000 TRT1 TIME2 -1.222-1.220.7120000 TRT1 TIME3 -0.057 0.71520000 TRT2 TIME1 0.3180.3150.71120000 TRT2 TIME2 -0.54-0.5410.71120000 TRT2 TIME3 0.5850.5890.7120000

85 Applied Bayesian Inference, KSU, April 29, 2012 §  / Least Squares Means EffecttrttimeEstimateStandar d Error DF trt11.40000.213512 trt2-0.29000.213512 trt3-0.16000.213512 time1-0.50000.210036 time20.36670.210036 time30.46670.210036 time40.93330.210036 trt*time11-0.24000.363836 trt*time121.38000.363836 trt*time131.88000.363836 trt*time142.58000.363836 trt*time21-0.58000.363836 trt*time22-0.52000.363836 trt*time23-0.060000.363836 trt*time244.44E-160.363836 trt*time31-0.68000.363836 trt*time320.24000.363836 trt*time33-0.42000.363836 trt*time340.22000.363836 Marginal (“Least Squares”) Means VariableMeanMedianStd Dev A11.3991.4010.24 A2-0.292-0.290.237 A3-0.16-0.1610.236 B1-0.502-0.5010.224 B20.3640.3630.222 B30.4670.4660.224 B40.9340.9360.222 A1B1-0.244-0.2460.389 A1B21.3781.3790.391 A1B31.8821.880.391 A1B42.5812.5840.391 A2B1-0.586 0.393 A2B2-0.526-0.5250.385 A2B3-0.058-0.0540.387 A2B40.00310.00170.386 A3B1-0.676-0.6780.388 A3B20.2390.2410.386 A3B3-0.422-0.4270.392 A3B40.2190.2160.385 85 MCMC PROC MIXED  A1  B1  A1B1  A1  B1  A1B1

86 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior Densities of  A1,  B1,  A1B1 86 Dotted lines: t densities based on estimates/stderrs from PROC MIXED Closed lines: MCMC

87 Applied Bayesian Inference, KSU, April 29, 2012 §  / How about fully Bayesian inference in generalized linear mixed models? Probit link GLMM. – Extensions to handle unknown variance components are exactly the same given the augmented liability variables. i.e. scaled-inverted chi-square conjugate to  2 u. – No “overdispersion” (   e ) to contend with for binary data. But stay tuned for binomial/Poisson data! 87

88 Applied Bayesian Inference, KSU, April 29, 2012 §  / Analysis of “binarized” RCBD data. title 'Posterior inference conditional on unknown VC'; proc glimmix data=binarize; class litter diet; model y = diet / covb solution dist=bin link = probit; random litter; lsmeans diet / diff ilink; estimate 'diet 1 lsmean' intercept 1 diet 1 0 0 0 0 / ilink; estimate 'diet 2 lsmean' intercept 1 diet 0 1 0 0 0/ ilink; estimate 'diet1 vs diet2 dif' intercept 0 diet 1 -1 0 0 0; run; 10000 burnin cycles 200000 cycles therafter Saving every 10 Gelman’s prior on VC. 88 Empirical BayesFully Bayes

89 Applied Bayesian Inference, KSU, April 29, 2012 §  / Inferences on VC Method = RSPLMCMC 89 Covariance Parameter Estimates EstimateStandard Error 0.57830.5021 Covariance Parameter Estimates EstimateStandard Error 0.64880.6410 Method = Laplace Covariance Parameter Estimates EstimateStandard Error 0.66620.6573 Method = Quad Analysis Variable : sigmau MeanMedianStd DevN 2.0481.4682.12820000

90 Applied Bayesian Inference, KSU, April 29, 2012 §  / Inferences on marginal means (  +  i ) Method = Laplace 90 MCMC diet Least Squares Means dietEstimateStandard Error DF 1-0.30240.515936 21.09290.596436 3-0.64280.533536 41.09460.597636 50.35190.529436 VariableMeanMedianStd DevN mm1-0.297-0.3010.64320000 mm21.3221.2830.71620000 mm3-0.697-0.690.66220000 mm41.3191.2850.7220000 mm50.4650.4420.67120000 Larger: take into account uncertainty on variance components

91 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior Densities of (  +  i ) 91 Dotted lines: t 36 densities based estimates and standard errors from PROC GLIMMIX (method=laplace) Closed lines: MCMC

92 Applied Bayesian Inference, KSU, April 29, 2012 §  / MCMC inferences on probabilities of “success”: (based on  (  +  i ) 92

93 Applied Bayesian Inference, KSU, April 29, 2012 §  / MCMC inferences on marginal probabilities: (based on ) 93 Potentially big issues with empirical Bayes inference…dependent upon quality of VC inference & asymptotics!

94 Applied Bayesian Inference, KSU, April 29, 2012 §  / Inference on Diet 1 vs. Diet 2 probabilities 94 Estimates LabelMeanStandard Error Mean diet 1 lsmean 0.38120.1966 diet 2 lsmean 0.86280.1309 diet1 vs diet2 dif Non-est. VariableMeanMedianStd Dev N Prob diet1 0.40.3820.21220000 Prob diet2 0.8570.8990.13720000 Prob diff 0.4570.4640.20720000 P-value = 0.0559 prob21_diffFrequencyPercent prob21_diff < 01800.90 prob21_diff >= 01982099.10 Probability (  (  +  2 ) -  (  +  1 ) < 0) = 0.0090 (“one-tailed”) PROC GLIMMIX MCMC

95 Applied Bayesian Inference, KSU, April 29, 2012 §  / Any formal comparisons between GLS/REML/EB(M/PQL) and MCMC for GLMM? Check Browne and Draper (2006). Normal data (LMM) – Generally, inferences based on GLS/REML and MCMC are sufficiently close. – Since GLS/REML is faster, it is the method of choice for classical assumptions. Non-normal data (GLMM). – Quasi-likelihood based methods are particularly problematic in bias of point estimates and interval coverage of variance components. Side effects on fixed effects inference. – Bayesian methods with diffuse priors are well calibrated for both properties for all parameters. – Comparisons with Laplace not done yet. 95

96 Applied Bayesian Inference, KSU, April 29, 2012 §  / A pragmatic take on using MCMC vs PL for GLMM under classical assumptions? If datasets are too small to warrant asymptotic considerations, then the experiment is likely to be poorly powered. – Otherwise, PL might ≈ MCMC inference. However, differences could depend on dimensionality, deviation of data distribution from normal, and complexity of design. The real big advantage of MCMC ---is multi- stage hierarchical models (see later) 96

97 Applied Bayesian Inference, KSU, April 29, 2012 §  / Implications of design on Fully Bayes vs. PL inference for GLMM? RCBD: Known for LMM, that inferences on treatment differences in RCBD are resilient to estimates of block VC. – Inference on differences in treatment effects thereby insensitive to VC inferences in GLMM? Whole plot treatment factor comparisons in split plot designs? Greater sensitivity (i.e. whole plot VC). – Sensitivity of inference for conditional versus “population-averaged” probabilities? 97

98 Applied Bayesian Inference, KSU, April 29, 2012 §  / Ordinal Categorical Data Back to the GF83 data. – Gibbs sampling strategy laid out by Sorensen and Gianola (1995); Albert and Chib (1993). – Simple extensions to what was considered earlier for linear/probit mixed models 98

99 Applied Bayesian Inference, KSU, April 29, 2012 §  / Joint Posterior Density Stages 99 1A 1B 2 2 3 (or something diffuse)

100 Applied Bayesian Inference, KSU, April 29, 2012 §  / Anything different for FCD compared to probit binary? Liabilities Thresholds: – This leads to painfully slow mixing…a better strategy is based on Metropolis sampling (Cowles et al., 1996). 100

101 Applied Bayesian Inference, KSU, April 29, 2012 §  / Fully Bayesian inference on GF83 5000 burn-in samples 50000 samples post burn-in Saving every 10. 101 Diagnostic plots for  2 u

102 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior Summaries VariableMeanMedianStd Dev5th Pctl95th Pctl intercept-0.222-0.1980.669-1.2090.723 hy0.2360.2230.396-0.3990.894 age-0.036-0.0350.392-0.690.598 sex-0.172-0.1710.393-0.8180.48 sire1-0.082-0.0420.5870.734 sire20.1160.04910.572-0.6410.937 sire30.1940.1060.625-0.641.217 sire4-0.173-0.110.606-1.1180.595 sigmau1.3620.2028.6580.00214.148 thresh20.830.8040.3020.3831.366 probfemalecat10.5980.6090.1880.2650.885 probfemalecat20.8270.8640.1480.530.986 probmalecat10.5390.5450.1830.230.836 probmalecat20.790.8210.1540.4910.974 102

103 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior densities of sex-specific cumulative probabilities (first two categories) 103 How would interpret a “standard error” in this context?

104 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior densities of sex-specific probabilities (each category) 104

105 Applied Bayesian Inference, KSU, April 29, 2012 §  / What if some FCD are not recognizeable? Examples: Poisson mixed models, logistic mixed models. Hmmm.. Need a different strategy. – Use Gibbs sampling whenever you can. – Use Metropolis-Hastings sampling for FCD that are not recognizeable. NEXT! 105


Download ppt "Applied Bayesian Inference, KSU, April 29, 2012 §  / §❹ The Bayesian Revolution: Markov Chain Monte Carlo (MCMC) Robert J. Tempelman 1."

Similar presentations


Ads by Google