Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applied Bayesian Inference, KSU, April 29, 2012 §  / §❹ The Bayesian Revolution: Markov Chain Monte Carlo (MCMC) Robert J. Tempelman 1.

Similar presentations


Presentation on theme: "Applied Bayesian Inference, KSU, April 29, 2012 §  / §❹ The Bayesian Revolution: Markov Chain Monte Carlo (MCMC) Robert J. Tempelman 1."— Presentation transcript:

1 Applied Bayesian Inference, KSU, April 29, 2012 §  / §❹ The Bayesian Revolution: Markov Chain Monte Carlo (MCMC) Robert J. Tempelman 1

2 Applied Bayesian Inference, KSU, April 29, 2012 §  / Simulation-based inference Suppose you’re interested in the following integral/expectation: You can draw random samples x 1,x 2,…,x n from f(x). Then compute With Monte Carlo Standard Error: As n →  2 f(x): density g(x): function.

3 Applied Bayesian Inference, KSU, April 29, 2012 §  / Beauty of Monte Carlo methods You can determine the distribution of any function of the random variable(s). Distribution summaries include: – Means, – Medians, – Key Percentiles (2.5%, 97.5%) – Standard Deviations, – Etc. Generally more reliable than using “Delta method” especially for highly non-normal distributions. 3

4 Applied Bayesian Inference, KSU, April 29, 2012 §  / Using method of composition for sampling (Tanner, 1996). Involve two stages of sampling. Example: – Suppose Y i | i ~Poisson( i ) – In turn., i | ,  ~ Gamma( ,  ) – Then 4 negative binomial distribution with mean  /  and variance (  /  )(1+  -1 ).

5 Applied Bayesian Inference, KSU, April 29, 2012 §  / Using method of composition for sampling from negative binomial: data new; seed1 = 2; alpha = 2; beta = 0.25; do j = 1 to 10000; call rangam(seed1,alpha,x); lambda = x/beta; call ranpoi(seed1,lambda,y); output; end; run; proc means mean var; var y; run; 5 1.Draw i | ,  ~ Gamma( ,  ). 2.Draw Y i ~Poisson( i ) The MEANS Procedure VariableMeanVariance y E(y) =  /  /  Var(y) = (  /  )(1+  -1 ) = 8*(1+4)=40

6 Applied Bayesian Inference, KSU, April 29, 2012 §  / Another example? Student t. data new; seed1 = 29523; df=4; do j = 1 to ; call rangam(seed1,df/2,x); lambda = x/(df/2); t = rannor(seed1)/sqrt(lambda); output; end; run; proc means mean var p5 p95; var t; run; data new; t5 = tinv(.05,4); t95 = tinv(.95,4); run; proc print; run; 6 1.Draw i | ~ Gamma( ,  ). 2.Draw t i | i ~Normal(0,1/ i ) Then t ~ Student t VariableMeanVariance5th Pctl95th Pctl t Obst5t

7 Applied Bayesian Inference, KSU, April 29, 2012 §  / Expectation-Maximization (EM) Ok, I know that EM is NOT a simulation-based inference procedure. – However, it is based on data augmentation. Important progenitor of Markov Chain Monte Carlo (MCMC) methods – Recall the plant genetics example 7

8 Applied Bayesian Inference, KSU, April 29, 2012 §  / Data augmentation Augment “data” by splitting first cell into two cells with probabilities ½ and  /4 for 5 categories: Looks like a Beta Distribution to me! 8

9 Applied Bayesian Inference, KSU, April 29, 2012 §  / Data augmentation (cont’d) So joint distribution of “complete” data: Consider the part just including the “missing data” binomial 9

10 Applied Bayesian Inference, KSU, April 29, 2012 §  / Expectation-Maximization. Start with complete log-likelihood: 1. Expectation (E-step) 10

11 Applied Bayesian Inference, KSU, April 29, 2012 §  / 2. Maximization step – Use first or second derivative methods to maximize – Set to 0: 11

12 Applied Bayesian Inference, KSU, April 29, 2012 §  / Recall the data ProbabilityGenotypeData (Counts) Prob(A_B_)y 1 =1997 Prob(aaB_)y 2 =906 Prob(A_bb)y 3 =904 Prob(aabb)y 4 =32 0    1  → 0: close linkage in repulsion  → 1: close linkage in coupling 12

13 Applied Bayesian Inference, KSU, April 29, 2012 §  / PROC IML code: proc iml; y1 = 1997; y2 = 906; y3 = 904; y4 = 32; theta = 0.20; /*Starting value */ do iter = 1 to 20; Ex2 = y1*(theta)/(theta+2); /* E-step */ theta = (Ex2+y4)/(Ex2+y2+y3+y4);/* M-step */ print iter theta; end; run; itertheta Slower than Newton-Raphson/Fisher scoring…but generally more robust to poorer starting values.

14 Applied Bayesian Inference, KSU, April 29, 2012 §  / How derive an asymptotic standard error using EM? From Louis (1982): Given: 14

15 Applied Bayesian Inference, KSU, April 29, 2012 §  / Finish off Now Hence: 15

16 Applied Bayesian Inference, KSU, April 29, 2012 §  / Stochastic Data Augmentation (Tanner, 1996) Posterior Identity Predictive Identity Implies Transition function for Markov Chain 16 Suggests an “iterative” method of composition approach for sampling

17 Applied Bayesian Inference, KSU, April 29, 2012 §  / Sampling strategy from p(  |y) Start somewhere: (starting value  =   ) – Sample x [1] from – Sample   ] from – Sample x [2] from – Sample   ] ] from – etc. – It’s like sampling from “E-steps” and “M-steps” Cycle 1 Cycle 2 17

18 Applied Bayesian Inference, KSU, April 29, 2012 §  / What are these Full Conditional Densities (FCD) ? Recall “complete” likelihood function Assume prior on  is “flat” : FCD: Beta(  =(y 1 -x +y 4 +1),  =(y 2 +y 3 +1)) Binomial(n=y 1, p = 2/(  +2)) 18

19 Applied Bayesian Inference, KSU, April 29, 2012 §  / IML code for Chained Data Augmentation Example proc iml; seed1=4; ncycle = 10000; /* total number of samples */ theta = j(ncycle,1,0); y1 = 1997; y2 = 906; y3 = 904; y4 = 32; beta = y2+y3+1; theta[1] = ranuni(seed1); /* initial draw between 0 and 1 */ do cycle = 2 to ncycle; p = 2/(2+theta[cycle-1]); xvar= ranbin(seed1,y1,p); alpha = y1+y4-xvar+1; xalpha = rangam(seed1,alpha); xbeta = rangam(seed1,beta); theta[cycle] = xalpha/(xalpha+xbeta); end; create parmdata var {theta xvar }; append; run; data parmdata; set parmdata; cycle = _n_; run; 19 Starting value

20 Applied Bayesian Inference, KSU, April 29, 2012 §  / Trace Plot proc gplot data=parmdata; plot theta*cycle; run; Burn -in? “bad” starting value Should discard the first “few” samples to ensure that one is truly sampling from p(  |y) Starting value should have no impact. “Convergence in distribution”. How to decide on this stuff? Cowles and Carlin (1996) 20 Throw away the first 1000 samples as “burn-in”

21 Applied Bayesian Inference, KSU, April 29, 2012 §  / Histogram of samples post burn-in proc univariate data=parmdata ; where cycle > 1000; var theta ; histogram/normal(color=red mu= sigma=0.0060); run; Bayesian inference N9000 Posterior Mean Post. Std Deviation Quantiles for Normal Distribution PercentQuantile Observed (Bayesian) Asymptotic (Likelihood) Asymptotic Likelihood inference 21

22 Applied Bayesian Inference, KSU, April 29, 2012 §  / Zooming in on Trace Plot Hints of autocorrelation. Expected with Markov Chain Monte Carlo simulation schemes. Number of drawn samples is NOT equal number of independent draws. 22 The greater the autocorrelation…the greater the problem…need more samples!

23 Applied Bayesian Inference, KSU, April 29, 2012 §  / Sample autocorrelation Autocorrelation Check for White Noise To LagChi- Square DFPr > ChiSq Autocorrelations < proc arima data=parmdata plots(only)=series(acf); where cycle > 1000; identify var= theta nlag=1000 outcov=autocov ; run; 23

24 Applied Bayesian Inference, KSU, April 29, 2012 §  / How to estimate the effective number of independent samples (ESS) Consider posterior mean based on m samples: Initial positive sequence estimator (Geyer, 1992; Sorensen and Gianola, 1995): Lag-m autocovariance 24 Sum of adjacent lag autocovariances variance

25 Applied Bayesian Inference, KSU, April 29, 2012 §  / Initial positive sequence estimator Choose t such that all SAS PROC MCMC chooses a slightly different cutoff (see documentation). 25 Extensive autocorrelation across lags…..leads to smaller ESS

26 Applied Bayesian Inference, KSU, April 29, 2012 §  / SAS code %macro ESS1(data,variable,startcycle,maxlag); data _null_; set &data nobs=_n;; call symputx('nsample',_n); run; proc arima data=&data ; where iteration > &startcycle; identify var= &variable nlag=&maxlag outcov=autocov ; run; proc iml; use autocov; read all var{'COV'} into cov; nsample = &nsample; nlag2 = nrow(cov)/2; Gamma = j(nlag2,1,0); cutoff = 0; t = 0; do while (cutoff = 0); t = t+1; Gamma[t] = cov[2*(t-1)+1] + cov[2*(t-1)+2]; if Gamma[t] < 0 then cutoff = 1; if t = nlag2 then do; print "Too much autocorrelation"; print "Specify a larger max lag"; stop; end; varm = (-Cov[1] + 2*sum(Gamma)) / nsample; ESS = Cov[1]/varm; /* effective sample size */ stdm = sqrt(varm); parameter = "&variable"; /* Monte Carlo standard error */ print parameter stdm ESS; run; %mend ESS1; 26 Recall: 9000 MCMC post burnin cycles.

27 Applied Bayesian Inference, KSU, April 29, 2012 §  / Executing %ESS1 %ESS1(parmdata,theta,1000,1000); 27 Recall: 1000 MCMC burnin cycles. parameterstdmESS theta i.e. information equivalent to drawing 2967 independent draws from density.

28 Applied Bayesian Inference, KSU, April 29, 2012 §  / How large of an ESS should I target? Routinely…in the thousands or greater. Depends on what you want to estimate. – Recommend no less than 100 for estimating “typical” location parameters: mean, median, etc. – Several times that for “typical” dispersion parameters like variance. Want to provide key percentiles? – i.e., 2.5 th, 97.5 th percentiles? Need to have ESS in the thousands! – See Raftery and Lewis (1992) for further direction. 28

29 Applied Bayesian Inference, KSU, April 29, 2012 §  / Worthwhile to consider this sampling strategy? Not too much difference, if any, with likelihood inference. But how about smaller samples? – e.g., y 1 =200,y 2 =91,y 3 =90,y 4 =3 – Different story 29

30 Applied Bayesian Inference, KSU, April 29, 2012 §  / Gibbs sampling: origins (Geman and Geman, 1984). Gibbs sampling was first developed in statistical physics in relation to spatial inference problem – Problem: true image  was corrupted by a stochastic process to produce an observable image y (data) Objective: restore or estimate the true image  in the light of the observed image y. – Inference on  based on the Markov random field joint posterior distribution, through successively drawing from updated FCD which were rather easy to specify. – These FCD each happened to be the Gibbs distn’s. Misnomer has been used since to describe a rather general process. 30

31 Applied Bayesian Inference, KSU, April 29, 2012 §  / Gibbs sampling Extension of chained data augmentation for case of several unknown parameters. Consider p = 3 unknown parameters: Joint posterior density Gibbs sampling: MCMC sampling strategy where all FCD are recognizeable: 31

32 Applied Bayesian Inference, KSU, April 29, 2012 §  / Gibbs sampling: the process 1) Start with some “arbitrary” starting values (but within allowable parameter space) 2) Draw from 3) Draw from 4) Draw from 5) Repeat steps 2)-4) m times. Steps 2-4 constitute one cycle of Gibbs sampling m: length of Gibbs chain 32 One cycle = one random draw from

33 Applied Bayesian Inference, KSU, April 29, 2012 §  / General extension of Gibbs sampling When there are d parameters and/or blocks of parameters: Again specify starting values: Sample from the FCD’s in cycle i Sample  1 (k+1) from Sample  2 (k+1) from … Sample  d (k+1) from 33 Generically, sample  i from

34 Applied Bayesian Inference, KSU, April 29, 2012 §  / Throw away enough burn-in samples (k

35 Applied Bayesian Inference, KSU, April 29, 2012 §  / Mixed model example with known variance components, flat prior on . Recall: – where Write – i.e. 35 ALREADY KNOW JOINT POSTERIOR DENSITY!

36 Applied Bayesian Inference, KSU, April 29, 2012 §  / FCD for mixed effects model with known variance components Ok..really pointless to use MCMC here..but let’s demonstrate. But it be can shown FCD are: where 36 i th row i th column i th diagonal element

37 Applied Bayesian Inference, KSU, April 29, 2012 §  / Two ways to sample  and u 1. Block draw from – faster MCMC mixing (less/no autocorrelation across MCMC cycles) – But slower computing time (depending on dimension of  ). i.e. compute Cholesky of C Some alternative strategies available (Garcia-Cortes and Sorensen, 1995) 2. Series of univariate draws from – Faster computationally. – Slower MCMC mixing Partial solution: “thinning the MCMC chain” e.g., save every 10 cycles rather than every cycle 37

38 Applied Bayesian Inference, KSU, April 29, 2012 §  / Example: A split plot in time example (Data from Kuehl, 2000, pg.493) Experiment designed to explore mechanisms for early detection of phlebitis during amiodarone therapy. – Three intravenous treatments: (A1) Amiodarone (A2) the vehicle solution only (A3) a saline solution. – 5 rabbits/treatment in a completely randomized design. – 4 repeated measures/animal (30 min. intervals) 38

39 Applied Bayesian Inference, KSU, April 29, 2012 §  / SAS data step data ear; input trt rabbit time temp; y = temp; A = trt; B = time; trtrabbit = compress(trt||'_'||rabbit); wholeplot=trtrabbit; cards; etc. 39

40 Applied Bayesian Inference, KSU, April 29, 2012 §  / The data (“spaghetti plot”) 40

41 Applied Bayesian Inference, KSU, April 29, 2012 §  / Profile (Interaction) means plots 41

42 Applied Bayesian Inference, KSU, April 29, 2012 §  / A split plot model assumption for repeated measures Treatment 1 Rabbit 3 Rabbit 2 Rabbit 1 Time 1 Time 2 Time 3 Time 4 Time 1 Time 2 Time 3 Time 4 Time 1 Time 2 Time 3 Time 4 RABBIT IS THE EXPERIMENTAL UNIT FOR TREATMENT RABBIT IS THE BLOCK FOR TIME 42

43 Applied Bayesian Inference, KSU, April 29, 2012 §  / Suppose CS assumption was appropriate CONDITIONAL SPECIFICATION: Model variation between experimental units (i.e. rabbits) – This is a partially nested or split-plot design. i.e. for treatments, rabbits is the experimental unit;  for time, rabbits is the block! 43

44 Applied Bayesian Inference, KSU, April 29, 2012 §  / Analytical (non-simulation) Inference based on PROC MIXED Let’s assume “known” Flat priors on fixed effects p(  )  1. title 'Split Plot in Time using Mixed'; title2 'Known Variance Components'; proc mixed data=ear noprofile; class trt time rabbit; model temp = trt time trt*time /solution; random rabbit(trt); parms (0.1) (0.6) /hold = 1,2; ods output solutionf = solutionf; run; proc print data=solutionf; where estimate ne 0; run; 44

45 Applied Bayesian Inference, KSU, April 29, 2012 §  / (Partial) Output ObsEffecttrttimeEstimateStdErrDF 1Intercept__ trt1_ trt2_ time_ time_ time_ trt*time trt*time trt*time trt*time trt*time trt*time

46 Applied Bayesian Inference, KSU, April 29, 2012 §  / MCMC inference First set up dummy variables. /* Based on the zero out last level restrictions */ proc transreg data=ear design order =data; model class(trt|time / zero=last); id y trtrabbit; output out=recodedsplit; run; proc print data=recodedsplit (obs=10); var intercept &_trgind; run; 46 Corner parameterization implicit in SAS linear model s software

47 Applied Bayesian Inference, KSU, April 29, 2012 §  / Partial Output (First two rabbits) Obs_NA ME_ Inter cept trt1trt2time 1 time 2 time 3 Trt1 time 1 Trt1 time 2 Trt1 time 3 Trt2 time 1 Trt2 time 2 Trt2 time 3 trttimeytrtrab bit _ _ _ _ _ _ _ _ _ _3 Part of X matrix (full-rank) 47

48 Applied Bayesian Inference, KSU, April 29, 2012 §  / MCMC using PROC IML proc iml; seed = &seed; nburnin = 5000; /* number of burn in samples */ total = ;/* total number of Gibbs cycles beyond burnin */ thin= 10;/* saving every “thin" */ ncycle = total/skip;/* leaving a total of ncycle saved samples */ Full code available online 48

49 Applied Bayesian Inference, KSU, April 29, 2012 §  / Key subroutine (univariate sampling) start gibbs; /* univariate Gibbs sampler */ do j = 1 to dim; /* dim = p + q */ /* generate from full conditionals for fixed and random effects */ solt = wry[j] - coeff[j,]*solution + coeff[j,j]*solution[j]; solt = solt/coeff[j,j]; vt = 1/coeff[j,j]; solution[j] = solt + sqrt(vt)*rannor(seed); end; finish gibbs; 49

50 Applied Bayesian Inference, KSU, April 29, 2012 §  / Output samples to SAS data set called soldata proc means mean median std data=soldata; run; ods graphics on; %tadplot(data=soldata, var=_all_); ods graphics off; %tadplot is a SAS automacro suited for processing MCMC samples. 50

51 Applied Bayesian Inference, KSU, April 29, 2012 §  / Comparisons for fixed effects 51 MCMC (Some Monte Carlo error) EXACT (PROC MIXED) EffecttrttimeEstimateStdErr Intercept__ trt1_ trt2_ time_ time_ time_ trt*time trt*time trt*time trt*time trt*time trt*time VariableMeanMedianStd DevN int TRT TRT TIME TIME TIME TRT1 TIME TRT1 TIME TRT1 TIME TRT2 TIME TRT2 TIME TRT2 TIME

52 Applied Bayesian Inference, KSU, April 29, 2012 §  / %TADPLOT output on “intercept”. 52 Trace Plot Autocorrelation Plot Posterior Density

53 Applied Bayesian Inference, KSU, April 29, 2012 §  / Marginal/Cell Means Effects on previous 2-3 slides not of particular interest. Marginal means: – Can derive using contrast vectors that are used to compute least squares means in PROC GLM/MIXED/GLIMMIX etc. lsmeans trt time trt*time / e; –  Ai : marginal mean for trt i –  Bj : marginal mean for time j –  AiBj : cell mean for trt i time j. 53

54 Applied Bayesian Inference, KSU, April 29, 2012 §  / Examples of marginal/cell means Marginal means Cell mean 54

55 Applied Bayesian Inference, KSU, April 29, 2012 §  / Marginal/cell (“LS”) means. VariableMeanMedianStd Dev A A A B B B B A1B A1B A1B A1B A2B A2B A2B A2B A3B A3B A3B A3B trttimeEstimateStandard Error E MCMC (Monte Carlo error) EXACT (PROC MIXED)

56 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior densities of  a1,  b1,  a1b1. 56 Dotted lines: normal density inferences based on PROC MIXED Closed lines: MCMC

57 Applied Bayesian Inference, KSU, April 29, 2012 §  / Generalized linear mixed models (Probit Link Model) Stage 1: Stage 2: Stage 3: 57

58 Applied Bayesian Inference, KSU, April 29, 2012 §  / Rethinking prior on  i.e. – Might not be the best idea for binary data, especially when the data is “sparse” Animal breeders call this the “extreme category problem” – e.g., if all of responses in a fixed effects subclass is either 1 or 0, then ML/PM of corresponding marginal mean will approach -/+ ∞. – PROC LOGISTIC has the FIRTH option for this very reason. Alternative: – Typically, <  2  < 50 is probably sufficient on the underlying latent scale (conditionally N(0,1))

59 Applied Bayesian Inference, KSU, April 29, 2012 §  / Recall Latent Variable Concept (Albert and Chib, 1993) Recall Suppose for animal i Then 59

60 Applied Bayesian Inference, KSU, April 29, 2012 §  / Data augmentation with ={ i }, i.e. 60 distribution of Y becomes degenerate or point mass in form conditional on

61 Applied Bayesian Inference, KSU, April 29, 2012 §  / Rewrite hierarchical model Stage 1a) Stage 1b) Those two stages define likelihood function 61

62 Applied Bayesian Inference, KSU, April 29, 2012 §  / Joint Posterior Density Now Let’s for now assume known  2 u : 62

63 Applied Bayesian Inference, KSU, April 29, 2012 §  / FCD Liabilities: if y i = 1 if y i = 0 63 i.e., draw from truncated normals

64 Applied Bayesian Inference, KSU, April 29, 2012 §  / FCD (cont’d) Fixed and random effects 64 where

65 Applied Bayesian Inference, KSU, April 29, 2012 §  / Alternative Sampling strategies for fixed and random effects 1. Joint multivariate draw from – faster mixing…but computationally expensive? 2. Univariate draws from FCD using partitioned matrix results. – Refer to Slides # 36, 37, 49 – Slower mixing. 65

66 Applied Bayesian Inference, KSU, April 29, 2012 §  / Recall “binarized” RCBD 66

67 Applied Bayesian Inference, KSU, April 29, 2012 §  / MCMC analysis 5000 burn-in cycles 500,000 additional cycles – Saving every 10: 50,000 saved cycles Full conditional univariate sampling on fixed and random effects. “Known”  2 u = Remember…no  2 e. 67

68 Applied Bayesian Inference, KSU, April 29, 2012 §  / Fixed Effect Comparison on inferences (conditional on “known”  2 u = 0.50) MCMC PROC GLIMMIX 68 Solutions for Fixed Effects EffectdietEstimateStandard Error Intercept diet diet diet diet diet50. VariableMeanMedianStd DevN intercept DIET DIET DIET DIET

69 Applied Bayesian Inference, KSU, April 29, 2012 §  / Marginal Mean Comparisons Based on K’  69 diet Least Squares Means dietEstimateStandard Error MCMC PROC GLIMMIX VariableMeanMedianStd DevN mm mm mm mm mm

70 Applied Bayesian Inference, KSU, April 29, 2012 §  / Diet 1 Marginal Mean (  +  1 ) 70

71 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior Density discrepancy between MCMC and Empirical Bayes for  i ? 71 Diet Marginal Means Dotted lines: normal approximation based on PROC GLIMMIX Closed lines: MCMC Do we run the risk of overstating precision with conventional methods?

72 Applied Bayesian Inference, KSU, April 29, 2012 §  / How about probabilities of success? i.e.,  (K’  ) or normal cdf of marginal means 72 dietEstimateStandard Error MeanStandard Error Mean VariableMeanMedianStd DevN prob prob prob prob prob MCMC PROC GLIMMIX DELTA METHOD

73 Applied Bayesian Inference, KSU, April 29, 2012 §  / Comparison of Posterior Densities for Diet Marginal Mean Probabilities 73 Dotted lines: normal approximation based on PROC GLIMMIX Closed lines: MCMC Largest discrepancies along the boundaries

74 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior density of  (  +  1 ) &  (  +  2 ) 74 (+2)(+2) (+1)(+1)

75 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior density of  (  +  2 ) -  (  +  1 ) 75 Probability (  (  +  2 ) -  (  +  1 ) < 0) = “Two-tailed” P- value = 2* = prob21_diffFrequencyPercent prob21_diff < prob21_diff >=

76 Applied Bayesian Inference, KSU, April 29, 2012 §  / How does that compare with PROC GLIMMIX? Estimates LabelEstimateStandard Error DFt ValuePr > |t|MeanStandard Error Mean diet 1 lsmean diet 2 lsmean diet1 vs diet2 dif Non-est. 76 Recall, we assumed “known”  2 u …hence normal rather than t-distributed test statistic.

77 Applied Bayesian Inference, KSU, April 29, 2012 §  / What if variance components are not known? Specify priors on variance components: Options? – 1. Conjugate (Scaled Inverted Chi-Square) denoted as  -2 ( m, m s m 2 )) – 2. Flat (and bounded as well?) – 3. Gelman’s (2006) prior 77

78 Applied Bayesian Inference, KSU, April 29, 2012 §  / Relationship between Scaled Inverted Chi-Square & Inverted Gamma Scaled Inverted Chi- square: Inverted Gamma 78 Gelman’s prior

79 Applied Bayesian Inference, KSU, April 29, 2012 §  / Gibbs sampling and mixed effects models Recall the following hierarchical model: 79

80 Applied Bayesian Inference, KSU, April 29, 2012 §  / Joint Posterior Density and FCD FCD for  and u: same as before: normal FCD for VC:  -2 80

81 Applied Bayesian Inference, KSU, April 29, 2012 §  / Back to Split Plot in Time Example Empirical Bayes (EGLS based on REML) title 'Split Plot in Time using Mixed'; title2 'UnKnown Variance Components'; proc mixed data=ear covtest ; class trt time rabbit; model temp = trt time trt*time /solution; random rabbit(trt); ods output solutionf = solutionf; run; proc print data=solutionf; where estimate ne 0; run; Fully Bayes: 5000 burnin-cycles subsequent cycles Save every 10 post burn-in Use Gelman’s prior on VC 81 Code available online

82 Applied Bayesian Inference, KSU, April 29, 2012 §  / Variance component inference Covariance Parameter Estimates Cov ParmEstimateStandard Error Z ValuePr > Z rabbit(trt) Residual < MCMC PROC MIXED VariableMeanMedianStd DevN sigmau sigmae

83 Applied Bayesian Inference, KSU, April 29, 2012 §  / MCMC plots 83 Random effects variance Residual Variance

84 Applied Bayesian Inference, KSU, April 29, 2012 §  / Estimated effects ± se (sd) PROC MIXED EffecttrttimeEstimateStdErr Intercept__ trt1_ trt2_ time_ time_ time_ trt*time trt*time trt*time trt*time trt*time trt*time MCMC 84 VariableMeanMedianStd DevN intercept TRT TRT TIME TIME TIME TRT1 TIME TRT1 TIME TRT1 TIME TRT2 TIME TRT2 TIME TRT2 TIME

85 Applied Bayesian Inference, KSU, April 29, 2012 §  / Least Squares Means EffecttrttimeEstimateStandar d Error DF trt trt trt time time time time trt*time trt*time trt*time trt*time trt*time trt*time trt*time trt*time244.44E trt*time trt*time trt*time trt*time Marginal (“Least Squares”) Means VariableMeanMedianStd Dev A A A B B B B A1B A1B A1B A1B A2B A2B A2B A2B A3B A3B A3B A3B MCMC PROC MIXED  A1  B1  A1B1  A1  B1  A1B1

86 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior Densities of  A1,  B1,  A1B1 86 Dotted lines: t densities based on estimates/stderrs from PROC MIXED Closed lines: MCMC

87 Applied Bayesian Inference, KSU, April 29, 2012 §  / How about fully Bayesian inference in generalized linear mixed models? Probit link GLMM. – Extensions to handle unknown variance components are exactly the same given the augmented liability variables. i.e. scaled-inverted chi-square conjugate to  2 u. – No “overdispersion” (   e ) to contend with for binary data. But stay tuned for binomial/Poisson data! 87

88 Applied Bayesian Inference, KSU, April 29, 2012 §  / Analysis of “binarized” RCBD data. title 'Posterior inference conditional on unknown VC'; proc glimmix data=binarize; class litter diet; model y = diet / covb solution dist=bin link = probit; random litter; lsmeans diet / diff ilink; estimate 'diet 1 lsmean' intercept 1 diet / ilink; estimate 'diet 2 lsmean' intercept 1 diet / ilink; estimate 'diet1 vs diet2 dif' intercept 0 diet ; run; burnin cycles cycles therafter Saving every 10 Gelman’s prior on VC. 88 Empirical BayesFully Bayes

89 Applied Bayesian Inference, KSU, April 29, 2012 §  / Inferences on VC Method = RSPLMCMC 89 Covariance Parameter Estimates EstimateStandard Error Covariance Parameter Estimates EstimateStandard Error Method = Laplace Covariance Parameter Estimates EstimateStandard Error Method = Quad Analysis Variable : sigmau MeanMedianStd DevN

90 Applied Bayesian Inference, KSU, April 29, 2012 §  / Inferences on marginal means (  +  i ) Method = Laplace 90 MCMC diet Least Squares Means dietEstimateStandard Error DF VariableMeanMedianStd DevN mm mm mm mm mm Larger: take into account uncertainty on variance components

91 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior Densities of (  +  i ) 91 Dotted lines: t 36 densities based estimates and standard errors from PROC GLIMMIX (method=laplace) Closed lines: MCMC

92 Applied Bayesian Inference, KSU, April 29, 2012 §  / MCMC inferences on probabilities of “success”: (based on  (  +  i ) 92

93 Applied Bayesian Inference, KSU, April 29, 2012 §  / MCMC inferences on marginal probabilities: (based on ) 93 Potentially big issues with empirical Bayes inference…dependent upon quality of VC inference & asymptotics!

94 Applied Bayesian Inference, KSU, April 29, 2012 §  / Inference on Diet 1 vs. Diet 2 probabilities 94 Estimates LabelMeanStandard Error Mean diet 1 lsmean diet 2 lsmean diet1 vs diet2 dif Non-est. VariableMeanMedianStd Dev N Prob diet Prob diet Prob diff P-value = prob21_diffFrequencyPercent prob21_diff < prob21_diff >= Probability (  (  +  2 ) -  (  +  1 ) < 0) = (“one-tailed”) PROC GLIMMIX MCMC

95 Applied Bayesian Inference, KSU, April 29, 2012 §  / Any formal comparisons between GLS/REML/EB(M/PQL) and MCMC for GLMM? Check Browne and Draper (2006). Normal data (LMM) – Generally, inferences based on GLS/REML and MCMC are sufficiently close. – Since GLS/REML is faster, it is the method of choice for classical assumptions. Non-normal data (GLMM). – Quasi-likelihood based methods are particularly problematic in bias of point estimates and interval coverage of variance components. Side effects on fixed effects inference. – Bayesian methods with diffuse priors are well calibrated for both properties for all parameters. – Comparisons with Laplace not done yet. 95

96 Applied Bayesian Inference, KSU, April 29, 2012 §  / A pragmatic take on using MCMC vs PL for GLMM under classical assumptions? If datasets are too small to warrant asymptotic considerations, then the experiment is likely to be poorly powered. – Otherwise, PL might ≈ MCMC inference. However, differences could depend on dimensionality, deviation of data distribution from normal, and complexity of design. The real big advantage of MCMC ---is multi- stage hierarchical models (see later) 96

97 Applied Bayesian Inference, KSU, April 29, 2012 §  / Implications of design on Fully Bayes vs. PL inference for GLMM? RCBD: Known for LMM, that inferences on treatment differences in RCBD are resilient to estimates of block VC. – Inference on differences in treatment effects thereby insensitive to VC inferences in GLMM? Whole plot treatment factor comparisons in split plot designs? Greater sensitivity (i.e. whole plot VC). – Sensitivity of inference for conditional versus “population-averaged” probabilities? 97

98 Applied Bayesian Inference, KSU, April 29, 2012 §  / Ordinal Categorical Data Back to the GF83 data. – Gibbs sampling strategy laid out by Sorensen and Gianola (1995); Albert and Chib (1993). – Simple extensions to what was considered earlier for linear/probit mixed models 98

99 Applied Bayesian Inference, KSU, April 29, 2012 §  / Joint Posterior Density Stages 99 1A 1B (or something diffuse)

100 Applied Bayesian Inference, KSU, April 29, 2012 §  / Anything different for FCD compared to probit binary? Liabilities Thresholds: – This leads to painfully slow mixing…a better strategy is based on Metropolis sampling (Cowles et al., 1996). 100

101 Applied Bayesian Inference, KSU, April 29, 2012 §  / Fully Bayesian inference on GF burn-in samples samples post burn-in Saving every Diagnostic plots for  2 u

102 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior Summaries VariableMeanMedianStd Dev5th Pctl95th Pctl intercept hy age sex sire sire sire sire sigmau thresh probfemalecat probfemalecat probmalecat probmalecat

103 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior densities of sex-specific cumulative probabilities (first two categories) 103 How would interpret a “standard error” in this context?

104 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior densities of sex-specific probabilities (each category) 104

105 Applied Bayesian Inference, KSU, April 29, 2012 §  / What if some FCD are not recognizeable? Examples: Poisson mixed models, logistic mixed models. Hmmm.. Need a different strategy. – Use Gibbs sampling whenever you can. – Use Metropolis-Hastings sampling for FCD that are not recognizeable. NEXT! 105


Download ppt "Applied Bayesian Inference, KSU, April 29, 2012 §  / §❹ The Bayesian Revolution: Markov Chain Monte Carlo (MCMC) Robert J. Tempelman 1."

Similar presentations


Ads by Google