Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1.

Similar presentations


Presentation on theme: "Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1."— Presentation transcript:

1 Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1

2 Applied Bayesian Inference, KSU, April 29, 2012 §  / Origins of hierarchical modeling A homework question from Mood (1950, p. 164, exercise 23) recounted by Searle et al. (1992) "Suppose intelligence quotients for students in a particular age group are normally distributed about a mean of 100 with standard deviation 15. The IQ, say, Y 1, of a particular student is to be estimated by a test on which he scores 130. It is further given that test scores are normally distributed about the true IQ as a mean with standard deviation 5. What is the maximum likelihood estimate of the student's IQ? (The answer is not 130)" 2

3 Applied Bayesian Inference, KSU, April 29, 2012 §  / Answer provided by one student (C.R. Henderson) The model: The model: This is not really ML but it does maximize the posterior density of (  +  i )|y j 3 “shrunk”

4 Applied Bayesian Inference, KSU, April 29, 2012 §  / Later versions of Mood’s textbooks (1963, 1974) were revised: “What is the maximum likelihood estimate?” replaced by “What is the Bayes estimator?” Homework was the inspiration of C.R.Henderson’s work on best linear unbiased prediction (BLUP) but also subsequently referred to as empirical Bayes prediction for linear models 4

5 Applied Bayesian Inference, KSU, April 29, 2012 §  / What is empirical Bayes ??? 5 An excellent primer: Casella, G. (1985). An introduction to empirical Bayes analysis. The American Statistician 39(2): 83-87

6 Applied Bayesian Inference, KSU, April 29, 2012 §  / Casella’s problem specified hierarchically Suppose we observe t normal random variables,, each random draws from normal distributions with different means  i, Suppose it is known (believed) that i.e. “random effects model” 6  2 : hyperparameters

7 Applied Bayesian Inference, KSU, April 29, 2012 §  / “ML” solution: Bayes estimator of  i : 7

8 Applied Bayesian Inference, KSU, April 29, 2012 §  / What is empirical Bayes? Empirical Bayes = Bayes with replaced by estimates. Does it work? 8

9 Applied Bayesian Inference, KSU, April 29, 2012 §  / From Casella (1985) Observed data based on first 45 at-bats for 7 NY Yankees in 1971. “known” batting average MONEYBALL! 9

10 Applied Bayesian Inference, KSU, April 29, 2012 §  / From Casella (1985) “Stein effect” estimates can be improved by using information from all coordinates when estimating each coordinate (Stein, 1981) Stein ≡ shrinkage based estimators MLEB Batting averages 0.200 0.300 10

11 Applied Bayesian Inference, KSU, April 29, 2012 §  / When might Bayes/Stein-type estimation be particularly useful? When number of classes (t) are large When number of observations (n) per class is small When ratio of  2 to  2 is small “Shrinkage is a good thing” Allison et al. (2006) 11

12 Applied Bayesian Inference, KSU, April 29, 2012 §  / Microarray experiments. A wonderful application of the power of empirical Bayes methods. Microarray analysis in a nutshell – …conducting t-tests on differential gene expression between two (or more) groups for thousands of different genes. – multiple comparison issues obvious (inspired research on FDR control). 12

13 Applied Bayesian Inference, KSU, April 29, 2012 §  / Can we do better than t-tests? “By sharing the variance estimate across multiple genes, can form a better estimate for the true residual variance of a given gene, and effectively boost the residual degrees of freedom” Wright and Simon (2003) 13

14 Applied Bayesian Inference, KSU, April 29, 2012 §  / Hierarchical model formulation Data stage: Second stage 14

15 Applied Bayesian Inference, KSU, April 29, 2012 §  / Empirical Bayes (EB) estimation The Bayes estimate of  2 g Empirical Bayes = Bayes with estimates of a and b: – Marginal ML estimation of a and b advocated by Wright and Simon (2003) – Method of moments might be good if G is large. Modify t-test statistic accordingly, Including posterior degrees of freedom: 15

16 Applied Bayesian Inference, KSU, April 29, 2012 §  / Observed Type I error rates (from Wright and Simon, 2003) Pooled: As if the residual variance was the same for all genes 16

17 Applied Bayesian Inference, KSU, April 29, 2012 §  / Power for P < 0.001 and n = 5 (Wright and Simon, 2003) 17

18 Applied Bayesian Inference, KSU, April 29, 2012 §  / 18 Less of a need for shrinkage with larger n. Power for P < 0.001 and n = 10 (Wright and Simon, 2003)

19 Applied Bayesian Inference, KSU, April 29, 2012 §  / “Shrinkage is a good thing” (David Allison, UAB) (Microarray data from MSU) Volcano Plots -Log 10 (P-value) vs. estimated trt effect for a simple design (no subsampling) -1 -0.5 0.0 0.5 1.0 Regular contrast t-test Shrinkage (on variance) based contrast t test 19

20 Applied Bayesian Inference, KSU, April 29, 2012 §  / Bayesian inference in the linear mixed model (Lindley and Smith, 1972; Sorensen and Gianola, 2002) First stage: Y ~ N(X  +Zu, R) (Let R = I  2 e ) Second Stage Priors: – Subjective: – Structural Third stage priors (Let G = A  2 u ; A is known) 20

21 Applied Bayesian Inference, KSU, April 29, 2012 §  / Starting point for Bayesian inference Write joint posterior density: Want to make fully Bayesian probability statements on  and u? – Integrate out uncertainty on all other unknowns. 21

22 Applied Bayesian Inference, KSU, April 29, 2012 §  / Let’s suppose (for now) that variance components are known i.e. no third stage prior necessary…conditional inference Rewrite: – Likelihood – Prior 22

23 Applied Bayesian Inference, KSU, April 29, 2012 §  / Bayesian inference with known VC where In other words: 23

24 Applied Bayesian Inference, KSU, April 29, 2012 §  / Flat prior on . Note that as diagonal elements of V   , diagonal elements of V  -1  0, Hence where Henderson’s MME (Robinson, 1991) : 24

25 Applied Bayesian Inference, KSU, April 29, 2012 §  / Inference Write: Posterior density of K’  + M’u: 25 (M=0)(M=0)

26 Applied Bayesian Inference, KSU, April 29, 2012 §  / RCBD example with random blocks Weight gains of pigs in feeding trial (Gill, 1978). Block on litters 26

27 Applied Bayesian Inference, KSU, April 29, 2012 §  / data rcbd; input litter diet1-diet5; datalines; 1 79.5 80.9 79.1 88.6 95.9 2 70.9 81.8 70.9 88.6 85.9 3 76.8 86.4 90.5 89.1 83.2 4 75.9 75.5 62.7 91.4 87.7 5 77.3 77.3 69.5 75.0 74.5 6 66.4 73.2 86.4 79.5 72.7 7 59.1 77.7 72.7 85.0 90.9 8 64.1 72.3 73.6 75.9 60.0 9 74.5 81.4 64.5 75.5 83.6 10 67.3 82.3 65.9 70.5 63.2 ; data rcbd_2 (drop=diet1-diet5); set rcbd; diet = 1; gain=diet1; output; diet = 2; gain=diet2; output; diet = 3; gain=diet3; output; diet = 4; gain=diet4; output; diet = 5; gain=diet5; output; run; 27

28 Applied Bayesian Inference, KSU, April 29, 2012 §  / RCBD model Linear Model: – Fixed diet effects  j – Random litter effects u i Prior on random effects: 28

29 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior inference on  and u conditional on known VC. title 'Posterior inference conditional on known VC'; proc mixed data=rcbd_2; class litter diet; model gain = diet / covb solution; random litter; parms (20) (50) /hold = 1,2; lsmeans diet / diff ; estimate 'diet 1 lsmean' intercept 1 diet 1 0 0 0 0 /df=10000; estimate 'diet 2 lsmean' intercept 1 diet 0 1 0 0 0/df=10000; estimate 'diet1 vs diet2 dif' intercept 0 diet 1 -1 0 0 0/df=10000; run; 29 “Known” variance so tests based on normal (arbitrarily large df) rather than Student t.

30 Applied Bayesian Inference, KSU, April 29, 2012 §  / Portions of output b Solution for Fixed Effects EffectdietEstimate Intercept79.76 diet1-8.58 diet2-0.88 diet3-6.18 diet42.15 diet50 Covariance Matrix for Fixed Effects RowEffe ct di et Col 1 Col 2 Col 3 Col 4 Col 5 Col6 1Int7-5. -5 2diet1-510.5. 3diet2-5.5.10.5. 4diet3-5.5. 10.5. 5diet4-5.5. 10. 6diet5 30

31 Applied Bayesian Inference, KSU, April 29, 2012 §  / Posterior densities of marginal means contrasts LabelEstimateStandard Error DFt ValuePr > |t| diet 1 lsmean 71.18002.64581E426.90<.0001 diet 2 lsmean 78.88002.64581E429.81<.0001 diet1 vs diet2 dif -7.70003.16231E4-2.430.0149 31

32 Applied Bayesian Inference, KSU, April 29, 2012 §  / Two stage generalized linear models Consider again probit model for binary data. – Likelihood function – Priors – Third Stage Prior (if  2 u was not known). 32

33 Applied Bayesian Inference, KSU, April 29, 2012 §  / Joint Posterior Density For all parameters Let’s condition on  2 u being known (for now) 33 Stage 1Stage 2 Stage 3 2 stage model 3 stage model

34 Applied Bayesian Inference, KSU, April 29, 2012 §  / Log Joint Posterior Density Let’s write: Log joint posterior = log likelihood + log prior L = L 1 + L 2 34

35 Applied Bayesian Inference, KSU, April 29, 2012 §  / Maximize joint posterior density w.r.t.  = [  u] i.e. compute joint posterior mode of  = [  u] – Analogous to pseudo-likelihood (PL) inference in PROC GLIMMIX (also penalized likelihood) Fisher scoring/Newton Raphson: Refer to § ❶ for details on v and W. 35

36 Applied Bayesian Inference, KSU, April 29, 2012 §  / PL (or approximate EB) Then 36

37 Applied Bayesian Inference, KSU, April 29, 2012 §  / How software typically sets this up: That is, can be written as: with being “pseudo-variates” 37

38 Applied Bayesian Inference, KSU, April 29, 2012 §  / Application Go back to same RCBD….suppose we binarize the data data binarize; set rcbd_2; y = (gain>75); run; 38

39 Applied Bayesian Inference, KSU, April 29, 2012 §  / RCBD example with random blocks Weight gains of pigs in feeding trial. Block on litters 39

40 Applied Bayesian Inference, KSU, April 29, 2012 §  / PL inference using GLIMMIX code (Known VC) title 'Posterior inference conditional on known VC'; proc glimmix data=binarize; class litter diet; model y = diet / covb solution dist=bin link=probit; random litter; parms (0.5) /hold = 1; lsmeans diet / diff ilink; estimate 'diet 1 lsmean' intercept 1 diet 1 0 0 0 0 /df=10000 ilink; estimate 'diet 2 lsmean‘ intercept 1 diet 0 1 0 0 0/df=10000 ilink; estimate 'diet1 vs diet2 dif' intercept 0 diet 1 -1 0 0 0/df=10000; run; 40 : Estimate on underlying normal scale : Estimated probability of success

41 Applied Bayesian Inference, KSU, April 29, 2012 §  / 41 Solutions for Fixed Effects EffectdietEstimateStandard Error Intercept0.30970.4772 diet1-0.59350.5960 diet20.67610.6408 diet3-0.90190.6104 diet40.67750.6410 diet50.

42 Applied Bayesian Inference, KSU, April 29, 2012 §  / 42 Covariance Matrix for Fixed Effects EffectdietRowCol1Col2Col3Col4Col5Col6 Interce pt 10.2277-0.1778-0.1766-0.1782-0.1766 diet12-0.17780.35520.17600.17870.1760 diet23-0.17660.17600.41070.17550.1784 diet34-0.17820.17870.17550.37250.1755 diet45-0.17660.17600.17840.17550.4109 diet56

43 Applied Bayesian Inference, KSU, April 29, 2012 §  / Delta method: How well does this generally work? 43 Estimates LabelEstimateStandard Error DFt ValuePr > |t|MeanStandard Error Mean diet 1 lsmean -0.28380.476810000-0.600.55170.38830.1827 diet 2 lsmean 0.98580.5341100001.850.06500.83790.1311 diet1 vs diet2 dif -1.26970.643310000-1.970.0484Non-est.  : standard normal cdf

44 Applied Bayesian Inference, KSU, April 29, 2012 §  / What if variance components are not known? Given Two stage: Three stage: Fully inference on  1 ? Known VC Unknown VC NOT POSSIBLE PRE-1990s 44

45 Applied Bayesian Inference, KSU, April 29, 2012 §  / Approximate empirical Bayes (EB) option Goal: Approximate Note: i.e., can be viewed as “weighted” average of conditional densities, the weight function being. 45

46 Applied Bayesian Inference, KSU, April 29, 2012 §  / Bayesian justification of REML With flat priors on, maximizing with respect to is REML ( ) of (Harville, 1974). If is (nearly) symmetric, then is a reasonable approximation with perhaps one important exception: is what PROC MIXED(GLIMMIX) essentially does by default (REML/RSPL/RMPL)! 46

47 Applied Bayesian Inference, KSU, April 29, 2012 §  / Back to Linear Mixed Model Suppose three stage density with flat priors on ,  2 u and  2 e. Then is the posterior marginal density of interest. – Maximize w.r.t  2 u and  2 e to get REML estimates Empirical Bayes strategy: Plug in REML variance component estimates to approximate: 47

48 Applied Bayesian Inference, KSU, April 29, 2012 §  / What is ML estimation of VC from a Bayesian perspective? Determine the “marginal likelihood” Maximize this with respect to ,  2 u and  2 e to get ML of ,  2 u and  2 e – ….assuming flat priors on all three. This is essentially what PROC MIXED(GLIMMIX) does with ML (MSPL/MMPL)! 48

49 Applied Bayesian Inference, KSU, April 29, 2012 §  / Approximate empirical Bayes inference Given where then 49

50 Applied Bayesian Inference, KSU, April 29, 2012 §  / Approximate “REML” (PQL) Analysis for GLMM Then is not known analytically but must be approximated – e.g. residual pseudo-likelihood method (RSPL/MSPL) in SAS PROC GLIMMIX First approximations proposed by Stiratelli et al. (1984) and Harville and Mee (1984) 50

51 Applied Bayesian Inference, KSU, April 29, 2012 §  / Other methods for estimating VC in PROC GLIMMIX Based on maximizing marginal likelihood: Method = QUAD: – Adaptive quadrature: exact but useful only for simple models. Method = LAPLACE: – Generally better approximation to than MMPL/MSPL and computationally more efficient than QUAD 51 More ML-like rather than REML like!

52 Applied Bayesian Inference, KSU, April 29, 2012 §  / Could we have considered a “residual/restricted” Laplace instead? i.e. maximize an approximation of with respect to the variance components. i.e., maximize: 52 Tempelman and Gianola (1993; 1996); Tempelman (1998); Wolfinger (1993) Premise: “REML” is generally less biased than “ML”

53 Applied Bayesian Inference, KSU, April 29, 2012 §  / Ordinal Categorical Data Recall threshold concept Let’s extend it to mixed model: 53

54 Applied Bayesian Inference, KSU, April 29, 2012 §  / Joint posterior density Likelihood: Priors 54

55 Applied Bayesian Inference, KSU, April 29, 2012 §  / Inference given “known”  2 u Then Use Fisher’s scoring to estimate 55

56 Applied Bayesian Inference, KSU, April 29, 2012 §  / Joint Posterior Mode Let First derivatives: 56 P ik See §❶ for details on p and v

57 Applied Bayesian Inference, KSU, April 29, 2012 §  / Second derivatives Now 57 See §❶ for details on T, L, and W.

58 Applied Bayesian Inference, KSU, April 29, 2012 §  / Fisher’s scoring So At convergence: 58 Full details in GF(1983)

59 Applied Bayesian Inference, KSU, April 29, 2012 §  / Recall GF83 Data H A G S Y H A G S Y H A G S Y 1 2 M 1 1 1 2 F 1 1 1 3 M 1 1 1 2 F 2 2 1 3 M 2 1 1 3 M 2 3 1 3 F 2 1 1 3 F 2 1 1 3 F 2 1 1 2 M 3 1 1 2 M 3 2 1 3 F 3 2 1 3 M 3 1 2 2 F 1 1 2 2 F 1 1 2 2 M 1 1 2 3 M 1 3 2 2 F 2 1 2 2 F 2 3 2 3 M 2 1 2 2 F 3 2 2 3 M 3 3 2 2 M 4 2 2 2 F 4 1 2 3 F 4 1 2 3 F 4 1 2 3 M 4 1 2 3 M 4 1 H: Herd (1 or 2) A: Age of Dam (2 = Young heifer, 3 = Older cow) G: Gender or sex (M and F) S: Sire of calf (1, 2, 3, or 4) Y: Ordinal Response (1,2, or 3) 59

60 Applied Bayesian Inference, KSU, April 29, 2012 §  / SAS data step: data gf83; input herdyear dam_age calfsex $ sire y @@; if herdyear = 2 then hy = 1; /* create dummy variables */ else hy = 0; if dam_age = 3 then age = 1; else age = 0; if calfsex = 'F' then sex = 1; else sex = 0; datalines; 1 2 M 1 1 1 2 F 1 1 etc. 60

61 Applied Bayesian Inference, KSU, April 29, 2012 §  / Reproducing analyses in GF83 (based on created dummy variables) ods select parameterestimates Estimates; proc glimmix data=gf83 ; model y = hy age sex/dist = mult link = cumprobit solution ; random sire /solution ; parms ( 0.05263158) /noiter; estimate 'fem marginal mean cat. 1' intercept 1 0 hy 0.5 age 0.5 sex 1 /ilink; estimate 'fem marginal mean cat. 2‘ intercept 0 1 hy 0.5 age 0.5 sex 1 /ilink; estimate 'male marginal mean cat. 1' intercept 1 0 hy 0.5 age 0.5 sex 0 /ilink; estimate 'male marginal mean cat. 2‘ intercept 0 1 hy 0.5 age 0.5 sex 0 /ilink; run; 61  2 u = 1/19 (as chosen by GF83)

62 Applied Bayesian Inference, KSU, April 29, 2012 §  / 62 Solutions for Fixed Effects EffectyEstimateStandard Error DFt ValuePr > |t| Intercept10.37550.558030.670.5492 Intercept21.01150.578931.750.1789 hy-0.29750.495020-0.600.5546 age0.12690.4987200.250.8017 sex0.39060.4967200.790.4409 Estimates LabelEstimateStandard ErrorDFt ValuePr > |t|MeanStandard Error Mean female marginal mean cat. 1 0.68080.3829201.780.09060.75200.1212 female marginal mean cat 2 1.31680.4249203.100.00570.90600.07123 male marginal mean cate1 0.29020.3607200.800.43050.61420.1380 male marginal mean cat 2 0.92620.3902202.370.02770.82280.1014 REPRODUCED IN GIANOLA AND FOULLEY (1983)

63 Applied Bayesian Inference, KSU, April 29, 2012 §  / Reproducing analyses in GF83 (alternative using less than full rank classification model) ods select parameterestimates estimates; proc glimmix data=gf83 ; class sire herdyear dam_age calfsex; model y = herdyear dam_age calfsex/dist = mult link = cumprobit solution ; random sire /solution ; parms ( 0.05263158) /noiter; estimate 'female marginal mean cat. 1' intercept 1 0 herdyear 0.5 0.5 dam_age 0.5 0.5 calfsex 1 /ilink; estimate 'female marginal mean cat. 2' intercept 0 1 herdyear 0.5 0.5 dam_age 0.5 0.5 calfsex 1 /ilink; estimate 'male marginal mean cat. 1' intercept 1 0 herdyear 0.5 0.5 dam_age 0.5 0.5 calfsex 0 1 /ilink; estimate 'male marginal mean cat. 2' intercept 0 1 herdyear 0.5 0.5 dam_age 0.5 0.5 calfsex 0 1 /ilink; run; 63

64 Applied Bayesian Inference, KSU, April 29, 2012 §  / Solutions for Fixed Effects Effectycalfsexherdyeardam_ageEstimateStandard Error DFt ValuePr > |t| Intercept10.20500.473430.430.6943 Intercept20.84090.494631.700.1876 herdyear10.29750.4950200.600.5546 herdyear20.... dam_age2-0.12690.498720-0.250.8017 dam_age30.... calfsexF0.39060.4967200.790.4409 calfsexM0.... 64 Estimates LabelEstimateStandard ErrorDFt ValuePr > |t|MeanStandard Error Mean female marginal mean category 1 0.68080.3829201.780.09060.75200.1212 female marginal mean category 2 1.31680.4249203.100.00570.90600.07123 male marginal mean category 1 0.29020.3607200.800.43050.61420.1380 male marginal mean category 2 0.92620.3902202.370.02770.82280.1014

65 Applied Bayesian Inference, KSU, April 29, 2012 §  / Conditional versus marginal (“population-averaged”) probabilities: Conditional (on u): Marginal (on u): Marginal probably matters just as much…. also…there is no corresponding closed form for (cumulative) logistic mixed models. 65

66 Applied Bayesian Inference, KSU, April 29, 2012 §  / Accounting for unknown  2 u ? ods html select covparms; title "Default RSPL"; proc glimmix data=gf83 ; class sire ; model y = hy age sex/dist = mult link = cumprobit; random intercept / subject=sire ; run; title "Quadrature"; proc glimmix data=gf83 method=quad ; class sire ; model y = hy age sex/dist = mult link = cumprobit; random intercept / subject=sire ; run; title "Laplace"; proc glimmix data=gf83 method = laplace ; class sire ; model y = hy age sex/dist = mult link = cumprobit; random intercept / subject=sire ; run; 66 Some alternative methods available in SAS PROC GLIMMIX

67 Applied Bayesian Inference, KSU, April 29, 2012 §  / Which one should I pick? 67 Covariance Parameter Estimates Cov ParmSubjectEstimateStandard Error Interceptsire0.27000.4837 Covariance Parameter Estimates Cov ParmSubjectEstimateStandard Error Interceptsire0.025680.2947 Covariance Parameter Estimates Cov ParmSubjectEstimateStandard Error Interceptsire0.024880.2898 RSPL QUAD LAPLACE “ML” vs. “REML” thing?

68 Applied Bayesian Inference, KSU, April 29, 2012 §  / Yet another option: “Residual” Laplace (Tempelman and Gianola, 1993) 68 Rather than using point estimate, might also weight inferences on  and u based p(  2 u |y) Log (p(  2 u |y))

69 Applied Bayesian Inference, KSU, April 29, 2012 §  / Summary of GLMM as conventionally done today. Some issues – 1. Approximate Is that wise? – 2. MML point estimates of  2 u are often badly biased. Upwards or downwards??? Unpredictable. – 3. Uncertainty in MML estimates not accounted for. – 4. Marginal versus conditional inference on treatment probabilities? (applies to other dist’n; e.g. Poisson) – Implications? We’ll see later with comparison between empirical Bayes and fully Bayes (using MCMC). Obvious dependency on n, q, etc. 69


Download ppt "Applied Bayesian Inference, KSU, April 29, 2012 §  / §❸Empirical Bayes Robert J. Tempelman 1."

Similar presentations


Ads by Google