Instructor: Walt Stroup, Ph.D.

Instructor: Walt Stroup, Ph.D.
Introduction to Modeling Change Over Time with Generalized Mixed Models using SAS PROC GLIMMIX A Short Course – 14 May 2007 Instructor: Walt Stroup, Ph.D. Professor & Chair, UNL Department of Statistics

Outline of ShortCourse (G/C = Growth/Change Model)
Introduction motivating examples Social Science HLM-speak vs. BioStat GLMM-speak GLMM / HLM essential background recurring modeling issues SAS / GLIMMIX syntax G/C Models - 1st part of the picture: Factorial trt designs with various error structures & distributions with repeated measures & correlated errors G/C Models - 2nd part of the picture: Random Effects issues random coefficients prediction vs. estimation G/C Models – 3rd part of the picture - GLM issues: Binary, count, rate, zero-inflated models Power & Planning Nonlinear mixed models 14 May 2007 SSP Core Facility

Recurring Themes “Mixed Model” Issues “GLM” Issues fixed or random?
error terms – which one & are they correlated? std error & d.f. prediction or estimate? (“inference space”) “GLM” Issues what distribution? incl “is it really a distribution & does it matter”? what link – “data” vs “model” scale? overdispersion computational issues 14 May 2007 SSP Core Facility

Recurring Themes George Bernard Shaw:
“America and England are two peoples separated by a common language.” Generalized Mixed Models have AgStat-speak BioStat-speak Social/Behavioral Science Stat (HLM) speak One goal: serve as translator picture of GB Shaw 14 May 2007 SSP Core Facility

I. Introduction General considerations for modeling
Several examples illustrating generalized and mixed models Typology of models Background theory Decision chart to match model with software available in SAS 14 May 2007 SSP Core Facility

General Model considerations
A Model is a description of the components of an observation observation = systematic + random Nelder: random = ephemeral + noise or random=random model + random error Alternative: random = design components + remaining variation “All models are wrong but some are useful” – G.E.P Box 14 May 2007 SSP Core Facility

General Mixed Model Setting
Y is vector of responses (observable) u is vector of random (design induced) effects [not (directly) observable] relevant distributions Y|u ~ fC (  , R ) u ~ fR ( 0, G ) Model is of conditional mean of Y|u Inexact (but useful) HLM level 1 Biostat – subject-specific Level  2 14 May 2007 SSP Core Facility

GLIMMIX Short Course for Procter & Gamble
Typology of Models Type Mean Model Distribution NLMM h(X,,Z,u) y|u general, u normal ** GLMM h(X+Zu) u normal * LMM X+Zu u, y|u normal NLM h(X,) y normal GLM h(X) y general LM X * for PROC GLIMMIX ** for this course (G/N)LMM can be more general GLIMMIX Short Course for Procter & Gamble 19-20 Oct 2006

Example 1 Random Effects Model
Data: Output 4.1, p. 94, SAS for Linear Models, 4th ed. 20 packages of ground beef 3 samples per package 2 counts per sample response variable: microbial count response = mean + sample + count + error i.e. observation = systematic + random model + error 14 May 2007 SSP Core Facility

Model for Example 1 Convention: fixed Greek; random Latin
yijk is observation [ log(count) ]  is overall mean (systematic / fixed) pi, s(p)ij are random model effects eijk is random error Convention: fixed Greek; random Latin 14 May 2007 SSP Core Facility

Hierarchical Levels students classroom Level 2 Level 1 size level
small 1 medium 2 large 3 school Level 3 14 May 2007 SSP Core Facility

Hierarchical Level to Statistical Model
classroom students school Level 3 GLIMMIX-speak HLM-speak 14 May 2007 SSP Core Facility

Modeling Issues Estimate i2’s
Estimate, standard error, and interval estimate of  Estimates of package, sample effects a.k.a. Estimates of school and classroom effects 14 May 2007 SSP Core Facility

Singer: HLM to MIXED Unconditional means model
Include Level 2 Covariate one-way random effects model 14 May 2007 SSP Core Facility

Example 2 Blocking & Multi-Location
Data: SAS for Linear Models: Output 3.7, discussed as mixed model in section 4.3; Output 11.30; SAS for Mixed Models, 2nd ed. Section 6.6 Output discussed here 3 treatments 8 locations location represent a population 3-12 blocks depending on location response = trt + loc + blk(loc) + trtloc + error i.e. observation = systematic+random model+error 14 May 2007 SSP Core Facility

Example 2 framed by Extending School / Classroom Example
students Treatment school classroom students school Treatment 14 May 2007 SSP Core Facility

Model with Treatment classroom students Treatment school
14 May 2007 SSP Core Facility

Modeling Issues Appropriate error term to test treatment
Standard error of treatment mean (inference space) Intra-block vs. inter-block analysis 14 May 2007 SSP Core Facility

ANOVA (ignoring block)
Test of TRT affected If Location fixed: 14 May 2007 SSP Core Facility

Inference Space 14 May 2007 SSP Core Facility

Where does Uncertainty Arise?
Loc 1 Loc 2 Only from variation among obs within locations? Locations fixed Or does variation among locations also contribute? Locations random Loc 7 Loc 8 14 May 2007 SSP Core Facility

Intra- vs. Inter-block analysis
Intra- (fixed) block analysis based only on within block treatment differences Inter-block analysis also accounts for variance among blocks (random combines inter- and intra-) Lead to equivalent tests when all treatments appear equally in each block Not equivalent otherwise In most cases, combined inter-/intra-block analysis is more efficient 14 May 2007 SSP Core Facility

Example 3 Repeated Measures/Longitudinal
Data: SAS for Linear Models, Output 8.1; SAS for Mixed Models, Chapter 5 3 treatments (2 test drugs + placebo) ni patients per treatment 8 times of measurement (1, 2, 3, ..., 8 hours post trt) baseline measurement at time 0 response = trt + hour + trthour + pat(trt) + error i.e. observation = systematic + random model + error Variations on this theme are “latent growth models” 14 May 2007 SSP Core Facility

Growth Models – Singer HLM-speak to GLIMMIX-speak
Unconditional Linear Growth Model HLM GLIMMIX Level 1 Within subjects Level 2 Between subjects PA SS 14 May 2007 SSP Core Facility

Singer (1998) I’ll update & make switch to Proc GMIMMIX
Excellent paper translating HLM-speak to Proc Mixed Uses Radenbusch & Byrk examples Fair Warning to Readers, however – it’s dated new features & output revisions in SAS some of the output encouraged confusion or poor practice specifics revised output of Fit Statistics Misleading output for variance estimates deleted Kenward-Roger procedure for d.f. & std errors I’ll update & make switch to Proc GMIMMIX 14 May 2007 SSP Core Facility

Errors may be correlated
Modeling Issues Errors may be correlated May affect conclusions How to select covariance model Denominator degrees of freedom Bias in standard errors and test statistics 14 May 2007 SSP Core Facility

Impact of Correlated Errors
Covariance Model den df F-value Pr>F errors independent 483 7.11 <0.0001 errors correlated no structure (bias corrected) 69 (98.1) 4.06 (3.66) AR(1) 3.93 bias corrected 424 3.89 14 May 2007 SSP Core Facility

response: favorable or unfavorable (fij = # fav)
Example 4 Data: SAS for Mixed Models, Section 14.5 2 treatment (Test Drug, Control) 8 clinics clinics represent a population nij subjects at jth location on ith treatment response: favorable or unfavorable (fij = # fav) response = trt + clinic + clinicloc + error i.e. observation = systematic + random model + error 14 May 2007 SSP Core Facility

Response (fij / nij) is binomial, not normal
Modeling Issues Response (fij / nij) is binomial, not normal Response may not be linear in model parameters Errors may not be additive Variance of binomial & normal are different heterogeneous depends of location parameter 14 May 2007 SSP Core Facility

Generalized Linear Mixed Model
e.g. Logistic mixed model 14 May 2007 SSP Core Facility

response = number of seizures (count)
Example 5 SAS for Linear Models, Output 10.39 2 treatments ni persons per treatment 4 times of measurement response = number of seizures (count) baseline and age observations response = trt + hour + trthour + baseline & age pat(trt) + error i.e. observation = systematic + random model + error 14 May 2007 SSP Core Facility

Count typically not ~ normal
Modeling Issues Count typically not ~ normal Poisson (or negative binomial) more likely Generalized Linear Model Issues Linear model not good direct model of mean Variance depends on mean Repeated Measures Issues Observations within subjects correlated over time Between subject variance 14 May 2007 SSP Core Facility

Example 6 SAS for Mixed Models, Section 1.5.6 5 treatments
observed in each of 4 randomized blocks several measurements at days between 130 and 180 growing degree days response = (trt,day) + block + blktrt + error i.e. observation = systematic + random model + error 14 May 2007 SSP Core Facility

Emergence over TIME by TRT
Black: NoTill Red: SumBlade (summer) Cyan: SB&SD Green: SpDisk (spring) Blue: SpPlow 14 May 2007 SSP Core Facility

“Usual” mixed model and repeated measures issues, plus
Modeling Issues “Usual” mixed model and repeated measures issues, plus Linear model is poor model of trtday means 14 May 2007 SSP Core Facility

Nonlinear Mixed Model 14 May 2007 SSP Core Facility

Typology of Models Type Mean Model Distribution NLMM h(X,,Z,u)
y|u general, u normal ** GLMM h(X+Zu) u normal * LMM X+Zu u, y|u normal NLM h(X,) y normal GLM h(X) y general LM X * for PROC GLIMMIX ** for this course (G/N)LMM can be more general 14 May 2007 SSP Core Facility

Generalized Mixed Model SAS Software Decision Table

Essential GLMM Background

First How do I run a SAS Program? 
???????  It’s easier than the urban legends would have you believe 14 May 2007 SSP Core Facility

Basic Parts of SAS Program
Data your_choice_of_name; Input list of variables; /* $ after alphameric var */ Datalines; data – one line / obs, one column per variable ; DATA Step PROC Step Modify existing data set (Data __; Set__;) comment Proc GLIMMIX Data= your_choice_of_name; CLASS block group & trt var; MODEL response=block trt covar / options; ... Run; Data new_data_set_name; Set [old – e.g.] your_choice_of_name; program & data manipulation statements. e.g. LogY=Log(Y); 14 May 2007 SSP Core Facility

Example of SAS Program DATA Step PROC Step Data; Set; + new PROC
data demo1; input classroom trt $ time count; sc=sqrt(count); datalines; 1 std 1 12 1 std 2 16 1 std 4 17 1 std 8 24 2 exper 1 17 2 exper 2 24 2 exper 4 30 2 exper 8 32 11 std 1 16 11 std 2 15 11 std 4 22 11 std 8 23 8 exper 1 15 8 exper 2 20 8 exper 4 24 8 exper 8 27 ; proc glimmix data=demo1; class classroom trt time; model sc=trt time trt*time / dist=normal ddfm=kr; random classroom(trt); lsmeans trt*time; ods output lsmeans=lsm; run; Data; Set; + new PROC data plot_growth; set lsm; log_time=log2(time); symbol i=join value=circle; proc gplot data=plot_growth; plot estimate*log_time=trt; run; 14 May 2007 SSP Core Facility

II. Generalized Mixed Model Theory
Clarify Fixed vs Random effects Linear Models LM to LMM + GLM to GLMM Estimation and Inference for LMM GLM GLMM For GLMM: what follows naturally from GLM and LMM Special Issues 14 May 2007 SSP Core Facility

Fixed vs. Random Effects?
Fixed Effect? levels observed = population of interest (except regression) levels deliberately chosen inference: systematic relationship between y and  Random Effect? observed levels represent target population random sample? -- ideal (but seldom perfectly realized) makes sense to conceptualize probability distribution Bottom Line: do observed levels of effect plausibly represent a probability distribution? yes  random effect no  fixed effect 14 May 2007 SSP Core Facility

General Structure of Model
Nelder: observation=systematic + random General approach: likelihood consists of two parts observation (y | u) random effects u model is mathematical description of  = E(y | u) Distribution: observation y | u ~ f(,R) random effects u ~ MVN(0,G) Model:  = h(X,,Z,u) h() called “inverse link” 14 May 2007 SSP Core Facility

Linear Model (LM) No random effects simple ANOVA (one error term)
multiple regression 14 May 2007 SSP Core Facility

Generalizations of LM LM (Linear Model) obs ~ normal
fixed effects only obs ~non-normal GLM: (Generalized Linear Model) Random Effects LMM: (Linear Mixed Model) obs ~ non-normal random effects GLMM (generalized linear mixed model) 14 May 2007 SSP Core Facility

GLM: Generalized Linear Model
Binomial: Logistic regression; Probit models Poisson: Log-linear models 14 May 2007 SSP Core Facility

LMM: Linear Mixed Model
Multi-error models; split-plot, multi-location Repeated measures a.k.a. Longitudinal data More vocabulary: “G-side” concerns V(u) “R-side” concerns V(e) 14 May 2007 SSP Core Facility

GLMM: Generalized Linear Mixed Model
Modelling will involve Distribution Link (or inv link) G-side R-side 14 May 2007 SSP Core Facility

Some Grounding Before Moving On
“Hessian Fly” example, Gotway & Stroup (1997, JABES) “Hessian Fly” not so important, but design & data structure are 16 treatments, 4 replications: 4x4 Lattice 16 incomplete blocks organized into 4 complete blocks Response: Yij/nij (damaged / obs per trt x block unit) 1 2 5 6 3 4 7 8 9 13 10 14 11 12 15 16 14 May 2007 SSP Core Facility

Linear Model (LM) proc glimmix; class block entry;
model pct=block entry; proc glimmix; class inc_block entry; model pct=inc_block entry; 14 May 2007 SSP Core Facility

Linear Mixed Model (LMM)
proc glimmix; class block entry; model pct=entry; random block; G-side modeling block effect Incomplete block (recovery of interblock information) Replace “block” by “inc_block”) 14 May 2007 SSP Core Facility

LMM G-side / R-side Two alternative “G-side” specifications:
proc glimmix; class block entry; model pct=entry; random block; proc glimmix; class block entry; model pct=entry; random intercept/subject=block; proc glimmix; class block entry; model pct=entry; random _residual_ / type=cs subject=block; R-side specification Here, it doesn’t matter (all equivalent) but for more complex models, the distinctions will matter 14 May 2007 SSP Core Facility

Generalized Linear Model (GLM)
proc glimmix; class block entry; model y/n = block entry; or replace “block” by “inc_block” for intra-block logit ANOVA More on GLIMMIX syntax later Here, note Y/N causes default to Binomial distribution & Logit link (same as GENMOD) 14 May 2007 SSP Core Facility

Generalized Linear Mixed Model (GLMM)
proc glimmix; class block entry; model y/n = entry; random intercept / subject=block; proc glimmix; class block entry; model y/n = entry; random block; proc glimmix; class block entry; model y/n = entry; random _residual_ / type=cs subject=block; Marginal model not equivalent 14 May 2007 SSP Core Facility

II. Inference in LM, GLM, LMM, and GLMM

II. Examples of Estimable Functions

II. Common Inference Results for GLM

II. GLM: Inference with Unknown Scale Parameter

II. Extension of GLM Scale Parameter Quasi-Likelihood
Overdispersion “Working Correlation” 14 May 2007 SSP Core Facility

II. GLM: Deviance and Likelihood Ratio Test

II. LMM: The “Mixed Model Equations”
Mixed Model Solution Marginal Model Solution 14 May 2007 SSP Core Facility

II. LMM Inference – G and R known

II. LMM Inference – G and R unknown

II. LMM: Variance Component Estimation
Several methods For variance-component-only models: use EMS from ANOVA Maximum likelihood problem: biased Restricted maximum likelihood Several computational approaches Newton Raphson Fisher Scoring EM 14 May 2007 SSP Core Facility

What’s Wrong with ML? An example to illustrate
SAS for Mixed Models, Data Set 1.5.1 Incomplete Block design from Cochran & Cox, Experimental Designs, p 456 15 treatments 15 blocks 4 treatments observed per block 14 May 2007 SSP Core Facility

C&C Example: ML and two alternatives
Intrablock (fixed block) analysis proc glimmix data=cc456; class trt bloc; model y=trt bloc; equivalent to PROC GLM Inter/Intra-block (random block)analysis –default proc glimmix data=cc456; class trt bloc; model y=trt; random bloc; PROC MIXED default give same result Inter/Intra-block (random block) analysis – ML proc glimmix data=cc456 method=mspl; class trt bloc; model y=trt; random bloc; same as Proc MIXED METHOD=ML; 14 May 2007 SSP Core Facility

ML vs Alternative Results: Which is Right?
Intrablock (fixed block) Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F trt 14 31 1.23 0.3012 Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F trt 14 36.2 1.48 0.1676 Intra/interblock (random) block default Intra/interblock (random) block - ML Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F trt 14 49.04 2.02 0.0352 14 May 2007 SSP Core Facility

Simulation ML or REML 1000 simulated data sets using C & C, p 456 design B2/2 = 0.5 Recorded type I error rate for Ftrt intrablock REML random block ML random block Variable N Mean ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ fixd_rej REML_rej ML_rej 14 May 2007 SSP Core Facility

II. LMM with estimated G and R Bias in std error and test statistics

II. LMM: Degrees of Freedom

II. Degrees of Freedom (2)

II. Satterthwaite Approximation

II. Satterthwaite Approximation in LMM
For vector K (e.g. treatment contrast): 14 May 2007 SSP Core Facility

II. GLMM Estimation 14 May 2007 SSP Core Facility

II. Working Correlation
Recall Gotway & Stroup (1997) Hessian Fly Example Gotway and Stroup considered spatial variation among e.u. 1 2 5 6 3 4 7 8 9 13 10 14 11 12 15 16 proc glimmix; class block entry; model y/n=entry; random intercept / subject=block; random _residual_ / type=sp(sph)(row col) MODEL sets up Binomial GLM, Logit link RANDOM _RESIDUAL_ sets up a working correlation based on SPHERICAL semivariogram 14 May 2007 SSP Core Facility

II. Marginal (PA) vs Subject-Specific Inference
Population Averaged (PA) SS (true GLMM) 14 May 2007 SSP Core Facility

II. More on PA (marginal) vs. SS

II. Estimation of GLMM model E(y|u) inverse link: E(y|u)=h(X+Zu)
link: g[E(y|u)]==X+Zu to estimate  and u need to evaluate f(y), f(y|u) approximate e.g. by Taylor series expansion Penalized Quasi-Likelihood (SAS %GLIMMIX) SAS PROC GLIMMIX (next slides) numerical integrate joint density Gauss-Hermite Quadrature (Proc NLMIXED) stochastically evaluate integral Monte Carlo Markov Chain (WinBugs – not in this course) 14 May 2007 SSP Core Facility

II. Computational Method Comparison
GEE Computationally easy Meaning of marginal results in GLM? Linearized GLMM (current PROC GLIMMIX) uses familiar LMM analogs (but many are ad hoc & need further research) allows considerable R-side flexibility adequate for many GLMM; breaks down for certain cases (binary data) Integral Approximation (PROC NLMIXED) better approximation that Linearized GLMM BUT: ML only, simple G-side models only, no R-side LaPlace computationally less demanding than Integral approximation but often “accurate enough”; same limitations as Integral approximations MCMC simple models only; limited & temperamental software but in extreme cases, only way to get accurate results 14 May 2007 SSP Core Facility

Modeling Considerations 14 May 2007 SSP Core Facility

Basic Parts of SAS Program
Data your_choice_of_name; Input list of variables; /* $ after alphameric var */ Datalines; data – one line / obs, one column per variable ; DATA Step PROC Step comment proc glimmix data=demo1; class classroom trt time; model sc=trt time trt*time / dist=normal ddfm=kr; random classroom(trt); lsmeans trt*time; ods output lsmeans=lsm; run; 14 May 2007 SSP Core Facility

III. Modeling Considerations
Overdispersion Marginal (PA) vs Conditional (SS) models “Data” vs “Model” Scale 14 May 2007 SSP Core Facility

III. Model Considerations
Variance Model & Overdispersion Choice of Link Function Choice of Distribution Choice of Model Effects Correlated Errors? Any of the above could show up as “overdispersion” 14 May 2007 SSP Core Facility

III. GLMM: Model Considerations
Common dilemma Design, e.g. like “Hessian fly” example BINOMIAL data Recover interblock information - BLOCK random 1 2 5 6 3 4 7 8 9 13 10 14 11 12 15 16 Analysis reveals that the data are overdispersed 14 May 2007 SSP Core Facility

III. Hessian Fly Example
proc glimmix data=HessianFly; class block entry; model y/n = entry; random block; Fit Statistics -2 Res Log Pseudo-Likelihood 182.21 Generalized Chi-Square 107.96 Gener. Chi-Square / DF 2.25 Evidence of Overdispersion when >>1 14 May 2007 SSP Core Facility

III. Overdispersion Observed variance > variance under presumed model Symptom: Deviance/DFE or chi-square/DFE >> 1 Uniquely a GLM / GLMM issue not a consideration with LM, LMM y|u ~ normal implies variance not a function of mean When is there an issue If Var(y) = f[E(y)] and using scale adjustment requires unrealistic assumptions 14 May 2007 SSP Core Facility

III. Common fix for Overdispersion
proc glimmix data=HessianFly; class block entry; model y/n= entry; random block; random _residual_; Covariance Parameter Estimates Cov Parm Subject Estimate Standard Error Intercept block . Residual (VC) 2.2668 0.4627 Issue: not a true likelihood Covariance Parameter Estimates Cov Parm Subject Estimate Standard Error Intercept block 14 May 2007 SSP Core Facility

Impact of Scale Parameter on Inference
Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F entry 15 45 6.90 <.0001 no scale parameter with scale parameter adjustment Type III Tests of Fixed Effects Effect Num DF Den F Value Pr > F entry 15 45 3.03 0.0020 failure to account for overdispersion tends to increase type I error rate but is this the best way to address the problem? 14 May 2007 SSP Core Facility

III. Mean – Variance Overdispersion Models

III. Marginal or Conditional Formulation
For many models (notably LMM) there are equivalent forms conditional (mixed, SS) model marginal (PA) model lead to the same marginal log-likelihood Distinction results from G-side model; random model effects R-side model; marginal model 14 May 2007 SSP Core Facility

III. Example: variance component (G-side) vs
III. Example: variance component (G-side) vs. Compound symmetry (R-side) 14 May 2007 SSP Core Facility

III. Compound Symmetry Equivalent

III. G-side / R-side proc glimmix; class block entry; model y/n=entry;
random block; proc glimmix; class block entry; model y/n=entry; random intercept / subject=block; R-side model proc glimmix; class block entry; model y/n=entry; random _residual_ / type=CS subject=block; same model G-side proc mixed; class block entry; model y=entry; repeated / type=CS subject=block; 14 May 2007 SSP Core Facility

III. Variance Component vs CS in GLMM
Variance component model is GLMM CS model is GEE They are not equivalent 14 May 2007 SSP Core Facility

III. Conditional vs. Marginal Results
Fit Statistics Gener. Chi-Square / DF 2.27 Fit Statistics Gener. Chi-Square / DF 2.30 Covariance Parameter Estimates Cov Parm Subject Estimate Intercept block Residual (VC) 2.2668 Covariance Parameter Estimates Cov Parm Subject Estimate CS block Residual 2.2992 Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F entry 15 45 3.03 0.0020 Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F entry 15 45 2.99 0.0023 which is right? fit statistic? can you simulate data using mechanism implied by model? 14 May 2007 SSP Core Facility

III. Marginal or Conditional?
How to choose? Conditional: G-side; Marginal: R-side Fit statistic? (may help; may deceive) General recommendation G-side formulation preferred for non-normal data G-side effects operate inside the link function & hence always lead to valid conditional & marginal distributions R-side effects operate outside the link function for non-normal data, models implied by R-side effects may be vacuous 14 May 2007 SSP Core Facility

III. Impact of Model Effects
Back to Hessian Fly Data Incomplete Block Design Try more appropriate model Fit Statistics Gener. Chi-Square / DF 1.41 Covariance Parameter Estimates Cov Parm Subject Estimate Intercept inc_block 0.4971 proc glimmix; class inc_block entry; model y/n-entry; random intercept / subject=inc_block; Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F entry 15 33 6.33 <.0001 14 May 2007 SSP Core Facility

III. Inference After model fit & estimation, inference begins
Also want at least some of following comparisons among groups (trt, entry...) test hypotheses obtain confidence intervals obtain predictions further model checking 14 May 2007 SSP Core Facility

III. Scale issue for GLM, GLMM
For GLM, GLMM there are two “natural scales” linear (or model) scale (e.g. logit) data scale May be other scales, depending on context odds odds ratio 14 May 2007 SSP Core Facility

III. Choosing the Scale Example: Hessian Fly – binomial dist, logit link Data: measured as 0/1; per e.u. as Y/N Main focus: entry effect on P{indiv resp = 1} 14 May 2007 SSP Core Facility

III. Scale and Inference

III. Inverse Linking Estimation occurs on model scale
But reporting typically must occur on data scale “delta” rule 14 May 2007 SSP Core Facility

III. Model & Data Scale – Hessian Fly Example
Solutions for Fixed Effects Effect entry Estimate Standard Error DF t Value Pr > |t| Intercept 0.4886 15 -3.90 0.0014 1 3.8001 0.6327 33 6.01 <.0001 2 3.4821 0.6186 5.63 Estimates Label Estimate Standard Error Lower Upper Mean Standard Error Mean Lower Mean Upper Mean entry 1 1.8944 0.4608 0.9568 2.8319 0.8693 0.7225 0.9444 entry 2 1.5765 0.4321 0.6974 2.4555 0.8287 0.6676 0.9210 diff entry 1-2 0.3179 0.5793 1.4965 0.5788 0.1412 0.2972 0.8171 data scale linear or model scale which of these make NO sense? 14 May 2007 SSP Core Facility

on to GLIMMIX 14 May 2007 SSP Core Facility

IV. GLIMMIX Syntax SAS software for GLMs & Mixed models
Basic GLIMMIX syntax Similarities & Differences vs existing SAS Procs New features 14 May 2007 SSP Core Facility

IV. SAS Software for Linear Models
LM Proc GLM, MIXED Proc GLIMMIX GLM Proc GENMOD Proc NLMIXED LMM Proc MIXED GLMM Proc GLIMMIX Proc NLMIXED 14 May 2007 SSP Core Facility

IV. PROC GLIMMIX Syntax What’s familiar (from MIXED & GENMOD)
CLASS MODEL DIST and LINK options in MODEL (like GENMOD) RANDOM (for G-side) ESTIMATE, CONTRAST, LSMEANS ODS What’s new or different RANDOM _RESIDUAL_ (replaces REPEATED for R-side) LSMESTIMATE new options in LSMEANS (e.g. better options for factorial exp) NLOPTIONS Model diagnostics 14 May 2007 SSP Core Facility

IV. Relation between GLMM Structure and GLIMMIX Code
proc glimmix; class variables; model <resp>=<fixed effects> /dist= link= ; random <g-side effects> / <options>; random _residual_ / type= subject= ; run; 14 May 2007 SSP Core Facility

IV. NLOPTIONS Statement
New Statement in GLIMMIX Controls Optimization technique, Line Search Method, number of Iterations, etc proc glimmix; class id a b; model y=a b a*b; random _residual_ / type=cs subject=id(a); nloptions tech=nrridge maxiter=100; TECH=NRRIDGE causes GLIMMIX to use MIXED computing algorithm (good for comparison...) 14 May 2007 SSP Core Facility

IV. Programming Statements
Similar to GENMOD, NLIN, NLMIXED GLIMMIX supports statements using DATA step syntax Use to transform variables, define quantities to output, user-defined link, variance, etc. For example.... proc glimmix; class block entry; pct=y/n; model pct=entry; random intercept / subject=block; 14 May 2007 SSP Core Facility

IV. Some GLIMMIX Defaults Useful to Know
In MODEL statement response Y= NORMAL distribution & IDENTITY link response Y/N= BINOMIAL distribution and LOGIT link For distributions without scale parameter in variance function (e.g. Binomial, Poisson) no scale parameter assumed (unlike %GLIMMIX macro) obtain scale parameter with RANDOM _RESIDUAL_ Optimization method automatically matched based on DISTRIBUTION & LINK 14 May 2007 SSP Core Facility

IV. Estimation Methods in PROC GLIMMIX
Defaults depend on model, distribution, and link May be altered with METHOD= option in PROC statement METHOD= options variations on pseudo-likelihood RSPL RMPL MSPL MMPL subject specific (conditional or mixed) model Restricted obj fct (like REML) Unrestricted obj fct (like ML) population averaged (marginal) model 14 May 2007 SSP Core Facility

IV. Defaults & Methods (continued)
GLMM Default Method is RSPL For LMM, this is REML GLIMMIX uses different algorithm than MIXED, TECH=NRRIDG uses MIXED algorithm you can get slightly different numbers with MIXED/GLIMMIX METHOD=MSPL yields ML estimates Methods appear in literature as MPL, PQL Gaussian adaptive quadrature and LaPlace algorithms will be added to V 9.2 not available yet & not discussed here 14 May 2007 SSP Core Facility

IV. Examples proc glimmix; class id; model y=x / dist=poisson; run;
Poisson regression Log link proc glimmix; class id; model y=x / dist=poisson; random _residual_; run; Poisson regression Log link add scale parameter proc glimmix; class id; _variance_=_mu_*_mu_; model y=x / dist=poisson; run; Poisson regression Log link change variance function 14 May 2007 SSP Core Facility

IV. “GLM-mode” vs “GLMM-mode”
Use following trick to get GLM (GENMOD) type model via pseudo-likelihood proc glimmix; class id; model y=x / dist=poisson; random _residual_; “GLM-mode” max likelihood proc glimmix; class id; model y=x / dist=poisson; random _residual_ / subject=id; “GLMM-mode” pseudo likelihood this is a GEE with indep working corr 14 May 2007 SSP Core Facility

IV. Distributions supported by GLIMMIX
Discrete Binary Binomial Poisson Geometric Negative Binomial Multinomial Nominal Ordinal Continuous Beta Normal Lognormal Gamma Exponential Inverse Gaussian Shifted T 14 May 2007 SSP Core Facility

IV. MIXED to GLIMMIX – R-side
proc mixed; class loc id trt time; model y=trt | time; random loc; repeated / type=ar(1) subject=id(loc); proc glimmix; class loc id trt time; model y=trt | time; random intercept / subject=loc; random _residual_ / type=ar(1) subject=id(loc); when you use GLIMMIX, you will notice it is much fussier about SUBJECT= statement when nested subject structure is present (MIXED more likely to let you get away with ignoring SUBJECT) 14 May 2007 SSP Core Facility

IV. More on R-side alternative form proc mixed; of random residual
e.g when time points missing, unsorted etc. proc mixed; class loc id trt time; model y=trt | time; random loc; repeated time / type=ar(1) subject=id(loc); proc glimmix; class loc id trt time; model y=trt | time; random intercept / subject=loc; random time / type=ar(1) subject=id(loc) residual; ** vs random _residual_ / type=ar(1) subject=id(loc); 14 May 2007 SSP Core Facility

IV. MIXED to GLIMMIX - Estimate
MIXED: single row ESTIMATE statements GLIMMIX: multi-row with multiplicity adjustment proc mixed; class trt; model y=trt a x trt*a trt*x; estimate ’10 3’ trt 1 -1 trt*a trt*x 3 -3; estimate ’20 3’ trt 1 -1 trt*a trt*x 3 -3; estimate ’30 3’ trt 1 -1 trt*a trt*x 3 -3; proc glimmix; class trt; model y=trt a x trt*a trt*x; estimate ’10 3’ trt 1 -1 trt*a trt*x 3 -3, ’20 3’ trt 1 -1 trt*a trt*x 3 -3, ’30 3’ trt 1 -1 trt*a trt*x 3 -3 / adjust=scheffe; 14 May 2007 SSP Core Facility

IV. MIXED vs. GLIMMIX - LSMEANS
Example: Factorial gives you table of all possible differences PROC MIXED; class A B; model y=A|B; lsmeans A B/diff; lsmeans A*B/diff slice=(A B); tests – but does not estimate – simple effects A given B, vice versa PROC GLIMMIX; class A B; model y=A|B; lsmeans A B/diff lines; lsmeans A*B / slice=(A B) slicediff=(A B); gives multiple range display users love  restricts A*B diffs to actual simple effects, e.g. A1-A2|Bj 14 May 2007 SSP Core Facility

IV. GLIMMIX – LSMEANS (1) Main Effects
B Least Squares Means B Estimate Standard Error DF t Value Pr > |t| 1 1.3226 13.69 14.01 <.0001 2 20.05 4 21.38 8 19.13 T Grouping for B Least Squares Means LS-means with the same letter are not significantly different. B Estimate 4 A 2 8 1 proc glimmix data=AxB_example; class block A B; model y=A|B/ddfm=satterth; random block block*B; lsmeans A B/diff lines; lsmeans A*B/slicediff=(A B); run; 14 May 2007 SSP Core Facility

IV. GLIMMIX – LSMEANS (2) Simple Effects
proc glimmix data=AxB_example; class block A B; model y=A|B/ddfm=satterth; random block block*B; lsmeans A B/diff lines; lsmeans A*B/slicediff=(A B); run; A*B Least Squares Means A B Estimate Standard Error r 1 1.4769 2 4 8 A*B Least Squares Means A B Estimate Standard Error s 1 1.4769 2 4 8 Simple Effect Comparisons of A*B Least Squares Means By B Simple Effect Level A _A Estimate Standard Error DF t Value Pr > |t| B 1 r s 2.9400 1.3144 16 2.24 0.0399 B 2 2.6400 2.01 0.0618 B 4 -0.15 0.8810 B 8 -0.76 0.4578 14 May 2007 SSP Core Facility

Differences of A*B Least Squares Means
IV. GLIMMIX – LSMEANS (3) lsmeans a*b / diff; gave you this Differences of A*B Least Squares Means A B _A _B Estimate Standard Error DF t Value Pr > |t| r 1 2 1.8796 19.49 -4.17 0.0005 4 -4.35 0.0003 8 -2.55 0.0192 s 2.9400 1.3144 16 2.24 0.0399 -2.77 0.0121 -4.46 -3.09 0.0060 -0.18 0.8583 3.0400 1.62 0.1219 5.74 <.0001 etc 14 May 2007 SSP Core Facility

IV. GLIMMIX -- LSMESTIMATE
not estimable estimate ‘A|B’ a*b ; must write estimate ‘A|B’ a 1 -1 a*b ; new GLIMMIX alternative lsmestimate a*b ‘A|B’ ; 14 May 2007 SSP Core Facility

IV. ODS Graphics With GLIMMIX
Not available with MIXED ods html; ods graphics on; ods select MeanPlot; proc glimmix data=AxB_example; class block A B; model y=A|B/ddfm=satterth; random block block*B; lsmeans A*B/plot=MeanPlot (sliceby=A join cl); run; ods graphics off; ods html close; 14 May 2007 SSP Core Facility

Factorial Treatment Design
Treatment Design vs Experiment (or study) Design Factorial is type of treatment design Factor A, a levels; Factor B, b levels; etc Main inference tools: simple effects; e.g. method effect | variety j interaction; i.e. simple effects equal for all j main effects 14 May 2007 SSP Core Facility

specific form depends on design 14 May 2007 SSP Core Facility

GLIMMIX Features Can estimate / test
simple effects main effect depending on which is appropriate ODS graphics can graph / plot effects of interest SLICE can focus on simple effects in presence of interaction SLICEDIFF can estimate simple effects of interest 14 May 2007 SSP Core Facility

Modeling & Design 14 May 2007 SSP Core Facility

But My Study is not a Designed Experiment!
Comparative Study: any study whose purpose is to compare treatments or conditions (includes assessing change over time). Includes “quasi-experiments” & surveys with comparative objectives + designed experiments. Design principles apply to all! Most modeling issues are study design issues Most modeling errors result from poor understanding of design principles 14 May 2007 SSP Core Facility

If you are modeling, you need to understand design principles!!

Key Terms in Design Treatment Design: factors and levels & how they are structured in the study. E.g factorial, planned obs over time Experiment Design: Organization of experimental units (e.g into matched pairs, blocks, strata, clusters); plan by which they are assigned to treatment levels. Experimental Unit: (e.u.) Smallest entity to which treatment levels (or treatment combinations) are independently assigned. E.U.s are legitimate units of replication Sampling Unit: Unit on which measurement is taken. May be e.u. itself or subset of e.u. A.k.a. pseudo-replicate Pseudo-replication: use of S.U.s as units of replication; common form of inappropriate design & analysis 14 May 2007 SSP Core Facility

Factorial & Experiment Designs
idea: experimental unit is smallest entity to which treatment level independently applied e.u. may be different size for different factors e.g. from SAS for Mixed Models, Section 4.6 2 type  3 dose example dose applied to cage; type to animal in cage e.u. for dose: cage with 2 animals e.u. for type (and dose  type): animal  split-plot many variations (including repeated measures) 14 May 2007 SSP Core Facility

Participate in Prof Devel school
Adding to Model classroom students exp std Treatment Participate in Prof Devel school curriculum classroom students std exp curriculum school Treatment Do Not Participate 14 May 2007 SSP Core Facility

V. Factorial Treatment Designs
Basic Features Come in Many (many, many) design forms Experiment design & “quasi-experiment” or survey “study design” key to deciding what’s random & what’s fixed non-mixed (LM and GLM only) software is UNACCEPTABLE for these types of problems Includes repeated measures (change... growth) Normal and non-normal data 14 May 2007 SSP Core Facility

Type x Dose Design Type 1 Type 2 type 1 type 2 Type = Curriculum
or... Dose = Professional Development Trt Type = Curriculum 14 May 2007 SSP Core Facility

Here are 7 (seven) From SAS for Mixed Models Treatment design:
2 x 2 factorial Experiment design: many variations Here are 7 (seven) 14 May 2007 SSP Core Facility

we’re just getting started!
Even with 2 x 2 factorial these seven are not all we’re just getting started! 14 May 2007 SSP Core Facility

Microchip wafer Split Block Example Side L R Position (same meaning
both sides) 14 May 2007 SSP Core Facility

Choosing right model – step 1 What is the experimental unit?

Common Models in PROC MIXED/GLIMMIX
MODEL  treatment design RANDOM  experiment (study) design 14 May 2007 SSP Core Facility

Model for split-plot: school-classroom example

Model for split-plot – Dose x Type example

Conventional ANOVA H a.k.a. between subjects error HH a.k.a.
within subjects error 14 May 2007 SSP Core Facility

Standard errors of various terms
Note: you can use MS() directly except for dose|typej 14 May 2007 SSP Core Facility

Programming in Proc GLIMMIX
class bloc type dose; model y=type|dose; random intercept dose / subject=bloc; ** i.e. random bloc bloc*dose; lsmeans type*dose / diff lines slicediff=(type dose) slice=(type dose); ods output lsmeans=lsm; run; simple effect differences only simple effect tests only with “MRT lines” all possible mean differences You can use ODS to output LSMEANS and GPLOT for interaction plots, Or use ODS graphics directly 14 May 2007 SSP Core Facility

Type x Dose: Selected Output
Covariance Parameter Estimates Cov Parm Subject Estimate Standard Error Intercept block 2.0735 2.7320 dose 4.5132 2.8291 Residual 4.3189 1.5270 Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F type 1 16 2.78 0.1151 dose 3 12 13.63 0.0004 type*dose 2.29 0.1176 14 May 2007 SSP Core Facility

type*dose Least Squares Means
Type x Dose LSMeans type*dose Least Squares Means type dose Estimate Standard Error DF t Value Pr > |t| r 1 1.4769 20.23 13.54 <.0001 2 18.85 4 19.08 8 16.79 s 11.55 17.06 19.22 17.47 14 May 2007 SSP Core Facility

Type x Dose: “MRT Lines”
T Grouping for type*dose Least Squares Means LS-means with the same letter are not significantly different. type dose Estimate s 4 A r 2 8 1 B C however ...     14 May 2007 SSP Core Facility

A Factorial Inference Flowchart
The Prime Directive: Interactions first!!!!! Interaction? Non-ignorable Negligible Interpret Simple Effects Interpret Main Effects Full Wheelbarrow 14 May 2007 SSP Core Facility

Plots of Differences between Means
LSMEANS allows various plots of mean differences DIFFPlot: plots interval estimates of mean differences ANoMPlot: (ANalysis of Means) plots difference between each treatment and the overall mean ControlPlot: Plots each treatment vs control (e.g. like Dunnett test) 14 May 2007 SSP Core Facility

SAS for Mean Difference Plots
From Type x Dose example ods html; ods graphics on; ods select Anomplot DiffPlot; proc glimmix data=variety_eval; class block type dose; model y=type|dose/ddfm=satterth; random block block*dose; lsmeans dose/plot=DiffPlot; lsmeans dose/plot=AnomPlot; *lsmeans type*dose/plot=DiffPlot; *lsmeans type*dose/plot=AnomPlot; run; ods graphics off; ods html close; 14 May 2007 SSP Core Facility

SAS for Mean Difference Plots: DIFFPLOT

SAS for Mean Difference Plots: ANoMPLOT

Mean Difference Plots – Control Plots
From SAS for Linear Models – Output Randomized Complete Block 5 Irrigation Treatments: Flood (control), Basin, Spray, Sprinkler, Trickle ods html; ods graphics on; ods select ControlPlot; proc glimmix order=data; class bloc irrig; model fruitwt=irrig; random bloc; lsmeans irrig/diff=control('flood') plot=controlplot adjust=dunnett; run; ods graphics off; ods html close; run; 14 May 2007 SSP Core Facility

Dunnett-style Control Plot

Back to Type x Dose Data: Interaction Plot

Type x Dose: Simple Effects
SLICE: test only Tests of Effect Slices for type*dose Sliced By dose dose Num DF Den DF F Value Pr > F 1 16 5.00 0.0399 2 4.03 0.0618 4 0.02 0.8810 8 0.58 0.4578 Tests of Effect Slices for type*dose Sliced By type type Num DF Den DF F Value Pr > F r 3 19.49 8.12 0.0010 s 13.58 <.0001 Simple Effect Comparisons of type*dose Least Squares Means By dose Simple Effect Level type _type Estimate Standard Error DF t Value Pr > |t| dose 1 r s 2.9400 1.3144 16 2.24 0.0399 dose 2 2.6400 2.01 0.0618 dose 4 -0.15 0.8810 dose 8 -0.76 0.4578 SLICEDIFF estimates etc 14 May 2007 SSP Core Facility

Type x Dose: Simple Effect Estimates by Type
Simple Effect Comparisons of type*dose Least Squares Means By type Simple Effect Level dose _dose Estimate Standard Error DF t Value Pr > |t| type r 1 2 1.8796 19.49 -4.17 0.0005 4 -4.35 0.0003 8 -2.55 0.0192 -0.18 0.8583 3.0400 1.62 0.1219 3.3800 1.80 0.0876 type s -4.33 -6.02 <.0001 -4.65 0.0002 -1.69 0.1066 -0.32 0.7530 2.5800 1.37 0.1855 14 May 2007 SSP Core Facility

Effect of dose?  Log(Dose) otherwise.....
contrast 'logdose linear' dose ; contrast 'logdose quad' dose ; contrast 'logdose cubic' dose ; contrast 'type x linear' dose*type ; contrast 'type x quad' dose*type ; contrast 'type x cubic' dose*type ;  Log(Dose) otherwise..... contrast 'dose linear' dose ; contrast 'dose quad' dose ; contrast 'dose cubic' dose ; contrast 'type x linear' dose*type ; contrast 'type x quad' dose*type ; contrast 'type x cubic' dose*type ; 14 May 2007 SSP Core Facility

LogDose contrast results
Contrasts Num Den Label DF DF F Value Pr > F logdose linear logdose quad logdose cubic type x linear type x quad type x cubic 14 May 2007 SSP Core Facility

Direct Regression – borrow from ANCOVA
proc glimmix data=variety_eval; class block type dose; model y=type logdose(type) ld_sq(type) / noint ddfm=satterth solution; random intercept dose / subject=block; contrast 'equal quad by type?' ld_sq(type) 1 -1; run; Contrasts Label Num DF Den DF F Value Pr > F equal quad by type? 1 17 0.04 0.8497 Solutions for Fixed Effects Effect type Estimate Standard Error DF t Value r 1.4204 19.62 14.21 s 11.98 logdose(type) 9.8890 2.0181 21.45 4.90 5.44 ld_sq(type) 0.6447 -4.35 -4.16 can re-fit with LD_SQ common to both types 14 May 2007 SSP Core Facility

Example 3 From SAS for Mixed Models, Section 4.7 4 “conditions”
3 diets Condition applied in incomplete block design 2 conditions per block Diet applied to cages within condition Condition is whole plot, diet is split-plot 14 May 2007 SSP Core Facility

“Plot plan” diet 1 diet 2 diet 3 14 May 2007 SSP Core Facility

Model? blocking? yes e.u. with respect to condition “1/2 block”
e.u. with repect to diet: “1/3 condition e.u.” e.u. w.r.t. cond x diet: same as diet 14 May 2007 SSP Core Facility

SAS Program data & program: file ch4-ex3.sas proc glimmix data=fix2;
class cage condition diet / ddfm=kr; model gain=condition diet condition*diet/ddfm=satterth; random intercept condition / subject=cage; run; data & program: file ch4-ex3.sas 14 May 2007 SSP Core Facility

Type III Tests of Fixed Effects Covariance Parameter Estimates
Selected Output Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F condition 3 23.61 2.71 0.0677 diet 2 20.17 0.93 0.4090 condition*diet 6 1.73 0.1661 Covariance Parameter Estimates Cov Parm Subject Estimate Standard Error Intercept cage 3.0376 5.0791 condition . Residual 8.7672 how should one deal with negative variance component estimate? revert to ANOVA via PROC GLM ? in MIXED, use NOBOUND option ? in GLIMMIX, use LowerB alternatively, redefine model may be CS with plots in block negatively correlated 14 May 2007 SSP Core Facility

Comparison with SAS Proc GLM
proc glm data=fix2; class cage condition diet; model gain=cage condition cage*condition diet condition*diet; random cage cage*condition/test; lsmeans condition diet condition*diet; Tests of Hypotheses for Mixed Model Analysis of Variance Source DF Type III SS Mean Square F Value Pr > F cage * condition Error Error: MS(cage*condition) * This test assumes one or more other fixed effects are zero. cage*condition * diet condition*diet Error: MS(Error) 14 May 2007 SSP Core Facility

More GLM output DON’T use Proc GLM with mixed models! non-estimability
Least Squares Means condition gain LSMEAN Non-est Non-est Non-est Non-est diet gain LSMEAN normal restrict suppleme condition diet gain LSMEAN normal Non-est restrict Non-est suppleme Non-est normal Non-est restrict Non-est suppleme Non-est normal Non-est restrict Non-est suppleme Non-est normal Non-est restrict Non-est suppleme Non-est DON’T use Proc GLM with mixed models! non-estimability results from inappropriate definition of estimability (based on fixed & random eff) inescapable consequence of Proc GLM with mixed model 14 May 2007 SSP Core Facility

GLM vs MIXED issues vs. GLM uses implied MS regardless
REML default: variance component estimates set to 0 if BLOCK affected, type I error rate  if error term affected, power may  better to allow negative estimates In MIXED: NOBOUND or METHOD=TYPE3 In GLIMMIX: LowerB vs. GLM uses implied MS regardless GLM: inappropriate NON-EST artifact of incomplete block design Standard errors for means, many simple effects (including SLICE) incorrect in GLM (no fix!!) 14 May 2007 SSP Core Facility

GLIMMIX Option (1) – Like NOBOUND in MIXED
proc glimmix data=fix2; class cage condition diet; model gain=condition|diet/ddfm=kr; random intercept condition / subject=cage; parms / lowerb=(1e-4,-10,1e-4); run; Covariance Parameter Estimates Cov Parm Subject Estimate Standard Error Intercept cage 5.0288 4.7149 condition 4.8693 Residual Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F condition 3 4.718 4.31 0.0798 diet 2 16 0.82 0.4561 condition*diet 6 1.52 0.2333 14 May 2007 SSP Core Facility

GLIMMIX Option (2) – is it really correlation?
proc glimmix data=fix2; class cage condition diet; model gain=condition|diet/ddfm=kr; random intercept / subject=cage; random _residual_ / type=cs subject=condition*cage; run; Covariance Parameter Estimates Cov Parm Subject Estimate Intercept cage 5.0271 CS cage*condition Residual Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F condition 3 4.717 4.31 0.0798 diet 2 16 0.82 0.4561 condition*diet 6 1.52 0.2334 14 May 2007 SSP Core Facility

Modeling Change over Time
Regression over time Latent growth / change models Random coefficients over time Repeated measures experiment Longitudinal Data 14 May 2007 SSP Core Facility

From Acock – BMI Data Note – my sample differs from Acock’s, so the numbers won’t match 14 May 2007 SSP Core Facility

Basic Growth Model Simplest model involves slope & intercept
In “Stat-speak” this is just linear regression 14 May 2007 SSP Core Facility

Basic Growth Model in SAS
in PROC GLM proc glm; model bmi=year; run; Parameter Estimate Standard Error t Value Pr > |t| Intercept 38.44 <.0001 year 4.44 Source DF Sum of Squares Mean Square F Value Pr > F Model 1 19.68 <.0001 Error 229 Corrected Total 230 R-Square Coeff Var Root MSE bmi Mean very deceptive – more shortly 14 May 2007 SSP Core Facility

Growth Model in SAS - II selected output next page in PROC GLIMMIX
class id; model bmi=year/solution; random _residual_ /subject=id; estimate 'y-hat in 1997' intercept 1 year 0 / cl; estimate 'y-hat in 2000' intercept 1 year 3 / cl; estimate 'y-hat in 2003' intercept 1 year 6 / cl; run; selected output next page 14 May 2007 SSP Core Facility

Basic Growth Model – Selected GLIMMIX Output
Covariance Parameter Estimates Cov Parm Estimate Standard Error Residual (VC) 2.0558 Note: residual VC est = MSE from GLM ANOVA Solutions for Fixed Effects Effect Estimate Standard Error DF t Value Pr > |t| Intercept 0.5563 32 38.44 <.0001 year 0.6844 0.1543 197 4.44 Estimates Label Estimate Standard Error DF t Value Pr > |t| Alpha Lower Upper y-hat in 1997 0.5563 197 38.44 <.0001 0.05 y-hat in 2000 0.3086 75.95 y-hat in 2003 45.82 14 May 2007 SSP Core Facility

G/C Model – Issue I – Account for ID
Recall R2 for Basic Growth Model very low You must account for variation among subjects (ID) proc glm; class id; model bmi=id year; run; proc glimmix; model bmi=year/solution; random id; /* or random intercept / subject = id okay better 14 May 2007 SSP Core Facility

Covariance Parameter Estimates Solutions for Fixed Effects
Selected Output from GLM R-Square vs Covariance Parameter Estimates Cov Parm Subject Estimate Standard Error Intercept id 4.4950 Residual 5.1293 0.5168 from GLIMMIX vs Solutions for Fixed Effects Effect Estimate Standard Error DF t Value Pr > |t| Intercept 0.7712 32 27.73 <.0001 year 0.6844 197 9.19 estimates don’t change std errors do 14 May 2007 SSP Core Facility

Growth Change Modeling Issue - II
Correlated Errors Recall: Correlation Modeled by Covariance Model Failure to model correlation increases P{type I error} Over-modeling correlation decreases Power 14 May 2007 SSP Core Facility

Covariance models 14 May 2007 SSP Core Facility

More covariance models

Issues in Repeated Measures
Impact of covariance structure? Selection of appropriate covariance? Bias in std errors, test statistics Degrees of freedom Nonlinear models over time Non-normal errors 14 May 2007 SSP Core Facility

Basic G/C Model with Covariance Model
Also known as Autocorrelation proc glimmix; class id; model bmi=year/solution / ddfm=kr; random intercept / subject=id; random _residual_ /subject=id type=ar(1); run; degree of freedom and std error bias must be dealt with more later Competing Covariance Models compared via Fit Statistics AICC BIC HQIC CAIC 14 May 2007 SSP Core Facility

Selected Output for G/C Model w/ Autocorrelation
Covariance Parameter Estimates Cov Parm Subject Estimate Standard Error Intercept id 4.6202 AR(1) 0.5623 0.1144 Residual 7.7165 1.8981 variance, covariance & correlation estimates Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) CAIC (smaller is better) HQIC (smaller is better) Generalized Chi-Square Gener. Chi-Square / DF 7.72 Solutions for Fixed Effects Effect Estimate Standard Error DF t Value Pr > |t| Intercept 0.8042 32 26.52 <.0001 year 0.6896 0.1102 197 6.26 estimate – slight effect std error – bigger effect used to assess cov model 14 May 2007 SSP Core Facility

random coeff correl errors prediction add Gender add emotional prob

Repeated Measure Experiments a.k.a. Longitudinal Data
Assign e.u. to treatments May use any design (completely random, blocked, row-column, split-plot ....) Observations at planned times Objectives assess changes in response over time assess treatment effect on (1) 14 May 2007 SSP Core Facility

Typical repeated Measures Data
from SAS for Linear Models, Chapter 8 SAS for Mixed Models, 2nd ed, Chapter 5 14 May 2007 SSP Core Facility

From BMI Data: Are G/C Curves Equal by Gender?
interaction plot of G/C curve by gender 14 May 2007 SSP Core Facility

FYI – SAS Code to Get Interaction Plot
ods html; ods graphics on; ods select MeanPlot; proc glimmix data=bmi_uni_anc; class gender id year; model bmi=gender|year / solution ddfm=kr; random intercept / subject=id(gender); random _residual_ / type=ar(1) subject=id(gender); lsmeans gender*year / plot=MeanPlot (sliceby=gender join cl); run; ods graphics off; ods html close; 14 May 2007 SSP Core Facility

Model translates to: proc glimmix data=bmi_uni_anc;
class gender id year; model bmi=gender|year / solution ddfm=kr; random intercept / subject=id(gender); random _residual_ / type=ar(1) subject=id(gender); 14 May 2007 SSP Core Facility

Back to SAS for Mixed Models Example

Middle Ground between MANOVA and Split-Plot in Time via Proc GLIMMIX
CLASSES SUBJ TRT TIME; MODEL Y= TRT TIME TRT*TIME; RANDOM INTERCEPT / SUBJECT=SUBJ(TRT); RANDOM TIME / TYPE=AR(1) SUBJECT=SUBJ(TRT) RESIDUAL; *LSMEANS TRT TIME TRT*TIME; TITLE 'MIXED - AR(1) ERRORS'; RUN; RANDOM specifies between subjects effects (G-side) RANDOM...RESIDUAL specifies within subjects effect (R-side) in many models, G- and R-side effects are not identifiable 14 May 2007 SSP Core Facility

Modeling Covariance among Repeated Measures
PROC MIXED DATA=univ; CLASSES SUBJ TRT TIME; MODEL Y= TRT TIME TRT*TIME; REPEATED TIME / TYPE=UN SSCP SUBJECT=SUBJ(TRT); ODS OUTPUT CovParms=cp; run; data times; do time1=1 to 8; do time2=1 to time1; dist=time1-time2; output; end; data covplot; merge times cp; proc gplot data=covplot; plot adjcorr*dist=time1; Computes covariance between pairs of measurements (same subject, different times) based on Sum of squares & cross-products matrix then plots them by distance 14 May 2007 SSP Core Facility

Plot of Covariance by Distance

Idealized Plots CS=Subj(Trt), AR(1), AR(1)+Subj(Trt)
= random Subj(Trt) AR(1) only 14 May 2007 SSP Core Facility

Model Fitting Criteria in Version 8
1. Compound Symmetry proc glimmix; classes subj trt time; model y= trt time trt*time; random time / residual type=cs subject=subj(trt); title 'mixed - compound symmetry'; Fit Statistics -2 Res Log Likelihood 839.39 AIC (smaller is better) 843.39 AICC (smaller is better) 843.47 BIC (smaller is better) 845.75 CAIC (smaller is better) 847.75 HQIC (smaller is better) 844.02 Generalized Chi-Square 767.61 Gener. Chi-Square / DF 4.80 14 May 2007 SSP Core Facility

Comparison of Models Smaller is Better
Compound Symmetry Neg2LogLike Parms AIC AICC HQIC BIC CAIC AR(1) + Subj(TRT) random effect Neg2LogLike Parms AIC AICC HQIC BIC CAIC Unstructured Neg2LogLike Parms AIC AICC HQIC BIC CAIC ANTE(1) Neg2LogLike Parms AIC AICC HQIC BIC CAIC TOEP Neg2LogLike Parms AIC AICC HQIC BIC CAIC 14 May 2007 SSP Core Facility

How do Model Fitting Criteria Compare?
Guerin & Stroup (2000) compared AIC, BIC, HQIC, CAIC for simulated AR(1) and ARH(1) data CAIC tends to select simpler models AIC tends to select most complex models * complex -- AIC > HQIC > BIC > CAIC -- simple Model too simple (correlation model not adequate)  Type I error rate too high Model too complex (correlation over-modeled)  Type I error control not affected, but power suffers *Since 2000, SAS added AICC to address AIC issue Best choice depends on severity of Type I vs II error 14 May 2007 SSP Core Facility

An Inference Issue CS: Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F TRT TIME <.0001 TRT*TIME AR(1)+between subj: Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F TRT TIME <.0001 TRT*TIME UN: Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F TRT TIME <.0001 TRT*TIME UN similar to MANOVA but MANOVA Trt*Time p-value was 0.50 14 May 2007 SSP Core Facility

Bias & Options for Adjusting
SAS Default uses estimated (co)variance components in V std errors biased , t-, F-statistics biased  “Robust” (a.k.a. “sandwich) estimate of K’V-1K available using EMPIRICAL option in MIXED Kenward & Roger (Biometrics, 1997) proposed adjustment; available using DDFM=KR option in MODEL statement of MIXED Guerin & Stroup (2000) evaluated KR option of SAS Version 8 with simulated AR(1) and ARH(1) data Biased F resulted in inflated Type I error rates unless KR option used (for α=0.05, rejection rates >0.10 for TYPE=AR(1), up to 0.20 with TYPE=ANTE(1), UN 14 May 2007 SSP Core Facility

Sandwich (“Robust”) Estimator

How does the sandwich estimator perform?
proc mixed empirical; classes subj trt time; model y=trt time trt*time; random intercept/ subject=subj(trt); random time / type=ar(1) subject=subj(trt) residual; run; Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F TRT TIME <.0001 TRT*TIME <.0001 vs. F=1.48; p=0.0921 using default 14 May 2007 SSP Core Facility

Kenward and Roger proc glimmix; classes subj trt time;
model y= trt time trt*time/ddfm=kr; random intercept / subject=subj(trt); random time / type=ar(1) subject=subj(trt) residual; Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F TRT TIME <.0001 TRT*TIME 14 May 2007 SSP Core Facility

Alternative KR adjustment
in SAS, KR adjustment uses Hessian matrix by default you can cause it to use the Information matrix instead no documented advantage one way or another PROC glimmix scoremod scoring=51; CLASSES SUBJ TRT TIME; MODEL Y= TRT TIME TRT*TIME/ddfm=kr; RANDOM intercept / subject=SUBJ(TRT); Random _resid_ / TYPE=AR(1) SUBJECT=SUBJ(TRT); nloptions technique=nrridg; Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F TRT TIME <.0001 TRT*TIME vs. F=1.24, p= using Hessian 14 May 2007 SSP Core Facility

Alternative Model for Change in BMI by Gender
proc glimmix data=bmi_uni_anc; class gender id year; model bmi=gender yr(gender) / noint solution ddfm=kr; random intercept / subject=id(gender); random _residual_ / type=ar(1) subject=id(gender); contrast 'male vs female intercept' gender 1 -1; contrast 'male vs female slope' yr(gender) 1 -1; run; 14 May 2007 SSP Core Facility

Covariance Parameter Estimates Solutions for Fixed Effects
Selected Output Contrasts Label Num DF Den DF F Value Pr > F male vs female intercept 1 165.9 3.89 0.0501 male vs female slope 204.5 1.57 0.2111 Covariance Parameter Estimates Cov Parm Subject Estimate Intercept id(gender) AR(1) 0.2928 Residual 7.8871 Solutions for Fixed Effects Effect gender Estimate Standard Error DF t Value Pr > |t| 0.6084 165.9 33.20 <.0001 1 0.5596 39.01 yr(gender) 0.7860 204.5 9.58 0.6462 8.56 14 May 2007 SSP Core Facility

This is a random coefficient model
Alternative Model proc glimmix data=bmi_uni; class gender id; model bmi=gender year(gender) / noint solution ddfm=kr; random intercept year(gender) / subject=id type=un; contrast 'male vs female intercept' gender 1 -1; contrast 'male vs female slope' year(gender) 1 -1; run; This is a random coefficient model Next section 14 May 2007 SSP Core Facility

Response Surface Split Plot with Repeated Measures
4 treatment factors (A, B, C, D) 2 levels each 3 factors (A, B, C) applied to P( subject) treatment design: central composite design subjects split into 2 sub-units level of D randomly assigned to each sub-unit observations at 3 planned times (H) 14 May 2007 SSP Core Facility

Central Composite Design

Model for Central Composite Split-Split Plot

SAS Statements proc glimmix; class ca cb cc p d u;
*model y=a b c a*a b*b c*c a*b a*c b*c d d*a d*b d*c t t*t t*a t*b t*c t*d/htype=1 htype=3 ddfm=kr; model y=d a(d) b(d) c(d) a*a b*b c*c a*b a*c b*c t(d) t*t t*a t*b t*c /noint solution htype=1 ddfm=kr; random p(ca cb cc) d*p(ca cb cc); 14 May 2007 SSP Core Facility

Solutions for Fixed Effects Covariance Parameter Estimates
Standard Error 2.3344 1 a(d) 1.8101 b(d) c(d) 4.4019 3.5352 a*a 0.4980 3.2427 b*b c*c 5.1647 a*b 6.2083 1.8872 a*c b*c 1.2083 t(d) 9.4200 0.5504 t*t 1.1114 a*t 0.1160 0.5078 b*t 1.7331 c*t 0.3513 Key output Covariance Parameter Estimates Cov Parm Subject Estimate Intercept p(ca*cb*cc) d 4.5151 Residual Fit Statistics AICC (smaller is better) 573.40 14 May 2007 SSP Core Facility

Complex Split-split-plot revisited
Recall A, B, C applied to units P P split in two, levels of D to each half Measured a 3 times Previous analysis assumed split on time Actually repeated measures Split-plot + repeated measures 14 May 2007 SSP Core Facility

CCD Split-plot + repeated measures
proc glimmix data=CCD_SpltPlt; class ca cb cc p d u; *model y=a b c a*a b*b c*c a*b a*c b*c d d*a d*b d*c t t*t t*a t*b t*c t*d/htype=1 htype=3 ddfm=kr; model y=d a(d) b(d) c(d) a*a b*b c*c a*b a*c b*c t(d) t*t t*a t*b t*c / noint solution htype=1 ddfm=kr; random intercept / subject=p(ca cb cc); random _residual_ / type=sp(pow)(t) subject=d*p(ca cb cc); run; AICC: as split-split-plot 551.1 as repeated measures using SP(POW) note SP(POW) is generalization of AR(1) for unequally spaced times 14 May 2007 SSP Core Facility

Unreplicated Split-Plot
SAS for Mixed Models, Section 16.7 Quilt divided in half Each “half sheet” received 2 x 2 x 3 factorial 2 pH levels (low high) 2 temp (cold hot) 3 dry cycles (air machine-delicate machine-normal Material cut from each unit washed 10, 20, 30, 40, 50 times Breaking strength monitored Materials observed so reps by sheet lost 14 May 2007 SSP Core Facility

Model for Breaking Strength Experiment
where is the mean of the ijkth pH  water temperature  dry cycle (i=8,10; j=35,55; k=air, delicate, normal) at the lth time of washing (l= ), rm is the effect of the mth block (m=1,2 in the design, but m=1 only in the data) wijkm is the ijkmth between subjects (or whole-plot) error effect, assumed eijklm is the within subjects (or split-plot) error effect, assumed 14 May 2007 SSP Core Facility

ANOVA for Breaking Strength Experiment
Source of Variation d.f. block 1 pH (P) wash temp (T) dry cycle (D) 2 PT PD TD PTD between subject error 11 no. of washes (W) 4 WP WT WD 8 WPT WPD WTD WPTD within subjects error 48 but these become when blocking by “half quilt” distinction lost 14 May 2007 SSP Core Facility

Breaking Strength vs # Washes by pH

Breaking Strength vs # Washes by Temp

Breaking Strength vs # Washes by Dry Cycle

Revised ANOVA Pool negligible effects to get between & within error
Source of Variation d.f. pH (P) 1 wash temp (T) dry cycle (D) 2 between subject error 7 linear effect of no. of washes (W Lin) W LinP W LinT W LinD within subjects error 43 14 May 2007 SSP Core Facility

GLIMMIX Program for Breaking Strength Experiment
proc glimmix data=shellie; class pH water_temp dry_cycle; model breaking_strength=pH water_temp dry_cycle w w*pH w*water_temp w*dry_cycle / solution; random pH*water_temp*dry_cycle; contrast 'air vs dryer effect on wear' w*dry_cycle ; contrast 'delicate v normal effect on wear' w*dry_cycle ; run; 14 May 2007 SSP Core Facility

Revised GLIMMIX - Estimate Regression over # of Washes
proc glimmix data=shellie; class pH water_temp dry_cycle; model breaking_strength= w(pH) w(water_temp) w(dry_cycle)/noint solution; random pH*water_temp*dry_cycle; estimate 'slope: ph 8, cold, air‘ w(ph) 1 0 w(water_temp) 1 0 w(dry_cycle) 1 0 0; estimate 'slope: ph 8, cold, delicate' w(ph) 1 0 w(water_temp) 1 0 w(dry_cycle) 0 1 0; estimate 'slope: ph 8, cold, normal' w(ph) 1 0 w(water_temp) 1 0 w(dry_cycle) 0 0 1; estimate 'slope: ph 8, hot, air‘ w(ph) 1 0 w(water_temp) 0 1 w(dry_cycle) 1 0 0; estimate 'slope: ph 8, hot, delicate' w(ph) 1 0 w(water_temp) 0 1 w(dry_cycle) 0 1 0; etc for all pH – temp – dry cycle combinations 14 May 2007 SSP Core Facility

Regression – Selected Output
Label Estimate Standard Error slope: ph 8, cold, air slope: ph 8, cold, delicate slope: ph 8, cold, normal slope: ph 8, hot, air slope: ph 8, hot, delicate slope: ph 8, hot, normal slope: ph 10, cold, air slope: ph 10, cold, delicate slope: ph 10, cold, normal slope: ph 10, hot, air slope: ph 10, hot, delicate slope: ph 10, hot, normal avg slope: ph 8 avg slope: ph 10 avg slope: cold water avg slope: hot water avg slope: air dry avg slope: delicate dry avg slope: normal dry Solution for Fixed Effects Effect water temp Dry cycle pH Estimate Standard Error Intercept 0.1070 14 May 2007 SSP Core Facility

Prediction & Inference Space

VI. Prediction, “BLUP” and Inference Space
Estimation vs. Prediction When “BLUP” is a good thing Inference Space what is it? how can we use it? Performance evaluation issues Multi-location issues 14 May 2007 SSP Core Facility

Estimation, Prediction, and Inference Space
Estimation based on estimable functions Estimation applies to fixed effects only, inference is to entire population Prediction based on “predictable functions” Prediction applies to fixed & random effects, narrows scope of inference to specific subset defined by M’u Examples: locations, workers, teachers, patients... 14 May 2007 SSP Core Facility

Prediction Example 1 Growth Change Modeling Issue - III
Random Coefficients Recall Basic Growth Model proc glimmix data=bmi_uni; class id; model bmi=year/solution ddfm=kr; random intercept year / subject=id type=un solution; random _residual_ /subject=id type=ar(1); 14 May 2007 SSP Core Facility

Selected Output partial listing Covariance Parameter Estimates
Cov Parm Subject Estimate UN(1,1) id UN(2,1) 0.5873 UN(2,2) 0.2676 AR(1) 0.3024 Residual 4.6021 Solutions for Fixed Effects Effect Estimate Standard Error t Value Intercept 0.6480 32.96 year 0.6870 0.1212 5.67 Solution for Random Effects Effect Subject Estimate Std Err Pred DF Intercept id 73 2.1023 1.3487 165 year 0.3118 id 281 id 496 partial listing 14 May 2007 SSP Core Facility

You can obtain Subject-Specific Estimates
proc glimmix data=bmi_uni; class id; model bmi=year/solution ddfm=kr; random intercept year / subject=id type=un solution; random _residual_ /subject=id type=ar(1); estimate 'popn avg slope' year 1 / cl; estimate 'id (73) specific slope' year 1 | year 1 / subject 1 0 cl e; estimate 'id (496) specific slope' year 1 | year 1 / subject cl; estimate 'popn avg intercept' intercept 1 / cl; estimate 'predicted bmi in 1997' intercept 1 year 0 / cl; estimate 'id (73) specific intercept' intercept 1 | intercept 1 / subject 1 0 cl e; estimate 'id (496) specific intercept' intercept 1 | intercept 1 / subject cl; estimate 'predicted bmi in 2000' intercept 1 year 3 / cl; estimate 'id (73) specific 2000 bmi' intercept 1 year 3 | intercept 1 year 3/ subject 1 0 cl; estimate 'id (496) specific 2000 bmi' intercept 1 year 3 | intercept 1 year 3/ subject cl; estimate 'predicted bmi in 2003' intercept 1 year 6 / cl; estimate 'id (73) specific 2003 bmi' intercept 1 year 6 | intercept 1 year 6/ subject 1 0 cl; estimate 'id (496) specific 2003 bmi' intercept 1 year 6 | intercept 1 year 6/ subject cl; run; 14 May 2007 SSP Core Facility

Best Linear Unbiased Prediction
Look closer at Estimate statement estimate 'popn avg slope' year 1 / cl; estimate 'id (73) specific slope' year 1 | year 1 / subject 1 0 cl e; estimate 'id (496) specific slope' year 1 | year 1 / subject cl; estimate 'predicted bmi in 2000' intercept 1 year 3 / cl; estimate 'id (73) specific 2000 bmi' intercept 1 year 3 | intercept 1 year 3/ subject 1 0 cl; estimate 'id (496) specific 2000 bmi' intercept 1 year 3 | intercept 1 year 3/ subject cl; Coefficients to right of vertical bar ( | ) apply to random effects – this is a new idea BLUP estimation (prediction) of random effects 14 May 2007 SSP Core Facility

Selected Estimates from Random Coeff BMI Model
Label Estimate Standard Error DF Lower Upper popn avg slope 0.6870 0.1214 31.57 0.4396 0.9344 id (73) specific slope 0.5262 0.3833 18.35 1.3303 id (496) specific slope 0.6146 1.4187 popn avg intercept 0.6459 31.5 predicted bmi in 1997 id (73) specific intercept 1.4916 33.36 id (496) specific intercept predicted bmi in 2000 0.7330 31.99 id (73) specific 2000 bmi 0.9928 9.56 id (496) specific 2000 bmi predicted bmi in 2003 0.9605 31.84 id (73) specific 2003 bmi 1.5462 20.15 id (496) specific 2003 bmi 14 May 2007 SSP Core Facility

Inference Space Example II:
Workers and machines From McLean, Sanders & Stroup (1991, American Statistician) Also Chapter 6, ex 2, SAS for Mixed Models 2 machines 3 operators (sample from population) inference can apply to population of workers or specific worker KEY CONCEPT: Inference Space 14 May 2007 SSP Core Facility

Worker-Machine Example: Fixed Effect Inference
proc glimmix; class machine operator; model y=machine/ddfm=kr; random operator machine*operator; lsmeans machine / diff; estimate 'BLUE - machine 1' intercept 1 machine 1 0; estimate 'BLUE - diff' machine 1 -1; Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F machine 1 2 20.26 0.0460 based on MS(mach) / MS(Mach*oper) machine Least Squares Means machine Estimate Std Error DF t Value Pr > |t| 1 0.2467 2.973 206.50 <.0001 2 210.59 these ESTIMATE statements give same result Differences of machine Least Squares Means machine _machine Estimate Std Error DF t Value Pr > |t| 1 2 0.2240 -4.50 0.0460 14 May 2007 SSP Core Facility

Worker-Machine Example: Prediction
these statements apply inference to specific workers or worker-machine machine 1 averaged over ONLY THE WORKERS IN THE STUDY diff between machines for workers in study ONLY operator 1 averaged over machines, with machine 1 only, oper-specific difference between machines estimate 'BLUP - m1 narrow' intercept 3 machine 3 0 | operator 1 1 1 machine*operator /divisor=3; estimate 'BLUP - diff nrw' machine 3 -3 | machine*operator /divisor=3; estimate 'BLUP - oper 1' intercept 2 machine 1 1 | operator 2 0 0 machine*operator /divisor=2; estimate 'BLUP - m1 op1' intercept 1 machine 1 0 | operator machine*operator ; estimate 'BLUP - diff op1' machine 1 -1 | machine*operator ; 14 May 2007 SSP Core Facility

Worker-Machine Example: Prediction (2)
Estimates Label Estimate Standard Error DF t Value Pr > |t| BLUE - machine 1 0.2467 2.973 206.50 <.0001 BLUE - diff 0.2240 2 -4.50 0.0460 BLUP - m1 narrow 6 566.53 BLUP - diff nrw 0.1272 -7.93 0.0002 BLUP - oper 1 0.1151 6.698 449.30 BLUP - m1 op1 0.1724 7.885 297.48 BLUP - diff op1 0.2567 7.976 -3.42 0.0092 BLUE – inference to population of workers BLUP – inference to specific worker or set of workers note impact of standard error 14 May 2007 SSP Core Facility

BLUP a.k.a. “Shrinkage Estimator”
Covariance Parameter Estimates Cov Parm Estimate operator 0.1073 machine*operator Residual BLUP is regressed toward mean BLUP is E(u|Y) Degree of skrinkage depends of variance component estimates 14 May 2007 SSP Core Facility

Relationship to Proc GLM
operator y LSMEAN Standard Error 1 proc glm; class machine operator; model y=machine|operator; random operator machine*operator/test; lsmeans machine operator machine*operator/stderr; lsmeans machine/stderr e=machine*operator; estimate 'diff' machine 1 -1/e; run; vs , machine operator y LSMEAN Standard Error 1 vs 51.30, machine y LSMEAN Standard Error 1 machine y LSMEAN Standard Error 1 std error neither Mixed broad or narrow produced by estimate “m1” intercept 3 machine 3 0 | operator machine*operator 0 / divisor=3 same as BLUP specific to workers in GLIMMIX 14 May 2007 SSP Core Facility

Prediction Example II: Multi-Location Data
From SAS for Mixed Models, 9 Locations 3 blocks per location 4 treatments Major issues are blocks fixed or random? if random how does one estimate location-specific treatment effects? 14 May 2007 SSP Core Facility

ANOVA (ignoring block)
Test of TRT affected If Location fixed: 14 May 2007 SSP Core Facility

Inference Space 14 May 2007 SSP Core Facility

Where does Uncertainty Arise?
Loc 1 Loc 2 Only from variation among obs within locations? Locations fixed Or does variation among locations also contribute? Locations random Loc 7 Loc 8 14 May 2007 SSP Core Facility

Location-Specific Effects: BLUP
Implies linear combination of fixed and random effect (predictable function = BLUP) 14 May 2007 SSP Core Facility

Basic SAS Programs for fixed location: for random locations
proc glimmix data=MultiCenter; class location block treatment; model response=location treatment location*treatment; random block(location); lsmeans treatment; lsmeans location*treatment/slice=location slicediff=location; run; for random locations proc glimmix data=MultiCenter; class location block treatment; model response=treatment/ddfm=KR; random location block(location) location*treatment; lsmeans treatment/diff; estimate 'trt1 vs trt2' treatment ; estimate 'loc A vs loc B' | location ; estimate 'trt 1 BLUP' intercept 8 treatment 8 | location /divisor=8; estimate 'trt1 at loc A blup' intercept 1 treatment | location 1 0 location*treatment 1 0; etc – see ch6 MultiCenter.sas for program in detail 14 May 2007 SSP Core Facility

“Take Home” points Inference space usually implies random locations
“Broad” inference on treatments applies to entire population Location-specific inference may be of interest Requires BLUP Hans Peter Piepho has proposed mixed-model based measures of commonality among locations Making locations fixed to maximize error d.f. to test TRT is inappropriate 14 May 2007 SSP Core Facility

GLM Issues 14 May 2007 SSP Core Facility

VII. “GLM” Issues Bernoulli data Counts Rates as a binomial
special problems with BINARY data Counts Rates 14 May 2007 SSP Core Facility

Common Non-Normal Models
Bernoulli (binary) observations Categorical data Binomial multinomial Counts Poisson Over dispersed (e.g. negative binomial) Rates Survival times Gamma, Weibull Dispersion measures variance Contingency tables 14 May 2007 SSP Core Facility

Elements of GLM (Generalized Linear Model)
Systematic model X Assumed distribution implied variance structure Link function Examples y ~ Bernoulli(p) p = (X) or logit(p)=X Y~ Poisson() log () = X 14 May 2007 SSP Core Facility

GLM Example From SAS for Linear Models
Output 10.1, re- expressed in 10.5 Challenger space shuttle data relate prob{failure} to temperature at launch DATA: TEMP, TD (# times thermal distress in O-ring, NO_TD 14 May 2007 SSP Core Facility

Approach to modeling Assess relationship between TEMP and Prob{TD=1}, i.e O-rings show thermal distress Distribution: Bernoulli Natural parameter: logit = log[p/(1-p)] Model: logit(Pr{TD})=a+b(Temp) Inverse link form: Pr{TD}=exp[a+b(Temp)]/{1+exp[a+b(Temp)]} 14 May 2007 SSP Core Facility

SAS Program: Proc GENMOD
proc glimmix data=Challenger; model td/total=temp; estimate 'logit at 50 deg' intercept 1 temp 50 / ilink; estimate 'logit at 60 deg' intercept 1 temp 60 / ilink; estimate 'logit at 64.7 deg' intercept 1 temp 64.7 / ilink; estimate 'logit at 64.8 deg' intercept 1 temp 64.8 / ilink; estimate 'logit at 70 deg' intercept 1 temp 70 / ilink; estimate 'logit at 80 deg' intercept 1 temp 80 / ilink; run; 14 May 2007 SSP Core Facility

Relevant Output no evidence of overdispersion Fit Statistics
Pearson Chi-Square 11.13 Pearson Chi-Square / DF 0.80 no evidence of overdispersion Parameter Estimates Effect Estimate Standard Error DF t Value Pr > |t| Intercept 7.3786 14 2.04 0.0608 temp 0.1082 -2.14 0.0500 14 May 2007 SSP Core Facility

Relevant Output (2) logit scale data scale Estimates Label Estimate
Standard Error DF t Value Pr > |t| Mean Standard Error Mean logit at 50 deg 3.4348 2.0232 14 1.70 0.1117 0.9688 logit at 60 deg 1.1131 1.0259 1.09 0.2962 0.7527 0.1909 logit at 64.7 deg 0.6576 0.03 0.9738 0.5055 0.1644 logit at 64.8 deg 0.6518 -0.00 0.9985 0.4997 0.1630 logit at 70 deg 0.5953 -2.03 0.0618 0.2300 0.1054 logit at 80 deg 1.4140 -2.50 0.0256 logit scale data scale 14 May 2007 SSP Core Facility

Alternatives Express data in binomial form Probit link
SAS for Linear Models, 4th ed., output 10.5 Probit link 14 May 2007 SSP Core Facility

Logit vs Probit Red: probit Blue: logit 14 May 2007 SSP Core Facility

Probit Model proc glimmix data=Challenger;
model td/total=temp/link=probit solution; estimate 'logit at 50 deg' intercept 1 temp 50 / ilink; estimate 'logit at 60 deg' intercept 1 temp 60 / ilink; estimate 'logit at 64.7 deg' intercept 1 temp 64.7 / ilink; estimate 'logit at 64.8 deg' intercept 1 temp 64.8 / ilink; estimate 'logit at 70 deg' intercept 1 temp 70 / ilink; estimate 'logit at 80 deg' intercept 1 temp 80 / ilink; run; 14 May 2007 SSP Core Facility

Probit Output Estimates Label Estimate Standard Error DF t Value
Parameter Estimates Effect Estimate Standard Error DF t Value Pr > |t| Intercept 8.7750 4.0286 14 2.18 0.0470 temp -2.31 0.0364 Fit Statistics Pearson Chi-Square 10.98 Pearson Chi-Square / DF 0.78 Estimates Label Estimate Standard Error DF t Value Pr > |t| Mean Standard Error Mean logit at 50 deg 2.0201 1.1413 14 1.77 0.0985 0.9783 logit at 60 deg 0.6692 0.6024 1.11 0.2854 0.7483 0.1921 logit at 64.7 deg 0.3960 0.09 0.9324 0.5136 0.1579 logit at 64.8 deg 0.3925 0.05 0.9587 0.5083 0.1566 logit at 70 deg 0.3244 -2.10 0.0541 0.2477 0.1026 logit at 80 deg 0.7277 -2.79 0.0144 14 May 2007 SSP Core Facility

Option 3: Use Binary Data
proc glimmix data=O_Ring; model td_bin=temp / solution; model td_bin=temp /dist=binomial link=logit solution; estimate 'logit at 50 deg' intercept 1 temp 50 / ilink; estimate 'logit at 60 deg' intercept 1 temp 60 / ilink; estimate 'logit at 64.7 deg' intercept 1 temp 64.7 / ilink; estimate 'logit at 64.8 deg' intercept 1 temp 64.8 / ilink; estimate 'logit at 70 deg' intercept 1 temp 70 / ilink; estimate 'logit at 80 deg' intercept 1 temp 80 / ilink; run; Careful!! Normal default 14 May 2007 SSP Core Facility

Binary Output no evidence of overdispersion Estimates Label Estimate
Fit Statistics Pearson Chi-Square 23.17 Pearson Chi-Square / DF 1.10 Parameter Estimates Effect Estimate Standard Error DF t Value Pr > |t| Intercept 7.3786 21 2.04 0.0543 temp 0.1082 -2.14 0.0438 no evidence of overdispersion Estimates Label Estimate Standard Error DF t Value Pr > |t| Mean Standard Error Mean logit at 50 deg 3.4348 2.0232 21 1.70 0.1043 0.9688 logit at 60 deg 1.1131 1.0259 1.09 0.2902 0.7527 0.1909 logit at 64.7 deg 0.6576 0.03 0.9737 0.5055 0.1644 logit at 64.8 deg 0.6518 -0.00 0.9985 0.4997 0.1630 logit at 70 deg 0.5953 -2.03 0.0552 0.2300 0.1054 logit at 80 deg 1.4140 -2.50 0.0209 14 May 2007 SSP Core Facility

Binary Data + Random Effects
Binary data in GLM with random effect can be troublesome Pseudo-likelihood tends to produce biased variance / covariance component estimates e.g. variance estimates biased down for small cluster size Larger sample sizes tend to be required No overdispersion estimate 14 May 2007 SSP Core Facility

Binary GLMM example courtesy of Oliver Schabenberger 200 subjects
random intercept logistic link data binary; do subject = 1 to 200; ranint = rannor(&seed); do i = 1 to &n; linp = &b0 + ranint; pi = 1/(1 + exp(-linp)); y = ranbin(0,1,pi); output; end; drop i; run; 14 May 2007 SSP Core Facility

Binary GLMM Schabenberger used two programs proc glimmix data=binary;
class subject; model y(event='1') = / dist=binary link=logit s; random intercept / subject=subject; ods select ParameterEstimates CovParms; run; proc nlmixed data=binary; parms s2 1 intercept -1; model y ~ binary(1/(1+exp(-intercept+gamma))); random gamma ~ normal(0,s2) subject=subject; ods select Dimensions ParameterEstimates; run; 14 May 2007 SSP Core Facility

GLIMMIX vs NLMIXED Binary Results
cluster size n=4 cluster size n=20 GLIMMIX Covariance Parameter Estimates Cov Parm Subject Estimate Standard Error Intercept subject 0.5251 0.1699 Covariance Parameter Estimates Cov Parm Subject Estimate Standard Error Intercept subject 0.9905 0.1373 Solutions for Fixed Effects Effect Estimate Standard Error DF Intercept 199 Solutions for Fixed Effects Effect Estimate Standard Error DF Intercept 199 NLMIXED Parameter Estimates Parameter Estimate Standard Error DF s2 0.8159 0.2718 199 intercept 0.1085 Parameter Estimates Parameter Estimate Standard Error DF s2 1.1512 0.1659 199 intercept 14 May 2007 SSP Core Facility

Diagnostics & Alternative Models
Example using count data SAS Linear Models, Output 10.24 Historically, count data assumed ~ Poisson Implies mean=variance In practice, often variance>mean, overdispersion Requires modification scale to correct std error, test statistics for overdispersion use different distribution 14 May 2007 SSP Core Facility

Basic analysis + model checking
proc glimmix data=a; class BLOCK CTL_TRT a b; model count=CTL_TRT a b a*b/dist=poisson; random intercept / subject=BLOCK; output out=check pred=xbeta pred(ilink)=pred residual=r pearson=resid_pearson; run; Model checking plots: Residuals vs pred use std resid or deviance res std’ize pred scale look for unequal scatter (wrong dist or var fct) pattern in resid (wrong model or link) y* vs.  (xbeta) linear or wrong link data plot; merge check; adjlamda=2*sqrt(pred); ystar=xbeta+(count-pred)/pred; absres=abs(resid_pearson); proc gplot; plot resid_pearson*(pred xbeta); plot (resid_pearson)*adjlamda; plot ystar*xbeta; plot absres*adjlamda; run; 14 May 2007 SSP Core Facility

Evidence of Overdispersion
Fit Statistics -2 Res Log Pseudo-Likelihood Generalized Chi-Square Gener. Chi-Square / DF Gener. chi-square / DF should be  1 >1 indicates overdispersion <1 indicates underdispersion 14 May 2007 SSP Core Facility

Example: plot of residuals x adjlamda

Another look – absolute value resid vs adjlamda

Link? Plot ystar x XBeta should be linear – no strong evidence of problem 14 May 2007 SSP Core Facility

Strategy 1: Adjust using scale parameter

Implementation with GLIMMIX
proc glimmix data=a; class BLOCK CTL_TRT a b; model count=CTL_TRT a b a*b/dist=poisson htype=1,3; random intercept / subject=BLOCK; random _residual_; run; 14 May 2007 SSP Core Facility

Selected Output UnScaled Scaled
Type I Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F CTL_TRT 1 27 55.83 <.0001 Type I Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F CTL_TRT 1 27 16.23 0.0004 Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F CTL_TRT . A 2 27 9.19 0.0009 B 0.06 0.9402 A*B 4 3.11 0.0315 Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F CTL_TRT . A 2 27 2.67 0.0875 B 0.02 0.9822 A*B 4 0.90 0.4753 Note discrepancy for CTL_TRT and A main effect 14 May 2007 SSP Core Facility

Alternative 2: different distribution e.g. Negative Binomial
 is the mean and k is the aggregation parameter small k  aggregation; k  Poisson 14 May 2007 SSP Core Facility

Negative Binomial with GLIMMIX
proc glimmix data=a; class BLOCK CTL_TRT a b; model count=CTL_TRT a b a*b/dist=negbin htype=1,3; random intercept / subject=BLOCK; run; Type I Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F CTL_TRT 1 27 10.08 0.0037 Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F CTL_TRT . A 2 27 3.53 0.0436 B 0.03 0.9753 A*B 4 1.02 0.4139 Fit Statistics -2 Res Log Pseudo-Likelihood 84.48 Generalized Chi-Square 28.32 Gener. Chi-Square / DF 0.94 14 May 2007 SSP Core Facility

Modeling with Offsets There are cases when modeling count alone is naive This occurs when counts are “per unit” Number of plants per plot Number of patients per county Number of students per district Number of boating accidents per year per lake Number of defects per lot Accurate model must take units into account Essentially, based on log(count/unit) Log(count) is link; log(unit) is “offset” 14 May 2007 SSP Core Facility

Offset defined Idea: raw count may be artifact of unit size
Count / unit more informative Offset adjusts for size is a regressor whose coefficient is assumed to be 1.0 used especially in conjuction with Poisson models with log link accounts for heterogeneity in rates resulting from difference in size 14 May 2007 SSP Core Facility

Modeling with Offsets 14 May 2007 SSP Core Facility

Example: Courtesy of Oliver Schabenberger
Some of the data X is predictor variable SIZE is the “unit” to be taken into account Obs size x count 1 5001 4.597 4 2 7550 4.245 76 3 1744 3.918 1451 3.273 5 5313 4.140 12 6 3687 3.438 7 3022 4.763 8 8809 4.445 9 4436 4.191 10 2621 4.835 14 May 2007 SSP Core Facility

Naive Modeling (not accounting for SIZE)
proc glimmix data=test; model count = x / s dist=poisson; ods select FitStatistics ParameterEstimates; run; Fit Statistics -2 Log Likelihood 647.12 AIC (smaller is better) 651.12 AICC (smaller is better) 651.45 BIC (smaller is better) 654.50 CAIC (smaller is better) 656.50 HQIC (smaller is better) 652.35 Pearson Chi-Square Pearson Chi-Square / DF 28.39 Parameter Estimates Effect Estimate Standard Error DF t Value Pr > |t| Intercept 2.0978 0.4143 38 5.06 <.0001 x 0.1002 -0.16 0.8725 14 May 2007 SSP Core Facility

Poisson Model with Offset
proc glimmix data=test; offs = log(size); model count = x /s dist=poisson offset=offs; ods select FitStatistics ParameterEstimates; run; Fit Statistics -2 Log Likelihood 318.41 AIC (smaller is better) 322.41 AICC (smaller is better) 322.73 BIC (smaller is better) 325.79 CAIC (smaller is better) 327.79 HQIC (smaller is better) 323.63 Pearson Chi-Square 347.09 Pearson Chi-Square / DF 9.13 Parameter Estimates Effect Estimate Standard Error DF t Value Pr > |t| Intercept 0.5052 38 -14.48 <.0001 x 0.2247 0.1225 1.83 0.0746 14 May 2007 SSP Core Facility

Alternative to Offset?? Could count/size be treated as binomial?
proc glimmix data=test; offs = log(size); model count = x /s dist=poisson offset=offs; output out=gmxout1 pred(ilink)=mu; id _xbeta_ offs _linp_; ods exclude all; run; proc glimmix data=test; model count/size = x /s dist=binomial; output out=gmxout2 pred(ilink)=prob; ods exclude all; run; data gmxout2; set gmxout2; predcount= prob * size; 14 May 2007 SSP Core Facility

Compare Poisson/Offset vs Binomial Results
Poisson results MU = pred count Bimomial results Obs _xbeta_ offs _linp_ mu 1 9.3321 2 3 2.7939 4 2.0109 5 8.9468 6 5.3028 7 5.8535 8 9 7.5561 10 5.1595 Obs size x count prob predcount 1 5001 4.597 4 9.3320 2 7550 4.245 76 3 1744 3.918 2.7939 1451 3.273 2.0109 5 5313 4.140 12 8.9469 6 3687 3.438 5.3028 7 3022 4.763 5.8533 8 8809 4.445 9 4436 4.191 7.5561 10 2621 4.835 5.1594 predicted counts nearly identical 14 May 2007 SSP Core Facility

ZIP and Hurdle Models Mixture models for count data
ZIP = “zero-inflated Poisson” ZINB = “zero-inflated Negative Binomial” in principle, other zero-inflated models limited only by imagination Accommodate excess zeros Excess zeros cause overdispersion Are not in exponential family Cannot be fit with PROC GLIMMIX Can be fit using PROC NLMIXED 14 May 2007 SSP Core Facility

ZIP Model Observation prob of zero from Poisson prob of 0 from process
Bernoulli process 14 May 2007 SSP Core Facility

Hurdle Model Two part model One process generates zeros
Another process generates non-zeros observation truncated at zero distribution zeros from Z process 14 May 2007 SSP Core Facility

ZIP or Hurdle? Number of doctor visits per year
Number of fish caught by sport fishermen Cancer mortality 14 May 2007 SSP Core Facility

From SAS for Mixed Models, 2nd ed, Ch 15
%let pi = 0.27; data zip; do s = 1 to 100; u = rannor(556712); do i = 1 to 20; x = int(ranuni(0)*100); y = int(rannor(0)*100); if (ranuni(0) < &pi) then do; count = 0; lambda = .; end; else do; lambda = exp( *x *y + u); count = ranpoi(0,lambda); end; output; drop i u lambda; run; Credit: Oliver Schabenberger 14 May 2007 SSP Core Facility

ZIP Model with Random Effects
proc nlmixed data=zip; parameters b0=0 b1=0 b2=0 a0=0 s2u=1; /* linear predictor for the inflation probability */ linpinfl = a0; /* infprob = inflation probability for zeros */ /* = logistic transform of the linear predictor*/ infprob = 1/(1+exp(-linpinfl)); /* Poisson mean */ lambda = exp(b0 + b1*x + b2*y + u); /* Build the ZIP log likelihood */ if count=0 then ll = log(infprob + (1-infprob)*exp(-lambda)); else ll = log((1-infprob)) + count*log(lambda)-lgamma(count+1)-lambda; model count ~ general(ll); random u ~ normal(0,s2u) subject=s; estimate "inflation probability" infprob; run; 14 May 2007 SSP Core Facility

ZIP NLMIXED Selected Results
true parameter values b0=-2 b1=b2=0.01 a0= s2u=1 Fit Statistics -2 Log Likelihood 2803.6 AIC (smaller is better) 2813.6 AICC (smaller is better) 2813.7 BIC (smaller is better) 2826.7 Parameter Estimates Parameter Estimate Standard Error DF t Value Pr > |t| Alpha Lower Upper Gradient b0 0.1530 99 -13.06 <.0001 0.05 b1 7.78 b2 25.78 a0 0.1594 -6.86 s2u 1.0828 0.2095 5.17 0.6671 1.4985 Additional Estimates Label Estimate Standard Error DF t Value Pr > |t| Alpha Lower Upper inflation probability 0.2510 99 8.38 <.0001 0.05 0.1915 0.3104 14 May 2007 SSP Core Facility

GLMM Multi-Clinic Binomial Data
SAS for Linear Models, Output 10.9 also SAS for Mixed Models, Ch 14 from Beitler & Landis, Biometrics, 1985 2 treatments (drug, cntl) 8 clinics, represent population nij patients observed on trt i at clinic j yij have favorable response 14 May 2007 SSP Core Facility

GLMM for Beitler Landis Data
proc glimmix data=a; class clinic trt; model fav/nij= trt/dist=binomial link=logit; random intercept trt / subject=clinic; lsmeans trt/odds; estimate 'lsm - cntl' intercept 1 trt 1 0 /ilink; estimate 'lsm - drug' intercept 1 trt 0 1 / ilink; estimate 'diff' trt 1 -1; contrast 'diff' trt 1 -1; run; Covariance Parameter Estimates Cov Parm Subject Estimate Intercept clinic 2.0103 trt 14 May 2007 SSP Core Facility

If you drop Clinic x Trt conditional (SS) model marginal (PA) model
proc glimmix data=a; class clinic trt; model fav/nij= trt/dist=binomial link=logit; random intercept / subject=clinic; lsmeans trt/odds; estimate 'lsm - cntl' intercept 1 trt 1 0 /ilink; estimate 'lsm - drug' intercept 1 trt 0 1 / ilink; estimate 'diff' trt 1 -1; contrast 'diff' trt 1 -1; run; conditional (SS) model proc glimmix data=a; class clinic trt; model fav/nij= trt/dist=binomial link=logit; random _residual_ / type=cs subject=clinic; lsmeans trt/odds; estimate 'lsm - cntl' intercept 1 trt 1 0 /ilink; estimate 'lsm - drug' intercept 1 trt 0 1 / ilink; estimate 'diff' trt 1 -1; contrast 'diff' trt 1 -1; run; marginal (PA) model 14 May 2007 SSP Core Facility

Selected Output – Conditional Model
Type III Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F trt 1 7 5.98 0.0444 Covariance Parameter Estimates Cov Parm Estimate Standard Error clinic 2.0327 1.2637 Estimates Label Estimate Standard Error DF t Value Pr > |t| Mean Standard Error Mean lsm - cntl 0.5586 7 -2.05 0.0793 0.2411 0.1022 lsm - drug 0.5552 -0.76 0.4720 0.3960 0.1328 diff 0.2963 -2.45 0.0444 trt Least Squares Means trt Estimate Standard Error DF t Value Pr > |t| Odds cntl 0.5586 7 -2.05 0.0793 0.3178 drug 0.5552 -0.76 0.4720 0.6557 14 May 2007 SSP Core Facility

GLMM with NLMIXED 1. data step to define indicator for Trt=1 (because NLMIXED lacks CLASS statement) data a; input clinic trt $ fav unfav; nij=fav+unfav; t1=(trt='drug'); 2. then, run NLMIXED proc nlmixed; parms mu=1 tau=0 s2c=2; eta=mu+tau*t1+cj; pij=exp(eta)/(1+exp(eta)); model fav~binomial(nij,pij); random cj~normal(0,s2c) subject=clinic; estimate 'trt effect' tau; estimate 'ctl p_hat' exp(mu)/(1+exp(mu)); estimate 'drug p_hat' exp(mu+tau)/(1+exp(mu+tau)); estimate 'diff on p_hat scale' exp(mu+tau)/(1+exp(mu+tau)) - exp(mu)/(1+exp(mu)); run; 14 May 2007 SSP Core Facility

NLMIXED with CxT term included
first, also define Trt=2 indicator, here denoted t2 proc nlmixed; parms mu=1 tau=0 s2c=2 s2ct=0.08; eta=mu+tau*t1+cj+c1j*t1+c2j*t2;; pij=exp(eta)/(1+exp(eta)); model fav~binomial(nij,pij); random cj c1j c2j~normal([0,0,0],[s2c,0,s2ct,0,0,s2ct]) subject=clinic; estimate 'trt effect' tau; estimate 'ctl p_hat' exp(mu)/(1+exp(mu)); estimate 'drug p_hat' exp(mu+tau)/(1+exp(mu+tau)); estimate 'diff on p_hat scale' exp(mu+tau)/(1+exp(mu+tau)) - exp(mu)/(1+exp(mu)); run; 14 May 2007 SSP Core Facility

Binary Repeated Measures
2 treatments 20 subjects (animals) per trt 5 times of measurement response at each measurement 0/1 suggested by companion animal vaccine trials 14 May 2007 SSP Core Facility

Several approaches GEE using GENMOD PQL using %GLIMMIX
random subj(trt), or CS G-H quadrature using NLMIXED (not shown) but you could use MIXED type 1 error control of PQL + random subj(trt) not acceptable power of PQL/CS or NLMIXED > GEE 14 May 2007 SSP Core Facility

various SAS pgm for binary rpt-M data
proc genmod; class trt animal day; model y=trt|day/dist=bin type1 type3; repeated subject=animal(trt)/ type=exch; GEE Proc GLIMMIX; CLASS trt animal day; MODEL y=trt|day / dist=binomial link=logit; random animal(trt); random day / rside type=cs subject=animal(trt); PQL random an(trt) CS NLMixed next page 14 May 2007 SSP Core Facility

NLMixed data nlmx; set univar; t1=(trt=1); t2=(trt=2);
d1=(day=1); d2=(day=2); d3=(day=3); d4=(day=4); d5=(day=5); proc nlmixed; parms mu=1 a1=1 b1=1 b2=1 b3=1 b4=1 ab11=1 ab12=1 ab13=1 ab14=1 sb2=1; eta=mu+a1*t1+b1*d1+b2*d2+b3*d3+b4*d4+ ab11*t1*d1+ab12*t1*d2+ab13*t1*d3+ab14*t1*d4; pi=exp(eta+bse)/(1+exp(eta+bse)); model y~binary(pi); random bse~normal(0,sb2) subject=id; contrast 'trt' a1; contrast 'day' b1,b2,b3,b4; contrast 'trt x day' ab11,ab12,ab13,ab14; 14 May 2007 SSP Core Facility

Poisson Repeated Measures
Output SAS for Linear Models Leppik, et al (1985); Thall & Vail (1990) 2 treatments 28 patients on trt=0; 31 on trt=1 4 times of measurement epilespsy: # seizures in 4 test periods baseline & age covariates 14 May 2007 SSP Core Facility

Model for seizure data using GEE see SAS file for %GLIMMIX approach
proc genmod data=seizure; class id trt time; /* this model first */ *model y=trt time trt*time log_base trt*log_base log_age/ dist=poisson link=log type1 type3; /* then this model */ model y=trt time log_base(trt)log_age/ repeated subject=id / type=exch corrw; see SAS file for %GLIMMIX approach 14 May 2007 SSP Core Facility

GENMOD to GLIMMIX using GEE equivalent GLIMMIX
proc genmod data=seizure; class id trt time; model y=trt time log_base(trt)log_age/ dist=poisson link=log type1 type3; repeated subject=id / type=exch corrw; equivalent GLIMMIX proc glimmix data=seizure; class id trt time; model y=trt time log_base(trt)log_age/ dist=poisson link=log; random time / type=cs subject=id residual; 14 May 2007 SSP Core Facility

Degrees of Freedom & Standard Errors
Recall Satterthwaite approximation & Kenward-Roger bias adjustment in LMM Same issues exist with GLMM But not nearly as well researched You can use SATTERTH and KR options in GLIMMIX with non-normal data & non-identity link But what do they do? 14 May 2007 SSP Core Facility

Power 14 May 2007 SSP Core Facility

VIII. Power What if you have “Mixed Model Issues”?
Many software packages for power & sample size e.g SAS PROC POWER for FIXED effect models only What if you have “Mixed Model Issues”? random effects split-plot structure errors potentially correlated: longitudinal or spatial data any other non-standard model structure Methods based on PROC GLIMMIX adapted from Stroup (2002, JABES) 14 May 2007 SSP Core Facility

Mixed Model Background – G, R unknown

Computing Power using SAS
create data set like proposed design (O’Brien: “exemplary data set”) run PROC GLIMMIX with covariance components fixed =(F computed by GLIMMIX)rank(K) [or chi-sq with GLM] use GLIMMIX to compute  critical F (Fcrit ) is value s.t. P{F (rank(K), υ, 0 ) > Fcrit}=  [or chi-square] Power = P{F [rank(K), υ, ] >Fcrit } SAS functions can compute Fcrit & Power 14 May 2007 SSP Core Facility

Compute Power with GLIMMIX – CRD example
/* step 1 - create data set with same structure as proposed design use MU (expected mean) instead of observed Y_ij values */ /* this example shows power for 5, 10, and 15 e.u. per trt */ data crdpwrx1; input trt mu; do n=5 to 15 by 5; do eu=1 to n; output; end; cards; 1 100 2 94 3 90 ; 14 May 2007 SSP Core Facility

Compute Power with GLIMMIX – CRD example
/* step 2 - use PROC GLIMMIX to compute non-centrality parameters for ANOVA tests & contrasts ODS statements output them to new data sets */ proc sort data=crdpwrx1; by n; proc glimmix data=crdpwrx1; class trt; model mu=trt; parms (100)/hold=1; contrast 'et1 v et2' trt ; contrast 'c vs et' trt ; ods output tests3=b; ods output contrasts=c; run; 14 May 2007 SSP Core Facility

Type III Tests of Fixed Effects
Num DF Den DF F Value Pr > F trt 2 12 1.27 0.3169 Contrasts Label Num DF Den DF F Value Pr > F et1 v et2 1 12 0.40 0.5390 c vs et 2.13 0.1698 /* step 3: combine ANOVA & contrast n-c parameter data sets use SAS functions PROBF and FINV to compute power */ data power; set b c; alpha=0.05; ncparm=numdf*fvalue; fcrit=finv(1-alpha,numdf,dendf,0); power=1-probf(fcrit,numdf,dendf,ncparm); proc print; Obs n Effect NumDF DenDF FValue ProbF Label alpha ncparm fcrit power 1 5 trt 2 12 1.27 0.3169 0.05 0.40 0.5390 et1 v et2 3 2.13 0.1698 c vs et 14 May 2007 SSP Core Facility

More Advanced Example Plots in 8 x 3 grid
Main variation alone 8 “rows” 3 x 2 treatment design Alternative designs randomized complete block (4 blocks, size 6) incomplete block (8 blocks, size 3) split plot RCBD “easy” but ignores natural variation 14 May 2007 SSP Core Facility

Picture the 8 x 3 Grid Gradient 14 May 2007 SSP Core Facility

SAS Programs to Compare 8 x 3 Design
data a; input bloc trtmnt do s_plot=1 to 3; input dose mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3)); output; end; cards; ; Split-Plot proc glimmix data=a noprofile; class bloc trtmnt dose; model mu=bloc trtmnt|dose; random trtmnt/subject=bloc; parms (4) (6) / hold=1,2; lsmeans trtmnt*dose / diff; contrast 'trt x lin' trtmnt*dose ; ods output diffs=b; ods output contrasts=c; run; 14 May 2007 SSP Core Facility

8 x 3 – Incomplete Block proc glimmix data=a noprofile;
input bloc do eu=1 to 3; input trtmnt dose mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3)); output; end; cards; ; proc glimmix data=a noprofile; class bloc trtmnt dose; model mu=trtmnt|dose; random intercept / subject=bloc; parms (4) (6) / hold=1,2; lsmeans trtmnt*dose / diff; contrast 'trt x lin' trtmnt*dose ; ods output diffs=b; ods output contrasts=c; run; 14 May 2007 SSP Core Facility

8 x 3 Example - RCBD proc glimmix data=a noprofile;
input trtmnt dose do bloc=1 to 4; mu=trtmnt*(0*(dose=1)+4*(dose=2)+8*(dose=3)); output; end; cards; ; proc glimmix data=a noprofile; class bloc trtmnt dose; model mu=bloc trtmnt|dose; parms (10) / hold=1; lsmeans trtmnt*dose / diff; contrast 'trt x lin' trtmnt*dose ; ods output diffs=b; ods output contrasts=c; run; 14 May 2007 SSP Core Facility

Power for GLMs 2 treatments P{favorable outcome}
for trt 1 p= 0.30; for trt 2 p=0.25 power if n1=300; n2=600 data a; input trt y n; datalines; ; proc glimmix; class trt; model y/n=trt / chisq; ods output tests3=pwr; run; data power; set pwr; alpha=0.05; ncparm=numdf*chisq; fcrit=cinv(1-alpha,numdf,0); power=1-probchi(fcrit,numdf,ncparm); proc print; run; 14 May 2007 SSP Core Facility

Power for GLMM Same trt and sample size per location as before
10 locations Var(Location)=0.25; Var(Trt*Loc)=0.125 Variance Components: variation in log(OddsRatio) Power? data a; input trt y n; do loc=1 to 10; output; end; datalines; ; proc glimmix data=a initglm; class trt loc; model y/n = trt / oddsratio; random intercept trt / subject=loc; random _residual_; parms (0.25) (0.125) (1) / hold=1,2,3; ods output tests3=pwr; run; 14 May 2007 SSP Core Facility

GLMM Power Analysis Results
Odds Ratio Estimates trt _trt Estimate DF 95% Confidence Limits 1 2 1.286 9 0.884 1.871 Gives you expected Conf Limits for # Locations & N / Loc contemplated Obs Effect NumDF DenDF alpha ncparm fcrit power 1 trt 9 0.05 Gives you the power of the test of TRT effect on prob(favorable) 14 May 2007 SSP Core Facility

GLMM Power: Impact of Sample Size?
N of subjects per trt per location? N of Locations? Three cases n-300/ loc n=600/1200, 10 loc n=300/600, 20 loc data a; input trt y n; do loc=1 to 10; output; end; datalines; ; data a; input trt y n; do loc=1 to 10; output; end; datalines; ; data a; input trt y n; do loc=1 to 20; output; end; datalines; ; 14 May 2007 SSP Core Facility

GLMM Power: Impact of Sample Size?
Recall, for 10 locations, N=300/600, CI for OddsRatio was (0.884, 1.871); Power was 0.274 For 10 locations, N=600 / 1200 N alone has almost no impact Odds Ratio Estimates trt _trt Estimate DF 95% Confidence Limits 1 2 1.286 9 0.891 1.855 Obs Effect NumDF DenDF alpha ncparm fcrit power 1 trt 9 0.05 For 20 locations, N=300 / 600 Odds Ratio Estimates trt _trt Estimate DF 95% Confidence Limits 1 2 1.286 19 1.006 1.643 Obs Effect NumDF DenDF alpha ncparm fcrit power 1 trt 19 0.05 14 May 2007 SSP Core Facility

Spatial Data 14 May 2007 SSP Core Facility

Example 5 - Spatial from SAS for Mixed Models, Sect. 11.7
“Alliance” Data from Stroup, Baenziger, and Mulitze (1994) in GLIMMIX-speak: data two; set alliance; obs = _n_; proc glimmix data=two; class Entry Rep obs; model Yield=Entry/ddfm=kr; random intercept/subject=rep; random obs / type=sp(sph)(latitude longitude); parms (0.1) (43.4) (27.5) (11.5); lsmeans entry; 14 May 2007 SSP Core Facility

IX. Spatial Data Example from SAS for Mixed Models
Spatial errors in Treatement Comparison studies only No spatial mapping, Kriging Standard parametric models from Geostatistics RSMOOTH alternative Issues 14 May 2007 SSP Core Facility

From Stroup, Baenziger & Mulitze (Crop Science, 1994)
56 varieties, 4 blocks, e.u. = 4.3  1.2 m plots 14 May 2007 SSP Core Facility

Contour Plot of Response
B N B = Buckskin N = NE86503 14 May 2007 SSP Core Facility

Additional GLIMMIX Code to Plot Spatial Variability
output out=gmxout2 pred=p; ods output lsmeans=lsm2; id entry latitude longitude _zgamma_; run; proc means data=gmxout2; var _zgamma_; run; proc print data=gmxout2(OBS=20); run; proc g3d data=gmxout2; plot latitude*longitude=_zgamma_ /grid; 14 May 2007 SSP Core Facility

Plot of Spherical Covariance

Alternative Using RSMOOTH
Advantage in Theory: RSMOOTH does not require parametric model of spatial variation, which can be unrealistic e.g. Alliance data spatial variation is from winter kill proc glimmix data=alliance; class Entry Rep; model Yield=Entry /ddfm=kr; *model Yield=Entry latitude longitude/ddfm=kr; random intercept/subject=rep; random latitude longitude / type=rsmooth; 14 May 2007 SSP Core Facility

RSMOOTH? From Penalized Spline
Ruppert, Wand, and Carroll (2003, SemiParametric Regression, Cambridge) 14 May 2007 SSP Core Facility

RSMOOTH (2) Rewrite the model 14 May 2007 SSP Core Facility

RSMOOTH (2) 14 May 2007 SSP Core Facility

RSMOOTH yields following Spatial Plot

RSMOOTH vs SP(SPH) Sp(SPH) RSMOOTH Type III Tests of Fixed Effects
Num Den Effect DF DF F Value Pr > F Entry Type III Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F Entry 14 May 2007 SSP Core Facility

However... Plot of LSMeans from two approaches
LSM_RSMOOTH average 31.06 LSM_SP_SPH average 24.40 ???? 14 May 2007 SSP Core Facility

Nonlinear Mixed Models

Some NLMM Issues Consulting problem at UNL
Why nonlinear mixed model (NLMM) seemed appropriate Problems in implementation  NLMM issues Alternatives whose implications are not adequately understood 14 May 2007 SSP Core Facility

Wheat Sawfly Study Gary Hein, Research Entomologist, Scottsbluff, NE RREC Sawflies inhabit/damage wheat 5 tillage treatments: impact on sawflies Exp design used 4 randomized blocks Sawfly emergence measured at planned times during growing season 14 May 2007 SSP Core Facility

Emergence over TIME by TRT
Black: NoTill Red: SumBlade (summer) Cyan: SB&SD Green: SpDisk (spring) Blue: SpPlow 14 May 2007 SSP Core Facility

“Conventional” Analysis
Emerge =  + TRT + blk + blk*trt + DATE + TRT*DATE + date*blk(trt) blk*trt a.k.a. between subjects or “whole-plot” error date*blk(trt) = within subjects or “split-plot” error ANOVA: Source df blk 3 TRT 4 betw subj error 12 DATE 12 TRT*DATE 48 within subj error 180 14 May 2007 SSP Core Facility

Standard ANOVA model: emerge =
+ blk + TRT +w.p.error + TIME + TRT*TIME + s.p. error The Mixed Procedure Covariance Parameter Estimates Cov Parm Estimate blk blk*trt Residual Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F trt date <.0001 trt*date <.0001 CS covariance fit adequately 14 May 2007 SSP Core Facility

Break out TRT*DATE effect
Type 1 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F trt lin <.0001 quad cubic <.0001 date lin*trt quad*trt <.0001 cubic*trt trt*date <.0001 14 May 2007 SSP Core Facility

Alternative Modeling Considerations
Modeling ij Decompose ij in “standard ANOVA” +Trt+Time+Trt*Time Further decompose via polynomial regression Nonlinear decomposition, e.g. Gompertz Transform yijk to “linearize” response profile over date logit or probit (assume sigmoid profile is symmetric) complementary log-log (allows asymmetry) 14 May 2007 SSP Core Facility

Parameter Estimates a1 0.9949 0.03629 19 27.42 <.0001
Standard Parameter Estimate Error DF t Value Pr > |t| a <.0001 a <.0001 a <.0001 a <.0001 a <.0001 b <.0001 b <.0001 b b <.0001 b c <.0001 c <.0001 c c <.0001 c s2w s2s These are ML estimates Bias? 14 May 2007 SSP Core Facility

Fit of Gompertz 14 May 2007 SSP Core Facility

Trt Comparisons with NLMIXED
Contrasts Num Den Label DF DF F Value Pr > F among a among b among c a: nt vs sum bld a: nt+sb vs sb&sd a: sp dsk vs sp plow a: nt+sb vs sp d+p b: nt vs sum bld b: nt+sb vs sb&sd b: sp dsk vs sp plow b: nt+sb vs sp d+p c: nt vs sum bld c: nt+sb vs sb&sd c: sp dsk vs sp plow c: nt+sb vs sp d+p 14 May 2007 SSP Core Facility

Issues with Test Results
denominator degrees of freedom? DF in NLMIXED based on simple N-1 rule MIXED uses Satterthwaite/KR NLMIXED analog? bias in test statistics? In MIXED, ML variance estimates biased  Test statistics biased  Excessive type I error rates familiar in MIXED Same in NLMIXED? 14 May 2007 SSP Core Facility

Alternative NLMIXED Analysis
Use MIXED to obtain REML estimates of W2 and S2 Include REML variance component estimates in NLMIXED as known NLMIXED will compute std errors and test statistics using REML estimates 14 May 2007 SSP Core Facility

NLMIXED REML Tests MLE: W2 = 0.002926 S2 = 0.01598
REML: W2 = S2 = Num Den Label DF DF F Value Pr > F among a among b among c a: nt vs sum bld a: nt+sb vs sb&sd a: sp dsk vs sp plow a: nt+sb vs sp d+p b: nt vs sum bld b: nt+sb vs sb&sd b: sp dsk vs sp plow b: nt+sb vs sp d+p c: nt vs sum bld c: nt+sb vs sb&sd c: sp dsk vs sp plow c: nt+sb vs sp d+p Vs. ML .1085 .0966 .0161 .0177 14 May 2007 SSP Core Facility

Hein: “What if we transform the data to linearize it, then use MIXED?”
Denote response variable emerge by y then: 14 May 2007 SSP Core Facility

Plot of CLogLog over Date by Trt

MIXED Analysis of CLogLog
Type 1 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F trt lin <.0001 lin*trt trt*date <.0001 Test of Lin and Lin*Trt correspond to equality of i and i for all treatments in Gompertz NLMM 14 May 2007 SSP Core Facility

Decomposing Contrasts
Num Den Label DF DF F Value Pr > F trt (b) c b: nt v sum bld b: nt&sb vs sb&sd b: sp d v p b: nt&sb v sp d&p c: nt v sum bld c: nt&sb vs sb&sd c: sp d v p c: nt&sb v sp d&p Vs NLMM .169 .154 .674 .026 .611 .028 NLMM too conservative? or is Linearized LMM too liberal? 14 May 2007 SSP Core Facility

Unresolved Issues 14 May 2007 SSP Core Facility

Unresolved NLMIXED Issues
REML vs. ML variance component estimates Degrees of Freedom Starting Values and Convergence Are NLMIXED tests too conservative? Implications for standard errors?? Correlated error repeated measures? When are linearized models analyzed using LMM (e.g. Proc Mixed) preferable? Design 14 May 2007 SSP Core Facility

GLIMMIX vs MIXED/GENMOD
GLIMMIX has very useful mean comparison options not available in MIXED especially for Factorial Simple Effects GLIMMIX can model true GLMM’s GLIMMIX is “touchy” (e.g. use of SUBJECT=) Many Research Issues RSMOOTH Properties of NonNormal KR, working correlation, DDF, etc. Computational Methods 14 May 2007 SSP Core Facility

Does GLIMMIX replace MIXED/GENMOD?
For GLMMs – no question For GLMs / LMMs for the most part – YES Most GENMOD & MIXED programs can be duplicated in GLIMMIX Mean Comparison features no need to “trick” GENMOD into GLMM with marginal model (e.g. split-plot, rpt measures) 14 May 2007 SSP Core Facility

Instructor: Walt Stroup, Ph.D.

Similar presentations

Presentation on theme: "Instructor: Walt Stroup, Ph.D."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Instructor: Walt Stroup, Ph.D.

Similar presentations

Presentation on theme: "Instructor: Walt Stroup, Ph.D."— Presentation transcript:

Similar presentations

About project

Feedback