SAS® Global Forum 2014 March Washington, DC Got Randomness?

Slides:



Advertisements
Similar presentations
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Advertisements

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Data: Crab mating patterns Data: Typists (Poisson with random effects) (Poisson Regression, ZIP model, Negative Binomial) Data: Challenger (Binomial with.
Moving away from Linear-Gaussian assumptions Cons: Some things become much harder. No baked-in test of global fit Non-recursive models Error correlations.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Weekend Workshop I PROC MIXED. Random or Fixed ?RANDOMFIXEDLevels: Selected at random from infinite population Finite number of possibilities Another.
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
EPI 809/Spring Probability Distribution of Random Error.
Logistic Regression Example: Horseshoe Crab Data
PROC GLIMMIX: AN OVERVIEW
Overview of Logistics Regression and its SAS implementation
The Power of Proc Nlmixed. Introduction Proc Nlmixed fits nonlinear mixed-effects models (NLMMs) – models in which the fixed and random effects have a.
Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season.
Comparison of Repeated Measures and Covariance Analysis for Pretest-Posttest Data -By Chunmei Zhou.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Log-Linear Models & Dependent Samples Feng Ye, Xiao Guo, Jing Wang.
Multiple regression analysis
1 Experimental design and analyses of experimental data Lesson 6 Logistic regression Generalized Linear Models (GENMOD)

EPI 809/Spring Multiple Logistic Regression.
OLS versus MLE Example YX Here is the data:
EPI809/Spring Testing Individual Coefficients.
Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Generalized Linear Models
Review of Lecture Two Linear Regression Normal Equation
GEE and Generalized Linear Mixed Models
Introduction to Multilevel Modeling Using SPSS
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
Inference for regression - Simple linear regression
Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data.
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
Fixed vs. Random Effects Fixed effect –we are interested in the effects of the treatments (or blocks) per se –if the experiment were repeated, the levels.
Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 4: Regression Models and Multivariate Analyses.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
April 11 Logistic Regression –Modeling interactions –Analysis of case-control studies –Data presentation.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.
GEE Approach Presented by Jianghu Dong Instructor: Professor Keumhee Chough (K.C.) Carrière.
The Completely Randomized Design (§8.3)
BUSI 6480 Lecture 8 Repeated Measures.
Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures.
Topic 30: Random Effects. Outline One-way random effects model –Data –Model –Inference.
1 STA 517 – Chp4 Introduction to Generalized Linear Models 4.3 GENERALIZED LINEAR MODELS FOR COUNTS  count data - assume a Poisson distribution  counts.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
1 Topic 4 : Ordered Logit Analysis. 2 Often we deal with data where the responses are ordered – e.g. : (i) Eyesight tests – bad; average; good (ii) Voting.
1 STA 617 – Chp12 Generalized Linear Mixed Models SAS for Model (12.3) with Matched Pairs from Table 12.1.
Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects.
Today: March 7 Data Transformations Rank Tests for Non-Normal data Solutions for Assignment 4.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Logistic regression (when you have a binary response variable)
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
Experimental Statistics - week 9
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 3: Missing Data in Longitudinal Studies.
G Lecture 71 Revisiting Hierarchical Mixed Models A General Version of the Model Variance/Covariances of Two Kinds of Random Effects Parameter Estimation.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
Generalized Linear Models
This Week Review of estimation and hypothesis testing
Generalized Linear Model
Generalized Linear Models
6-1 Introduction To Empirical Models
Presentation transcript:

SAS® Global Forum 2014 March 23-26 Washington, DC Got Randomness? SASTM for Mixed and Generalized Linear Mixed Models David A. Dickey NC State University TM SAS and its products are the registered trademarks of SAS Institute, Cary, NC Copyright © 2010, SAS Institute Inc. All rights reserved.

Data: Challenger (Binomial with random effects) Data: Ships (Poisson with offset) Data: Crab mating patterns (X rated) (Poisson Regression, ZIP model, Negative Binomial) Data: Typists (Poisson with random effects) Data: Flu samples (Binomial with random effects)

“Generalized”  non normal distribution Binary for probabilities: Y=0 or 1 Mean E{Y}=p Variance p(1-p) Pr{Y=j}= pj(1-p)(1-j) Link: L=ln(p/(1-p)) = “Logit” Range (over all L): 0<p<1 Poisson for counts: Y in {0,1,2,3,4, ….} Mean count l Variance l Pr{Y=j} = exp(- l )(lj)/(j!) Link: L = log(l) Range (over all L): l >0 2/3 1/3 Like Dislike Pr{ Like }=2/3 l=0.4055 2/3 .055 .27 .007 .001 0 1 2 3 4 5

Mixed (not generalized) Models: Fixed Effects and Random Effects

Fixed? or Random? Replication : Same levels Different levels Inference for: Only Observed Population of Levels Levels Levels : Picked on Picked at Purpose Random Inference on: Means Variances Example: Only These All Doctors Drugs All Clinics Example: Only These All Farms Fertilizers All Fields

Generalized (not mixed) linear models. Use link L = g(E{Y}), e.g. ln(p/(1-p)) = ln(E{Y}/(1-E{Y}) Assume L is linear model in the inputs with fixed effects. Estimate model for L, e.g. L=g(E{Y})=bo + b1 X Use maximum likelihood Example: L = -1 + .18*dose Dose = 10, L=0.8, p=exp(0.8)/(1+exp(0.8))= “inverse link” = 0.86

Challenger was mission 24 From 23 previous launches we have: 6 O-rings per mission Y=0 no damage, Y=1 erosion or blowby p = Pr {Y=1} = f{mission, launch temperature) Features: Random mission effects Logistic link for p proc glimmix data=O_ring; class mission; model fail = temp/dist=binomial s; random mission; run; Generalized Mixed

We “hit the boundary” Estimated G matrix is not positive definite. Covariance Parameter Estimates Cov Standard Parm Estimate Error mission 2.25E-18 . Solutions for Fixed Effects Effect Estimate Error DF t Value Pr > |t| Intercept 5.0850 3.0525 21 1.67 0.1106 temp -0.1156 0.04702 115 -2.46 0.0154 We “hit the boundary”

Likelihood

Just logistic regression – no mission variance component

input fluseasn year t week pos specimens; pct_pos=100*pos/specimens; Flu Data CDC Active Flu Virus Weekly Data % positive data FLU; input fluseasn year t week pos specimens; pct_pos=100*pos/specimens; logit=log(pct_pos/100/(1+(pct_pos/100))); label pos = "# positive specimens"; label pct_pos="% positive specimens"; label t = "Week into flu season (first = week 40)"; label week = "Actual week of year"; label fluseasn = "Year flu season started"; Empirical Logit % positive

S(j) = sin(2pjt/52) C(j)=cos(2pjt/52) (1) GLM all effects fixed (harmonic main effects insignificant) “Sinusoids” S(j) = sin(2pjt/52) C(j)=cos(2pjt/52) PROC GLM DATA=FLU; class fluseasn; model logit = s1 c1 fluseasn*s1 fluseasn*c1 fluseasn*s2 fluseasn*c2 fluseasn*s3 fluseasn*c3 fluseasn*s4 fluseasn*c4; output out=out1 p=p; data out1; set out1; P_hat = exp(p)/(1+exp(p)); label P_hat = "Pr{pos. sample} (est.)"; run;

(2) MIXED analysis on logits Random harmonics. Normality assumed Toeplitz(1) PROC MIXED DATA=FLU method=ml; ** reduced model; class fluseasn; model logit = s1 c1 /outp=outp outpm=outpm ddfm=kr; random intercept/subject=fluseasn; random s1 c1/subject=fluseasn type=toep(1); random s2 c2/subject=fluseasn type=toep(1); random s3 c3/subject=fluseasn type=toep(1); random s4 c4/subject=fluseasn type=toep(1); run;

(3) GLIMMIX analysis Random harmonics. Binomial assumed (overdispersed – lab effects?) PROC GLIMMIX DATA=FLU; title2 "GLIMMIX Analysis"; class fluseasn; model pos/specimens = s1 c1 ; * s2 c2 s3 c3 s4 c4; random intercept/subject=fluseasn; random s1 c1/subject=fluseasn type=toep(1); random s2 c2/subject=fluseasn; ** Toep(1) - no converge; random s3 c3/subject=fluseasn type=toep(1); random s4 c4/subject=fluseasn type=toep(1); random _residual_; covtest glm; output out=out2 pred(ilink blup)=pblup pred(ilink noblup)=overall pearson = p_resid; run;

pred(ilink blup)=pblup pred(ilink noblup)= overall pearson = p_resid; output out=out2 pred(ilink blup)=pblup pred(ilink noblup)= overall pearson = p_resid; run; Pearson Residuals: Used to check fit when using default (pseudo-likelihood) Variance should be near 1 proc means mean var; var p_resid; run; Without random _residual_; variance 3.63.  With random _residual_; variance 0.83.  --------------------------------------------------------------- Fit Statistics -2 Res Log Pseudo-Likelihood 341.46 Generalized Chi-Square 1707.29 Gener. Chi-Square / DF 4.59 ------------------------------------------------------------------------- Or… use method=quad

random _residual_ does not affect the fit (just standard errors) Type III Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F S1 1 8 34.93 0.0004 c1 1 8 25.49 0.0010 Tests of Covariance Parameters Based on the Residual Pseudo-Likelihood Label DF -2 Res Log P-Like ChiSq Pr > ChiSq Note Independence 6 1312.34 970.88 <.0001 MI MI: P-value based on a mixture of chi-squares Output due to covtest glm;  random _residual_ does not affect the fit (just standard errors)

Could try 2 parameter Beta distribution instead: PROC GLIMMIX DATA=FLU; title2 "GLIMMIX Analysis"; class fluseasn; model f = s1 c1 /dist=beta link=logit s; random intercept/subject=fluseasn; random s1 c1/subject=fluseasn type=toep(1); random s2 c2/subject=fluseasn type=toep(1); random s3 c3/subject=fluseasn type=toep(1); random s4 c4/subject=fluseasn type=toep(1); output out=out3 pred(ilink blup)=pblup pred(ilink noblup)=overall pearson=p_residbeta; run;

Without BLUPS and with BLUPS Binomial Assumption Without BLUPS and with BLUPS

Without BLUPS and with BLUPS Beta Assumption Without BLUPS and with BLUPS

Poisson Example: Wave induced damage incidents in 40 ships (ship groups) Variables: Factorial Effects (fixed, classificatory): Ship Type 5 levels A,B,C,D,E Year Constructed 4 levels Years of Operation 2 levels Covariate (“offset” - continuous) = Time in service (“Aggregate months”) Incidents (dependent, counts) Source” McCullough & Nelder (but I ignore cases where year constructed > period of operation) Constructed Operated 60-64 65-69 70-74 75-79 60-74 ABCDE -X-

proc glimmix data=ships; Title "Ignoring Ship Variance"; class operation construct shiptype; model incidents = operation construct shiptype/ dist=poisson s offset=log_service; run; ln(service) Poisson: ln(l) – ln(service) = b0 + b1(operation) + b2(construct) + b3(ship_type) ln(l/service) >1 Ship variance? -2 Log Likelihood 136.56 (more fit statistics) Pearson Chi-Square 42.28 Pearson Chi-Square / DF 1.69

Without ship variance component: Type III Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F Operation 1 25 10.57 0.0033 Construct 3 25 9.72 0.0002 Shiptype 4 25 6.50 0.0010

PROC GLIMMIX data=ships method=quad; class operation construct shiptype ship; model incidents = operation construct shiptype/ dist=poisson s offset=log_service; covtest "no ship effect" glm; * random ship; random intercept / subject = operation*construct*shiptype; run; Covariance Parameter Estimates Standard Cov Parm Subject Estimate Error Intercept Operat*Constr*Shipty 0 .

Fit Statistics -2 Log Likelihood 136.56 Fit Statistics for Conditional Distribution -2 log L(incidents | r. effects) 136.56 Pearson Chi-Square 42.27 Pearson Chi-Square / DF 1.24 Type III Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F Operation 1 25 10.57 0.0033 no Construct 3 25 9.72 0.0002  shiptype 4 25 6.50 0.0010 changes Tests of Covariance Parameters  covtest "no ship effect" glm; Based on the Likelihood Label DF -2 Log Like ChiSq Pr > ChiSq Note no ship effect 1 136.56 . 1.0000 MI MI: P-value based on a mixture of chi-squares.

(reference: SAS GLIMMIX course notes): Horseshoe Crab study (reference: SAS GLIMMIX course notes): Female nests have “satellite” males Count data – Poisson? Generalized Linear Features (predictors): Carapace Width, Weight, Color, Spine condition Random Effect: Site Mixed Model To nest  Go State

proc glimmix data=crab; class site; model satellites = weight width / Fit Statistics Gener. Chi-Square / DF 2.77 Cov Parm Subject Estimate Intercept site 0.1625 Effect Estimate Pr > |t| Intercept -1.1019 0.2527 weight 0.5042 0.0035 width 0.0318 0.5229 proc glimmix data=crab; class site; model satellites = weight width / dist=poi solution ddfm=kr; random int / subject=site; output out=overdisp pearson=pearson; run; proc means data=overdisp n mean var; var pearson; run; proc univariate data=crab normal plot; var satellites; run; Histogram # Boxplot 15.5+* 1 0 .* 1 0 . 12.5+* 1 | .* 1 | .** 3 | 9.5+** 3 | .*** 6 | .** 4 | 6.5+******* 13 | .******** 15 +-----+ .********** 19 | | 3.5+********** 19 | | .***** 9 *--+--* .******** 16 | | 0.5+******************************* 62 +-----+ ----+----+----+----+----+----+- * may represent up to 2 counts N Mean Variance --------------------------- 173 -0.0258264 2.6737114 Zero Inflated ?

Zero Inflated Poisson (ZIP) Q: Can zero inflation cause overdispersion (s2>m)? Recall: in Poisson, s2=m=l

A: yes!

Zero Inflated Poisson - (ZIP code  ) Dickey ncsu Zero Inflated Poisson - (ZIP code  ) SAS Global Forum Washington DC 20745 proc nlmixed data=crab; parms b0=0 bwidth=0 bweight=0 c0=-2 c1=0 s2u1=1 s2u2=1; x=c0+c1*width+u1; p0 = exp(x)/(1+exp(x)); * width affects p0; eta= b0+bwidth*width +bweight*weight +u2; lambda=exp(eta); if satellites=0 then loglike = log(p0 +(1-p0)*exp(-lambda)); else loglike = log(1-p0)+satellites*log(lambda)-lambda-lgamma(satellites+1); expected=(1-p0)*lambda; id p0 expected lambda; model satellites~general(loglike); Random U1 U2~N([0,0],[s2u1,0,s2u2]) subject=site; predict p0+(1-p0)*exp(-lambda) out=out1; run;

weight affects l width affects p0 Parameter Estimates Parameter Estimate t Pr>|t| Lower Upper b0 2.7897 2.55 0.0268 0.3853 5.1942 bwidth -0.0944 -1.65 0.1267 -0.2202 0.0314 bweight 0.4649 2.38 0.0366 0.0347 0.8952 c0 13.3739 4.42 0.0010 6.7078 20.0401 c1 -0.5447 -4.61 0.0008 -0.8049 -0.2844 s2u1 0.5114 1.12 0.2852 -0.4905 1.5133 s2u2 0.1054 1.67 0.1239 -0.0339 0.2447 weight affects l width affects p0 Variance for p0 l

From fixed part of model, compute Pr{count=j} and plot (3D) versus Weight, Carapace width

Another possibility: Negative binomial Number of failures until kth success ( p=Prob{success} )

Negative binomial: In SAS, k (scale) is our 1/k proc glimmix data=crab; class site; model satellites = weight width / dist=nb solution ddfm=kr; random int / subject=site; run; Fit Statistics -2 Res Log Pseudo-Likelihood 539.06 Generalized Chi-Square 174.83 Gener. Chi-Square / DF 1.03 Covariance Parameter Estimates Cov Parm Subject Estimate Std. Error Intercept site 0.09527 0.07979 Scale 0.7659 0.1349 Standard Effect Estimate Error DF t Value Pr > |t| Intercept -1.2022 1.6883 168.5 -0.71 0.4774 weight 0.6759 0.3239 156.6 2.09 0.0386 width 0.01905 0.08943 166.2 0.21 0.8316

Population average model vs. Individual Specific Model 8 typists Y=Error counts (Poisson distributed) ln(li)= ln(mean of Poisson) = m+Ui for typist i so li=em+Ui conditionally (individual specific) Distributions for Y, U~N(0,1) and m=1 l=em=e1=2.7183 = mean for “typical” typist (typist with U=0)

Population average model Expectation ||||| | | of individual distributions averaged across population of all typists. Run same simulation for 8000 typists, compute mean of conditional population means, exp(m+U). The MEANS Procedure Variable N Mean Std Dev Std Error ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ lambda 8000 4.4280478 6.0083831 0.067175 Z=(4.428-2.7183)/0.06718 = 25.46 !! Population mean is not em Conditional means, m+U, are lognormal. Log(Y)~N(1,1)  E{Y}=exp(m+0.5s2) = e1.5 = 4.4817 #SASGF13 Copyright © 2013, SAS Institute Inc. All rights reserved.

Main points: Generalized linear models with random effects are subject specific models. Subject specific models have fixed effects that represent an individual with random effects 0 (individual at the random effect distributional means). Subject specific models when averaged over the subjects do not give the model fixed effects. Models with only fixed effects do give the fixed effect part of the model when averaged over subjects and are thus called population average models.