GEE and Generalized Linear Mixed Models

Slides:



Advertisements
Similar presentations
Use of Estimating Equations and Quadratic Inference Functions in Complex Surveys Leigh Ann Harrod and Virginia Lesser Department of Statistics Oregon State.
Advertisements

General Linear Model With correlated error terms  =  2 V ≠  2 I.
Lecture 11 (Chapter 9).
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
GENERAL LINEAR MODELS: Estimation algorithms
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Lecture 6 (chapter 5) Revised on 2/22/2008. Parametric Models for Covariance Structure We consider the General Linear Model for correlated data, but assume.
Multilevel modeling in R Tom Dunn and Thom Baguley, Psychology, Nottingham Trent University
PROC GLIMMIX: AN OVERVIEW
Lecture 4 (Chapter 4). Linear Models for Correlated Data We aim to develop a general linear model framework for longitudinal data, in which the inference.
Repeated Measures, Part 3 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.
Multiple Linear Regression Model
Chapter 10 Simple Regression.

Clustered or Multilevel Data
Mixed models Various types of models and their relation
Topic 3: Regression.
Linear and generalised linear models
Linear and generalised linear models
Basics of regression analysis
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Longitudinal Data Analysis: Why and How to Do it With Multi-Level Modeling (MLM)? Oi-man Kwok Texas A & M University.
Modeling clustered survival data The different approaches.
Using Regression Models to Analyze Randomized Trials: Asymptotically Valid Tests Despite Incorrect Regression Models Michael Rosenblum, UCSF TAPS Fellow.
Maximum likelihood (ML)
Multivariate Probability Distributions. Multivariate Random Variables In many settings, we are interested in 2 or more characteristics observed in experiments.
Generalized Linear Models
Review of Lecture Two Linear Regression Normal Equation
9. Binary Dependent Variables 9.1 Homogeneous models –Logit, probit models –Inference –Tax preparers 9.2 Random effects models 9.3 Fixed effects models.
Lecture 9: Marginal Logistic Regression Model and GEE (Chapter 8)
Inference for regression - Simple linear regression
Simple Linear Regression
The Campbell Collaborationwww.campbellcollaboration.org Introduction to Robust Standard Errors Emily E. Tanner-Smith Associate Editor, Methods Coordinating.
Fixed vs. Random Effects Fixed effect –we are interested in the effects of the treatments (or blocks) per se –if the experiment were repeated, the levels.
Lecture 8: Generalized Linear Models for Longitudinal Data.
G Lecture 5 Example fixed Repeated measures as clustered data
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
HSRP 734: Advanced Statistical Methods June 19, 2008.
Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician Session 3: Incomplete Data in Longitudinal Studies.
Modeling Repeated Measures or Longitudinal Data. Example: Annual Assessment of Renal Function in Hypertensive Patients UNITNOYEARAGESCrEGFRPSV
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 5: Choices for Longitudinal Data Analysis.
Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Empirical Efficiency Maximization: Locally Efficient Covariate Adjustment in Randomized Experiments Daniel B. Rubin Joint work with Mark J. van der Laan.
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.
GEE Approach Presented by Jianghu Dong Instructor: Professor Keumhee Chough (K.C.) Carrière.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Osteoarthritis Initiative Analytic Strategies for the OAI Data December 6, 2007 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and.
Estimation in Marginal Models (GEE and Robust Estimation)
1 STA 617 – Chp11 Models for repeated data Analyzing Repeated Categorical Response Data  Repeated categorical responses may come from  repeated measurements.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Generalized Linear Models (GLMs) and Their Applications.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Statistics……revisited
Tutorial I: Missing Value Analysis
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 3: Missing Data in Longitudinal Studies.
G Lecture 71 Revisiting Hierarchical Mixed Models A General Version of the Model Variance/Covariances of Two Kinds of Random Effects Parameter Estimation.
1 Statistics 262: Intermediate Biostatistics Mixed models; Modeling change.
BINARY LOGISTIC REGRESSION
Generalized Linear Models
Linear Mixed Models in JMP Pro
Microeconometric Modeling
CHAPTER 29: Multiple Regression*
BY: Mohammed Hussien Feb 2019 A Seminar Presentation on Longitudinal data analysis Bahir Dar University School of Public Health Post Graduate Program.
OVERVIEW OF LINEAR MODELS
Presentation transcript:

GEE and Generalized Linear Mixed Models Tom Greene

Outline Subject specific and population average inference in generalized linear models Review of classical generalized linear models with independent observations Generalized Estimating Equations Contrasts of GLMMs with GEEs GEE example

Classes of Generalized Linear Models (Linear regression, ANOVA, ANCOVA) E(Y) = X β, Responses Independent Linear Mixed Models E(Y|b) = X β + Z b Responses Correlated Correlation modeled in part by “random effects” Generalized Linear Models (Logistic regression, Poisson regression, etc.) g(E(Y)) = X β Responses Independent Generalized Linear Mixed Models (GLMM) g(E(Y|b)) = X β + Z b Responses Correlated Correlation modeled in part by “random effects” Generalized Estimating Equations Approach (GEE) g(E(Y)) = X β Responses Correlated

Classes of Generalized Linear Models for Correlated Data Linear Mixed Models E(Y|b) = X β + Z b Responses Correlated Correlation modeled in part by “random effects” Generalized Estimating Equations Approach (GEE) g(E(Y)) = X β Responses Correlated Generalized Linear Mixed Models (GLMM) g(E(Y|b)) = X β + Z b Responses Correlated Correlation modeled in part by “random effects” Population Average Inference Subject Specific Inference

Classes of Generalized Linear Models for Correlated Data Population Average Inference Subject Specific Inference Generalized Estimating Equations Approach (GEE) g(E(Y)) = X β Responses Correlated Generalized Linear Mixed Models (GLMM) g(E(Y|b)) = X β + Z b Responses Correlated Analysis describes differences in the mean of Y across the entire population Analysis describes differences in the mean of Y conditional on the patient’s specific random effect b Most relevant from an individual patient’s perspective Often b represent a dimension of frailty – Hence, X β tells about the relationship of Y to X among patients with the same frailty Analysis informative from population perspective; most relevant from perspective of Policy makers Providers desiring to optimize outcomes across entire population

Extreme Example Subject specific effects of X on Pr(Death), OR = 20 per 1 unit increase in X Population average effect of X on Pr(Death), OR = 2.7 per 1 unit increase in X

Example: Toenail Data Toenail Dermatophyte Onychomycosis: Common toenail infection, difficult to treat, affecting more than 2% of population. Design: Randomized, double-blind, parallel group, multicenter study for the comparison of two new compounds (A and B) for oral treatment. 2 x189 patients randomized, 36 centers 48 weeks of total follow up (12 months) 12 weeks of treatment (3 months) Measurements at months 0, 1, 2, 3, 6, 9, 12. Research question: Severity relative to treatment of TDO ?

Review of Generalized Linear Models (Independent Responses) Independent responses Yi, i = 1, 2, …, N Yi, with distribution from exponential family f(y;θ,ø) = Mean model μi = E(Yi|Xi1,Xi2,…,Xip) g(μi) = β0 + β1Xi1 + β2Xi2+ βpXip Variance function Var(Yi) = øV(μi) V(μi) is a known function determined by the assumed distribution of Y within the exponential family

Review of Generalized Linear Models (Independent Responses)  

Review of Generalized Linear Models (Independent Responses)  

Review of Generalized Linear Models (Independent Responses) Independent responses Yi, i = 1, 2, …, N Yi, with distribution from exponential family f(y;θ,ø) = Mean model μi = E(Yi|Xi1,Xi2,…,Xip) g(μi) = β0 + β1Xi1 + β2Xi2+ βJXiJ Variance function Var(Yi) = øV(μi) vi = V(μi) is a known function determined by the assumed distribution of Y within the exponential family The mean model is the only part we have to get right for valid large-sample inference!!!

Extension to GEE for Longitudinal Data GEE: Generalized Estimating Equations (Liang & Zeger, 1986; Zeger & Liang, 1986) • Method is semi-parametric – estimating equations are derived without full specification of the joint distribution of a subject’s observations • Instead, specification of The mean model for the marginal distributions of the yij The variance function of yij given µij The “working” correlation matrix for the vector of repeated observations from each subject Relies on the independence across subjects (or clusters) to estimate consistently the variance of the regression coefficients

GEE Method Outline 1. Relate the marginal response μij = E(yij) to a linear combination of the covariates g(μij) = Xtijβ • yij is the response for subject i at time j, j = 1,2, .., J • Xij is a p × 1 vector of covariates β is a p × 1 vector of regression coefficients • g(·) is the link function 2. Describe the variance of yij as a function of the mean V(yij) = v(μij)ø • ø is possibly unknown scale parameter • v(·) is a known variance function

Link and Variance Functions • Normally-distributed response g(μij) = μij “Identity link” v(μij) = 1 V(yij) = ø • Binary response (Bernoulli) g(μij) = log[μij/(1 − μij)] “Logit link” v(μij) = μij(1 − μij) ø = 1 • Poisson response g(μij) = log(μij) “Log link” v(μij) = μij ø = 1

GEE Method Outline 3. Choose the form of a n × n “working” correlation matrix Ri for each Yi

Working Correlation Structures

Working Correlation Structures

Working Correlation Structures (AR(1)

Working Correlation Structures

GEE Estimation • Define Ai = n × n diagonal matrix with V(μij) as the jth diagonal element • Define Ri(α) = n × n “working” correlation matrix (of the n repeated measures) Working variance–covariance matrix for Yi equals Vi(α) = øAi1/2 Ri(α) Ai1/2

 

 

 

 

 

GEE vs. GLMM 1) Target of Inference: GEE: Population Average GLMM: Subject Specific Notes: Recent work on perform population average inference under GLMM models

GEE vs. GLMM 2) Outputs: GEE: GLMM: Coefficients relating Y to X Coefficients relating Y to X conditional on b Estimates of subject specific random effects Variance of subject specific random effects

GEE vs. GLMM 3) Robustness: GEE (with robust variance estimates): Inference valid in large samples even if distribution of Y and/or variance of Y are incorrectly specified GLMM (with model-based estimates) Valid inference generally requires correct specification of distribution of Y and of variance of Y Notes: Recent proposals for robust variance estimates under GLMM Inference for Linear Mixed Models remains valid if Y is not normal for large N Caveat to GEE robustness: GEE can be biased if time dependent covariates are used unless an independent working correlation matrix is used

GEE vs. GLMM 4) Efficiency (power and width of confidence intervals) Usually fairly efficient if variance function is correctly specified Between subject comparisons are nearly efficient if an independence covariance structure is used for balanced data GLMM: Maximum likelihood estimates are asymptotically efficient as long as the model is correctly specified

GEE vs. GLMM 5) Missing Data: “Classical” GEE (with robust variance estimates) Valid inference if data are Missing Completely At Random (MCAR) even if variance model is wrong If variance model is correct, estimate of β is still consistent if data are MAR but not MCAR (but standard errors are not correct) GLMM (with model-based estimates) Valid inference if data are Missing At Random (MAR) Notes: Various strategies for valid GEE inference if data are MAR

Missing data Three general approaches to dealing with missing data under GEE which assume MAR but not MCAR Inverse probability weighting (Robins, Rotnitzky and Zhao, JASA, 1995) Multiple imputation Inverse probability weighting with augmentation, or doubly robust estimation Each method can incorporate covariate information not included in the GEE model itself. This can make the MAR assumption much more plausible. Methods 2 and 3 can be considerably more efficient than standard inverse probability weighting

 

GEE vs. GLMM 6) Small to Moderate Samples: GEE (with robust variance estimates): Estimated standard errors are unstable and biased downwards Inefficient estimating equation for estimating variance Effectively uses fully unstructured variance model “Sample size” means the number of independent units Various corrections have been proposed (available in PROC GLIMMIX) GLMM (with model-based estimates) Large-sample approximations are often invoked, but performance usually better than GEE with small to moderate N if model is correctly specified.

More Toenail Data Multicenter trial comparing active vs. control oral treatments for toenail infection Repeated measurements of binary outcome: 0 = none or mild separation 1 = severe separation 1908 observations in 294 patients, mostly over 1 year

**** Standard GENMOD GEE program using Robust SEs *****; **** Binary outcome leads to default logistic link function ****; proc genmod descending; Class id; model outcome = treatment month treatment*month/ dist=bin; repeated subject=id/type=exch covb corrw; estimate 'Control Slope' month 1/exp; estimate 'Treartment Slope' month 1 treatment*month 1/exp; run; Working Correlation Matrix Col1 Col2 Col3 Col4 Col5 Col6 Col7 Row1 1.0000 0.4212 0.4212 0.4212 0.4212 0.4212 0.4212 Row2 0.4212 1.0000 0.4212 0.4212 0.4212 0.4212 0.4212 Row3 0.4212 0.4212 1.0000 0.4212 0.4212 0.4212 0.4212 Row4 0.4212 0.4212 0.4212 1.0000 0.4212 0.4212 0.4212 Row5 0.4212 0.4212 0.4212 0.4212 1.0000 0.4212 0.4212 Row6 0.4212 0.4212 0.4212 0.4212 0.4212 1.0000 0.4212 Row7 0.4212 0.4212 0.4212 0.4212 0.4212 0.4212 1.0000

Analysis Of GEE Parameter Estimates **** Standard GENMOD GEE program using Robust SEs; **** Binary outcome leads to default logistic link function; proc genmod descending; Class id; model outcome = treatment month treatment*month/ dist=bin; repeated subject=id/type=exch covb corrw; estimate 'Control Slope' month 1/exp; estimate 'Treatment Slope' month 1 treatment*month 1/exp; run; Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Standard 95% Confidence Parameter Estimate Error Limits Z Pr > |Z| Intercept -0.5819 0.1720 -0.9191 -0.2446 -3.38 0.0007 treatment 0.0072 0.2595 -0.5013 0.5157 0.03 0.9779 month -0.1713 0.0300 -0.2301 -0.1125 -5.71 <.0001 treatment*month -0.0777 0.0541 -0.1838 0.0283 -1.44 0.1509

**** Standard GENMOD GEE program using Robust SEs *****; **** Binary outcome leads to default logistic link function ****; proc genmod descending; Class id; model outcome = treatment month treatment*month/ dist=bin; repeated subject=id/type=exch covb corrw; estimate 'Control Slope' month 1/exp; estimate 'Treatment Slope' month 1 treatment*month 1/exp; run; Can ignore in this case Contrast Estimate Results Mean Mean L'Beta Standard Label Estimate Confidence Limits Estimate Error Control Slope 0.4573 0.4427 0.4719 -0.1713 0.0300 Exp(Control Slope) 0.8426 0.0253 Treatment Slope 0.4381 0.4165 0.4599 -0.2490 0.0450 Exp(Treatment Slope) 0.7796 0.0351 L'Beta Chi- Label Alpha Confidence Limits Square Pr > ChiSq Control Slope 0.05 -0.2301 -0.1125 32.60 <.0001 Exp(Control Slope) 0.05 0.7945 0.8936 Treatment Slope 0.05 -0.3373 -0.1607 30.57 <.0001 Exp(Treatment Slope) 0.05 0.7137 0.8515

Solutions for Fixed Effects **** GLIMMIX GLMM Estimating Subject Specific Effects ****; **** Binary outcome leading to default logistic link function ****; proc glimmix method=RSPL data=toenail; Class id; model outcome (event="1") = treatment month treatment*month/ s dist=binary; random int / subject=id; estimate 'Control Slope' month 1/or; estimate 'Treartment Slope' month 1 treatment*month 1/or cl; run; Solutions for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept -0.7204 0.2370 292 -3.04 0.0026 treatment -0.02594 0.3360 1612 -0.08 0.9385 month -0.2782 0.03222 1612 -8.64 <.0001 treatment*month -0.09583 0.05105 1612 -1.88 0.0607

data small; set toenail; if id <= 20; *** Small Sample; data small; set toenail; if id <= 20; ** Standard GENMOD GEE with Robust SEs: 17 Patients Only ***; ** Binary outcome leading to default logistic link function **; proc genmod descending; Class id; model outcome = treatment month treatment*month/ dist=bin; repeated subject=id/type=exch covb corrw; run; Standard 95% Confidence Parameter Estimate Error Limits Z Pr > |Z| Intercept -0.3558 0.6272 -1.5851 0.8736 -0.57 0.5706 treatment 0.0527 0.9679 -1.8444 1.9497 0.05 0.9566 month -0.1543 0.0991 -0.3485 0.0400 -1.56 0.1196 treatment*month 0.0272 0.1725 -0.3109 0.3654 0.16 0.8746

Solutions for Fixed Effects **** GLIMMIX GEE program using Robust SEs; **** Binary outcome leads to default logistic link function; **** Restricted to 17 patients; **** Small N Adjustment of Morel, Bokossa, and Neerchal (2003); proc glimmix method=RSPL empirical=mbn data=small; Class id; model outcome (event="1") = treatment month treatment*month/ s dist=binary ddfm=kenwardroger; random _residual_ / subject=id type=cs; run; Solutions for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept -0.3605 0.7369 15 -0.49 0.6317 treatment 0.05762 1.1209 15 0.05 0.9597 month -0.1530 0.1197 94 -1.28 0.2043 treatment*month 0.02560 0.1984 94 0.13 0.8976

THAT’s ALL