1 Generalized Estimating Equations (GEEs) Purpose: to introduce GEEs These are used to model correlated data from Longitudinal/ repeated measures studies.

Presentation on theme: "1 Generalized Estimating Equations (GEEs) Purpose: to introduce GEEs These are used to model correlated data from Longitudinal/ repeated measures studies."— Presentation transcript:

1 Generalized Estimating Equations (GEEs) Purpose: to introduce GEEs These are used to model correlated data from Longitudinal/ repeated measures studies Clustered/ multilevel studies

2 Outline Examples of correlated data Successive generalizations –Normal linear model –Generalized linear model –GEE Estimation Example: stroke data – exploratory analysis – modelling

3 Correlated data 1.Repeated measures: same subjects, same measure, successive times – expect successive measurements to be correlated Subjects, i = 1,…,n A C B Randomize Treatment groups Measurement times

4 Correlated data 2.Clustered/multilevel studies Level 3 Level 2 Level 1 E.g., Level 3: populations Level 2: age - sex groups Level 1: blood pressure measurements in sample of people in each age - sex group We expect correlations within populations and within age-sex groups due to genetic, environmental and measurement effects

5 Notation Repeated measurements: y ij, i = 1,… N, subjects; j = 1, … n i, times for subject i Clustered data: y ij, i = 1,… N, clusters; j = 1, … n i, measurements within cluster i Use “unit” for subject or cluster

6 Normal Linear Model For all units: E(y)=  =X ,y~N( ,V) This V is suitable if the units are independent For unit i: E(y i )=  i =X i  ;y i ~N(  i, V i ) X i : n i  p design matrix  : p  1 parameter vector V i : n i  n i variance-covariance matrix, e.g., V i =  2 I if measurements are independent

7 Normal linear model: estimation Solve this set of score equations to estimate We want to estimate and V Use

8 Generalized linear model (GLM)

9 Generalized estimating equations (GEE)

10 Generalized estimating equations D i is the matrix of derivatives  i /  j V i is the ‘working’ covariance matrix of Y i A i =diag{var(Y ik )}, R i is the correlation matrix for Y i  is an overdispersion parameter

11 Overdispersion parameter Estimated using the formula: Where N is the total number of measurements and p is the number of regression parameters The square root of the overdispersion parameter is called the scale parameter

12 Estimation (1) More generally, unless V i is known, need iteration to solve 1.Guess V i and estimate  by b and hence  2.Calculate residuals, r ij =y ij -  ij 3.Estimate V i from the residuals 4.Re-estimate b using the new estimate of V i Repeat steps 2-4 until convergence

13 Estimation (2) – For GEEs

14 Iterative process for GEE’s Start with R i =identity (ie independence) and  =1: estimate  Use estimates to calculated fitted values: And residuals: These are used to estimate A i, R i and  Then the GEE’s are solved again to obtain improved estimates of 

15 Correlation For unit i For repeated measures = correl between times l and m For clustered data = correl between measures l and m For all models considered here V i is assumed to be same for all units

16 Types of correlation 1. Independent: V i is diagonal 2. Exchangeable: All measurements on the same unit are equally correlated Plausible for clustered data Other terms: spherical and compound symmetry

17 Types of correlation 3. Correlation depends on time or distance between measurements l and m e.g. first order auto-regressive model has terms ,  2,  3 and so on Plausible for repeated measures where correlation is known to decline over time 4. Unstructured correlation: no assumptions about the correlations Lots of parameters to estimate – may not converge

18 Missing Data For missing data, can estimate the working correlation using the all available pairs method, in which all non-missing pairs of data are used in the estimators of the working correlation parameters.

19 Choosing the Best Model Standard Regression (GLM) AIC = - 2*log likelihood + 2*(#parameters)  Values closer to zero indicate better fit and greater parsimony.

20 Choosing the Best Model GEE QIC(V) – function of V, so can use to choose best correlation structure. QIC u – measure that can be used to determine the best subsets of covariates for a particular model. the best model is the one with the smallest value!

21 Other approaches – alternatives to GEEs 1.Multivariate modelling – treat all measurements on same unit as dependent variables (even though they are measurements of the same variable) and model them simultaneously (Hand and Crowder, 1996) e.g., SPSS uses this approach (with exchangeable correlation) for repeated measures ANOVA

22 Other approaches – alternatives to GEEs 2.Mixed models – fixed and random effects e.g., y = X  + Zu + e  : fixed effects; u: random effects ~ N(0,G) e: error terms ~ N(0,R) var(y)=ZG T Z T + R so correlation between the elements of y is due to random effects Verbeke and Molenberghs (1997)

23 Example of correlation from random effects Cluster sampling – randomly select areas (PSUs) then households within areas Y ij =  + u i + e ij Y ij : income of household j in area i  : average income for population u i : is random effect of area i ~ N(0, ); e ij : error ~ N(0, ) E(Y ij ) =  ; var(Y ij ) = ; cov(Y ij,Y km )=, provided i=k, cov(Y ij,Y km )=0, otherwise. So V i is exchangeable with elements: =ICC (ICC: intraclass correlation coefficient)

24 Numerical example: Recovery from stroke Treatment groups A = new OT intervention B = special stroke unit, same hospital C= usual care in different hospital 8 patients per group Measurements of functional ability – Barthel index measured weekly for 8 weeks Y ijk : patients i, groups j, times k Exploratory analyses – plots Naïve analyses Modelling

25 Numerical example: time plots Individual patients and overall regression line

26 Numerical example: time plots for groups

27 Numerical example: research questions Primary question: do slopes differ (i.e. do treatments have different effects)? Secondary question: do intercepts differ (i.e. are groups same initially)?

28 Numerical example: Scatter plot matrix

29 Numerical example Correlation matrix week1234567 20.93 30.880.92 40.830.880.95 50.790.850.910.92 60.710.790.850.880.97 70.620.700.770.830.920.96 80.550.640.700.770.880.930.98

30 Numerical example 1. Pooled analysis ignoring correlation within patients

31 Numerical example 2. Data reduction

32 Numerical example 2. Repeated measures analyses using various variance-covariance structures For the stroke data, from scatter plot matrix and correlations, an auto-regressive structure (e.g. AR(1)) seems most appropriate Use GEEs to fit models

33 Numerical example 4. Mixed/Random effects model Use model Y ijk = (  j + a ij ) + (  j + b ij )k + e ijk (i)  j and  j are fixed effects for groups (ii)other effects are random and all are independent Fit model and use estimates of fixed effects to compare  j ’s and  j ’s

34 Numerical example: Results for intercepts Intercept AAsymp SERobust SE Pooled29.8215.772 Data reduction29.8217.572 GEE, independent29.8215.68310.395 GEE, exchangeable29.8217.04710.395 GEE, AR(1)33.4927.6249.924 GEE, unstructured30.7037.40610.297 Random effects29.8217.047 Results from Stata 8

35 Numerical example: Results for intercepts B - AAsymp SERobust SE Pooled3.3488.166 Data reduction3.34810.709 GEE, independent3.3488.03711.884 GEE, exchangeable3.3489.96611.884 GEE, AR(1)-0.27010.78211.139 GEE, unstructured2.05810.47411.564 Random effects3.3489.966 Results from Stata 8

36 Numerical example: Results for intercepts C - AAsymp SERobust SE Pooled-0.0228.166 Data reduction-0.01810.709 GEE, independent-0.0228.03711.130 GEE, exchangeable-0.0229.96611.130 GEE, AR(1)-6.39610.78210.551 GEE, unstructured-1.40310.47410.906 Random effects-0.0229.966 Results from Stata 8

37 Numerical example: Results for slopes Slope AAsymp SERobust SE Pooled6.3241.143 Data reduction6.3241.080 GEE, independent6.3241.1251.156 GEE, exchangeable6.3240.4631.156 GEE, AR(1)6.0740.7401.057 GEE, unstructured7.1260.8791.272 Random effects6.3240. 463 Results from Stata 8

38 Numerical example: Results for slopes B - AAsymp SERobust SE Pooled-1.9941.617 Data reduction-1.9941.528 GEE, independent-1.9941.5921.509 GEE, exchangeable-1.9940.6551.509 GEE, AR(1)-2.1421.0471.360 GEE, unstructured-3.5561.2431.563 Random effects-1.9940.655 Results from Stata 8

39 Numerical example: Results for slopes C - AAsymp SERobust SE Pooled-2.6861.617 Data reduction-2.6861.528 GEE, independent-2.6861.5921.502 GEE, exchangeable-2.6860.6551.509 GEE, AR(1)-2.2361.0471.504 GEE, unstructured-4.0121.2431.598 Random effects-2.6860.655 Results from Stata 8

40 Numerical example: Summary of results All models produced similar results leading to the same conclusion – no treatment differences Pooled analysis and data reduction are useful for exploratory analysis – easy to follow, give good approximations for estimates but variances may be inaccurate Random effects models give very similar results to GEEs don’t need to specify variance-covariance matrix model specification may/may not be more natural

Download ppt "1 Generalized Estimating Equations (GEEs) Purpose: to introduce GEEs These are used to model correlated data from Longitudinal/ repeated measures studies."

Similar presentations