Presentation is loading. Please wait.

Presentation is loading. Please wait.

Andrew Thomson on Generalised Estimating Equations (and simulation studies)

Similar presentations


Presentation on theme: "Andrew Thomson on Generalised Estimating Equations (and simulation studies)"— Presentation transcript:

1 Andrew Thomson on Generalised Estimating Equations (and simulation studies)

2 Topics Covered What are GEE? Relationship with robust standard errors Why they are not as complicated as they appear How does simulation answer (or not) the differences between different GEE approaches

3 Issues… My results are questionable (thanks to Richard…) Not shown in their entirety But – Agree with other studies Fixed cluster size is definitely correct

4 A simple example Consider simple uncorrelated linear regression, e.g. height on weight Minimize sum of squares

5 Simple example II Differentiate wrt each parameter and set = 0 In general if we have p covariates then minimizing ss is the same as solving p estimating equations

6 Extensions Non-linear regression (logistic) Weighting, based on the correlation of the results

7 Surprisingly – Not that bad For each cluster, D j is a 2 x m ij matrix

8 A is an m ij x m ij matrix with diagonal elements Independence – Identity matrix Exchangeable. 1s on the diagonal, rho everywhere else Unadjusted studies -

9 So what is D j T V j ? Independence – Control Independence - IV Exch Control Exch IV

10 Missing Out Some Algebra Independence. Estimate And estimate OR as Exch -

11 Simple Interpretation Independence gives equal weight to each observation Exchangeable gives weight proportional to the variance (measured by rho) No obvious working correlation matrix which gives equal weight to each cluster

12 Note on Simulation Used to make inference about methods behaviour when unclear as to theoretical properties Simulator has choice over – Parameters varied –Output measured These should answer relevant questions

13 Relevance for simulation studies Equal cluster sizes give the same point estimate Any potential benefits of one approach over the other in terms of precision (measured by MSE) cannot be found Simulation studies should always consider the variable cluster size case

14 Unadjusted studies What outcome (OR, RR, RD) are we interested in measuring? What weights do we use for each cluster? Does the estimating procedure e.g. confidence interval construction have the right size?

15 Estimating the Variance Done using robust standard errors F is a matrix which depends on V and D is estimated by Independence is identical to robust standard errors Criticism of GEE is also criticism of RSE

16 Problems and solutions is biased downwards for small samples (< 40 clusters) p-values too small We “know” what this bias is (function of D and V). Lets call it H We replace with Basically changing the filling of our sandwich

17 C.I Construction 1.Wald Test a)Independence b)Exchangeable c)Bias Corrected 2.Score Test (adjusted score test) Evaluate score equations at H 0 obtain a χ 2 statistic.

18 More on the score test Score test is conservative Using bias correction will make it worse Multiply χ 2 statistic by J / (J-1) CI construction is done using the bisection algorithm

19 Results! - Size (5% Nominal) 4-6 clusters15-20 clusters Naïve12% Ind11%9.5% Exch9%8% B.C.7.5%7% Adj. Score5.2%5%

20 Power H 0 is not true. Simulation studies tend to use beta- binomial distribution to simulate Common rho (?) If size is above nominal, power will e inflated as well. If they have the same size, does MSE have an effect?

21 Power results In general above nominal. Due to incorrect size Naïve > Ind > Exch > B.C = Score This result is expected and surprising at the same time. Score and B.C actually attain the nominal level Considered later

22 Adjusted studies Very few have been done ( 2.5) Beta – binomial distribution is not amenable to including covariates Cluster level covariate – same argument applies for the fixed / variable cluster size issue Results are identical

23 Why is the adjusted score powerful? 1.The score test is just better 2.Power is based on p-values, rather than C.Is. Containing 1. It is possible to have a p-value that is significant but the confidence interval contains 1 3.Score statistic not derived for all data sets due to model fitting

24 Fitting the models R – various libraries (gee, geese, geepack). No score test. Crashes STATA – xtgee – no score test SAS – Proc Genmod. Score test. No score test CI construction S-Plus – code from authors (allegedly)

25 Convergence Depends on number of clusters 15 – 20 clusters 100% convergence 10 clusters 99.7% convergence 4 – 6 clusters 99% convergence Score test – lose even more in SAS 15 – 20 clusters lose another 0.5% 4 – 6 clusters lose another 1%

26 Conclusions If you wish to use GEE then the adjusted score test is the (only?) appropriate way for a small number of clusters This is perhaps questionable The most complicated model to fit in terms of code.

27 What Should Simulation Do? Reflect what you’ll see in practice –Variable cluster size –Include individual level covariates (ideally imbalanced) Look not only at size but power (and coverage) Measure MSE for no IV cases Sensitivity to departures from assumptions

28 Number of Studies that do this 0 Mine does. Perhaps ‘luck’ rather than judgement Designed it 2 years ago Decided 2 months ago that it was actually quite good

29 ‘Luck’ 1 supervisor, 2 advisors One advisor suggested MSE The other was adamant I did sensitivity analysis Richard obviously made outstanding contribution. Something of a consortium approach

30 Data sharing Given this – might be useful to have data files available online Use these for any further analysis methods that may become available Server space? Interactivity? Results?

31 Thank You


Download ppt "Andrew Thomson on Generalised Estimating Equations (and simulation studies)"

Similar presentations


Ads by Google