Presentation is loading. Please wait.

Presentation is loading. Please wait.

Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression.

Similar presentations


Presentation on theme: "Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression."— Presentation transcript:

1 Different Distributions David Purdie

2 Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression Survival data: –Cox regression

3 General form for distributions from the exponential family Outcome for subject i at time j = Y ij E(Y ij )=  ij Generalized linear model g(  ij )=X i  where X i =(x i1,…,x ij ) is the matrix of covariates for subject i

4 Binary outcomes: logistic regression Outcome: Pr(Y ij = 1) =  ij (probability of an event) Pr(Y ij = 0) = 1-  ij. Logit link function: Logistic model: where  ij = E(Y ij |X i )

5 Events over time: Poisson regression Outcome: Y i = number of events in time period t i E(Y i ): i t i Var(Y i t i )= i t i (were i is the event rate) Log link function: log ( i ) Poisson model:

6 Survival data: Cox regression Parameter: t ij (time to event y ij ) Based on a hazard function: h t Outcome: T ij = time till event y ij Log link function: log (h t ) Cox model: where  t is the baseline hazard rate.

7 Alternating logistic regression If the responses are binary, it may make more sense to use a matrix of odds ratios rather than correlations. Replace corr(Y ij, Y ik ) with: The ALR algorithm models  ijk = log{OR(Y ij,Y ik )} as:  ijk =z ijk  where  are regression parameters and z is fixed and needs to be specified

8 Mixed Models for Non-Normal Data E(y|u)= , var(y|u)=  V(  ), g(  )=X  +Zu Random coefficients u have dist f(u) y|u has the usual glm distribution Binary outcome: –binomial for y|u and beta for u Count outcome: –Poisson for y|u and gamma for u

9 Example - binary Study of bladder cancer All patients had superficial bladder tumours on entry which were removed Two randomly allocated treatments ( group  ): – Placebo (n=47), Thiotepa (n=38) Many multiple recurrences of tumours Month  is month since treatment (1 to 53) Baseline covariates of number of initial tumours ( number  ) & size of largest tumour ( size  ) Lots of missing data: 3585 out of 4505 potential observations (80%) are missing Model missing data (yes/no) using a binomial GEE to assess if data is missing at random (logit link function)  Name in data set

10

11 Visits per subject NMeanMinMax Placebo478.7119 Thiotepa3813.5138 Total8510.8138

12 Plot of missing proportion over time

13 Format for the data in SAS Subjectgroupnumbercountmonthmissingsize 1010103 101.213 101.313. 2020101 202.211 2020301. 48110103 11.213 11.313. 49130101 13.211 13.311

14 Logistic GEE in SAS proc genmod data=tumour_miss descending; class group subject month; model missing=group month size number / dist=binomial type3; repeated subject=subject / type=ind corrw within=month; estimate 'effect of thiotepa' group -1 1/ exp; run;

15 ORs for group (Thiotepa vs plac) Corr structureOR95% CIP-value ind0.530.33 - 0.840.007 exch (  =0.12) 0.410.23 - 0.720.002 AR(1)0.530.33 - 0.840.007 mdep(1)0.530.33 - 0.840.007 mdep(3)0.560.35 – 0.880.013 unstr--- Log OR structure Logor=exch (OR=1.05) 0.510.32 – 0.810.004

16 Example - Poisson Response: number of new tumours ( count  ) Month  is month since treatment (1 to 53) Baseline covariates of number of initial tumours ( number  ) & size of largest tumour ( size  ) Timesince  is the number of months since the last visit Missing data are dependent upon treatment group and time Model new tumour counts using a Poisson GEE to assess treatment effect (log link function)  Name in data set

17 Count of tumours by treatment group NMeanStdMinMax Placebo4070.701.7309 Thiotepa5130.230.9909

18 New tumour counts over time by treatment group

19

20 Plot of observed means over time

21 Poisson GEE in SAS proc genmod data=tumour_count; class group subject month; model count=group size number timesince / dist=poisson scale=deviance; repeated subject=subject / type=exch withinsubject=month corrw; estimate 'effect of thiotepa' group -1 1/ exp; run;

22 RRs for group (Thiotepa vs plac) Corr structureOR95% CIP-value ind0.330.18 - 0.590.0002 exch (  =0.08) 0.380.21 - 0.680.0012 AR(1)0.350.19 - 0.620.0004 mdep(1)0.350.19 - 0.620.0004 mdep(5)0.370.21 - 0.650.0006 unstr*0.490.29 - 0.830.008 *WARNING: The number of response pairs for estimating correlation is less than or equal to the number of regression parameters. A simpler correlation model might be more appropriate.

23 Using an offset data tumour_count; set tumour_count; off=log(timesince+1); run; proc genmod data=tumour_count; class group subject month; model count=group size number / dist=poisson scale=deviance offset=off type3; repeated subject=subject / type=unstr withinsubject=month; estimate 'effect of thiotepa' group -1 1/ exp; run;

24 RRs for group (Thiotepa vs plac) Corr structureOR95% CIP-value ind 0.460.25 - 0.860.014 exch (  =0.07) 0.480.26 - 0.910.023 AR(1)0.46 0.25 - 0.840.012 mdep(1)0.46 0.25 - 0.840.012 mdep(5)0.460.25 - 0.840.011 unstr*0.850.36 - 2.020.708 *WARNING: The number of response pairs for estimating correlation is less than or equal to the number of regression parameters. A simpler correlation model might be more appropriate.

25 Interpretation and Presentation Descriptive: plots of means or tables of means (percentages, etc.) Tables of parameter estimates and confidence intervals (odds ratios or relative risks) P-values for effects or interactions (possibly just in the text) Emphasize results from descriptive analysis and effect estimates.

26 Statistical Methods What is the distribution of the outcome? How were the data summarized? Due to the repeated nature of the data, a generalized estimated equations (GEE) approach was used to estimate parameters and test for differences between groups. What was the form of the correlation structure? What hypotheses were being tested? How were missing data handled? How were variances calculated? What statistical package was used?

27 Example: Statistical Methods Mean numbers of new tumours were used to summarise the data. Poisson regression was used to model tumour counts using the time between successive observations as an offset. Due to the repeated nature of the data, a generalized estimated equations (GEE) approach was used to estimate parameters and test for differences between groups. The main hypothesis being tested was whether Thiotepa affected the numbers of new tumours. The correlation between successive observations was examined and an appropriate correlation structure was specified. Drop outs and non-attendance was examined to assess for differences between the treatment groups. Robust variance estimate techniques were used to calculate standard errors and confidence intervals. All analysis were performed using SAS version 8.2.


Download ppt "Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression."

Similar presentations


Ads by Google