From t-test to multilevel analyses (Linear regression, GLM, …)

From t-test to multilevel analyses (Linear regression, GLM, …)
Stein Atle Lie, statistician, professor Uni Health, Uni Research

Outline Pared t-test (Mean and standard deviation)
Two-group t-test (Mean and standard deviations) Linear regression GLM (general linear models) GLMM (general linear mixed model) … PASW (former SPSS), Stata, R, gllamm (Stata)

Multilevel models “Same thing – many names”: Random effects models
Mixed effects models Variance component models Frailty models (in survival analyses) Latent variables

Objective Take the general thinking from simple statistical methods into more sophisticated data-structures and statistical analyses Focus on the interpretation of the results with respect to those found in basic statistical methods

Multilevel data Types of data:
Repeated measures for the same individual The same measure is repeated several times on the same individual Several observers have measured the same individual Several different measures for the same individual A categorical variable with ”many” levels (multicenter data)

Null hypotheses In ordinary statistics (using both pared and two‑sample t-tests) we define a null hypothesis. H0: m1 = m2 We assume that mean from group (or measure) 1 is equal to the mean from group (or measure) 2. Alternatively H0: D = m1-m2 = 0

p-value Definition: “If our null-hypothesis is true - what is the probability to observe the data* that we did?” * And hence the mean, t-statistic, etc…

p-value We assume that our null-hypothesis is true (m0=0 or m1-m2=0)
We observe our data Mean value etc. Under the assumption of normal distributed data p-value The p-value is the probability to observe our data (or something more extreme) under the given assumptions m0

Pared t-test The straightforward way to analyze two repeated measures is a pared t-test. Measure at time1 or location1 (e.g. Data1) is directly compared to measure at time2 or location2 (e.g. Data2) Is the difference between Data1 and Data2 (Diff = Data1-Data2) unlike 0?

Pared t-test (n=10) PASW: T-TEST PAIRS=Data1 WITH Data2 (PAIRED).

Pared t-test The pared t-test will only be performed for complete (balanced) data. What happens if we delete two observations from data2? (Only 8 complete pairs remain)

Pared t-test (n=8) PASW: T-TEST PAIRS=Data1 WITH Data2 (PAIRED). Excel

Two group t-test If we now consider the data from time1 and time2 (or location1 and location2) to be independent (even if their not) and use a two group t-test on the full dataset, 2*10 observations

Two group t-test (n=20 [10+10])
PASW: T-TEST GROUPS=Grp(1 2) /VARIABLES=Data.

Two group t-test Observe that mean for Grp1 and Grp2 is equal to mean for Data1 and Data2 And that the mean difference is also equal The difference between pared t-test and two group t-test lies in the Variance - and the number of observations and therefore in the standard deviation and standard error and hence in the p-value and confidence intervals

Two group t-test The two group t-test are performed on all available data. What happens if we delete two observations from Grp2? (Only 8 complete pairs remain - but 18 observations remain)

Two group t-test (n=18 [10+8])
PASW: T-TEST GROUPS=Grp(1 2) /VARIABLES=Data.

Two group t-test (s1=s2) s1 s2 m1 m2 D

Two group t-test (s1=s2) s1 s2

ANOVA (Analysis of variance (s1=s2=s3)
m1 m2 m3

ANOVA (Analysis of variance (s1=s2=s3)

Linear regression If we now perform an ordinary linear regression with the data as outcome (dependent variable) and the group variable (Grp=1 and 2) as independent variable the coefficient for group is identical to the mean difference and the standard error, t-statistic, and p‑value are identical to those found in a two‑group t‑test

Linear regression Now exchange the independent variable for group (Grp=1 and 2) with a dummy variable (dummy=0 for grp=1 and dummy=1 for grp=2) the coefficient for the dummy is equal to the coefficient for grp (the mean difference) and the coefficient for the constant term is equal to the mean for grp1 (the standard error is not!)

Linear models in Stata In ordinary linear models (regress and glm) in Stata one may add an option for clustered data – to obtain standard errors adjusted for intragroup correlation. (This is ideal when you want to adjust for clustered data, but are not interested in the correlation within or between groups)

Linear models in Stata Thus, we now have an alternative to the pared t‑test. The mean difference is identical to that obtained from the pared t‑test, and the standard errors (and p-values) are adjusted for intragroup correlation As an alternative we may use the program gllamm (Generalized Linear Latent And Mixed Models) in Stata

gllamm (n=20) gllamm (Stata): . gllamm data dummy, i(id)
number of level 1 units = 20 number of level 2 units = 10 data | Coef. Std. Err z P>|z| [95% Conf. Interval] dummy | _cons | Variance at level ( ) Variances and covariances of random effects level 2 (id) var(1): ( )

Linear models in Stata If we now delete two of the observations in Grp2 We then have coefficients (“mean differences”) calculated based on all (n=18) data and standard errors corrected for intragroup correlation - using the commands <regress>, <glm> or <gllamm>

Intra class correlation (ICC)
Variance at level ( ) level 2 (id) var(1): ( ) The total variance is hence = (and the standard deviation is hence ) The proportion of variance attributed to level 2 is therefore ICC = / = 0.578

Linear regression Ordinary linear regression
Assumes data is Normal and i.i.d. (identical independent distributed)

Linear regression Y X Regression line: y = b0 + b1·x b1 b0
residual b1 (x1,y1) (xn,yn) (xi,yi) b0 Kortisol * Months Height * Weight Kortisol * Time X

Linear regression Assumptions:
1) y1, y2,…, yn are independent normal distributed 2) The expectation of Yi is: E(Yi) = b0 + b1·xi (linear relation between X and Y) 3) The variance of Yi is: var(Yi) = s2 (equal variance for ALL values of X)

Linear regression Assumptions - Residualer (ei): yi = a + b·xi + ei
1) e1, e2,…, en are independent normal distributed 2) The expectation of ei is: E(ei) = 0 3) The variance of ei is: var(Yi) = s2

Ordinary linear regression
The formula for an ordinary regression can be expressed as: yi = b0 + b1·xi + ei ei ~N(0, se2)

Random intercept model
Y Regression lines: yij = b0 + b1·xij+vij (x11,y11) b1 (xnp,ynp) b0+uj (xij,yij) su se X

For a random intercept model, we can express the regression line(s) - and the variance components as yij = b0 + b1·xij + vij vij = uj + eij eij ~N(0, se2) (individual) uj ~N(0, su2) (group)

Alternatively we may express the formulas, for the simple variance component model, in terms of random intercepts: yij = b0j + b1·xij + eij b0j = b0 + uj eij ~N(0, se2) (individual) uj ~N(0, su2) (group)

Random slope model For a random slope model (the intercepts are equal), we can express the regression line(s) and the variance components as yij = b0 + b1j·xij + eij b1j = b1+ wj eij ~N(0, se2) (individual) wj ~N(0, sw2) (group)

Random slope and intercept model
For a random slope and random intercept model, we can express the regression line(s) and the variance components as yij = b0j + b1j·xij + eij b1j = b1+ wj b0j = b0 + uj eij ~N(0, se2) (individual) uj ~N(0, su2) (group) wj ~N(0, sw2) (group)

Cortisol data Cortisol level in saliva measured each morning in 3 days, in two periods* 55 individuals 278 observations (52 missing) * The real data was measured 5 times per day, in 3 days and 3 periods - from the article: Harris A, Marquis P, Eriksen HR, Grant I, Corbett R, Lie SA, Ursin H. Diurnal rhythm in British Antarctic personnel. Rural Remote Health Apr-Jun;10(2):1351.

Cortisol data – missing data

Cortisol data – long data format

Cortisol data Period1 Period2

lmer(Kortisol~1+Day2+Day3+Period2 +(1|ID),data=kortisol) Random effects: Groups Name Variance Std.Dev. ID (Intercept) Residual Number of obs: 278, groups: ID, 55 Fixed effects: Estimate Std. Error t value (Intercept) Day Day Period ICC=0.219

Cortisol data Period1 Period2

PASW: MIXED Kortisol BY ID WITH Period2 Day2 Day3 /FIXED=Period2 Day2 Day3 | SSTYPE(3) /METHOD=REML /PRINT=SOLUTION /RANDOM=ID | COVTYPE(VC). ICC=0.219

Linear mixed model (random intercept model)
lmer(Kortisol~1+Day+Period2 +(1|ID),data=kortisol) Random effects: Groups Name Variance Std.Dev. ID (Intercept) Residual Number of obs: 278, groups: ID, 55 Fixed effects: Estimate Std. Error t value (Intercept) Day Period ICC=0.220

Linear mixed model (random intercept model)
Period1 Period2

Linear mixed model (random slope model)
lmer(Kortisol~1+Day+Period2 +(Day-1|ID),data=kortisol) Random effects: Groups Name Variance Std.Dev. ID Day e ! Residual e Number of obs: 278, groups: ID, 55 Fixed effects: Estimate Std. Error t value (Intercept) Day Period

Period1 Period2

Linear mixed model (random slope & intercept)
lmer(Kortisol~1+Day+Period2 +(1+Day|ID),data=kortisol) Random effects: Groups Name Variance Std.Dev. Corr ID (Intercept) Day Residual Number of obs: 278, groups: ID, 55 Fixed effects: Estimate Std. Error t value (Intercept) Day Period ICC=0.257

Period1 Period2

Summary The interpretation of parameter estimates of categorical variables (preferably dummy variables) from linear models can be interpreted as mean differences, as from ordinary t-test This is equivalent in models for repeated or clustered observations!

Software Personal opinion
PASW/SPSS Very easy to do simple models (menu/syntax) Arrange data Stata Steeper learning curve to start Easy () to extend the simpler models to more sophisticated models glamm R Steep learning curve Nice graphics

From t-test to multilevel analyses (Linear regression, GLM, …)

Similar presentations

Presentation on theme: "From t-test to multilevel analyses (Linear regression, GLM, …)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

From t-test to multilevel analyses (Linear regression, GLM, …)

Similar presentations

Presentation on theme: "From t-test to multilevel analyses (Linear regression, GLM, …)"— Presentation transcript:

Similar presentations

About project

Feedback