# Statistical Analysis Overview I Session 2 Peg Burchinal Frank Porter Graham Child Development Institute, University of North Carolina-Chapel Hill.

## Presentation on theme: "Statistical Analysis Overview I Session 2 Peg Burchinal Frank Porter Graham Child Development Institute, University of North Carolina-Chapel Hill."— Presentation transcript:

Statistical Analysis Overview I Session 2 Peg Burchinal Frank Porter Graham Child Development Institute, University of North Carolina-Chapel Hill

Overview: Statistical analysis overview I-b Nesting and intraclass correlation Hierarchical Linear Models –2 level models –3 level models

Nesting Nesting implies violation of the linear model assumptions of independence of observations Ignoring this dependency in the data results in inflated test statistics when observations are positively correlated –CAN DRAW INCORRECT CONCLUSIONS

Nesting and Design Educational data often collected in schools, classrooms, or special treatment groups –Lack of independence among individuals -> reduction in variability Pre-existing similarities (i.e., students within the cluster are more similar than a students who would be randomly selected) Shared instructional environment (i.e., variability in instruction greater across classroom than within classroom) Educational treatments often assigned to schools or classrooms –Advantage: To avoid contamination, make study more acceptable (often simple random assignment not possible) –Disadvantage: Analysis must take dependencies or relatedness of responses within clusters into account

Intraclass Correlation (ICC) For models with clustering of individuals –“cluster effect”: proportion of variance in the outcomes that is between clusters (compares within-cluster variance to between-cluster variance) –Example – clustering of children in classroom. ICC describes proportion of variance associated with differences between classrooms

Intraclass Correlation Intraclass correlation (ICC) – measure of relatedness or dependence of clustered data –Proportion of variance that is between clusters –ICC or  =    b / (    b +    w ) –ICC = 0 } no correlation among individuals within a cluster = 1 } all responses within the clusters are identical

Nesting, Design, and ICC Taking ICC into account results in less power for given sample size –less independent information Design effect = mk / (1 +  (m-1)) –m= number of individuals per cluster –K=number of clusters –  =ICC Effective sample size is number of clusters (k) when ICC=1 and is number of individuals (mk) when ICC=0

ICC and Hierchical Linear Models Hierarchical linear models (HLM) implicitly take nesting into account –Clustering of data is explicitly specified by model –ICC is considered when estimating standard errors, test statistics, and p-values

2 level HLM One level of nesting –Longitudinal: Repeated measures of individual over time Typically - Random intercepts and slopes to describe individual patterns of change over time –Clusters: Nesting of individuals within classes, families, therapy groups, etc. Typically - Random intercept to describe cluster effect

2 level HLM Random-intercepts models Corresponds to One-way ANOVA with random effects (mixed model ANOVA) Example: Classrooms randomly assigned to treatment or control conditions –All study children within classroom in same condition –Post treatment outcome per child (can use pre-treatment as covariate to increase power) –Level 1 = children in classroom Level 2 = classroom ICC reflects extent the degree of similarity among students within the classroom.

2 Level HLM Random Intercept Model Level 1 – individual students within the classroom –Unconditional Model: Y ij = B 0j + r ij –Conditional Model: Y ij = B 0j + B 1 X ij + r ij Y ij = outcome for i th student in j th class B 0j = intercept (e.g., mean) for j th class B 1 = coefficient for individual-level covariate, X ij r ij = random error term for i th student in j th class, E ( r ij ) = 0, var (r ij ) =  

2 Level HLM Random Intercept Model Level 2 – Classrooms –Unconditional model: B 0j =  00 + u 0j –Conditional model: B 0j =  00 +  01 W j1 +  02 W j2 + u 0j B 0j j = intercept (e.g., mean) for j th class  00 = grand mean in population  01 = treatment effect for W j, dummy variable indicating treatment status -.5 if control;.5 if treatment  02 coefficient for W j2, class level covariate u 0j = random effect associated with j-th classroom E (u ij ) = 0, var (u ij ) =  

2 Level HLM Random Intercept Model Combined (unconditional) –Y ij =  00 + u 0j + r ij Y ij = B 0j + r ij B 0j =  00 + u 0j Combined (conditional) –Y ij =  00 +  01 W j +  02 W j2 + B 1 X ij + u 0j + r ij Y ij = B 0j + B 1 X ij + r ij B 0j =  00 +  01 W j +  02 W j2 + u 0j Var (Y ij ) = Var ( u 0j + r ij ) = (      ICC =  =    (     

Example 2 level HLM Random Intercepts Purdue Curriculum Study (Powell & Diamond) –Onsite or Remote coaching –27 Head Start classes randomly assigned to onsite coaching and 25 to remote coaching –Post-test scores on writing –Onsite: n=196, M=6.70, SD=1.54 Remote: n=171, M=7.05, SD=1.64

Example 2 level HLM Random Intercepts Level 1: Writing ij = B 0j + B 1 Writing-pre ij + r ij B 1 =.56, se=.05, p<.001 E ( r ij ) = 0, var (r ij ) = 1.67 Level 2: B 0j =  00 +  01 Onsite j + u 0j  00 (intercept- remote group adjusted mean) = 3.74, se =.31  01 (Onsite-Remote difference) = -.37, se=.17, p=.03 E (u ij ) = 0, var (u ij ) =  ICC =    (       

2 Level HLM - Longitudinal (random-slopes and –intercepts models) Corresponds NOT to One-way ANOVA with random effects Example: Longitudinal assessment of children’s literacy skills during Pre-K years –Level 1 = individual growth curve Level 2 = group growth curve

Level 1- Longitudinal HLM Level 1 – individual growth curve –Unconditional Model: Y ij = B 0j + B 1j Age ij + r ij –Conditional Model: Y ij = B 0j + B 1j Age ij + B 2 X ij + r ij Y ij = outcome for i th student on the j th occasion Age ij = age at assessment for i th student on the j th occasion B 0j = intercept for i th student B 1j = slope for Age for i th student B 2 = coefficient for tiem-varying covariate, X ij\ r ij = random error term for i th student on the j th occasion E ( r ij ) = 0, var (r ij ) =  

Level 2 – Longitudinal HLM Level 2 – predicting individual trajectories –Unconditional model: B 0j =  00 + u 0j B 1j =  10 + u 1j –Conditional model: B 0j =  00 +  01 W j1 +  02 W j2 + u 0j B 1j =  10 +  11 W j1 +  12 W j2 + u 1j B 0j = intercept for i th student B 1j = slope for Age for i th student  00 = intercept in population  10 = slope in population  01 = treatment effect on intercept for W j, student - level covariate  11 = treatment effect on slope for W j, student - level covariate

Level 2 – Longitudinal HLM Level 2 – predicting individual trajectories –Unconditional model:B 0j =  00 + u 0j B 1j =  10 + u 1j –Conditional model: B 0j =  00 +  01 W j1 + u 0j B 1j =  10 +  11 W j1 + u 1j u 0j = random effect for individual intercept u 0j = random effect for individual slope E (u 0j ) = 0, var (u 0j ) =   E (u 1j ) = 0, var (u 1j ) =    cov  u 0j, u 1j ) =   var  u 0j, u 1j )=         level 1 and 2 error terms independent cov (r ij, T) = 0

Example – Longitudinal HLM Purdue Curriculum Study (Powell & Diamond) Level 1 – estimating individual growth curves for children in one treatment condition (Remote) –Level 2 – estimating population growth curves for Remote condition BlendingPrePostFollow-up N M (sd) 187 9.48 (5.34) 171 13.75 (4.57) 63 15.14 (4.60)

Example Level 1: blending ij = B 0j + B 1j Age ij + r ij estimated    Level 2: B 0j =  00 +  01 W j1 + u 0j B 1j =  10 + u 1j Estimated results Intercept  00 = 11.86 (se=.48),  00 = 10.03** season  01 = 2.43* (se=.70) Slope  10 = 1.51* (se=.60),  11 = 4.24**  10 = -1.45**

3 level HLM 2 levels of nesting Examples –Longitudinal assessments of children in randomly assigned classrooms Level 1 – child level data Level 2 – child’s growth curve Level 3 – classroom level data –Two levels of nesting such as children nested in classrooms that are nested in schools Level 1 – child level data Level 2 – classroom level data Level 3 – school level data

3 level Model-Random Intercepts Children nested in classrooms, classrooms nested in schools –Level 1 child-level model Y ijk =  ojk + e ijk Y ijk is achievement of child I in class J in school K  ojk is mean score of class j in school k e ojk is random “child effect” –Classroom level model  ojk =  00k + r 0jk  00k is mean score for school k r 0jk is random “class effect” –School level model  00k =  000 + u 00k  000 is grand mean score u 00k is random “school effect”

3 level Model-Random Intercepts Children nested in classrooms, classrooms nested in schools –Level 1 child-level model Y ijk =  ojk + e ijk e ojk is random “child effect”, E (e ijk ) = 0, var(e ijk ) =   –Within classroom level model  ojk =  00k + r 0jk r 0jk is random “class effect”, E (r 0jk ) = 0, var(r 0jk ) =   Assume variance among classes within school is the same –Between classroom (school)  00k =  000 +  01 trt + u 00k E (u 00k ) = 0, var(u 00k ) =  

Partitioning variance Proportion of variance within classroom           Proportion of variance among classrooms within schools            Proportion of variance among schools           

3 Level HLM – level 2 longitudinal and level 3 random intercepts Typically – treatment randomly assigned at classroom level, children followed longitudinally (e.g., Purdue Curriculum Study) –(within child) Level 1: Y ijk =  0j k +  1j k Age ijk + r ijk E (e ijk ) = 0, var(e ijk ) =   –(between child ) Level 2:  0jk =  00k + r 0jk;  1j k =  10k + r 1jk E (r 0jk ) = 0, var(r 0jk ) =   E (r 1jk ) = 0, var(r 1jk ) =   –(between classes) Level 3:  00k =  00 + u 00k;  10k =  10 + u 10k E (u 00k ) = 0, var(u 00k ) =   E (u 10k ) = 0, var(u 10k ) =  

Example Purdue Curriculum Study Level 1 – individual growth curve Level 2 – classroom growth curve Level 3 – treatment differences in classroom growth curves WritingPrePostFollow-up Onsite M (se) N=199 5.98 (1.49) N=196 6.70 (1.54) N=79 6.92 (1.74) Remote M (se) N=187 6.01 (1.55) N=171 7.04 (1.64) N=63 7.48 (1.62)

Purdue Curriculum Study

Threats Homogeneity of variance – at each level –Nonnormal data with heavy tails –Bad data –Differences in variability among groups Normality assumption –Examine residuals –Robust standard error (large n) Inferences with small samples

3 Level HLM Longitudinal assessments of individual in clustered settings

Download ppt "Statistical Analysis Overview I Session 2 Peg Burchinal Frank Porter Graham Child Development Institute, University of North Carolina-Chapel Hill."

Similar presentations