Issues in Experimental Design Reliability and ‘Error’

More things to think about in experimental design The relationship of reliability and power Treatment effect not the same for everyone –Some benefit more than others Sounds like no big deal (or even obvious), but all of these designs discussed assume equal effect of treatment for individuals

Reliability What is reliability? Often thought of as consistency, but this is more of a by- product of reliability –Not to mention that you could have perfectly consistent scores lacking variability (i.e. constants) for which one could not obtain measures of reliability Reliability may refer to a measure’s ability to capture an individual’s true score, to distinguish accurately one person from another on some measure It is the correlation of scores on some measure with their true scores regarding that construct

Classical True Score Theory Each subject’s score is true score + error of measurement Obs var = True var + Error var Reliability = True var / Obs var = 1 – Error var / Obs var

Reliability and power Reliability = True var / Obs var = 1 – Error var / Obs var If observed variance goes up, power will decrease However if observed variance goes up, we don’t know automatically what happens to reliability Obs var = True var + Error var If it is error variance that is causing the increase in observed variance, reliability will decrease 1 –Reliability goes down, Power goes down If it is true variance that is causing the increase in observed variance, reliability will increase –Reliability goes up, Power goes down The point is that psychometric properties of the variables play an important, and not altogether obvious role in how we will interpret results, and not having a reliable measure is a recipe for disaster

Error in ANOVA Typical breakdown in a between groups design –SS tot = SS b/t + SS e Variation due to treatment and random variation (error) The F statistic is a ratio of these variances F = MS b /MS e

Error in ANOVA Classical True Score Theory –Each subject’s score = true score + error of measurement MS e can thus be further partitioned –Variation due to true differences on scores between subjects and error of measurement (unreliability) MS e = MS er + MS es –MS er regards measurement error –MS es systematic differences between individuals MS es comes has two sources –Individual differences –Treatment differences »Subject by treatment interaction

Error in ANOVA The reliability of the measure will determine the extent to which the two sources of variability (MS er or MS es) contribute to the overall MS e If Reliability = 1.00, MS er = 0 –Error term is a reflection only of systematic individual differences If Reliability = 0.00, MS es = 0 –Error term is a reflection of measurement error only MS er = (1-Rel)MS e MS es = (Rel)MS e

Error in ANOVA We can actually test to see if systematic variation is significantly larger than variation due to error of measurement

Error in ANOVA With a reliable measure, the bulk of MS e will be attributable to systematic individual differences However with strong main effects/interactions, we might see sig F for this test even though the contribution to model is not very much Calculate an effect size (eta-squared) –SS es /SS total –Lyons and Howard suggest (based on Cohen’s rules of thumb) that <.33 would suggest that further investigation may not be necessary How much of the variability seen in our data is due to systematic variation outside of the main effects? –Subjects responding differently to the treatment

Gist Discerning the true nature of treatment effects, e.g. for clinical outcomes, is not easy, and not accomplished just because one has done an experiment and seen a statistically significant effect Small though significant effects with not so reliable measures would not be reason to go with any particular treatment as most of the variance is due poor measures and subjects that do not respond similarly to that treatment –One reason to perhaps suspect individual differences due to the treatment would be heterogeneity of variance –For example, lots of variability in treatment group, not so much in control Even with larger effects and reliable measures, a noticeable amount of the unaccounted for variance may be due to subjects responding differently to the treatment Methods for dealing with the problem are outlined in Bryk and Raudenbush (hierarchical linear modeling), but one strategy may be to single out suspected covariates and control for them (ANCOVA or Blocking)

Repeated Measure and Hierarchical Linear Modeling Another issue with ANOVA design again concerns the subject by treatment interaction, this time with regard to repeated measurements RM design can be seen as a special case of HLM where the RM (e.g. time) is nested within subjects The outcome is predicted by the repeated measure as before, but one can allow the intercept and slope(s) to vary over subjects, and that variance taken into account for the model In this manner the HLM approach is specifically examining the treatment by subject interaction, getting a sense of the correlation between starting point and subsequent change

Repeated Measures and Hierarchical Linear Modeling Briefly, HLM is a regression approach in which intercepts and/or coefficients are allowed to vary depending on other variables As an example, the basic linear model for RM is the same However, as an example, the intercept may be allowed to vary as a function of another variable (in this case Subject) Which gives a new regression equation (note how this compares to RM in the GLM notes)

Example with One-way From before, stress week before, the week of, or the week after their midterm exam Using lmer in R 1, allowing a random intercept for a linear model where time predicts stress level but the intercept is allowed to vary by subject reveals the same ANOVA –lmemod0=lmer(Score~Time+ (1|Subject),rmdata) –anova(lmemod0) SourcedfSSMSFp Subject9654.372.7 time2204.8102.46.747.0065 error18273.215.178 Analysis of Variance Table Df Sum Sq Mean Sq F value Time 2 204.8 102.4 6.7467

Example with One-way However, if I were allow the coefficients 1 to vary, I would also note that starting point matters, in that there is a negative relation with the intercept and the general effect of time If one starts out stressed, there is less of a jump during the midterm, and stronger decline by the end

Summary Even though ANOVA designs may seem straightforward on the surface, and even if one has control over the administration of the variable of interest, one can see that issues remain, and that the basic approach may be inadequate to resolving the true nature of effects

Resources Zimmerman & Williams (1986) Bryk & Raudenbush (1988) Lyons & Howard (1991)

Issues in Experimental Design Reliability and ‘Error’

Similar presentations

Presentation on theme: "Issues in Experimental Design Reliability and ‘Error’"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Issues in Experimental Design Reliability and ‘Error’

Similar presentations

Presentation on theme: "Issues in Experimental Design Reliability and ‘Error’"— Presentation transcript:

Similar presentations

About project

Feedback