Presentation on theme: "RELIABILITY Reliability refers to the consistency of a test or measurement. Reliability studies Test-retest reliability Equipment and/or procedures Intra-"— Presentation transcript:
1RELIABILITYReliability refers to the consistency of a test or measurement.Reliability studiesTest-retest reliabilityEquipment and/or proceduresIntra- or inter-rater reliabilityAssessing the reliability of individual raters or a group of raters
2Terminology Reliability Consistency Precision Repeatability Agreement “Reliability” and “agreement” are not synonymous!
3Quantification of Reliability In terms of “consistency” of measurements:Relative consistencyThe consistency of the position or rank of individuals in the group relative to others.Quantified by the “intraclass correlation coefficient” (ICC)Absolute consistencyAn indication of the “precision” of a scoreAllows for constructing confidence intervals about a scoreQuantified by the “standard error of measurement” (SEM) or variations thereofMinimum difference, standard error of prediction (SEP), etc.
4Other Procedures Used to Quantify Reliability Pearson Product Moment correlation (Pearson r)Cannot detect systematic errorThe coefficient of variationStandard deviation ÷ meanLimits of agreement (Bland-Altman plots)Bland-Altman plots compare two measurement techniques on the same variableExample:DEXA vs. UWW for body composition
5Reliability Theory Each observed score is composed of two parts: True scoreThe mean of an infinite number of scores from a subjectErrorTrue score – observed score = errorSources of error:Biological variability, instrumentation, error by the subject, error by the tester, etc.Similarly, for a group of scores, the total variance (σ2T) in the data has two components:True score variance (σ2t)Error variance (σ2e)
6Reliability Theory Therefore: If we make a ratio of the true score variance (σ2t) to the total variance (σ2T) we have a reliability coefficient defined as:
7Reliability Theory The closer to 1.0, the higher the reliability Problem…We don’t actually know the “true score” for each subject; therefore, we don’t know the “true score variability.”We use an index for true score variability (σ2t) based on between-subjects variability; therefore, the formal definition of reliability becomes…
8Variance EstimatesVariance estimates are derived from the single-factor, within-subjects ANOVA modelAppropriate mean square values (MS) are recorded from the ANOVA tableNOTE: These will be the values we use to calculate the ICCs
9Intraclass Correlation Coefficients ICC is a relative measureRatio of variances from ANOVAUnitlessMore like to R2 from regression, than the Pearson R1 = perfect reliability; 0 = no reliabilityThe relative nature of the ICC and the magnitude of the ICC is dependent on the between-subjects variability↑ between-subjects variability = ↑ ICC↓ between-subjects variability = ↓ ICCTherefore, ICCs are context-specific“There is literally no such thing as the reliability of a test, unqualified; the coefficient has meaning only when applied to specific populations” Streiner & Norman (1995).
10Error Two types of error Systematic error Random error Where: systematic error + random error = total error
11ERROR Systematic Error Random Error Constant Error Bias Examples: Luck, alertness, attentiveness by tester, normal biological variabilityConstant ErrorBiasAffects all scores equally (examples)Learning effectsFatigue during the testAffects certain scores different from othersThese random errors should both increase and decrease scores randomly
12Systematic ErrorIt is argued that systematic error deals with VALIDITY not RELIABILITY!Systematic error is a “natural phenomenon” that does not contribute to unreliability per seShould we include systematic error?
13Calculations of Reliability We are interested in calculating the ICCFirst step:Conduct a single-factor, within-subjects (repeated measures) ANOVAThis is an inferential test for systematic errorAll subsequent equations are derived from the ANOVA tableNOTE: Both one-way and two-way ICC models can be completed from the same single-factor, within-subjects ANOVA
16ANOVA Table 3 sources of variability 2 factors Subjects (MSB) or (MSS)Between-subjects variability (for calculating the ICC)Trials (MST)Systematic error (for calculating the ICC)Error (MSE)Random error (for calculating the ICC)2 factorsTrialsDifferences between trialsSubjectsDifferences between subjectsInteraction term = trials x subjects
17ANOVA Table2 reasons for noting the three different sources of variabilityAs we will see, there are 6 different ICC modelsTwo are “one-way models” and four are “two-way models”One-way models lump together the “trial” and “error” variabilityTwo-way models keep them separateBetween-subjects ANOVAs are different than within-subjects ANOVAsThe variability due to subjects is not accounted for in the within-subjects ANOVA (due to the repeated testing of the same subject, we assume the same between-subjects variability)
18ICC Models Shrout & Fleiss (1979) have developed 6 forms of the ICC: There are 3 general models:Models 1, 2, and 3Each can be calculated two different waysIf the individual scores are actually “single” scores from each subject for each trial, the ICC model is given a second designation of “1”If the scores in the analysis represent the average of “k” scores from each subject, the ICC is given a second designation of “k”
19ICC ModelsUsually presented in the context of determining rater reliabilityModel 1 (1,1 & 1,k)Each subject is assumed to be assessed by a different set of raters than other subjectsRandom effect of ratersModel 2 (2,1 & 2,k)Each subject is assumed to be assessed by the same group of raters, and these raters were randomly sampledStill random effect of raters
20ICC ModelsModel 3 (3,1 & 3,k)Each subject is assessed by the same group of raters, but these raters are the only ones of interestNo desire to generalize the ICCs calculated beyond the confines of the study or laboratoryDoes not include systematic error in the model
22Example Using Model 3,1 Test-retest reliability No desire to generalize to other devices or testersSystematic error is not accounted for, but we conduct an ANOVA to test for systematic errorThis receives the same criticism as the Pearson R for not accounting for systematic error
23Interpreting the ICC If ICC = 0.95 95% of the observed score variance is due to true score variance5% of the observed score variance is due to error2 factors for examining the magnitude of the ICCWhich version of the ICC was used?Magnitude of the ICC depends on the between-subjects variability in the dataBecause of the relationship between the ICC magnitude and between-subjects variability, standard error of measurement values (SEM) should be included with the ICC
24Implications of a Low ICC Low reliabilityReal differencesArgument to include SEM valuesType I vs. Type II errorType I error is rejecting H0 when there was no effect (i.e., H0 = 0)Type II error is failing to reject the H0 when there is an effect (i.e., H0 ≠ 0)A low ICC means that more subjects will be necessary to overcome the increased percentage of the observed score variance due to error.
25Standard Error of Measurement ICC relative measure of reliabilityNo unitsSEM absolute index of reliabilitySame units as the measurement of interestUsually used to construct confidence intervalsThe SEM is the standard error in estimating observed scores from true scores.
26Calculating the SEM2 basic ways to calculate SEM#1
27Calculating the SEM2 basic ways to calculate SEM#2
28SEMWe can report SEM values in addition to the ICC values and the results of the ANOVAWe can calculate the minimum difference (MD) that can be considered “real” between scoresWe can also construct 95% confidence intervals about a subject’s estimated true score based on the SEM or SEP.
29Minimum DifferenceThe SEM can be used to determine the minimum difference (MD) to be considered “real” and can be calculated as follows:
30Confidence IntervalsFirst we must estimate the subjects true score (T):X = grand meanS = observed score
31Confidence IntervalsSecond, we must determine the standard error of prediction (SEP):SD = standard deviationICC = intraclass correlation coefficient
32Confidence IntervalsThird, we can calculate the confidence intervals as: