Presentation on theme: "MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health."— Presentation transcript:
MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health
RELIABILITY: Definition Extent of random variation in answers to questions as a function of when they are asked (test- retest), who asked them (inter- rater), and the fact that a given question is one of a number of questions that could have been asked to measure the concept of interest (internal consistency).
RELIABILITY: Computation Requires repeated measures to estimate stability over time (test-retest) or equivalence across data gatherers (inter- rater) or across questions/ items intended to measure the same underlying concept (internal consistency).
RELIABILITY: Test-retest Definition: correlation between answers to same question by same respondent at two different points in time
RELIABILITY: Test-retest Factors affecting: Vague question wording Transient personal states, e.g., physical or mental Situational factors, e.g., presence of other people
RELIABILITY: Test-retest Computation: Compute correlation coefficient between answers to same question by same respondent at two different points in time: RespondentQ1, Time 1Q1, Time 2 1AgreeAgree 2AgreeAgree 3AgreeAgree 4Agree Disagree 5AgreeAgree
RELIABILITY: Test-retest Correlation coefficients: Interval: Pearson r Ordinal: Spearman rho Nominal: Chi-square-based measures of association Correlation desired:.70+
RELIABILITY: Test-retest Comparisons of means: Interval: paired t-test, repeated measures analysis of variance Advantages: more accurately take into account that the first and second measurements are not independent more directly compare the actual answers at the two points in time
RELIABILITY: Inter-rater Definition: correlation between answers to same question by same respondent obtained by different data gatherers at (approximately) the same point in time
RELIABILITY: Inter-rater Factors affecting: Lack of adequate interviewer training Lack of standardization of data collection protocols and procedures
RELIABILITY: Inter-rater Computation: Compute correlation coefficient between answers to same question by same respondent obtained by different data gatherers : RespondentQ1, Int. AQ1, Int. B 1BP=140/90 BP=140/90 2BP=150/80 BP=150/80 3BP=145/95BP=145/95 4BP=145/95BP=120/80 5BP=140/90 BP=140/90
RELIABILITY: Inter-rater Correlation coefficients: (correlation coefficients for 3+ data gatherers noted in parentheses): Interval: Pearson r (eta) Ordinal: Spearman rho (chi-square) Nominal: Kappa (chi-square) Correlation desired:.80+
RELIABILITY: Internal Consistency Definition: correlation between answers by same respondent to different questions about the same underlying concept (usually summarized in scales)
RELIABILITY: Internal Consistency Factors affecting: Number of different questions asked to capture the underlying concept Level of association (correlation) between answers the same respondents give to different questions about the concept
RELIABILITY: Internal Consistency Computation: Compute internal consistency (underlying correlation) coefficients between answers by same respondent to different questions about the same concept: RespondentQ1Q2Q3 Disagree 1AgreeDisagreeAgree Disagree 2AgreeDisagreeAgree Disagree 3AgreeDisagreeAgree 4Agree AgreeAgree Disagree 5AgreeDisagreeAgree
RELIABILITY: Internal Consistency Computation: Corrected item-total correlation Add up the scores for answers to different questions about the same concept to create a total score Subtract the score for answer to a given question from the total score to create item-specific “corrected” total scores Compute Pearson correlation coefficients between score for each of the items and corresponding “corrected” total score
RELIABILITY: Internal Consistency Computation: Split-half reliability coefficient Randomly divide a series of questions about the same concept into halves and add up the scores for answers to the questions in the respective halves Compute Spearman-Brown prophecy coefficient for correlation between the scores for each half, adjusting for the fact that the respective scores are based on only half the original number of items
RELIABILITY : Spearman-Brown prophecy formula Computation: k * r o /1 + [(k-1) * r o ] where, k = factor by which scale is increased or decreased r o = alpha based on original length Example: 2 *.70/1 + [(2-1) *.70] =.82
RELIABILITY: Cronbach alpha coefficient Computation: k * r a /1 + [(k-1) * r a ] where, k = number of items in the scale r a = average Pearson r between items Example: 10 *.32/1 + [(10-1) *.32] =.82
WHEN TO UNDERTAKE RELIABILITY ANALYSIS RELIABILITY/ DIMENSIONS TEST-RETESTINTER-RATERINTERNAL CONSISTENCY QUESTIONSConcerned about stability of wording Concerned about equivalence of data gatherers Constructing summary scales of attitudes or other abstract concepts STUDIESEsp. important in longitudinal or experimental designs Monitored, but not usually measured directly in surveys Esp. used in attitudinal surveys STAGESPilot test or pretest Pretest plus monitor in final study Pretest or final study
REFERENCES DeVellis, Robert F. (2003). Scale Development: Theory and Applications. Second Edition. Thousand Oaks, CA: Sage. Ware, J.E., Jr., & Gandek, B., for the IQOLA Project (1998). Methods for testing data quality, scaling assumptions, and reliability: The IQOLA Project Approach. J. Clinical Epidemiology, 51 (11), 945-952.