Presentation on theme: "Reliability and Validity of Research Instruments"— Presentation transcript:
1 Reliability and Validity of Research Instruments An overviewReliability and Validity of Research Instruments
2 Measurement errorError variance--the extent of variability in test scores that is attributable to error rather than a true measure of behavior.Observed Score=true score + error variance(actual score obtained) (stable score) (chance/random error)(systematic error)
3 ValidityThe accuracy of the measure in reflecting the concept it is supposed to measure.
4 Reliability Stability and consistency of the measuring instrument. A measure can be reliable without being valid, but it cannot be valid without being reliable.
5 ValidityThe extent to which, and how well, a measure measures a concept.facecontentconstructconcurrentpredictivecriterion-related
6 Face validityJust on its face the instrument appears to be a good measure of the concept. “intuitive, arrived at through inspection”e.g. Concept=pain levelMeasure=verbal rating scale “rate your pain from 1 to 10”.Face validity is sometimes considered a subtype of content validity.Question--is there any time when face validity is not desirable?
7 Content validityContent of the measure is justified by other evidence, e.g. the literature.Entire range or universe of the construct is measured.Usually evaluated and scored by experts in the content area.A CVI (content validity index) of .80 or more is desirable.
8 Construct validitySensitivity of the instrument to pick up minor variations in the concept being measured.Can an instrument to measure anxiety pick up different levels of anxiety or just its presence or absence? Measure two groups known to differ on the construct.Ways of arriving at construct validityHypothesis testing methodConvergent and divergentMultitrait-multimatrix methodContrasted groups approachfactor analysis approach
9 Concurrent validityCorrespondence of one measure of a phenomenon with another of the same construct.(administered at the same time)Two tools are used to measure the same concept and then a correlational analysis is performed. The tool which is already demonstrated to be valid is the “gold standard” with which the other measure must correlate.
10 Predictive validityThe ability of one measure to predict another future measure of the same concept.If IQ predicts SAT, and SAT predicts QPA, then shouldn’t IQ predict QPA (we could skip SATs for admission decisions)If scores on a parenthood readiness scale indicate levels of integrity, trust, intimacy and identity couldn’t this test be used to predict successful achievement of the devleopmental tasks of adulthood?The researcher is usually looking for a more efficient way to measure a concept.
11 Criterion related validity The ability of a measure to measure a criterion (usually set by the researcher).If the criterion set for professionalism is nursing is belonging to nursing organizations and reading nursing journals, then couldn’t we just count memberships and subscriptions to come up with a professionalism score.Can you think of a simple criterion to measure leadership?Concurrent and predictive validity are often listed as forms of criterion related validity.
12 ReliabilityHomogeneity, equivalence and stability of a measure over time and subjects. The instrument yields the same results over repeated measures and subjects.Expressed as a correlation coefficient (degree of agreement between times and subjects) 0 to +1.Reliability coefficient expresses the relationship between error variance, true variance and the observed score.The higher the reliability coefficient, the lower the error variance. Hence, the higher the coefficient the more reliable the tool! .70 or higher acceptable.
13 StabilityThe same results are obtained over repeated administration of the instrument.Test-restest reliabilityparallel, equivalent or alternate forms
14 Test-Retest reliability The administration of the same instrument to the same subjects two or more times (under similar conditions--not before and after treatment)Scores are correlated and expressed as a Pearson r. (usually .70 acceptable)
15 Parallel or alternate forms reliability Parallel or alternate forms of a test are administered to the same individuals and scores are correlated.This is desirable when the researcher believes that repeated administration will result in “test-wiseness”Sample: ”I am able to tell my partner how I feel”“My partner tries to understand my feelings”
17 Item to total correlations Each item on an instrument is correlated to total score--an item with low correlation may be deleted. Highest and lowest correlations are usually reported.Only important if you desire homogeneity of items.
18 Spit Half reliabilityItems are divided into two halves and then compared. Odd, even items, or 1-50 and are two ways to split items.Only important when homogenity and internal consistency is desirable.
19 Kuder-Richardson coefficient (KR-20) Estimate of homogeneity when items have a dichotomous response, e.g. “yes/no” items.Should be computed for a test on an initial reliability testing, and computed for the actual sample.Based on the consistency of responses to all of the items of a single form of a test.
20 Cronbach’s alpha Likert scale or linear graphic response format. Compares the consistency of response of all items on the scale.May need to be computed for each sample.
21 EquivalenceConsistency of agreement of observers using the same measure or among alternate forms of a tool.Parallel or alternate forms (described under stability)Interrater reliability
22 Intertater reliability Used with observational data.Concordance between two or more observers scores of the same event or phenomenon.
23 CritiquingWas reliability and validity data presented and is it adequate?Was the appropriate method used?Was the reliability recalculated for the sample?Are the limitations of the tool discussed?