Presentation on theme: "Measurement Reliability and Validity"— Presentation transcript:
1 Measurement Reliability and Validity Evaluating Measures
2 Sources of measurement variation There are three things that contribute to a respondent/individual’s score on a measureThree ‘components’A true score componentThis is the good stuffNon-systematic (‘random’) error (‘noise’)Systematic error (‘bias’)
3 Variance goods and bads (Good) The ‘true score’ variance reflects the ‘correct’ placement of the individual on the measure. It correctly reflects the individual’s level on the concept being measured.(Kind of bad) The nonsystematic error variance increases the distribution of scores around the true score, but does not change the estimates of population mean, percentages, etc.
4 Variance goods and bads (Very bad) Systematic error (bias) misleads the researcher to believe that the population mean, percentages, etc. are different from the true value, that relationships exist that really don’t (and vice versa).
6 An example:Let’s say you give someone a ruler and want them to measure the height of preschoolersBecause the measurer cannot simply line up the kids against the ruler and read off the score (she must find where one foot ends, hold that spot and move the ruler, etc.) error is introduced into the scores (height measurements)Nonsystematic error is introduced if the measurer slips high sometimes and low other times when moving the rulerSystematic error is introduced if she has a consistent tendency to start the second foot at a point which is actually between 11 and 12 inches
7 Another example:When we want to measure some psychological trait of people we have no obvious way of doing it. We may come up with a list of questions meant to get at the trait, but we know that no single question will measure the trait as well as a ruler measures height. So we combine a number of questions in hopes that the total from multiple questions will be a better measure than the score on any single question.
8 Measurement ValidityIf the ‘score’ on the measure you use faithfully records a person’s ‘real’ position on or amount of the concept under study, then the measure is validDoes your measure of ‘fear of technology’ really indicate the relative level of subjects on that concept?Are other forces than ‘fear of technology’ determining subjects scores?
9 Validity is crucialIf your method of measurement is not valid, your whole research project is worthlessThis is the reason careful explication (conceptualization, operationalization) is so important
10 Threats to validityThere are a great number of threats to measurement validityThey may come from:The measures themselvesThe administration of the measuresAnything from lighting in the room to the attitude exhibited by the interviewerThe subjectsHow they approach the research
11 How to determine the validity of measures Reliability—if the measure is reliable, there is greater reason to expect it to be validTests of validity—a set of tests/approaches to determining measurement validityOutcome of the research—if the results of the research resemble those predicted, there is greater support for the measures
12 ReliabilityWhen a measurement procedure is reliable, it will yield consistent scores when the phenomenon being measured is not changingIf the phenomenon is changing, the scores will change in direct correspondence with the phenomenon
13 ReliabilityReliability is an estimate of the amount of non-systematic error in scores produced by a measurement procedureOften considered a minimum level of (or “prerequisite for”) measurement validitySchuttEasier to measure than validityStatistical estimates of reliability are available
14 Tests of reliability: Test-retest reliability The same measure is applied to the same sample of individuals at two points in timeExample: Students are given the same survey to complete two weeks apart. The results for each respondent are compared.Two weeks should not be so short that the respondents remember their answers to the questions nor so long that history and their own biological maturation should change their real scores
15 Tests of reliability: Interitem reliability “When researchers use multiple items to measure a single concept, they must be concerned with interitem reliability (or internal consistency).For example, if we are to have confidence that a set of questions Reliably measures depression, the answers to the questions should be highly associated with one another. The stronger the association among the individual items and the more items that are included, the higher the reliability of the index. Cronbach’s alpha is a statistic commonly used to measure interitem reliability.”
16 Tests of reliability: Alternate forms reliability Researchers compare subjects’ answers to slightly different versions of survey questions.Reverse order of response choicesModify question wording in minor waysSplit-halves reliability: survey sample is divided into two randomly. The two halves are administered the different forms of the questionnaire. If the outcomes are very similar, then the measure is reliable on this criterion.
17 Tests of reliability: Interobserver reliability Multiple observers apply the same method to the same people, events, places or texts and the results are compared.Most important when rating task is complexContent analysis very commonly uses this method
18 Measurement ValidityMeasurement validity addresses the question as to whether our measure actually is actually measuring what we think it isWe may actually be measuring something else or nothing at all
19 Evaluating measurement validity: Face validity Careful inspection of a measure to see if it it seems valid—that it makes sense to someone who is knowledgeable about the concept and its measurementAll measures should be evaluated in this way.Face validity is a weak form of validation. It is not convincing on its own.
20 Evaluating measurement validity: Content validity Content validity is a means to determine whether the measure covers the entire range of meaning of the concept and excludes meanings not falling under the concept.Compare measure to the view of the concept generated by a literature reviewHave experts in the area review the measure and look for missing dimensions of concept or inclusion of dimensions that it should not
21 Evaluating measurement validity: Criterion validity Compare scores on the measure to those generated by an already validated measure or the performance of a group known to be high or low on the measureConcurrent validity: measure is compared to criterion at the same timePredictive validity: scores on the measure are compared to future performance of subjectsTest of sales potentialNote: Many times proper criteria for test are not available
22 Evaluating measurement validity: Construct validity Compare the performance of the measure to the performance of other measures as predicted by theoryTwo forms: Convergent validity and Discriminant validity. Often combined in an analysis of construct validity
23 Convergent validityCompare the results from your measure to those generated using other measures of the same conceptThey should be highly correlatedTriangulation
24 Discriminant validity Performance on the measure to be validated is compared to scores on different but related concepts.Correlations to measures should be low to moderate. Too high a correlation indicates that either the concepts are not distinct or else your measure cannot tell them apart. Too low a correlation indicates that your measure does not measure your concept validly.
25 Source: Trochim, Research Methods Knowledge Base
26 Outcome of the research: Theoretical performance If your measure generates predictable statistical relationships with a number of measures of concepts you can make a stronger claim to the validity of the measure.One would not expect either non-systematic nor systematic error variance to act in the way predicted from your theory/model