2 ReliabilityWhen a Measurement Procedure yields consistent scores when the phenomenon being measured is not changing.Degree to which scores are free of “measurement error”Consistency of measurement
3 VALIDITYThe extent to which measures indicate what they are intended to measure.The match between the conceptual definition and the operational definition.
4 RELATIONSHIP BETWEEN RELIABILITY AND VALIDITY Necessary but not sufficientReliability is a prerequisite for measurement validityOne needs reliability, but it’s not enough
5 Example Measuring height with reliable bathroom scale Measuring “aggression” with observer agreement by observing a kid hitting a Bobo doll
6 Types of Reliability Measurement 1. Stability Reliability2. Equivalence Reliability
7 Stability Reliability Test-retestSAME TEST – DIFFERENT TIMESTesting phenomenon at two different times;The degree to which the two measurements of “Sam Ting,” using same measure, are related to one anotherOnly works if phenomenon is unchanging
8 Example of StabilityAdministering same questionnaire at 2 different timesRe-examining client before deciding on intervention strategy.Running “trial” twice (e.. g. errors in tennis serving)
9 Notes on Stability Reliability When ratings are by an observer rather than the subjects themselves, this is called Intraobserver Reliability or Intrarater Reliability.Answers about the past are less reliable when they are very specific, because the questions may exceed the subjects’ capacity to remember accurately.
10 Equivalence Reliability Inter-item (split ½)Parallel forms [Different types of measures]Interobserver Agreement-Is every observer scoring the same ?
11 1. Inter-item Reliability (Internal consistency): The association of answers to a set of questions designed to measure the same concept.
12 Note on Inter-item Validity The stronger the association among individual items and the more items included, the higher the reliability of an indexCronbach’s alpha is a statistic commonly used to measure inter-item reliabilityCronbach’s alpha is based on the average of all the possible correlations of all the split 1/2s of a set of questions on a questionnaire
13 2. Parallel forms of Reliability Split ½ (inter-item)Different types of measuresInterobserver ReliabilityIs everyone measuring the same thing ?Different measures – same time
14 3.Interobserver Reliability Correspondence between measures made by different observers.
15 Note for Stat Students Only The text inadvertently describes a 3rd type of reliability that we’re not concerned with in this class: ‘goodness of fit’ about a slope line. It’s sometimes referred to as random measurement error.Save this for Grad School =)
16 Note on ReliabilityFor Statistics people, the following quote refers to ‘goodness of fit’ around a slope line due to measurement error.Secondary Definition of Reliability from a previous slide“…or that the measured scores changes in direct correspondence to actual changes in the phenomenon”
19 Face Validityconfidence gained from careful inspection of a concept to see if it’s appropriate “on its face;”In our [collective] intersubjective, informed judgment, have we measured what we want to measure?(N.B. use of good judgment)
20 Example of Face Validity Rosenberg’s self esteem scale questions:
21 Content validity Also called “sampling validity” establishes that the measure covers the full range of the concept’s meaning, i.e., covers all dimensions of a conceptN.B depends on “good “ judgment
22 Example of content validity Earlier SES scale in classAuthoritarian personality questions from Walizer & Wienir
23 *Note *Actually I think face and content validity are probably Sam Ting
24 EMPIRICAL ValidityEstablishes that the results from one measure match those obtained with a more direct or already validated measure of the same phenomenon (the “criterion”)IncludesConcurrentPredictive
25 Concurrent ValidityValidity exists when a measure yields scores that are closely related to scores on a criterion measured at the same timeDoes the new instrument correlate highly with an old measure of the same concept that we assume (judge) to be valid? (use of “good” judgment)
26 Example of concurrent validity Aronson’s doodle measure of achievement motivation.Act vs. SAT
27 Predictive ValidityExits when a measure is validated by predicting scores on a criterion measured in the futureAre future events which we judge to be a result of the concept we’re measuring anticipated [predicted] by the scores we’re attempting to validateUse of “good” judgment
28 Examples of Predictive Validity Bronson screening test for “at risk” parenting followed up by interviewing and observing family members and school staff laterSat / ACT scores and later college “performance” (grades)Grades are “judged” to be measured validly
29 What’s a Construct? [NSB]* Multidimensional conceptSESIndustrializationFuzzy concept / hard to defineEgo strengthLoveConcept build out of other conceptsForce=mass * acceleration* Ya better know these!!!!!
30 Consider This:If a construct is hard to conceptualize doesn’t it make sense that it’ll be more difficult to operationalize and validate?
31 Construct validity: established by showing that a measure is (1) related to a variety of other measures as specified in a theory, used when no clear criterion exists for validation purposes (2) that the operationalization has a set of interrelated items and (3) that the operationalization has not included separate concepts
32 Construct validityCheck the intercorrelation of items used to measure construct judged to be validUse theory to predict a relationship and use a judged to be valid measure of the other variable then check for relationshipDemonstrate that your measure isn’t related to judged to be valid measures of unrelated concepts
33 Convergent ValidityConvergent validity: achieved when one measure of a concept is associated with different types of measures in the same concept (this relies on the same type of logic as measurement triangulation)Measures intercorrelated
34 Example of questions that Interrelate Questions for Companionate…intimacyWe get along wellWe communicateWe like the same stuffOur chemistry is goodWe support each other
35 Discriminant Validity Discriminant validity: scores on the measure to be validated are compared to scores on measures of different but related concepts and discriminant validity is achieved if the measure to be validated is NOT strongly associated with the measures of different conceptsMeasure not related to unrelated concepts
36 Questions for Passion I think my partner is HOT My partner turns me on When I’m with my partner I just feel the electricity
37 Using theoryMeasure of constructs predicts what theory says it should