Presentation on theme: "Reliability and Validity"— Presentation transcript:
1 Reliability and Validity 3. Threats to internal validity
2 MeasurementMEASUREMENT is any process by which a value is assigned to the level or state of some quality of an object of study
3 Measuring violenceViolence against a woman can range from beatings, to sexual violence or torture, to broken bones and very serious injury caused by pouring of acid or burning the victim alive
4 MeasurementMeasurement involves the expression of information in quantifies (numbers) rather than by verbal statementIt provides a powerful means of reducing qualitative data to a more condensed form for summarization, manipulation, and analysis
5 MeasurementThe best measure should be both reliable and valid
6 What is reliability? We often speak about “reliable cars." On news people talk about a "usually reliable source“In both cases, the word reliable usually means "dependable" or "trustworthy."In research, the term "reliable" also means dependable in a general sense, but that's not a precise enough definition
7 ReliabilityReliability is the consistency of your measurement, or the degree to which an instrument measures the same way each time it is used under the same condition with the same subjects
8 ReliabilityA measure is considered reliable if a person's score on the same test given twice is similar
9 Reliability of measuring devices The slightest variations in measuring devices in Olympic track and field events (whether it is a tape or clock) could mean the difference between the gold and silver medals
10 Reliability of measuring devices Olympic measuring devices, then, must be reliable from one throw or race to another and from one competition to anotherThey must also be reliable when used in different parts of the world, as temperature, air pressure, humidity, interpretation, or other variables might affect their readings
11 There are two ways that reliability is usually estimated: test/retestinternal consistency
12 Test-Retest Reliability We estimate test-retest reliability when we administer the same test to the same sample on two different occasions
13 Test/RetestThe idea behind test/retest is that you should get the same score on test 1 as you do on test 2.The three main components to this method are as follows:1) implement your measurement instrument at two separate times for each subject; 2) compute the correlation between the two separate measurements 3) assume there is no change in the underlying condition between test 1 and test 2.
14 Internal ConsistencyInternal consistency estimates reliability by grouping questions in a questionnaire that measure the same conceptAfter collecting the responses, run a correlation between groups of questions to determine if your instrument is reliably measuring that concept.Your computer output generates one number for Cronbach's alpha - and just like a correlation coefficientThe closer it is to one, the higher the reliability estimate of your instrument.
15 Example: Deviance scale The offenses include the following:“How many times in the past year have you…carried a hidden weapon other than a plain pocket knife?attacked someone with the idea of seriously hurting or killing them?been involved in gang fights?hit or threatened to hit a teacher or other adult at school?hit or threatened to hit your parents?hit or threatened to hit other students?had or tried to have sexual relations with someone against their will
17 ValidityValidity involves the degree to which you are measuring what you are supposed toMore simply, the accuracy of your measurement
18 Four types of validity Conclusion validity Internal validity Construct validityExternal validity
19 ExampleSay we are studying the effect of strict attendance policies on class participationSuppose, we observe that class participation did increase after the policy was establishedEach type of validity would highlight a different aspect of the relationship between our treatment (strict attendance policy) and our observed outcome (increased class participation).
20 Conclusion validityConclusion validity asks is there a relationship between the program and the observed outcome?Or, in our example, is there a connection between the attendance policy and the increased participation we saw?
21 Internal ValidityThe key question in internal validity is whether observed changes can be attributed to your program or intervention (i.e., the cause) and not to other possible causes (sometimes described as "alternative explanations" for the outcome)
23 Construct validityIt asks if there is a relationship between how I operationalized my concepts in this study to the actual causal relationship I'm trying to study?Or in our example, did our treatment (attendance policy) reflect the construct of attendance, and did our measured outcome - increased class participation - reflect the construct of participation?
24 External validityExternal validity refers to our ability to generalize the results of our study to other settings.In our example, could we generalize our results to other classrooms?
25 Reliability & Validity We often think of reliability and validity as separate ideas but, in fact, they're related to each other.One of my favorite metaphors is the targetThink of the center of the target as the concept that you are trying to measureImagine that for each person you are measuring, you are taking a shot at the target. If you measure the concept perfectly for a person, you are hitting the center of the targetIf you don't, you are missing the center. The more you are off for that person, the further you are from the center.
27 Reliability & Validity The figure above shows four possible situationsIn the first one, you are hitting the target consistently, but you are missing the center of the targetThat is, you are consistently and systematically measuring the wrong value for all respondentsThis measure is reliable, but no valid (that is, it's consistent but wrong).
28 Reliability & Validity The second, shows hits that are randomly spread across the targetYou seldom hit the center of the target but, on average, you are getting the right answer for the group (but not very well for individuals)In this case, you get a valid group estimate, but you are inconsistentHere, you can clearly see that reliability is directly related to the variability of your measure
29 Reliability & Validity The third scenario shows a case where your hits are spread across the target and you are consistently missing the centerYour measure in this case is neither reliable nor validFinally, we see the "Robin Hood" scenario -- you consistently hit the center of the targetYour measure is both reliable and valid
30 Threats to Internal Validity Single group threatsMultiple group threatsSocial interaction threats
31 Single group threatsApply when you are studying a single group receiving a program or treatmentThus, all of these threats can be greatly reduced by adding a control group that is comparable to your program group to your study
33 Single Group Design Threats HistoryMaturationMortalityInstrumentationTestingRegression to the Mean
34 History threatOccurs when a historical event affects your program group such that it causes the outcome you observe (rather than your treatment being the cause)In our earlier example, this would mean that the stricter attendance policy did not cause an increase in class participation, but rather, the expulsion of several students due to low participation from school impacted group participation
35 History Threat New math program for first graders For instance, we know that lot's of first graders watch the public TV program Sesame StreetIn every Sesame Street show they present some very elementary math concepts. Perhaps these shows cause the outcome and not your math program
36 Maturation Threat New math program is tested The children would have had the exact same outcome even if they had never had your special math training programAll you are doing is measuring normal maturation or growth in understanding that occurs as part of growing up -- your math program has no effect
37 Maturation ThreatOccurs when standard events over the course of time cause your outcome. For example, if by chance, the students who participated in your study on class participation all "grew up" naturally and realized that class participation increased their learning (how likely is that?) - that could be the cause of your increased participation, not the stricter attendance policy.
38 Testing Threat This threat only occurs in the pre-post design The pretest made some of the children more aware of that kind of math problem -- it "primed" them for the program so that when you began the math training they were ready for it in a way that they wouldn't have been without the pretestThis is what is meant by a testing threat -- taking the pretest (not getting your program) affects how participants do on the posttest.
39 Instrumentation Threat The reliability of the instrument used to gauge the dependent variableExamples include proficiency of a human observer (s) or interviewer (s)Inadequate test for first graders
40 Mortality ThreatMortality means that subjects are dropping out of the studyLet's assume that in math tutoring program we have a nontrivial dropout rate between pretest and posttestAssume that the kids who are dropping out are the low pretest math achievement test scorers
41 Mortality ThreatIf you look at the average gain from pretest to posttest using all of the scores available to you at each occasion, you would include these low pretest subsequent dropouts in the pretest and not in the posttestBy dropping out the potential low scorers from the posttest you'd be artificially inflating the posttest average over what it would have been if no students had dropped out
42 Regression to the MeanSubjects with extreme scores on a first measure of the dependent variable tend to have scores closer to the mean on a second measureAccording to Campbell (1969, p. 414): "Take any dependent measure that is repeatedly sampled, move along it as in a time dimension, and pick a point that is the "highest (lowest) so farOn the average, the next point will be lower (higher), nearer the general trend."
43 Multiple Group Threats Involve the comparability of the two groups in your study, and whether or not any other factor other than your treatment causes the outcomeThey also (conveniently) mirror the single group threats to internal validity.
44 Social Interaction Threats The results of such research are affected by the human interactions involvedThe social threats to internal validity refer to the social pressures in the research context that can lead to posttest differences that are not directly caused by the treatment itselfMany of these threats can be minimized by isolating the two groups from each other
45 Diffusion or Imitation of Treatment This occurs when a control group learns about the program either directly or indirectly from treatment groupComparison group subjects, seeing what the program group is getting, might set up their own experience to try to imitate that of the program groupIt can jeopardize your ability to assess whether your program is causing the outcome
46 Compensatory RivalryThe control group knows what the treatment group is getting (special math tutoring program) and develops a competitive attitude with themThe students feel jealousThis could lead them to deciding to compete with the program group "just to show them" how well they can do
47 Resentful Demoralization The opposite of compensatory rivalryStudents in the control group know what the program group is gettingInstead of developing a rivalry, they get discouraged or angry and they give up (sometimes referred to as the "screw you" effect!)This threat is likely to exaggerate posttest differences between groups, making your program look even more effective than it actually is.
48 Compensatory Equalization of Treatment When control and treatment group participants are aware of each other's conditions they may wish they were in the other group (depending on the perceived desirability of the program it could work either way)If the special math tutoring program was being done with state-of-the-art computers, you can bet that the parents of the children assigned to the traditional non-computerized control group will pressure the principal to "equalize" the situationPerhaps the principal will give the comparison group some other good, or let them have access to the computers for other subjects
49 Compensatory Equalization of Treatment If these "compensating" programs equalize the groups on posttest performance, it will tend to work against your detecting an effective program even when it does work
50 The Solomon Four-Group Design The Solomon Four-Group Design is designed to deal with a potential testing threatRecall that a testing threat occurs when the act of taking a test affects how people score on a retest or posttest.Note that two of the groups receive the treatment and two do notFurther, two of the groups receive a pretest and two do notBy explicitly including testing as a factor in the design, we are able to assess experimentally whether a testing threat is operating.
51 The Solomon Four-Group Design Further, two of the groups receive a pretest and two do notBy explicitly including testing as a factor in the design, we are able to assess experimentally whether a testing threat is operating.