# Reliability and Validity

## Presentation on theme: "Reliability and Validity"— Presentation transcript:

Reliability and Validity
3. Threats to internal validity

Measurement MEASUREMENT is any process by which a value is assigned to the level or state of some quality of an object of study

Measuring violence Violence against a woman can range from beatings, to sexual violence or torture, to broken bones and very serious injury caused by pouring of acid or burning the victim alive

Measurement Measurement involves the expression of information in quantifies (numbers) rather than by verbal statement It provides a powerful means of reducing qualitative data to a more condensed form for summarization, manipulation, and analysis

Measurement The best measure should be both reliable and valid

What is reliability? We often speak about “reliable cars."
On news people talk about a "usually reliable source“ In both cases, the word reliable usually means "dependable" or "trustworthy." In research, the term "reliable" also means dependable in a general sense, but that's not a precise enough definition

Reliability Reliability is the consistency of your measurement, or the degree to which an instrument measures the same way each time it is used under the same condition with the same subjects

Reliability A measure is considered reliable if a person's score on the same test given twice is similar

Reliability of measuring devices
The slightest variations in measuring devices in Olympic track and field events (whether it is a tape or clock) could mean the difference between the gold and silver medals

Reliability of measuring devices
Olympic measuring devices, then, must be reliable from one throw or race to another and from one competition to another They must also be reliable when used in different parts of the world, as temperature, air pressure, humidity, interpretation, or other variables might affect their readings

There are two ways that reliability is usually estimated:
test/retest internal consistency

Test-Retest Reliability
We estimate test-retest reliability when we administer the same test to the same sample on two different occasions

Test/Retest The idea behind test/retest is that you should get the same score on test 1 as you do on test 2. The three main components to this method are as follows: 1) implement your measurement instrument at two separate times for each subject; 2) compute the correlation between the two separate measurements 3) assume there is no change in the underlying condition between test 1 and test 2.

Internal Consistency Internal consistency estimates reliability by grouping questions in a questionnaire that measure the same concept After collecting the responses, run a correlation between groups of questions to determine if your instrument is reliably measuring that concept. Your computer output generates one number for Cronbach's alpha - and just like a correlation coefficient The closer it is to one, the higher the reliability estimate of your instrument.

Example: Deviance scale
The offenses include the following: “How many times in the past year have you… carried a hidden weapon other than a plain pocket knife? attacked someone with the idea of seriously hurting or killing them? been involved in gang fights? hit or threatened to hit a teacher or other adult at school? hit or threatened to hit your parents? hit or threatened to hit other students? had or tried to have sexual relations with someone against their will

Validity Validity involves the degree to which you are measuring what you are supposed to More simply, the accuracy of your measurement

Four types of validity Conclusion validity Internal validity
Construct validity External validity

Example Say we are studying the effect of strict attendance policies on class participation Suppose, we observe that class participation did increase after the policy was established Each type of validity would highlight a different aspect of the relationship between our treatment (strict attendance policy) and our observed outcome (increased class participation).

Conclusion validity Conclusion validity asks is there a relationship between the program and the observed outcome? Or, in our example, is there a connection between the attendance policy and the increased participation we saw?

Internal Validity The key question in internal validity is whether observed changes can be attributed to your program or intervention (i.e., the cause) and not to other possible causes (sometimes described as "alternative explanations" for the outcome)

Internal Validity

Construct validity It asks if there is a relationship between how I operationalized my concepts in this study to the actual causal relationship I'm trying to study? Or in our example, did our treatment (attendance policy) reflect the construct of attendance, and did our measured outcome - increased class participation - reflect the construct of participation?

External validity External validity refers to our ability to generalize the results of our study to other settings. In our example, could we generalize our results to other classrooms?

Reliability & Validity
We often think of reliability and validity as separate ideas but, in fact, they're related to each other. One of my favorite metaphors is the target Think of the center of the target as the concept that you are trying to measure Imagine that for each person you are measuring, you are taking a shot at the target. If you measure the concept perfectly for a person, you are hitting the center of the target If you don't, you are missing the center. The more you are off for that person, the further you are from the center.

Reliability & Validity

Reliability & Validity
The figure above shows four possible situations In the first one, you are hitting the target consistently, but you are missing the center of the target That is, you are consistently and systematically measuring the wrong value for all respondents This measure is reliable, but no valid (that is, it's consistent but wrong).

Reliability & Validity
The second, shows hits that are randomly spread across the target You seldom hit the center of the target but, on average, you are getting the right answer for the group (but not very well for individuals) In this case, you get a valid group estimate, but you are inconsistent Here, you can clearly see that reliability is directly related to the variability of your measure

Reliability & Validity
The third scenario shows a case where your hits are spread across the target and you are consistently missing the center Your measure in this case is neither reliable nor valid Finally, we see the "Robin Hood" scenario -- you consistently hit the center of the target Your measure is both reliable and valid

Threats to Internal Validity
Single group threats Multiple group threats Social interaction threats

Single group threats Apply when you are studying a single group receiving a program or treatment Thus, all of these threats can be greatly reduced by adding a control group that is comparable to your program group to your study

Single Group Threats

Single Group Design Threats
History Maturation Mortality Instrumentation Testing Regression to the Mean

History threat Occurs when a historical event affects your program group such that it causes the outcome you observe (rather than your treatment being the cause) In our earlier example, this would mean that the stricter attendance policy did not cause an increase in class participation, but rather, the expulsion of several students due to low participation from school impacted group participation

History Threat New math program for first graders
For instance, we know that lot's of first graders watch the public TV program Sesame Street In every Sesame Street show they present some very elementary math concepts. Perhaps these shows cause the outcome and not your math program

Maturation Threat New math program is tested
The children would have had the exact same outcome even if they had never had your special math training program All you are doing is measuring normal maturation or growth in understanding that occurs as part of growing up -- your math program has no effect

Maturation Threat Occurs when standard events over the course of time cause your outcome. For example, if by chance, the students who participated in your study on class participation all "grew up" naturally and realized that class participation increased their learning (how likely is that?) - that could be the cause of your increased participation, not the stricter attendance policy.

Testing Threat This threat only occurs in the pre-post design
The pretest made some of the children more aware of that kind of math problem -- it "primed" them for the program so that when you began the math training they were ready for it in a way that they wouldn't have been without the pretest This is what is meant by a testing threat -- taking the pretest (not getting your program) affects how participants do on the posttest.

Instrumentation Threat
The reliability of the instrument used to gauge the dependent variable Examples include proficiency of a human observer (s) or interviewer (s) Inadequate test for first graders

Mortality Threat Mortality means that subjects are dropping out of the study Let's assume that in math tutoring program we have a nontrivial dropout rate between pretest and posttest Assume that the kids who are dropping out are the low pretest math achievement test scorers

Mortality Threat If you look at the average gain from pretest to posttest using all of the scores available to you at each occasion, you would include these low pretest subsequent dropouts in the pretest and not in the posttest By dropping out the potential low scorers from the posttest you'd be artificially inflating the posttest average over what it would have been if no students had dropped out

Regression to the Mean Subjects with extreme scores on a first measure of the dependent variable tend to have scores closer to the mean on a second measure According to Campbell (1969, p. 414): "Take any dependent measure that is repeatedly sampled, move along it as in a time dimension, and pick a point that is the "highest (lowest) so far On the average, the next point will be lower (higher), nearer the general trend."

Multiple Group Threats
Involve the comparability of the two groups in your study, and whether or not any other factor other than your treatment causes the outcome They also (conveniently) mirror the single group threats to internal validity.

Social Interaction Threats
The results of such research are affected by the human interactions involved The social threats to internal validity refer to the social pressures in the research context that can lead to posttest differences that are not directly caused by the treatment itself Many of these threats can be minimized by isolating the two groups from each other

Diffusion or Imitation of Treatment
This occurs when a control group learns about the program either directly or indirectly from treatment group Comparison group subjects, seeing what the program group is getting, might set up their own experience to try to imitate that of the program group It can jeopardize your ability to assess whether your program is causing the outcome

Compensatory Rivalry The control group knows what the treatment group is getting (special math tutoring program) and develops a competitive attitude with them The students feel jealous This could lead them to deciding to compete with the program group "just to show them" how well they can do

Resentful Demoralization
The opposite of compensatory rivalry Students in the control group know what the program group is getting Instead of developing a rivalry, they get discouraged or angry and they give up (sometimes referred to as the "screw you" effect!) This threat is likely to exaggerate posttest differences between groups, making your program look even more effective than it actually is.

Compensatory Equalization of Treatment
When control and treatment group participants are aware of each other's conditions they may wish they were in the other group (depending on the perceived desirability of the program it could work either way) If the special math tutoring program was being done with state-of-the-art computers, you can bet that the parents of the children assigned to the traditional non-computerized control group will pressure the principal to "equalize" the situation Perhaps the principal will give the comparison group some other good, or let them have access to the computers for other subjects

Compensatory Equalization of Treatment
If these "compensating" programs equalize the groups on posttest performance, it will tend to work against your detecting an effective program even when it does work

The Solomon Four-Group Design
The Solomon Four-Group Design is designed to deal with a potential testing threat Recall that a testing threat occurs when the act of taking a test affects how people score on a retest or posttest. Note that two of the groups receive the treatment and two do not Further, two of the groups receive a pretest and two do not By explicitly including testing as a factor in the design, we are able to assess experimentally whether a testing threat is operating.

The Solomon Four-Group Design
Further, two of the groups receive a pretest and two do not By explicitly including testing as a factor in the design, we are able to assess experimentally whether a testing threat is operating.