Reliability and Validity

Slides:



Advertisements
Similar presentations
PhD Research Seminar Series: Valid Research Designs
Advertisements

Experimental Design and the struggle to control threats to validity.
Experimental and Quasi-Experimental Research
Independent and Dependent Variables
Increasing your confidence that you really found what you think you found. Reliability and Validity.
Validity and Reliability
Reliability.
Defining Characteristics
Inadequate Designs and Design Criteria
Experimental Research Designs
Get Ready to Play Publish or Perish! Please select a team. 1.Reeses 2.KitKat 3.Milky Way 4.Snickers 5. 3 Musketeers.
Correlation AND EXPERIMENTAL DESIGN
Research Design and Validity Threats
Educational Action Research Todd Twyman Summer 2011 Week 1.
Experimental Design.
Who are the participants? Creating a Quality Sample 47:269: Research Methods I Dr. Leonard March 22, 2010.
1 Exploring Quasi-Experiments Lab 5: May 9, 2008 Guthrie, J.T., Wigfield, A., & VonSecker, C. (2000). Effects of integrated instruction on motivation and.
MSc Applied Psychology PYM403 Research Methods Validity and Reliability in Research.
Experimental Research
Experimental Research Take some action and observe its effects Take some action and observe its effects Extension of natural science to social science.
EDRS6208 Lecture Three Instruments and Instrumentation Data Collection.
Statistical Analyses & Threats to Validity
The following lecture has been approved for University Undergraduate Students This lecture may contain information, ideas, concepts and discursive anecdotes.
Experimental and Quasi-Experimental Designs
Research Methods for Counselors COUN 597 University of Saint Joseph Class # 5 Copyright © 2015 by R. Halstead. All rights reserved.
Group Discussion Explain the difference between assignment bias and selection bias. Which one is a threat to internal validity and which is a threat to.
Evaluating a Research Report
 Internal Validity  Construct Validity  External Validity * In the context of a research study, i.e., not measurement validity.
Research Strategies Chapter 6. Research steps Literature Review identify a new idea for research, form a hypothesis and a prediction, Methodology define.
Copyright ©2008 by Pearson Education, Inc. Pearson Prentice Hall Upper Saddle River, NJ Foundations of Nursing Research, 5e By Rose Marie Nieswiadomy.
The Basics of Experimentation Ch7 – Reliability and Validity.
INTERNAL VALIDITY AND BASIC RESEARCH DESIGN. Internal Validity  the approximate truth about inferences regarding cause-effect or causal relationships.
1 Experimental Research Cause + Effect Manipulation Control.
Single-Group Threats to Internal Validity. The Single Group Case Two designs:
Research methods and statistics.  Internal validity is concerned about the causal-effect relationship in a study ◦ Can observed changes be attributed.
Experimental Research
Chapter 10 Experimental Research Gay, Mills, and Airasian 10th Edition
Research Design ED 592A Fall Research Concepts 1. Quantitative vs. Qualitative & Mixed Methods 2. Sampling 3. Instrumentation 4. Validity and Reliability.
Chapter Six: The Basics of Experimentation I: Variables and Control.
JS Mrunalini Lecturer RAKMHSU Data Collection Considerations: Validity, Reliability, Generalizability, and Ethics.
Experimental Research Methods in Language Learning Chapter 5 Validity in Experimental Research.
Experimental & Quasi-Experimental Designs Dr. Guerette.
Chapter 8 Experiments Topics Appropriate to Experiments The Classical Experiment Selecting Subjects Variations on Experimental Designs An Illustration.
 The basic components of experiments are: 1) taking action 2) observing the consequence of that action  Experimental model is most closely linked to.
Chapter Eight: Quantitative Methods
EXPERIMENTAL DESIGNS. Categories Lab experiments –Experiments done in artificial or contrived environment Field experiments –Experiments done in natural.
Journalism 614: Experimental Methods Experimental Research  Take some action and observe its effects –Extension of natural science to social science.
CHAPTER 8 EXPERIMENTS.
Research Design: Causal Studies l Quick Review: Three general forms of quantitative research studies –Descriptive: Describes a situation –Relational :
CJ490: Research Methods in Criminal Justice UNIT #4 SEMINAR Professor Jeffrey Hauck.
Experiments Textbook 4.2. Observational Study vs. Experiment Observational Studies observes individuals and measures variables of interest, but does not.
Can you hear me now? Keeping threats to validity from muffling assessment messages Maureen Donohue-Smith, Ph.D., RN Elmira College.
RELIABILITY AND VALIDITY Dr. Rehab F. Gwada. Control of Measurement Reliabilityvalidity.
Measurement and Scaling Concepts
Research designs Research designs Quantitative Research Designs.
CHAPTER 13: THE NUTS AND BOLTS OF QUASI- EXPERIMENTS.
Issues in Evaluating Educational Research
EXPERIMENTAL RESEARCH
Experimental Research
Experiments Why would a double-blind experiment be used?
“Social Interaction” Threats to Internal Validity
RELIABILITY OF QUANTITATIVE & QUALITATIVE RESEARCH TOOLS
Chapter Eight: Quantitative Methods
Introduction to Design
Experiments and Quasi-Experiments
Experiments and Quasi-Experiments
Introduction to Experimental Design
External Validity.
Group Experimental Design
Reminder for next week CUELT Conference.
Presentation transcript:

Reliability and Validity 3. Threats to internal validity

Measurement MEASUREMENT is any process by which a value is assigned to the level or state of some quality of an object of study

Measuring violence Violence against a woman can range from beatings, to sexual violence or torture, to broken bones and very serious injury caused by pouring of acid or burning the victim alive

Measurement Measurement involves the expression of information in quantifies (numbers) rather than by verbal statement It provides a powerful means of reducing qualitative data to a more condensed form for summarization, manipulation, and analysis

Measurement The best measure should be both reliable and valid

What is reliability? We often speak about “reliable cars." On news people talk about a "usually reliable source“ In both cases, the word reliable usually means "dependable" or "trustworthy." In research, the term "reliable" also means dependable in a general sense, but that's not a precise enough definition

Reliability Reliability is the consistency of your measurement, or the degree to which an instrument measures the same way each time it is used under the same condition with the same subjects

Reliability A measure is considered reliable if a person's score on the same test given twice is similar

Reliability of measuring devices The slightest variations in measuring devices in Olympic track and field events (whether it is a tape or clock) could mean the difference between the gold and silver medals

Reliability of measuring devices Olympic measuring devices, then, must be reliable from one throw or race to another and from one competition to another They must also be reliable when used in different parts of the world, as temperature, air pressure, humidity, interpretation, or other variables might affect their readings

There are two ways that reliability is usually estimated: test/retest internal consistency

Test-Retest Reliability We estimate test-retest reliability when we administer the same test to the same sample on two different occasions

Test/Retest The idea behind test/retest is that you should get the same score on test 1 as you do on test 2. The three main components to this method are as follows: 1) implement your measurement instrument at two separate times for each subject; 2) compute the correlation between the two separate measurements 3) assume there is no change in the underlying condition between test 1 and test 2.

Internal Consistency Internal consistency estimates reliability by grouping questions in a questionnaire that measure the same concept After collecting the responses, run a correlation between groups of questions to determine if your instrument is reliably measuring that concept. Your computer output generates one number for Cronbach's alpha - and just like a correlation coefficient The closer it is to one, the higher the reliability estimate of your instrument.

Example: Deviance scale The offenses include the following: “How many times in the past year have you… carried a hidden weapon other than a plain pocket knife? attacked someone with the idea of seriously hurting or killing them? been involved in gang fights? hit or threatened to hit a teacher or other adult at school? hit or threatened to hit your parents? hit or threatened to hit other students? had or tried to have sexual relations with someone against their will

Validity Validity involves the degree to which you are measuring what you are supposed to More simply, the accuracy of your measurement

Four types of validity Conclusion validity Internal validity Construct validity External validity

Example Say we are studying the effect of strict attendance policies on class participation Suppose, we observe that class participation did increase after the policy was established Each type of validity would highlight a different aspect of the relationship between our treatment (strict attendance policy) and our observed outcome (increased class participation).

Conclusion validity Conclusion validity asks is there a relationship between the program and the observed outcome? Or, in our example, is there a connection between the attendance policy and the increased participation we saw?

Internal Validity The key question in internal validity is whether observed changes can be attributed to your program or intervention (i.e., the cause) and not to other possible causes (sometimes described as "alternative explanations" for the outcome)

Internal Validity

Construct validity It asks if there is a relationship between how I operationalized my concepts in this study to the actual causal relationship I'm trying to study? Or in our example, did our treatment (attendance policy) reflect the construct of attendance, and did our measured outcome - increased class participation - reflect the construct of participation?

External validity External validity refers to our ability to generalize the results of our study to other settings. In our example, could we generalize our results to other classrooms?

Reliability & Validity We often think of reliability and validity as separate ideas but, in fact, they're related to each other. One of my favorite metaphors is the target Think of the center of the target as the concept that you are trying to measure Imagine that for each person you are measuring, you are taking a shot at the target. If you measure the concept perfectly for a person, you are hitting the center of the target If you don't, you are missing the center. The more you are off for that person, the further you are from the center.

Reliability & Validity

Reliability & Validity The figure above shows four possible situations In the first one, you are hitting the target consistently, but you are missing the center of the target That is, you are consistently and systematically measuring the wrong value for all respondents This measure is reliable, but no valid (that is, it's consistent but wrong).

Reliability & Validity The second, shows hits that are randomly spread across the target You seldom hit the center of the target but, on average, you are getting the right answer for the group (but not very well for individuals) In this case, you get a valid group estimate, but you are inconsistent Here, you can clearly see that reliability is directly related to the variability of your measure

Reliability & Validity The third scenario shows a case where your hits are spread across the target and you are consistently missing the center Your measure in this case is neither reliable nor valid Finally, we see the "Robin Hood" scenario -- you consistently hit the center of the target Your measure is both reliable and valid

Threats to Internal Validity Single group threats Multiple group threats Social interaction threats

Single group threats Apply when you are studying a single group receiving a program or treatment Thus, all of these threats can be greatly reduced by adding a control group that is comparable to your program group to your study

Single Group Threats

Single Group Design Threats History Maturation Mortality Instrumentation Testing Regression to the Mean

History threat Occurs when a historical event affects your program group such that it causes the outcome you observe (rather than your treatment being the cause) In our earlier example, this would mean that the stricter attendance policy did not cause an increase in class participation, but rather, the expulsion of several students due to low participation from school impacted group participation

History Threat New math program for first graders For instance, we know that lot's of first graders watch the public TV program Sesame Street In every Sesame Street show they present some very elementary math concepts. Perhaps these shows cause the outcome and not your math program

Maturation Threat New math program is tested The children would have had the exact same outcome even if they had never had your special math training program All you are doing is measuring normal maturation or growth in understanding that occurs as part of growing up -- your math program has no effect

Maturation Threat Occurs when standard events over the course of time cause your outcome. For example, if by chance, the students who participated in your study on class participation all "grew up" naturally and realized that class participation increased their learning (how likely is that?) - that could be the cause of your increased participation, not the stricter attendance policy.

Testing Threat This threat only occurs in the pre-post design The pretest made some of the children more aware of that kind of math problem -- it "primed" them for the program so that when you began the math training they were ready for it in a way that they wouldn't have been without the pretest This is what is meant by a testing threat -- taking the pretest (not getting your program) affects how participants do on the posttest.

Instrumentation Threat The reliability of the instrument used to gauge the dependent variable Examples include proficiency of a human observer (s) or interviewer (s) Inadequate test for first graders

Mortality Threat Mortality means that subjects are dropping out of the study Let's assume that in math tutoring program we have a nontrivial dropout rate between pretest and posttest Assume that the kids who are dropping out are the low pretest math achievement test scorers

Mortality Threat If you look at the average gain from pretest to posttest using all of the scores available to you at each occasion, you would include these low pretest subsequent dropouts in the pretest and not in the posttest By dropping out the potential low scorers from the posttest you'd be artificially inflating the posttest average over what it would have been if no students had dropped out

Regression to the Mean Subjects with extreme scores on a first measure of the dependent variable tend to have scores closer to the mean on a second measure According to Campbell (1969, p. 414): "Take any dependent measure that is repeatedly sampled, move along it as in a time dimension, and pick a point that is the "highest (lowest) so far On the average, the next point will be lower (higher), nearer the general trend."

Multiple Group Threats Involve the comparability of the two groups in your study, and whether or not any other factor other than your treatment causes the outcome They also (conveniently) mirror the single group threats to internal validity.

Social Interaction Threats The results of such research are affected by the human interactions involved The social threats to internal validity refer to the social pressures in the research context that can lead to posttest differences that are not directly caused by the treatment itself Many of these threats can be minimized by isolating the two groups from each other

Diffusion or Imitation of Treatment This occurs when a control group learns about the program either directly or indirectly from treatment group Comparison group subjects, seeing what the program group is getting, might set up their own experience to try to imitate that of the program group It can jeopardize your ability to assess whether your program is causing the outcome

Compensatory Rivalry The control group knows what the treatment group is getting (special math tutoring program) and develops a competitive attitude with them The students feel jealous This could lead them to deciding to compete with the program group "just to show them" how well they can do

Resentful Demoralization The opposite of compensatory rivalry Students in the control group know what the program group is getting Instead of developing a rivalry, they get discouraged or angry and they give up (sometimes referred to as the "screw you" effect!) This threat is likely to exaggerate posttest differences between groups, making your program look even more effective than it actually is.

Compensatory Equalization of Treatment When control and treatment group participants are aware of each other's conditions they may wish they were in the other group (depending on the perceived desirability of the program it could work either way) If the special math tutoring program was being done with state-of-the-art computers, you can bet that the parents of the children assigned to the traditional non-computerized control group will pressure the principal to "equalize" the situation Perhaps the principal will give the comparison group some other good, or let them have access to the computers for other subjects

Compensatory Equalization of Treatment If these "compensating" programs equalize the groups on posttest performance, it will tend to work against your detecting an effective program even when it does work

The Solomon Four-Group Design The Solomon Four-Group Design is designed to deal with a potential testing threat Recall that a testing threat occurs when the act of taking a test affects how people score on a retest or posttest. Note that two of the groups receive the treatment and two do not Further, two of the groups receive a pretest and two do not By explicitly including testing as a factor in the design, we are able to assess experimentally whether a testing threat is operating.

The Solomon Four-Group Design Further, two of the groups receive a pretest and two do not By explicitly including testing as a factor in the design, we are able to assess experimentally whether a testing threat is operating.