Presentation is loading. Please wait.

Presentation is loading. Please wait.

Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.

Similar presentations


Presentation on theme: "Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose."— Presentation transcript:

1 Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose of tests’ and the ‘consistency’ with which the purpose is fulfilled/met

2 Validity Depends on the PURPOSE Depends on the PURPOSE E.g. a ruler may be a valid measuring device for length, but isn’t very valid for measuring volume E.g. a ruler may be a valid measuring device for length, but isn’t very valid for measuring volume Measuring what ‘it’ is supposed to Measuring what ‘it’ is supposed to Matter of degree (how valid?) Matter of degree (how valid?) Specific to a particular purpose! Specific to a particular purpose! Must be inferred from evidence; cannot be directly measured Must be inferred from evidence; cannot be directly measured Learning outcomes Learning outcomes 1.Content coverage (relevance?) 2.Level & type of student engagement (cognitive, affective, psychomotor) – appropriate?

3 Reliability Consistency in the type of result a test yields Consistency in the type of result a test yields –Time & space –participants Not perfectly similar result but ‘very close- to’ being similar Not perfectly similar result but ‘very close- to’ being similar When someone says you are a ‘reliable’ person, what do they really mean? When someone says you are a ‘reliable’ person, what do they really mean? Are you a reliable person? Are you a reliable person?

4 What do you think…? Forced-choice assessment forms are high in reliability, but weak in validity (true/false) Forced-choice assessment forms are high in reliability, but weak in validity (true/false) Performance-based assessment forms are high in both validity and reliability (true/false) Performance-based assessment forms are high in both validity and reliability (true/false) A test item is said to be unreliable when most students answered the item wrongly (true/false) A test item is said to be unreliable when most students answered the item wrongly (true/false) When a test contains items that do not represent the content covered during instruction, it is known as an unreliable test (true/false) When a test contains items that do not represent the content covered during instruction, it is known as an unreliable test (true/false) Test items that do not successfully measure the intended learning outcomes (objectives) are invalid items (true/false) Test items that do not successfully measure the intended learning outcomes (objectives) are invalid items (true/false) Assessment that does not represent student learning well enough are definitely invalid and unreliable (true/false) Assessment that does not represent student learning well enough are definitely invalid and unreliable (true/false) A valid test can sometimes be unreliable (true/false) A valid test can sometimes be unreliable (true/false) –If a test is valid, it is reliable! (by-product)

5 Indicators of quality Validity Validity Reliability Reliability Utility Utility Fairness Fairness Question: how are they all inter-related?

6 Two methods of estimating reliability Test/Retest Test/retest is the more conservative method to estimate reliability. Simply put, the idea behind test/retest is that you should get the same score on test 1 as you do on test 2. The three main components to this method are as follows: Test/Retest Test/retest is the more conservative method to estimate reliability. Simply put, the idea behind test/retest is that you should get the same score on test 1 as you do on test 2. The three main components to this method are as follows: 1.) implement your measurement instrument at two separate times for each subject; 2). compute the correlation between the two separate measurements; and 3) assume there is no change in the underlying condition (or trait you are trying to measure) between test 1 and test 2.

7 Internal Consistency Internal consistency estimates reliability by grouping questions in a questionnaire that measure the same concept. For example, you could write two sets of three questions that measure the same concept (say class participation) and after collecting the responses, run a correlation between those two groups of three questions to determine if your instrument is reliably measuring that concept. Internal Consistency Internal consistency estimates reliability by grouping questions in a questionnaire that measure the same concept. For example, you could write two sets of three questions that measure the same concept (say class participation) and after collecting the responses, run a correlation between those two groups of three questions to determine if your instrument is reliably measuring that concept. Two methods of estimating reliability

8 The primary difference between test/retest and internal consistency estimates of reliability is that test/retest involves two administrations of the measurement instrument, whereas the internal consistency method involves only one administration of that instrument. The primary difference between test/retest and internal consistency estimates of reliability is that test/retest involves two administrations of the measurement instrument, whereas the internal consistency method involves only one administration of that instrument.

9 Validity Cook and Campbell (1979) define it as the "best available approximation to the truth or falsity of a given inference, proposition or conclusion.“ In short, were we right? In short, were we right? Let's look at a simple example. Say we are studying the effect of strict attendance policies on class participation. We see that class participation did increase after the policy was established. Each type of validity would highlight a different aspect of the relationship between our treatment (strict attendance policy) and our observed outcome (increased class participation).

10 Types of Validity 1. Conclusion validity 2. Internal Validity 3. Construct validity 4. External validity

11 Conclusion validity asks is there a relationship between the program and the observed outcome? Or, in our example, is there a connection between the attendance policy and the increased participation we saw?

12 Internal Validity asks if there is a relationship between the program and the outcome we saw, is it a causal relationship? For example, did the attendance policy cause class participation to increase?

13 Threats To Internal Validity There are three main types of threats to internal validity – single group multiple group social interaction threats. Single Group Threats apply when you are studying a single group receiving a program or treatment. Thus, all of these threats can be greatly reduced by adding a control group that is comparable to your program group to your study.

14 History Threat occurs when an historical event affects your program group such that it causes the outcome you observe (rather than your treatment being the cause). In our earlier example, this would mean that the stricter attendance policy did not cause an increase in class participation, but rather, the expulsion of several students due to low participation from school impacted your program group such that they increased their participation as a result. Threats To Internal Validity- Single Group

15 A Maturation Threat to internal validity occurs when standard events over the course of time cause your outcome. For example, if by chance, the students who participated in your study on class participation all "grew up" naturally and realized that class participation increased their learning (how likely is that?) - that could be the cause of your increased participation, not the stricter attendance policy. A Maturation Threat to internal validity occurs when standard events over the course of time cause your outcome. For example, if by chance, the students who participated in your study on class participation all "grew up" naturally and realized that class participation increased their learning (how likely is that?) - that could be the cause of your increased participation, not the stricter attendance policy.

16 Threats To Internal Validity- Single Group A Testing Threat to internal validity is simply when the act of taking a pre-test affects how that group does on the post-test. For example, if in your study of class participation, you measured class participation prior to implementing your new attendance policy, and students became forewarned that there was about to be an emphasis on participation, they may increase it simply as a result of involvement in the pretest measure - and thus, your outcome could be a result of a testing threat - not your treatment. A Testing Threat to internal validity is simply when the act of taking a pre-test affects how that group does on the post-test. For example, if in your study of class participation, you measured class participation prior to implementing your new attendance policy, and students became forewarned that there was about to be an emphasis on participation, they may increase it simply as a result of involvement in the pretest measure - and thus, your outcome could be a result of a testing threat - not your treatment.

17 Threats To Internal Validity- Single Group An Instrumentation Threat to internal validity could occur if the effect of increased participation could be due to the way in which that pretest was implemented. An Instrumentation Threat to internal validity could occur if the effect of increased participation could be due to the way in which that pretest was implemented.

18 Threats To Internal Validity- Single Group A Mortality Threat to internal validity occurs when subjects drop out of your study, and this leads to an inflated measure of your effect. For example, if as a result of a stricter attendance policy, most students drop out of a class, leaving only those more serious students in the class (those who would participate at a high level naturally) - this could mean your effect is overestimated and suffering from a mortality threat. A Mortality Threat to internal validity occurs when subjects drop out of your study, and this leads to an inflated measure of your effect. For example, if as a result of a stricter attendance policy, most students drop out of a class, leaving only those more serious students in the class (those who would participate at a high level naturally) - this could mean your effect is overestimated and suffering from a mortality threat.

19 Multiple Group Threats involve the comparability of the two groups in your study, and whether or not any other factor other than your treatment causes the outcome. They also mirror the single group threats to internal validity. A Selection-History threat occurs when an event occurring between the pre and post test affects the two groups differently. A Selection-Maturation threat occurs when there are different rates of growth between the two groups between the pre and post test. Selection-Testing threat is the result of the different effect from taking tests between the two groups. A Selection-Instrumentation threat occurs when the test implementation affects the groups differently between the pre and post test. A Selection-Mortality Threat occurs when there are different rates of dropout between the groups which leads to you detecting an effect that may not actually occur. Finally, a Selection-Regression threat occurs when the two groups regress towards the mean at different rates.

20 Construct validity Asks if there is there a relationship between how I operationalized my concepts in this study to the actual causal relationship I'm trying to study In our example, did our treatment (attendance policy) reflect the construct of attendance, and did our measured outcome - increased class participation - reflect the construct of participation?

21 External validity Refers to our ability to generalize the results of our study to other settings. In our example, could we generalize our results to other classrooms? Other schools?

22 Types of validity measures Face validity Face validity Construct validity Construct validity Content validity Content validity Criterion validity Criterion validity 1.Predictive 2.Concurrent

23 Face Validity Does it appear to measure what it is supposed to measure? Does it appear to measure what it is supposed to measure? Example: Let’s say you are interested in measuring, ‘Propensity towards violence and aggression’. By simply looking at the following items, state which ones qualify to measure the variable of interest: Example: Let’s say you are interested in measuring, ‘Propensity towards violence and aggression’. By simply looking at the following items, state which ones qualify to measure the variable of interest: –Have you been arrested? –Have you been involved in physical fighting? –Do you get angry easily? –Do you sleep with your socks on? –Is it hard to control your anger? –Do you enjoy playing sports?

24 Construct Validity Does the test measure the ‘human’ CHARACTERISTIC(s) it is supposed to? Does the test measure the ‘human’ CHARACTERISTIC(s) it is supposed to? Examples of constructs or ‘human’ characteristics: Examples of constructs or ‘human’ characteristics: –Mathematical reasoning –Verbal reasoning –Musical ability –Spatial ability –Mechanical aptitude –Motivation

25 Content Validity Does the DV measure the empirical, operationalized variable being studied? Does the DV measure the empirical, operationalized variable being studied? Does the exam cover adequately cover the range of material? Does the exam cover adequately cover the range of material?

26 Criterion Validity The degree to which content on a test (predictor) correlates with performance on relevant criterion measures (concrete criterion in the "real" world?) The degree to which content on a test (predictor) correlates with performance on relevant criterion measures (concrete criterion in the "real" world?) If they do correlate highly, it means that the test (predictor) is a valid one! If they do correlate highly, it means that the test (predictor) is a valid one! E.g. if you taught skills relating to ‘public speaking’ and had students do a test on it, the test can be validated by looking at how it relates to actual performance (public speaking) of students inside or outside of the classroom E.g. if you taught skills relating to ‘public speaking’ and had students do a test on it, the test can be validated by looking at how it relates to actual performance (public speaking) of students inside or outside of the classroom

27 Two Types of Criterion Validity Concurrent Criterion Validity = how well performance on a test estimates current performance on some valued measure (criterion)? (e.g. test of dictionary skills can estimate students’ current skills in the actual use of dictionary – observation) Concurrent Criterion Validity = how well performance on a test estimates current performance on some valued measure (criterion)? (e.g. test of dictionary skills can estimate students’ current skills in the actual use of dictionary – observation) Predictive Criterion Validity = how well performance on a test predicts future performance on some valued measure (criterion)? (e.g. reading readiness test might be used to predict students’ achievement in reading) Predictive Criterion Validity = how well performance on a test predicts future performance on some valued measure (criterion)? (e.g. reading readiness test might be used to predict students’ achievement in reading)

28 Reliability Measure of consistency of test results from one administration of the test to the next Measure of consistency of test results from one administration of the test to the next Generalizability – consistency (interwoven concepts) – if a test item is reliable, it can be correlated with other items to collectively measure a construct or content mastery Generalizability – consistency (interwoven concepts) – if a test item is reliable, it can be correlated with other items to collectively measure a construct or content mastery

29 Measuring Reliability Test – retest Test – retest Give the same test twice to the same group with any time interval between tests Equivalent forms (similar in content, difficulty level, arrangement, type of assessment, etc.) Equivalent forms (similar in content, difficulty level, arrangement, type of assessment, etc.) Give two forms of the test to the same group in close succession Split-half Split-half Test has two equivalent halves. Give test once, score two equivalent halves (odd items vs. even items) Cronbach Alpha (SPSS) Cronbach Alpha (SPSS) Inter-item consistency – one test – one administration Inter-rater Consistency (subjective scoring) Inter-rater Consistency (subjective scoring)


Download ppt "Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose."

Similar presentations


Ads by Google