Presentation is loading. Please wait.

Presentation is loading. Please wait.

Validity and Reliability

Similar presentations


Presentation on theme: "Validity and Reliability"— Presentation transcript:

1 Validity and Reliability

2 VALIDITY In scientific research validity refers to whether a study is able to scientifically answer the questions it is intended to answer. Instrument selection is important for validity because instruments are used to collect data Data are used to make inferences related to the questions Thus, the inferences about the specific uses of an instrument should also be validated.

3 What do we mean with validity of inferences?
Our inferences should be relevant to the purpose of the study (appropriate) If we want to see what our students’ attitudes are towards learning English, there is no use in making inferences using their scores in English tests. Our inferences should be meaningful and correct We should say something about the meaning of the information we collect. E.g. What does a high score on a particular test mean?

4 Our inferences should be useful
They should help researchers make a decision related to what they are trying to find out. E.g. If you want to see positive effects of formative assessment on student achievement, you should have information that will help you infer whether your students’ achievement is affected by formative assessment or not. Thus, validity depends on the amount and type of the evidence you have!

5 Kinds of evidence of validity
Content-Related evidence of validity Content and format of the instrument: the degree to which an instrument logically appears to measure an intended variable How appropriate is the content? Is the format appropriate? Does it logically get at the intended variable? How adequately does the sample of items or questions represent the content to be assessed?, etc.

6 Two points to consider in content-related evidence
i) adequacy of sampling Whether the content of the instrument has adequate sample of the domain of content it is supposed to represent E.g. If you want to see your students’ achievement at macro level, you should have enough number of items that show this skill.

7 ii) format of the instrument
Clarity of printing, size of type, adequacy of work space, appropriateness of language, clarity of directions, etc. E.g. If you want to see students’ attitudes towards English, the questionnaire should be in their target language if their level of target language proficiency is not high enough.

8 How do we obtain content-related evidence of validity?
Write out the definition of what you want to measure and give this definition (together with the instrument and the intended sample) to a number of judges. The judges look at the definition and place a checkmark in front of each item in the instrument that they feel does not measure the objectives. They also place a checkmark in front of each aspect in the definitions that is not assessed by the instrument. They evaluate the appropriateness of the format. Then the researcher rewrites these items. This continues until all judges approve of all items.

9 Example Judge No: ___________ Match to Portfolio Assessment Objectives
No Match Perfect Match A RANGE 1. ability to link ideas in a variety of ways 2. ability to use wide range of genres (stories, reports, articles, etc) 3. evidence of various topics B FLEXIBILITY 4. evidence of variations in the style, vocab, tone,lang., voice and ideas 5. evidence for the appropriateness of style, vocab, tone, lang. and voice C CONNECTIONS 6. evidence of applications of already-known concepts to newly- learned ones 7. evidence of new concepts and/or metaphors

10 General Aims of the Portfolio Assessment System
1. improving students’ writing abilities 2. improving students’ metacognitive skills 3. leading students to become autonomous language learners Specific objectives of the Portfolio Assessment System I- Helping students improve their linguistic skills in writing from the point of A) Grammar, punctuation and spelling, B) Vocabulary C) Coherence and Cohesion II- Helping students improve their metacognitive skills from the point of A) Applying and/or creating new concepts or ideas B) Using varieties in writing appropriately C) Analysing and Synthesising what they have learned/read D) Using other sources III- Helping students become autonomous language learners from the point of A) Applying their own views B) Connecting other sources with what they know

11 Criterion-Related Evidence
Comparing performance on one instrument with performance on some other. Two forms are available: a) predictive validity: compares the scores on the original test with scores on one or more criterion measures obtained in a follow-up testing b) concurrent validity: compares the test results with results obtained through a parallel, substitute measure

12 On both forms, a correlation coefficient is used.
Correlation coefficient (r) shows the degree of relationship that exists between the scores individuals obtain on two instruments. A positive relationship : a high (low) score on one instrument is accompanied by a high score (low) score on the other A negative relationship: a high (low) score on one instrument is accompanied by a low (high) score on the other Correlation coefficients fall somewhere between and An r of .00 indicates that no relationship exists.

13 Construct-Related Evidence
Establishing a link between the underlying theoretical construct we wish to measure and the visible performance we choose to observe Construct validation consists of building a strong logical case based on circumstantial evidence that a test measures the construct it is intended to measure

14 Generally there are 3 steps
i) the variable being measure is clearly defined ii) hypotheses, based on a theory underlying the variable, are formed about how people who possess a lot versus a little of the variable will behave in a particular situation iii) hypotheses are tested both logically and empirically

15 RELIABILITY The consistency of the scores obtained.
Possible to have quite reliable but invalid scores (Unreliable scores can never be valid!) What is desirable is to have both high reliability and high validity.

16 Errors of Measurement When someone takes the same test twice, they rarely perform exactly the same, due to many factors. Such factors result in errors of measurement. Because of errors of measurement, researchers expect some variation in scores. Reliability estimates help researchers have an idea of how much variation to expect.

17 This estimate is another application of correlation coefficient, known as a reliability coefficient.
A reliability coefficient is again a relationship, but it is between scores of the same individuals on the same instrument on two different times, or between two parts of the same instrument. There are three best ways to obtain reliability coefficient.

18 1. Test-Retest Method Administering the same test twice to the same group after a certain time. A reliability coefficient indicates the relationship between the two sets of scores obtained. Reliability coefficient is affected by the length of the time interval. The longer the time, the lower the reliability coefficient. The interval should be determined by the researcher considering that the individuals would retain their relative position. Most of the time 1-3 month interval is sufficient!

19 2. Equivalent-Forms Method
Two different but equivalent (parallel) forms of an instrument are administered to the same group of individuals during the same period of time. The questions (items) are different but they sample the same content. A reliability coefficient indicates strong evidence that the two forms are measuring the same thing.

20 3. Internal-Consistency Methods
There are several internal-consistency methods and they all require only a single administration of an instrument.

21 Split-half procedure Two halves of a test (odd items vs even items) is scored and a correlation coefficient is calculated for the two sets of scores. Spearman-Brown prophecy formula is used for calculation. The reliability of a test (instrument) can be increased by adding more items.

22 Kuder-Richardson Approaches
Two formulas: KR20 and KR21 KR21 is used when all items are of equal difficulty: you need the number of items on the test, the mean, and the standard deviation KR20 is more complicated but must be used when you cannot assume that all items are of equal difficulty

23 Alpha Coefficient (Cronbach alpha) (α)
General form of the KR20 formula Used to calculate the reliability of items that are not scored right versus wrong e.g. some essays where more than one answer is possible

24 Scoring Agreement When there is subjective evaluation (like essay scoring), there is the possibility of observer differences. In that case, scoring agreement should be reported. Such cases require training to obtain as high reliability as possible. The expected correlation is at least .90 correlation or 80% of agreement.

25 In case of subjective rating, we can talk about two kinds of reliability:
Intra-rater reliability: similar to test-retest strategy. The same raters score the papers of the same group of students in two separate occasions (e.g. two weeks apart). Thus, the intra-rater reliability is an estimate of the consistency of judgments over time

26 Inter-rater reliability: similar to the equivalent-forms strategy since the scores are obtained from two different raters Inter-rater reliability estimates the extent to which two or more raters agree on the score that should be assigned to a written sample. A correlation coefficient is calculated between the scores. Then the obtained coefficients are adjusted by the use of Spearman-Brown Prophecy formula.


Download ppt "Validity and Reliability"

Similar presentations


Ads by Google