Download presentation

1
Lesson Six Reliability

2
**Contents Definition of reliability**

Factors contributing to unreliability Types of reliability Indication of reliability: Reliability coefficient Ways of obtaining reliability coefficient: Alternate/Parallel forms Test-retest Split-half & KR-21/KR-20 Two ways of testing reliability How to make test more reliable Yun-Pi Yuan 2

3
**Definition of Reliability (1)**

“The consistency of measures across different times, test forms, raters, and other characteristics of the measurement context” (Bachman, 1990, p. 24). If you give the same test to the same testees on two different occasions, the test should yield similar results. Yun-Pi Yuan 3

4
**Definition of Reliability (2)**

A reliable test is consistent and dependable. Scores are consistent and reproducible. The accuracy or precision with which a test measures something; that is, consistency, dependability, or stability of test results. Yun-Pi Yuan 4

5
**Factors Contributing to Unreliability**

X=T+ E (observed score = true score + error score) Concerned with freedom from nonsystematic fluctuation. Fluctuations in the student scoring test administration the test itself Yun-Pi Yuan 5

6
**Types of Reliability Student- (or Person-) related reliability**

Rater- (or Scorer-) related reliability Intra-rater reliability Inter-rater reliability Test administration reliability Test (or instrument-related) reliability Yun-Pi Yuan 6

7
**Student-Related Reliability (1)**

The source of the error score comes from the test takers. Temporary illness Fatigue Anxiety Other physical or psychological factors Test-wiseness (i.e., strategies for efficient test taking) Yun-Pi Yuan 7

8
**Student-Related Reliability (2)**

Principles: Assess on several occasions Assess when person is prepared and best able to perform well Ensure that person understands what is expected (e.g., instructions are clear) Yun-Pi Yuan 8

9
**Rater (or Scorer) Reliability (1)**

Fluctuations: including human error, subjectivity, and bias Principles: Use experienced trained raters. Use more than one rater. Raters should carry out their assessments independently. Yun-Pi Yuan 9

10
**Rater Reliability (2) Two kinds of rater reliability:**

Intra-rater reliability Inter-rater reliability Yun-Pi Yuan 10

11
**Intra-Rater Reliability**

Fluctuations including: Unclear scoring criteria Fatigue Bias toward particular good and bad students Simple carelessness Yun-Pi Yuan 11

12
**Inter-Rater Reliability (1)**

Fluctuations including: Lack of attention to scoring criteria Inexperience Inattention Preconceived biases Yun-Pi Yuan 12

13
**Inter-Rater Reliability (2)**

Used with subjective tests when two or more independent raters are involved in scoring Train the raters before scoring (e.g., TWE, dept. oral and composition tests for recommended students). Yun-Pi Yuan 13

14
**Inter-Rater Reliability (3)**

Compare the scores of the same testee given by different raters. If r= high, there’s inter-rater reliability. Yun-Pi Yuan 14

15
**Test Administration Reliability**

Street noise Listening comprehension test Photocopying variations Lighting Variations in temperature Condition of desks and chairs Monitors Yun-Pi Yuan 15

16
**Test Reliability Measurement errors come from the test itself:**

Test is too long Test with a time limit Test format allows for guessing Ambiguous test items Test with more than one correct answer Yun-Pi Yuan 16

17
**Reliability Coefficient (r)**

To quantify the reliability of a test allow us to compare the reliability of different tests. 0 ≤ r ≤ 1 (ideal r= 1, which means the test gives precisely the same results for particular testees regardless of when it happened to be administered). If r = 1: 100% reliable A good achievement test: r>= .90 R<.70 shouldn’t use the test Yun-Pi Yuan 17

18
**How to Get Reliability Coefficient**

Two forms, two administrations: alternate/parallel forms One form, two administrations: test-retest One form, one administration (internal consistency): split-half (Spearman-Brown procedure) KR-21 KR-20 Yun-Pi Yuan 18

19
**Alternate/Parallel Forms**

Two forms, two administrations: Equivalent forms (i.e., different items testing the same topic) taken by the same test taker on different days If r is high, this test is said to have good reliability. the most stringent form Yun-Pi Yuan 19

20
**Test-Retest One form, two administrations**

The same test is administered to the same testees with a short time lag, and then calculate r. Appropriate for highly speeded test Yun-Pi Yuan 20

21
**Split-half (Spearman-Brown Procedure)**

One test, one administration Split the test into halves (i.e., odd questions vs even questions) to form two sets of scores. Also called internal consistency Q1 Q2 Q3 Q4 Q5 Q6 First Half Second Half Yun-Pi Yuan 21

22
**Split-half (2) Note that the r isn’t the reliability of the test**

A math relationship between test length and reliability: the longer the test, the more reliable it is. Rel.total = nr/1+ (n-1)r Spearman & Brown Prophecy Formula E.g., correlation between 2 parts of test; r= .6 rel. of full test = .75 If lengthen the test items into 3 times: r= .82 Yun-Pi Yuan 22

23
**Kuder-Ridchardson formula 21**

KR-21 = k/(k-1){1-[x (1- x/k)]/s2} k= number of items; x= mean s= standard deviation (formula see Bailey 100) description of the spread outness in a set of scores (or score deviations from the mean) o<=s the larger s, the more spread out E.g., 2 sets of scores: (5, 4,3) and (7,4,1); which group in general behaves more similarly? Yun-Pi Yuan 23

24
**Kuder-Ridchardson formula 20**

KR-20= [k/(k-1)][1-(∑pq/s2) p= item difficulty (percent of people who got an item right) q= 1-p (i.e., percent of people who got an item wrong) Yun-Pi Yuan 24

25
**Ways of Testing Reliability**

Examine the amount of variation Standard Error of Measurement (SEM) The smaller the better Calculate “reliability coefficient” “r” The bigger the better Yun-Pi Yuan 25

26
**Standard Error of Measurement (1)**

Average SD of an individual over a large number of testing Essence of variability of scores of an individual How large the error component is likely to be Particularly useful in interpretation of test scores SEM= S√1-rel. Yun-Pi Yuan 26

27
**Standard Error of Measurement (2)**

Average of a set of scores= “true” score of the individual X1=T1+ E1 X2=T2+ E2 : : : Xn= Tn+ En X = T + 0 Yun-Pi Yuan 27

28
**Standard Error of Measurement (3)**

E.g., GRE SD= 100, rel.= .91 SEM= 100 √1-.91= 30 How do we apply the SEM in the interpretation of the score? For a given spread of scores, the greater the reliability coefficient, the smaller will be the SEM. Yun-Pi Yuan 28

29
**Ways of Enhancing Reliability**

General strategies: Consider possible sources of unreliability Reduce or average out nonsystematic fluctuations in raters persons test administration instruments Yun-Pi Yuan 29

30
**How to Make Tests More Reliable? (1)**

Take enough samples of behavior Try to avoid ambiguous items Provide clear and explicit instructions Ensure tests are well layout & perfectly legible Provide uniform and undistracted condition of administration Try to use objective tests Yun-Pi Yuan 30

31
**How to Make Tests More Reliable? (2)**

Try to use direct tests Have independent, trained raters Provide a detailed scoring key Try to identify the test takers by number, not by names Try to have more multiple independent scoring in subjective tests (Hughes, 1989, pp ). Yun-Pi Yuan 31

Similar presentations

© 2019 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google