Download presentation

Presentation is loading. Please wait.

Published byBraiden Gedney Modified about 1 year ago

1
Lesson Six Reliability

2
Yun-Pi Yuan 2 Contents Definition of reliability Definition of reliability Factors contributing to unreliability Factors contributing to unreliability Types of reliability Types of reliability Indication of reliability: Reliability coefficient Reliability coefficient Ways of obtaining reliability coefficient: Ways of obtaining reliability coefficient Alternate/Parallel forms Test-retest Split-half & KR-21/KR-20 Split-halfKR-21KR-20 Two ways of testing reliability Two ways of testing reliability How to make test more reliable How to make test more reliable

3
Yun-Pi Yuan 3 Definition of Reliability (1) “ The consistency of measures across different times, test forms, raters, and other characteristics of the measurement context ” (Bachman, 1990, p. 24). If you give the same test to the same testees on two different occasions, the test should yield similar results.

4
Yun-Pi Yuan 4 Definition of Reliability (2) A reliable test is consistent and dependable. Scores are consistent and reproducible. The accuracy or precision with which a test measures something; that is, consistency, dependability, or stability of test results.

5
Yun-Pi Yuan 5 Factors Contributing to Unreliability X=T+ E (observed score = true score + error score) Concerned with freedom from nonsystematic fluctuation. Fluctuations in the student scoring test administration the test itself

6
Yun-Pi Yuan 6 Types of Reliability Student- (or Person-) related reliability Rater- (or Scorer-) related reliability Intra-rater reliability Inter-rater reliability Test administration reliability Test ( or instrument-related ) reliability

7
Yun-Pi Yuan 7 Student-Related Reliability (1) The source of the error score comes from the test takers. Temporary illness Fatigue Anxiety Other physical or psychological factors Test-wiseness ( i.e., strategies for efficient test taking)

8
Yun-Pi Yuan 8 Student-Related Reliability (2) Principles: Assess on several occasions Assess when person is prepared and best able to perform well Ensure that person understands what is expected (e.g., instructions are clear)

9
Yun-Pi Yuan 9 Rater (or Scorer) Reliability (1) Fluctuations: including human error, subjectivity, and bias Principles: Use experienced trained raters. Use more than one rater. Raters should carry out their assessments independently.

10
Yun-Pi Yuan 10 Rater Reliability (2) Two kinds of rater reliability: Intra-rater reliability Inter-rater reliability

11
Yun-Pi Yuan 11 Intra-Rater Reliability Fluctuations including: Unclear scoring criteria Fatigue Bias toward particular good and bad students Simple carelessness

12
Yun-Pi Yuan 12 Inter-Rater Reliability (1) Fluctuations including: Lack of attention to scoring criteria Inexperience Inattention Preconceived biases

13
Yun-Pi Yuan 13 Inter-Rater Reliability (2) Used with subjective tests when two or more independent raters are involved in scoring Train the raters before scoring (e.g., TWE, dept. oral and composition tests for recommended students).

14
Yun-Pi Yuan 14 Inter-Rater Reliability (3) Compare the scores of the same testee given by different raters. If r= high, there ’ s inter-rater reliability.r

15
Yun-Pi Yuan 15 Test Administration Reliability Street noise Listening comprehension test Photocopying variations Lighting Variations in temperature Condition of desks and chairs Monitors

16
Yun-Pi Yuan 16 Test Reliability Measurement errors come from the test itself: Test is too long Test with a time limit Test format allows for guessing Ambiguous test items Test with more than one correct answer

17
Yun-Pi Yuan 17 Reliability Coefficient (r) To quantify the reliability of a test allow us to compare the reliability of different tests. 0 ≤ r ≤ 1 (ideal r= 1, which means the test gives precisely the same results for particular testees regardless of when it happened to be administered). If r = 1: 100% reliable A good achievement test: r>=.90 R<.70 shouldn’t use the test

18
Yun-Pi Yuan 18 How to Get Reliability Coefficient Two forms, two administrations: alternate/parallel forms One form, two administrations: test-retest One form, one administration (internal consistency) : split-half (Spearman-Brown procedure) KR-21 KR-20

19
Yun-Pi Yuan 19 Alternate/Parallel Forms Two forms, two administrations: Equivalent forms (i.e., different items testing the same topic) taken by the same test taker on different days If r is high, this test is said to have good reliability. the most stringent form Test plan Form AForm B

20
Yun-Pi Yuan 20 Test-Retest The same test is administered to the same testees with a short time lag, and then calculate r. Appropriate for highly speeded test Test A Trial 1Trial 2 One form, two administrations

21
Yun-Pi Yuan 21 Split-half (Spearman-Brown Procedure) One test, one administration Split the test into halves (i.e., odd questions vs even questions) to form two sets of scores. Also called internal consistency Q1 Q2 Q3 Q4 Q5 Q6 First Half Second Half

22
Yun-Pi Yuan 22 Split-half (2) Note that the r isn ’ t the reliability of the test A math relationship between test length and reliability: the longer the test, the more reliable it is. Rel. total = nr/1+ (n-1)r Spearman & Brown Prophecy Formula E.g., correlation between 2 parts of test; r=.6 rel. of full test =.75 If lengthen the test items into 3 times: r=.82

23
Yun-Pi Yuan 23 Kuder-Ridchardson formula 21 KR-21 = k/(k-1){1-[x (1- x/k)]/s 2 } k= number of items; x= mean s= standard deviation (formula see Bailey 100) description of the spread outness in a set of scores (or score deviations from the mean) o<=s the larger s, the more spread out E.g., 2 sets of scores: (5, 4,3) and (7,4,1); which group in general behaves more similarly?

24
Yun-Pi Yuan 24 Kuder-Ridchardson formula 20 KR -20= [k/(k-1)][1-(∑pq/s 2 ) p= item difficulty (percent of people who got an item right) q= 1-p (i.e., percent of people who got an item wrong)

25
Yun-Pi Yuan 25 Ways of Testing Reliability Examine the amount of variation Standard Error of Measurement (SEM) The smaller the better Calculate “ reliability coefficient ” “ r ” The bigger the better

26
Yun-Pi Yuan 26 Standard Error of Measurement (1) Average SD of an individual over a large number of testing Essence of variability of scores of an individual How large the error component is likely to be Particularly useful in interpretation of test scores SEM= S√1-rel.

27
Yun-Pi Yuan 27 Standard Error of Measurement (2) Average of a set of scores= “true” score of the individual X 1 =T 1 + E 1 X 2 =T 2 + E 2 : : : X n = T n + E n X = T + 0

28
Yun-Pi Yuan 28 Standard Error of Measurement (3) E.g., GRE SD= 100, rel.=.91 SEM= 100 √1-.91= 30 o How do we apply the SEM in the interpretation of the score? For a given spread of scores, the greater the reliability coefficient, the smaller will be the SEM.

29
Yun-Pi Yuan 29 Ways of Enhancing Reliability General strategies: Consider possible sources of unreliability Reduce or average out nonsystematic fluctuations in raters persons test administration instruments

30
Yun-Pi Yuan 30 How to Make Tests More Reliable? (1) Take enough samples of behavior Try to avoid ambiguous items Provide clear and explicit instructions Ensure tests are well layout & perfectly legible Provide uniform and undistracted condition of administration Try to use objective tests

31
Yun-Pi Yuan 31 How to Make Tests More Reliable? (2) Try to use direct tests Have independent, trained raters Provide a detailed scoring key Try to identify the test takers by number, not by names Try to have more multiple independent scoring in subjective tests (Hughes, 1989, pp ).

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google