Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lesson Six Reliability. Yun-Pi Yuan 2 Contents  Definition of reliability Definition of reliability  Factors contributing to unreliability Factors contributing.

Similar presentations

Presentation on theme: "Lesson Six Reliability. Yun-Pi Yuan 2 Contents  Definition of reliability Definition of reliability  Factors contributing to unreliability Factors contributing."— Presentation transcript:

1 Lesson Six Reliability

2 Yun-Pi Yuan 2 Contents  Definition of reliability Definition of reliability  Factors contributing to unreliability Factors contributing to unreliability  Types of reliability Types of reliability  Indication of reliability: Reliability coefficient Reliability coefficient  Ways of obtaining reliability coefficient: Ways of obtaining reliability coefficient Alternate/Parallel forms Test-retest Split-half & KR-21/KR-20 Split-halfKR-21KR-20  Two ways of testing reliability Two ways of testing reliability  How to make test more reliable How to make test more reliable

3 Yun-Pi Yuan 3 Definition of Reliability (1)  “ The consistency of measures across different times, test forms, raters, and other characteristics of the measurement context ” (Bachman, 1990, p. 24).  If you give the same test to the same testees on two different occasions, the test should yield similar results.

4 Yun-Pi Yuan 4 Definition of Reliability (2)  A reliable test is consistent and dependable.  Scores are consistent and reproducible.  The accuracy or precision with which a test measures something; that is, consistency, dependability, or stability of test results.

5 Yun-Pi Yuan 5 Factors Contributing to Unreliability  X=T+ E (observed score = true score + error score)  Concerned with freedom from nonsystematic fluctuation.  Fluctuations in the student scoring test administration the test itself

6 Yun-Pi Yuan 6 Types of Reliability  Student- (or Person-) related reliability  Rater- (or Scorer-) related reliability Intra-rater reliability Inter-rater reliability  Test administration reliability  Test ( or instrument-related ) reliability

7 Yun-Pi Yuan 7 Student-Related Reliability (1)  The source of the error score comes from the test takers. Temporary illness Fatigue Anxiety Other physical or psychological factors Test-wiseness ( i.e., strategies for efficient test taking)

8 Yun-Pi Yuan 8 Student-Related Reliability (2)  Principles: Assess on several occasions Assess when person is prepared and best able to perform well Ensure that person understands what is expected (e.g., instructions are clear)

9 Yun-Pi Yuan 9 Rater (or Scorer) Reliability (1)  Fluctuations: including human error, subjectivity, and bias  Principles: Use experienced trained raters. Use more than one rater. Raters should carry out their assessments independently.

10 Yun-Pi Yuan 10 Rater Reliability (2)  Two kinds of rater reliability: Intra-rater reliability Inter-rater reliability

11 Yun-Pi Yuan 11 Intra-Rater Reliability  Fluctuations including: Unclear scoring criteria Fatigue Bias toward particular good and bad students Simple carelessness

12 Yun-Pi Yuan 12 Inter-Rater Reliability (1)  Fluctuations including: Lack of attention to scoring criteria Inexperience Inattention Preconceived biases

13 Yun-Pi Yuan 13 Inter-Rater Reliability (2)  Used with subjective tests when two or more independent raters are involved in scoring  Train the raters before scoring (e.g., TWE, dept. oral and composition tests for recommended students).

14 Yun-Pi Yuan 14 Inter-Rater Reliability (3)  Compare the scores of the same testee given by different raters. If r= high, there ’ s inter-rater reliability.r

15 Yun-Pi Yuan 15 Test Administration Reliability  Street noise Listening comprehension test  Photocopying variations  Lighting  Variations in temperature  Condition of desks and chairs  Monitors

16 Yun-Pi Yuan 16 Test Reliability  Measurement errors come from the test itself: Test is too long Test with a time limit Test format allows for guessing Ambiguous test items Test with more than one correct answer

17 Yun-Pi Yuan 17 Reliability Coefficient (r)  To quantify the reliability of a test  allow us to compare the reliability of different tests.  0 ≤ r ≤ 1 (ideal r= 1, which means the test gives precisely the same results for particular testees regardless of when it happened to be administered).  If r = 1: 100% reliable  A good achievement test: r>=.90  R<.70  shouldn’t use the test

18 Yun-Pi Yuan 18 How to Get Reliability Coefficient  Two forms, two administrations: alternate/parallel forms  One form, two administrations: test-retest  One form, one administration (internal consistency) : split-half (Spearman-Brown procedure) KR-21 KR-20

19 Yun-Pi Yuan 19 Alternate/Parallel Forms  Two forms, two administrations: Equivalent forms (i.e., different items testing the same topic) taken by the same test taker on different days If r is high, this test is said to have good reliability. the most stringent form Test plan Form AForm B

20 Yun-Pi Yuan 20 Test-Retest  The same test is administered to the same testees with a short time lag, and then calculate r.  Appropriate for highly speeded test Test A Trial 1Trial 2  One form, two administrations

21 Yun-Pi Yuan 21 Split-half (Spearman-Brown Procedure)  One test, one administration  Split the test into halves (i.e., odd questions vs even questions) to form two sets of scores.  Also called internal consistency Q1 Q2 Q3 Q4 Q5 Q6 First Half Second Half

22 Yun-Pi Yuan 22 Split-half (2)  Note that the r isn ’ t the reliability of the test  A math relationship between test length and reliability: the longer the test, the more reliable it is.  Rel. total = nr/1+ (n-1)r  Spearman & Brown Prophecy Formula  E.g., correlation between 2 parts of test; r=.6  rel. of full test =.75  If lengthen the test items into 3 times: r=.82

23 Yun-Pi Yuan 23 Kuder-Ridchardson formula 21  KR-21 = k/(k-1){1-[x (1- x/k)]/s 2 }  k= number of items; x= mean  s= standard deviation (formula see Bailey 100) description of the spread outness in a set of scores (or score deviations from the mean) o<=s  the larger s, the more spread out E.g., 2 sets of scores: (5, 4,3) and (7,4,1); which group in general behaves more similarly?

24 Yun-Pi Yuan 24 Kuder-Ridchardson formula 20  KR -20= [k/(k-1)][1-(∑pq/s 2 )  p= item difficulty (percent of people who got an item right)  q= 1-p (i.e., percent of people who got an item wrong)

25 Yun-Pi Yuan 25 Ways of Testing Reliability  Examine the amount of variation Standard Error of Measurement (SEM) The smaller the better  Calculate “ reliability coefficient ” “ r ” The bigger the better

26 Yun-Pi Yuan 26 Standard Error of Measurement (1)  Average SD of an individual over a large number of testing  Essence of variability of scores of an individual  How large the error component is likely to be  Particularly useful in interpretation of test scores  SEM= S√1-rel.

27 Yun-Pi Yuan 27 Standard Error of Measurement (2)  Average of a set of scores= “true” score of the individual  X 1 =T 1 + E 1 X 2 =T 2 + E 2 : : : X n = T n + E n X = T + 0

28 Yun-Pi Yuan 28 Standard Error of Measurement (3)  E.g., GRE SD= 100, rel.=.91 SEM= 100 √1-.91= 30 o How do we apply the SEM in the interpretation of the score?  For a given spread of scores, the greater the reliability coefficient, the smaller will be the SEM.

29 Yun-Pi Yuan 29 Ways of Enhancing Reliability  General strategies:  Consider possible sources of unreliability Reduce or average out nonsystematic fluctuations in  raters  persons  test administration  instruments

30 Yun-Pi Yuan 30 How to Make Tests More Reliable? (1)  Take enough samples of behavior  Try to avoid ambiguous items  Provide clear and explicit instructions  Ensure tests are well layout & perfectly legible  Provide uniform and undistracted condition of administration  Try to use objective tests

31 Yun-Pi Yuan 31 How to Make Tests More Reliable? (2)  Try to use direct tests  Have independent, trained raters  Provide a detailed scoring key  Try to identify the test takers by number, not by names  Try to have more multiple independent scoring in subjective tests (Hughes, 1989, pp. 36-42).

Download ppt "Lesson Six Reliability. Yun-Pi Yuan 2 Contents  Definition of reliability Definition of reliability  Factors contributing to unreliability Factors contributing."

Similar presentations

Ads by Google