Presentation is loading. Please wait.

Presentation is loading. Please wait.

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

Similar presentations


Presentation on theme: "© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity."— Presentation transcript:

1 © McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity

2 © McGraw-Hill Higher Education. All rights reserved. Chapter 3 Outline Selecting a Criterion Score Types of Reliability Reliability Theory Estimating Reliability – Intraclass R Spearman-Brown Prophecy Formula Standard Error of Measurement Objectivity Reliability of Criterion-referenced Tests Reliability of Difference Scores

3 © McGraw-Hill Higher Education. All rights reserved. Objectivity Interrater Reliability Agreement of competent judges about the value of a measure.

4 © McGraw-Hill Higher Education. All rights reserved. Reliability Dependability of scores Consistency Degree to which a test is free from measurement error.

5 © McGraw-Hill Higher Education. All rights reserved. Selecting a Criterion Score Criterion score – the measure used to indicate a person’s ability. – Can be based on the mean score of the best score. Mean Score – average of all trials. – Usually a more reliable estimate of a person’s true ability. Best Score – optimal score a person achieves on any one trial. – May be used when criterion score is to be used as an indicator of maximum possible performance.

6 © McGraw-Hill Higher Education. All rights reserved. Potential Methods to Select a Criterion Score 1. Mean of all trials. 2. Best score of all trials. 3. Mean of selected trials based on trials on which group scored best. 4. Mean of selected trials based on trials on which individual scored best (i.e., omit outliers). Appropriate method to use depends on the situation.

7 © McGraw-Hill Higher Education. All rights reserved. Norm-referenced Test Designed to reflect individual differences.

8 © McGraw-Hill Higher Education. All rights reserved. In Norm-referenced Framework Reliability - ability to detect reliable differences between subjects.

9 © McGraw-Hill Higher Education. All rights reserved. Types of Reliability Stability Internal Consistency

10 © McGraw-Hill Higher Education. All rights reserved. Stability (Test-retest) Reliability l Each subject is measured with same instrument on two or more different days. l Scores are then correlated. l An intraclass correlation should be used.

11 © McGraw-Hill Higher Education. All rights reserved. Internal Consistency Reliability Consistent rate of scoring throughout a test or from trial to trial. All trials are administered in a single day. Trial scores are then correlated. – An intraclass correlation should be used.

12 © McGraw-Hill Higher Education. All rights reserved. Sources of Measurement Error Lack of agreement among raters (i.e., objectivity). Lack of consistent performance by person. Failure of instrument to measure consistently. Failure of tester to follow standardized procedures.

13 © McGraw-Hill Higher Education. All rights reserved. Reliability Theory X = T + E Observed score = True score + Error  2 X =  2 t +  2 e Observed score variance = True score variance + Error variance Reliability =  2 t ÷  2 X Reliability = (  2 X -  2 e ) ÷  2 X

14 © McGraw-Hill Higher Education. All rights reserved. Reliability depends on: Decreasing measurement error Detecting individual differences among people – ability to discriminate among different ability levels

15 © McGraw-Hill Higher Education. All rights reserved. Reliability Ranges from 0 to 1.00 – When R = 0, there is no reliability. – When R = 2, there is maximum reliability.

16 © McGraw-Hill Higher Education. All rights reserved. Reliability from Intraclass R ANOVA is used to partition the variance of a set of scores. Parts of the variance are used to calculate the intraclass R.

17 © McGraw-Hill Higher Education. All rights reserved. Estimating Reliability Intraclass correlation from one-way ANOVA: R = (MS A – MS W )  MS A – MS A = Mean square among subjects (also called between subjects) – MS w = Mean square within subjects – Mean square = variance estimate This represents reliability of the mean test score for each person.

18 © McGraw-Hill Higher Education. All rights reserved. Sample SPSS One-way Reliability Analysis

19 © McGraw-Hill Higher Education. All rights reserved. Estimating Reliability Intraclass correlation from two-way ANOVA: R = (MS A – MS R )  MS A – MS A = Mean square among subjects (also called between subjects) – MS R = Mean square residual – Mean square = variance estimate Used when trial to trial variance is not considered measurement error (e.g., Likert type scale).

20 © McGraw-Hill Higher Education. All rights reserved. Sample SPSS Two-way Reliability Analysis

21 © McGraw-Hill Higher Education. All rights reserved. What is acceptable reliability? Depends on: – age – gender – experience of people tested – size of reliability coefficients others have obtained – number of days or trials – stability vs. internal consistency coefficient

22 © McGraw-Hill Higher Education. All rights reserved. l Most physical measures are stable from day- to-day. ­ Expect test-retest R xx between.80 and.95. l Expect lower R xx for tests with an accuracy component (e.g.,.70). l For written test, want R XX >.70. l For psychological instruments, want R XX >.70. l Critical issue: time interval between 2 test sessions for stability reliability estimates. 1 to 3 days apart for physical measures is usually appropriate. What is acceptable reliability?

23 © McGraw-Hill Higher Education. All rights reserved. Factors Affecting Reliability Type of test. – Maximum effort test expect R xx .80 – Accuracy type test expect R xx .70 – Psychological inventories expect R xx .70 Range of ability. – R xx higher for heterogeneous groups than for homogeneous groups. Test length. – Longer test, higher R xx

24 © McGraw-Hill Higher Education. All rights reserved. Factors Affecting Reliability Scoring accuracy. – Person administering test must be competent. Test difficulty. – Test must discriminate among ability levels. Test environment, organization, and instructions. – favorable to good performance, motivated to do well, ready to be tested, know what to expect.

25 © McGraw-Hill Higher Education. All rights reserved. Factors Affecting Reliability Fatigue – decreases R xx Practice trials – increase R xx

26 © McGraw-Hill Higher Education. All rights reserved. Coefficient Alpha AKA Cronbach’s alpha Most widely used with attitude instruments Same as two-way intraclass R through ANOVA An estimate of R xx of a criterion score that is the sum of trial scores in one day

27 © McGraw-Hill Higher Education. All rights reserved. Coefficient Alpha R alpha = [K / (K-1)] x [(S 2 x -  S 2 trials ) / S 2 x ] K = # of trials or items S 2 x = variance for criterion score (sum of all trials)  S 2 trials = sum of variances for all trials

28 © McGraw-Hill Higher Education. All rights reserved. Kuder-Richardson (KR) Estimate of internal consistency reliability by determining how all items on a test relate to the total test. KR formulas 20 and 21 are typically used to estimate R xx of knowledge tests. Used with dichotomous items (scored as right or wrong). KR 20 = coefficient alpha

29 © McGraw-Hill Higher Education. All rights reserved. KR 20 KR 20 = [K / (K-1)] x [(S 2 x -  pq) / S 2 x ] K = # of trials or items S 2 x = variance of scores p = percentage answering item right q = percentage answering item wrong  pq = sum of pq products for all k items

30 © McGraw-Hill Higher Education. All rights reserved. KR 20 Example Itempq 1.50.50 2.25.75 3.80.20 4.90.10 If Mean = 2.45 and SD = 1.2, what is KR 20 ? pq.25.1875.16.09  pq = 0.6875 KR 20 = (4/3) x (1.44 – 0.6875)/1.44 KR 20 =.70

31 © McGraw-Hill Higher Education. All rights reserved. KR 21 If assume all test items are equally difficult, KR 20 can be simplified to KR 21 KR 21 = [(K x S 2 )-(Mean x (K - Mean)] ÷ [(K-1) x S 2 ] K = # of trials or items S 2 = variance of test Mean = mean of test

32 © McGraw-Hill Higher Education. All rights reserved. Equivalence Reliability (Parallel Forms) Two equivalent forms of a test are administered to same subjects. Scores on the two forms are then correlated.

33 © McGraw-Hill Higher Education. All rights reserved. Spearman-Brown Prophecy formula Used to estimate r xx of a test that is changed in length. r kk = (k x r 11 ) ÷ [1 + (k - 1)(r 11 )] k = number of times test is changed in length. k = (# trials want) ÷ (# trials have) r 11 = reliability of test you’re starting with Spearman-Brown formula will give an estimate of maximum reliability that can be expected (upper bound estimate).

34 © McGraw-Hill Higher Education. All rights reserved. Standard Error of Measurement (SE M ) Degree you expect test score to vary due to measurement error. Standard deviation of a test score. SE M = S x  1 - R xx S x = standard deviation of group R xx = reliability coefficient Small SE M indicates high reliability

35 © McGraw-Hill Higher Education. All rights reserved. SE M example: written test: S x = 5 R xx =.88 SE M = 5  1 -.88 = 1.73 Confidence Interval: 68%X ± 1.00 (SE M ) 95%X ± 1.96 (SE M ) If X =2323 + 1.73 = 24.73 23 - 1.73 = 21.27 68% confident true score is between 21.27 and 24.73

36 © McGraw-Hill Higher Education. All rights reserved. Objectivity (Rater Reliability) Degree of agreement between raters. Depends on: – clarity of scoring system. – degree to which judge can assign scores accurately. If test is highly objective, objectivity is obvious and rarely calculated. As subjectivity increases, test developer should report estimate of objectivity.

37 © McGraw-Hill Higher Education. All rights reserved. Two Types of Objectivity: Intrajudge objectivity – consistency in scoring when test user scores same test two or more times. Interjudge objectivity – consistency between two or more independent judgments of same performance. Calculate objectivity like reliability, but substitute judges scores for trials.

38 © McGraw-Hill Higher Education. All rights reserved. Criterion-referenced Test A test used to classify a person as proficient or nonproficient (pass or fail).

39 © McGraw-Hill Higher Education. All rights reserved. In Criterion-referenced Framework: Reliability - defined as consistency of classification.

40 © McGraw-Hill Higher Education. All rights reserved. Reliability of Criterion- referenced Test Scores To estimate reliability, a double- classification or contingency table is formed.

41 © McGraw-Hill Higher Education. All rights reserved. Contingency Table (Double-classification Table) PassFail Day 2 Pass Fail Day 1 AB CD

42 © McGraw-Hill Higher Education. All rights reserved. Proportion of Agreement (P a ) Most popular way to estimate R xx of CRT. P a = (A + D) ÷ (A + B + C + D) P a does not take into account that some consistent classifications could happen by chance.

43 © McGraw-Hill Higher Education. All rights reserved. Example for calculating P a PassFail Day 2 Pass Fail Day 1 4512 835

44 © McGraw-Hill Higher Education. All rights reserved. P a = (A + D) ÷ (A + B + C + D) P a = (45 + 35) ÷ (45 + 12 + 8 + 35) P a = 80 ÷ 100 =.80 PassFail Day 2 Pass Fail Day 1 4512 835

45 © McGraw-Hill Higher Education. All rights reserved. Kappa Coefficient (K) Estimate of CRT R xx with correction for chance agreements. K = (P a - P c ) ÷ (1 - P c ) P a = Proportion of Agreement P c = Proportion of Agreement expected by chance P c = [(A+B)(A+C)+(C+D)(B+D)]÷(A+B+C+D) 2

46 © McGraw-Hill Higher Education. All rights reserved. Example for calculating K PassFail Day 2 Pass Fail Day 1 4512 835

47 © McGraw-Hill Higher Education. All rights reserved. K = (P a - P c ) ÷ (1 - P c ) P a =.80 PassFail Day 2 Pass Fail Day 1 4512 835

48 © McGraw-Hill Higher Education. All rights reserved. P c = [(A+B)(A+C)+(C+D)(B+D)]÷(A+B+C+D) 2 P c = [(45+12)(45+8)+(8+35)(12+35)]÷(100) 2 P c = [(57)(53)+(43)(47)]÷(10,000) = 5,042÷10,000 P c =.5042 PassFail Day 2 Pass Fail Day 1 4512 835

49 © McGraw-Hill Higher Education. All rights reserved. Kappa (K) K = (P a - P c ) ÷ (1 - P c ) K = (.80 -.5042) ÷ (1 -.5042) K =.597

50 © McGraw-Hill Higher Education. All rights reserved. Modified Kappa (Kq) Kq may be more appropriate than K when proportion of people passing a criterion- referenced test is not predetermined. Most situations in exercise science do not predetermine the number of people who will pass.

51 © McGraw-Hill Higher Education. All rights reserved. Modified Kappa (Kq) Kq = (P a – 1/q) ÷ (1 – 1/q) - q = number of classification categories - If pass-fail, q = 2 Kq = (.80 -.50) ÷ (1 -.50) Kq =.60

52 © McGraw-Hill Higher Education. All rights reserved. Modified Kappa Interpreted same as K. When proportion of masters =.50, Kq = K. Otherwise, Kq > K.

53 © McGraw-Hill Higher Education. All rights reserved. Interpretation of R xx for CRT P a (Proportion of Agreement) Affected by chance classifications Pa <.50 are unacceptable Pa should be >.80 in most situations. K and Kq (Kappa and Modified Kappa) Interpretable range: 0.0 to 1.0 Minimum acceptable value =.60

54 © McGraw-Hill Higher Education. All rights reserved. When reporting results: Report both indices of R xx.

55 © McGraw-Hill Higher Education. All rights reserved. Formative Evaluation of Chapter Objectives Define and differentiate between reliability and objectivity for norm-referenced tests. Identify factors that influence reliability and objectivity of norm-referenced test scores. Identify factors that influence reliability of criterion-referenced test scores. Select a reliable criterion score based on measurement theory.

56 © McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity


Download ppt "© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity."

Similar presentations


Ads by Google