Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reliability in assessment Cees van der Vleuten Maastricht University www.ceesvandervleuten.com Certificate Course on Assessment 6 May 2015.

Similar presentations


Presentation on theme: "Reliability in assessment Cees van der Vleuten Maastricht University www.ceesvandervleuten.com Certificate Course on Assessment 6 May 2015."— Presentation transcript:

1 Reliability in assessment Cees van der Vleuten Maastricht University www.ceesvandervleuten.com Certificate Course on Assessment 6 May 2015

2 Overview What is reliability conceptually? Evidence of the literature? How to improve reliability?

3 What is reliability? Correlation (r x,y )

4 What is reliability? High correlation (r x,y -> 1.0) Low correlation (r x,y -> 0.0)

5 Measurement influence

6 Reliability in achievement tests Test = item r = Split-half reliability coefficient,

7 Reliability in achievement tests Test = item r across all colours = Cronbach’s alpha

8 Reliability and test length Reliability Test length Spearman-Brown Prophecy formula Actual Predicted See: http://en.wikipedia.org/wiki/Spearman–Brown_prediction_formulahttp://en.wikipedia.org/wiki/Spearman–Brown_prediction_formula

9 Item-response theory Generalizability theory Three reliability theories Classical test theory Further reading: De Champlain, A. F. (2010). A primer on classical test theory and item response theory for assessments in medical education. Medical education, 44(1), 109-117. Bloch, R., & Norman, G. (2012). Generalizability theory for the perplexed: A practical introduction and guide: AMEE Guide No. 68. Medical teacher, 34(11), 960-992.

10

11 Overview What is reliability conceptually? Evidence of the literature? How to improve reliability?

12 Reliabilities across methods Testing Time in Hours 1 2 4 8 MCQ 1 0.62 0.77 0.87 0.93 Case- Based Short Essay 2 0.68 0.81 0.89 0.94 PMP 1 0.36 0.53 0.69 0.82 Oral Exam 3 0.50 0.67 0.80 0.89 Long Case 4 0.60 0.75 0.86 0.92 OSCE 5 0.54 0.70 0.82 0.90 Practice Video Assess- ment 7 0.62 0.77 0.87 0.93 1 Norcini et al., 1985 2 Stalenhoef-Halling et al., 1990 3 Swanson, 1987 4 Wass et al., 2001 5 Van der Vleuten, 1988 6 Norcini et al., 1999 In- cognito SPs 8 0.61 0.76 0.86 0.93 Mini CEX 6 0.73 0.84 0.92 0.96 7 Ram et al., 1999 8 Gorter, 2002 This table has been published in: Van Der Vleuten, C. P., & Schuwirth, L. W. (2005). Assessing professional competence: from methods to programmes. Medical education, 39(3), 309-317. See: http://www.ceesvandervleuten.com/publications/assessment-overviewshttp://www.ceesvandervleuten.com/publications/assessment-overviews

13 Reliability oral examination (Swanson, 1987) Testing Time in Hours 1 2 4 8 Two New Examiners for Each Case 0.61 0.76 0.86 0.93 New Examiner for Each Case 0.50 0.69 0.82 0.90 Same Examiner for All Cases 0.31 0.47 0.48 Number of Cases 2 4 8 12 Here multiple sources of error (cases, examiners) are combined in a single reliability estimate. This is the strength of generalizability theory.

14 Reliabilities across methods Testing Time in Hours 1 2 4 8 MCQ 1 0.62 0.77 0.87 0.93 Case- Based Short Essay 2 0.68 0.81 0.89 0.94 PMP 1 0.36 0.53 0.69 0.82 Oral Exam 3 0.50 0.67 0.80 0.89 Long Case 4 0.60 0.75 0.86 0.92 OSCE 5 0.54 0.70 0.82 0.90 Practice Video Assess- ment 7 0.62 0.77 0.87 0.93 1 Norcini et al., 1985 2 Stalenhoef-Halling et al., 1990 3 Swanson, 1987 4 Wass et al., 2001 5 Van der Vleuten, 1988 6 Norcini et al., 1999 In- cognito SPs 8 0.61 0.76 0.86 0.93 Mini CEX 6 0.73 0.84 0.92 0.96 7 Ram et al., 1999 8 Gorter, 2002 This table has been published in: Van Der Vleuten, C. P., & Schuwirth, L. W. (2005). Assessing professional competence: from methods to programmes. Medical education, 39(3), 309-317. See: http://www.ceesvandervleuten.com/publications/assessment-overviewshttp://www.ceesvandervleuten.com/publications/assessment-overviews

15 Checklist or rating scale reliability in OSCE 1 1 Van Luijk & van der Vleuten, 1990

16 The literature clearly suggests Reliability is a matter of sampling Across contexts Across assessors or any other factor influencing the assessment Objectivity is NOT the same as reliability Many subjective judgments make a robust judgment There are no intrinsically more reliable methods of assessment Most of our assessments in actual practice are not very reliable!

17

18 Overview What is reliability conceptually? Evidence of the literature? How to improve reliability?

19 Consequently…… One single measure is no measure Combine information Across time Across multiple measures Be aware of substantial false-positive and false-negative errors in a single measure.

20 Reliability Expected % false decisions 1,000 0,9510 0,8020 0,7025 0,6030 0,5033 0,0050

21 Finally…… Reliability and sampling are strongly related Objectification and standardization do not intrinsically lead to more reliability Do not objectify or standardize where it is not needed (e.g. when assessing complex skills in the real world).

22 This Powerpoint can be found at: www.ceesvandervleuten.com


Download ppt "Reliability in assessment Cees van der Vleuten Maastricht University www.ceesvandervleuten.com Certificate Course on Assessment 6 May 2015."

Similar presentations


Ads by Google