Presentation on theme: "Lessons learned in assessment"— Presentation transcript:
1Lessons learned in assessment History, Research and Practical ImplicationsCees van der VleutenMaastricht UniversityMHPE, Unit 1,3 June 2010Powerpoint at:
2Medical Education Anno 2008 Steep explosion of knowledge in medical education10 international journals and many national ones2 large international conferences, 2 large regional conferences and many national onesMix of practice, research and theoryMany training programmes, including masters and PhDsGroups of professionals appointed in medical schoolsA flourishing community of practice!
3Overview of presentation Where is education going?Lessons learned in assessmentAreas of development and research
5Where is education going? Underlying educational principles:Continuous learning of, or practicing with, authentic tasks (in steps of complexity; with constant attention to transfer)Integration of cognitive, behavioural and affective skillsActive, self-directed learning & in collaboration with othersFostering domain-independent skills, competencies (e.g. team work, communication, presentation, science orientation, leadership, professional behaviour….).
6Where is education going? ConstructivismCognitivepsychologyUnderlying educational principles:Continuous learning of, or practicing with, authentic tasks (in steps of complexity; with constant attention to transfer)Integration of cognitive, behavioural and affective skillsActive, self-directed learning & in collaboration with othersFostering domain-independent skills, competencies (e.g. team work, communication, presentation, science orientation, leadership, professional behaviour….).CollaborativelearningtheoryCognitiveloadtheoryEmpiricalevidence
7Where is education going? Work-based learningPractice, practice, practice….Optimising learning by:More reflective practiceMore structure in the haphazard learning processMore feedback, monitoring, guiding, reflection, role modellingFostering of learning culture or climateFostering of domain-independent skills (professional behaviour, team skills, etc).
8Where is education going? Work-based learningPractice, practice, practice….Optimising learning by:More reflective practiceMore structure in the haphazard learning processMore feedback, monitoring, guiding, reflection, role modellingFostering of learning culture or climateFostering of domain-independent skills (professional behaviour, team skills, etc).DeliberatePracticetheoryEmergingwork-based learning theoriesEmpiricalevidence
9Where is education going? Educational reform is on the agenda everywhereEducation is professionalizing rapidlyA lot of ‘educational technology’ is availableHow about assessment?
10Overview of presentation Where is education going?Lessons learned in assessmentAreas of development and research
11Miller’s pyramid of competence DoesLessons learned while climbing this pyramid with assessment technologyShows howKnows howKnowsMiller GE. The assessment of clinical skills/competence/performance. Academic Medicine (Supplement) 1990; 65: S63-S7.
12Assessing knowing how Does Shows how 60-ies: Knows how Knows how Written complexsimulations (PMPs)Knows howKnowsKnows
13Key findings written simulations (Van der Vleuten, 1995) Performance on one problem hardly predicted performance on anotherHigh correlations with simple MCQsExperts performed less well than intermediate expertsStimulus format more important than the response format
14Assessing knowing how Specific Lessons learned! Does Shows how Simple short scenario-based formats work best (Case & Swanson, 2002)Validity is a matter of good quality assurance around item construction (Verhoeven et al 1999)Generally, medical schools can do a much better job (Jozewicz et al 2002)Sharing of (good) test material across institutions is a smart strategy (Van der Vleuten et al 2004).DoesShows howKnows howKnows howKnows
15Moving from assessing knows What is arterial blood gas analysis most likely to show in patients with cardiogenic shock?A. Hypoxemia with normal pH B. Metabolic acidosis C. Metabolic alkalosis D. Respiratory acidosis E. Respiratory alkalosis
16To assessing knowing how A 74-year-old woman is brought to the emergency department because of crushing chest pain. She is restless, confused, and diaphoretic. On admission, temperature is 36.7 C, blood pressure is 148/78 mm Hg, pulse is 90/min, and resp are 24/min. During the next hour, she becomes increasingly stuporous, blood pressure decreases to 80/40 mm Hg, pulse increases to 120/min, and respirations increase to 40/min. Her skin is cool and clammy. An ECG shows sinus rhythm and 4 mm of ST segment elevation in leads V2 through V6. Arterial blood gas analysis is most likely to show:A. Hypoxemia with normal pH B. Metabolic acidosis C. Metabolic alkalosis D. Respiratory acidosis E. Respiratory alkalosis
18Maastricht item review process item analysesstudentcommentsanatomyphysiologyint medicinesurgerypsychologyreviewcommitteetestadministrationitem poolInfo to usersitem bankPre-test reviewPost-test review
19Assessing knowing how General Lessons learned! Does Shows how Competence is specific, not genericAssessment is as good as you are prepared to put into it.DoesShows howKnows howKnows howKnows
20Assessing showing how Does 70-ies: Shows how Shows how Performance assessmentin vitro (OSCE)Shows howKnows howKnows howKnows
21Key findings around OSCEs1 Performance on one station poorly predicted performance on another (many OSCEs are unreliable)Validity depends on the fidelity of the simulation (many OSCEs are testing testing fragmented skills in isolation)Global rating scales do well (improved discrimination across expertise groups; better intercase reliabilities; Hodges, 2003)OSCEs impacted on the learning of students1Van der Vleuten & Swanson, 1990
22Reliabilities across methods Case-BasedShortEssay20.680.730.840.82TestingTime inHours1248MCQ10.620.760.93PMP10.360.530.690.82OralExam30.500.690.820.90LongCase40.600.750.860.90OSCE50.470.640.780.881Norcini et al., 19852Stalenhoef-Halling et al., 19903Swanson, 19874Wass et al., 20015Petrusa, 2002
24Checklist or rating scale reliability in OSCE1 1Van Luijk & van der Vleuten, 1990
25Assessing showing how Specific Lessons learned! Does Shows how OSCE-ology (patient training, checklist writing, standard setting, etc.; Petrusa 2002)OSCEs are not inherently valid nor reliable, that depends on the fidelity of the simulation and the sampling of stations (Van der Vleuten & Swanson, 1990).DoesShows howShows howKnows howKnows
26Assessing showing how General Lessons learned! Does Shows how Objectivity is not the same as reliability (Van der Vleuten, Norman, De Graaff, 1991)Subjective expert judgment has incremental value (Van der Vleuten & Schuwirth, in prep)Sampling across content and jugdes/examiners is eminently importantAssessment drives learning.DoesShows howShows howKnows howKnows
27Assessing does 90-ies: Performance assessment in vivo by judging work samples (Mini-CEX, CBD,MSF, DOPS, Portfolio)DoesDoesShows howShows howKnows howKnows
28Key findings assessing does Ongoing work; this is where we currently areReliable findings point to feasible sampling (8-10 judgments seems to be the magical number; Williams et al 2003)Scores tend to be inflated (Govaerts et al 2007)Qualitative/narrative information is (more) useful (Govaerts et al 2007)Lots of work still needs to be doneHow (much) to sample across instruments?How to aggregate information?
29Reliabilities across methods Case-BasedShortEssay20.680.730.840.82MiniCEX60.730.840.920.96PracticeVideoAssess-ment70.620.760.93In-cognitoSPs80.610.760.920.93TestingTime inHours1248MCQ10.620.760.93PMP10.360.530.690.82OralExam30.500.690.820.90LongCase40.600.750.860.90OSCE50.470.640.780.881Norcini et al., 19852Stalenhoef-Halling et al., 19903Swanson, 19874Wass et al., 20015Petrusa, 20026Norcini et al., 19997Ram et al., 19998Gorter, 2002
30Assessing does Specific Lessons learned! Does Does Shows how Knows how Reliable sampling is possibleQualitative information carries a lot of weightAssessment impacts on work-based learning (more feedback, more reflection…)Validity strongly depends on the users of these instruments and therefore on the quality of implementation.DoesShows howKnows howKnows
31Assessing does General Lessons learned! Does Does Shows how Knows how Work-based assessment cannot replace standardised assessment (yet), or, no single measure can do it all (Tooke report, UK)Validity strongly depends on the implementation of the assessment (Govaerts et 2007)But, there is a definite place for (more subjective) expert judgment (Van der Vleuten & Schuwirth, under ed review).DoesShows howKnows howKnows
32Competency/outcome categorizations CanMedsrolesMedical expertCommunicatorCollaboratorManagerHealth advocateScholarProfessionalACGMEcompetenciesMedical knowledgePatient carePractice-based learning & improvementInterpersonal and communication skillsProfessionalismSystems-based practice
33Measuring the unmeasurable “Domainindependent”skillsDoesShows howKnows howKnows“Domain specific” skills
34Measuring the unmeasurable Importance of domain-independent skillsIf things go wrong in practice, these skills are often involved (Papadakis et 2005; 2008)Success in labour market is associated with these skills (Meng 2006)Practice performance is related to school performance (Padakis et al 2004).
36Measuring the unmeasurable Self assessmentPeer assessmentCo-assessment (combined self, peer, teacher assessment)Multisource feedbackLog book/diaryLearning process simulations/evaluationsProduct-evaluationsPortfolio assessment
37Eva, K. W., & Regehr, G. (2005). Self-assessment in the health professions: a reformulation and research agenda. Acad Med, 80(10 Suppl), S46-54.
38Falchikov, N. , & Goldfinch, J. (2000) Falchikov, N., & Goldfinch, J. (2000). Student peer assessment in higher education: A meta-analysiscomparing peer and teacher marks. Review of Educational Research, 70(3),
39Driessen, E. , van Tartwijk, J. , van der Vleuten, C. , & Wass, V Driessen, E., van Tartwijk, J., van der Vleuten, C., & Wass, V. (2007). Portfolios in medical education: why do they meetwith mixed success? A systematic review. Med Educ, 41(12),
40General lessons learned Competence is specific, not genericAssessment is as good as you are prepared to put into itObjectivity is not the same as reliabilitySubjective expert judgment has incremental valueSampling across content and judges/examiners is eminently importantAssessment drives learningNo single measure can do it allValidity strongly depends on the implementation of the assessment
41Practical implications Competence is specific, not genericOne measure is no measureIncrease sampling (across content, examiners, patients…) within measuresCombine information across measures and across timeBe aware of (sizable) false positive and negative decisionsBuild safeguards in examination regulations.
42Practical implications Assessment is as good as you are prepared to put into itTrain your staff in assessmentImplement quality assurance procedures around test constructionShare test material across institutionsReward good assessment and assessorsInvolve students as a source of quality assurance information
43Practical implications Objectivity is not the same as reliabilityDon’t trivialize the assessment (and compromise on validity) with unnecessary objectification and standardizationDon’t be afraid of holistic judgmentSample widely across sources of subjective influences (raters, examiners, patients)
44Practical implications Subjective expert judgment has incremental valueUse expert judgment for assessing complex skillsWho is an expert depends on assessment context (i.e. peer, patient, clerk, etc)Invite assessors to provide qualitative information or mediation of feedback
45Practical implications Sampling across content and judges/examiners is eminently importantUse efficient test designs: use single examiners per test item (question, essay, station, encounter…) and different examiners across itemsPsychometrically analyse sources of variance affecting the measurement to optimise the sampling plan and sample sizes needed
46Practical implications Assessment drives learningFor every evaluative action there is an educational reactionVerify and monitor the impact of assessment (evaluate the evaluation); many intended effects are not actually effective -> hidden curriculumNo assessment without feedback!Embed the assessment within the learning programme (cf. Wilson, M., & Sloane, K. (2000). From principles to practice: An embedded assessment system. Applied Measurement in Education, 13(2), )Use the assessment strategically to reinforce desirable learning behaviours
47Practical implications No single measure can do it allUse a cocktail of methods across the competency pyramidArrange methods in a programme of assessmentAny method may have utility (including the ‘old’ assessment methods depending on its utility within the programme)Compromises on the quality of methods should be made in light of its function in the programmeCompare assessment design with curriculum designResponsible people/committee(s)Use an overarching structureInvolve your stakeholdersImplement, monitor and change (assessment programmes ‘wear out’)
48Practical implications Validity strongly depends on the implementation of the assessmentPay special attention to implementation (good educational ideas often fail due to implementation problems)Involve your stakeholders in the design of the assessmentMany naive ideas exist around assessment; train and educate your staff and students.
49Overview of presentation Where is education going?Where are we with assessment?Where are we going with assessment?Conclusions
50Areas of development and research Understanding expert judgment
51Understanding human judgment How does the mind work of expert judges?How is it influenced?Link between clinical expertise and judgment expertise?Clash between psychology literature on expert judgment and psychometric research.
52Areas of development and research Understanding expert judgmentBuilding non-psychometric rigour into assessment
53Qualitative methodology as an inspiration Strategies for establishingtrustworthiness:• Prolonged engagement• Triangulation• Peer examination• Member checking• Structural coherence• Time sampling• Stepwise replication• Dependability audit• Thick description• Confirmability auditProcedural measures andsafeguards:• Assessor training & benchmarking• Appeal procedures• Triangulation across sources,saturation• Assessor panels• Intermediate feedback cycles• Decision justification• Moderation• Scoring rubrics• ……….Quantitative QualitativeCriterion approach approachTruth value Internal validity CredibilityApplicability External validity TransferabilityConsistency Reliability DependabilityNeutrality Objectivity Confirmability
54Driessen, E. W. , Van der Vleuten, C. P. M. , Schuwirth, L. W. T Driessen, E. W., Van der Vleuten, C. P. M., Schuwirth, L. W. T., Van Tartwijk, J., & Vermunt, J. D. (2005).The use of qualitative research criteria for portfolio assessment as an alternative to reliability evaluation: a case study.Medical Education, 39(2),
55Areas of development and research Understanding expert judgmentBuilding non-psychometric rigour into assessmentConstruction and governance of assessment programmes (Van der Vleuten 2005)
56Assessment programmes How to design assessment programmes?Strategies for governance (implementation, quality assurance)?How to aggregate information for decision making? When is enough enough?
57A model for designing programmes1 1Dijkstra, J. et al, in preparation.
58Areas of development and research Understanding expert judgmentBuilding non-psychometric rigour into assessmentConstruction and governance of assessment programmesUnderstanding and using assessment impacting learning
59Assessment impacting learning Lab studies convincingly show tests improve retention and performance (Larsen et al., 2008)Relatively little empirical research supporting educational practiceAbsence of theoretical insights.
60Theoretical model under construction1 CONSEQUENCES OF IMPACTTheoretical model under construction1Cognitive processing strategieschoiceeffortpersistenceDr Hanan Al-KadriMetacognitive regulation strategieschoiceeffortpersistenceOUTCOMES OF LEARNINGDETERMINANTS OF ACTIONimpact appraisallikelihoodseverityresponse appraisalefficacycostsvalueperceived agencyinterpersonal factorsnormative beliefsmotivation to complySOURCES OF IMPACTAssessmentassessment strategyassessment taskvolume of assessable materialsamplingcuesindividual assessor1Cilliers, F. in preparation.
61Areas of development and research Understanding expert judgmentBuilding non-psychometric rigour into assessmentConstruction and governance of assessment programmesUnderstanding and using assessment impacting learningUnderstanding and using qualitative information.
62Understanding and using qualitative information Assessment is dominated by the quantitative discourse (Hodges 2006)How to improve the use of qualitative information?How to aggregate qualitative information?How to combine qualitative and quantitative information?How to use expert judgment here?
63FinallyAssessment in medical education has a rich history of research and development with clear practical implications (we’ve covered some ground in 40 yrs!)We are moving beyond the psychometric discourse into an educational design discourseWe are starting to measure the unmeasurableExpert human judgment is reinstated as an indispensable source of information both at the method level as well as at the programmatic levelLots of exciting developments lie still ahead of us!
64This presentation can be found at: www. fdg. unimaas This presentation can be found at:
65LiteratureCillier, F. (In preparation). Assessment impacts on learning, you say? Please explain how. The impact of summative assessment on how medical students learn.Driessen, E., van Tartwijk, J., van der Vleuten, C., & Wass, V. (2007). Portfolios in medical education: why do they meet with mixed success? A systematic review. Med Educ, 41(12),Driessen, E. W., Van der Vleuten, C. P. M., Schuwirth, L. W. T., Van Tartwijk, J., & Vermunt, J. D. (2005). The use of qualitativeresearch criteria for portfolio assessment as an alternative to reliability evaluation: a case study. Medical Education, 39(2),Dijkstra, J. , Schuwirth, L. & Van der Vleuten (In preparation) A model for designing assessment programmes.Eva, K. W., & Regehr, G. (2005). Self-assessment in the health professions: a reformulation and research agenda. Acad Med, 80(10 Suppl), S46-54.Gorter, S., Rethans, J. J., Van der Heijde, D., Scherpbier, A., Houben, H., Van der Vleuten, C., et al. (2002). Reproducibility of clinical performance assessment in practice using incognito standardized patients. Medical Education, 36(9),Govaerts, M. J., Van der Vleuten, C. P., Schuwirth, L. W., & Muijtjens, A. M. (2007). Broadening Perspectives on Clinical Performance Assessment: Rethinking the Nature of In-training Assessment. Adv Health Sci Educ Theory Pract, 12,Hodges, B. (2006). Medical education and the maintenance of incompetence. Med Teach, 28(8),Jozefowicz, R. F., Koeppen, B. M., Case, S. M., Galbraith, R., Swanson, D. B., & Glew, R. H. (2002). The quality of in-house medical school examinations. Academic Medicine, 77(2),Meng, C. (2006). Discipline-specific or academic ? Acquisition, role and value of higher education competencies., PhD Dissertation, Universiteit Maastricht, Maastricht.Norcini, J. J., Swanson, D. B., Grosso, L. J., & Webster, G. D. (1985). Reliability, validity and efficiency of multiple choice question and patient management problem item formats in assessment of clinical competence. Medical Education, 19(3),Papadakis, M. A., Hodgson, C. S., Teherani, A., & Kohatsu, N. D. (2004). Unprofessional behavior in medical school is associated with subsequent disciplinary action by a state medical board. Acad Med, 79(3),Papadakis, M. A., A. Teherani, et al. (2005). "Disciplinary action by medical boards and prior behavior in medical school." N Engl J Med 353(25):Papadakis, M. A., G. K. Arnold, et al. (2008). "Performance during internal medicine residency training and subsequent disciplinary action by state licensing boards." Annals of Internal Medicine 148:
66LiteraturePetrusa, E. R. (2002). Clinical performance assessments. In G. R. Norman, C. P. M. Van der Vleuten & D. I. Newble (Eds.), International Handbook for Research in Medical Education (pp ). Dordrecht: Kluwer Academic Publisher.Ram, P., Grol, R., Rethans, J. J., Schouten, B., Van der Vleuten, C. P. M., & Kester, A. (1999). Assessment of general practitioners by video observation of communicative and medical performance in daily practice: issues of validity, reliability and feasibility. Medical Education, 33(6),Stalenhoef- Halling, B. F., Van der Vleuten, C. P. M., Jaspers, T. A. M., & Fiolet, J. B. F. M. (1990). A new approach to assessign clinical problem-solving skills by written examination: Conceptual basis and initial pilot test results. Paper presented at the Teaching and Assessing Clinical Competence, Groningen.Swanson, D. B. (1987). A measurement framework for performance-based tests. In I. Hart & R. Harden (Eds.), Further developments in Assessing Clinical Competence (pp ). Montreal: Can-Heal publications.van der Vleuten, C. P., Schuwirth, L. W., Muijtjens, A. M., Thoben, A. J., Cohen-Schotanus, J., & van Boven, C. P. (2004). Cross institutional collaboration in assessment: a case on progress testing. Med Teach, 26(8),Van der Vleuten, C. P. M., & D. Swanson, D. (1990). Assessment of Clinical Skills With Standardized Patients: State of the Art. Teaching and Learning in Medicine, 2(2),Van der Vleuten, C. P. M., & Newble, D. I. (1995). How can we test clinical reasoning? The Lancet, 345,Van der Vleuten, C. P. M., Norman, G. R., & De Graaff, E. (1991). Pitfalls in the pursuit of objectivity: Issues of reliability. Medical Education, 25,Van der Vleuten, C. P. M., & Schuwirth, L. W. T. (2005). Assessment of professional competence: from methods to programmes. Medical Education, 39,Van der Vleuten, C. P. M., & Schuwirth, L. W. T. (Under editorial review). On the value of (aggregate) human judgment. Med Educ.Van Luijk, S. J., Van der Vleuten, C. P. M., & Schelven, R. M. (1990). The relation between content and psychometric characteristics in performance-based testing. In W. Bender, R. J. Hiemstra, A. J. J. A. Scherp bier & R. P. Zwierstra (Eds.), Teaching and Assessing Clinical Competence. (pp ). Groningen: Boekwerk Publications.Wass, V., Jones, R., & Van der vleuten, C. (2001). Standardized or real patients to test clinical competence? The long case revisited. Medical Education, 35,Williams, R. G., Klamen, D. A., & McGaghie, W. C. (2003). Cognitive, social and environmental sources of bias in clinical performance ratings. Teaching and Learning in Medicine, 15(4),