Presentation on theme: "Computer Based Testing of Medical Knowledge. Tom Mitchell, Nicola Aldridge Intelligent Assessment Technologies Ltd. Walter Williamson Faculty of Medicine,"— Presentation transcript:
Computer Based Testing of Medical Knowledge. Tom Mitchell, Nicola Aldridge Intelligent Assessment Technologies Ltd. Walter Williamson Faculty of Medicine, University of Dundee. Peter Broomhead Brunel University.
Project carried out in Medical School at Dundee University in autumn 2002 / spring Computerisation of an existing paper-based test of medical knowledge. Test comprised of 270 short-answer free-text items. Marking of the paper-based tests consumed unsustainable amount of faculty resources. Computer system developed and rolled-out for 2003 tests. Overview.
The GMC defines “core” knowledge which is essential for a medical student. The Medical School at Dundee has implemented this by teaching to 12 learning outcomes. Assessment of the course involves written and practical tests. The GMC review team rated Dundee “Excellent”, but also recommended a new assessment to improve student feedback and course audit : A Progress Test. Background
What is a Progress Test ? A comprehensive assessment of medical knowledge. Inform students about their year-on-year progress against learning outcomes. Highlight gaps in their knowledge, and their performance relative to their peers. At Dundee the Progress Test is administered annually throughout the five years of the undergraduate programme – each year group sits same test. Progress Tests.
Piloted in April / June Test designed by Professor M. Friedman. MCQ discounted : Testing recall of knowledge, not recognition. “A doctor does not get five choices.” Many US schools moving to open-ended format. The first test comprised 250 short-answer free- text items. Longer term aim is to build up a bank of items. The Dundee Progress Test.
Items are short-answer free-text text. What simple clinical test can distinguish between solid and cystic scrotal swellings ? Accept : Transillumination, shining light through swelling. Allow: Light goes through cyst. Don’t accept on own: shine light at/on/behind… Progress Test Items (1).
1 transillumination of the area with a light source in a darkened room, cystic lesions will transilluminate but solid ones wont. 1 shine a light through it - cystic lesions allow light through, solid lesions don't 1 Illumination - can light pass throught the swelling - cystic if it does 1 shine a torch behind the swelling. cystic swelling will transilluminate 1 using a torch to shine a light through the swelling 1 Tranillumination of the scrotum with a torch 1 trans illumination of the scrotal swelling 0 using a pen torch to illuminate the swelling 0 illumination of the swelling using a light source Progress Test Items (2). Free-text text responses…
150+ students per academic year, students in total. 3 hour test, 250 – 270 short-answer free-text items. Admin : Print, collation, etc. of different 30 page test booklets (items in different order), test admin, script storage etc. Marking : 800 scripts, 750 x 240 = 180,000 items to mark + data entry, rapid feedback required. Plus, moderation of marking guidelines required. Paper-Based Testing (1).
Moderation To achieve consistent marking, the marking guidelines must be moderated in light of real student responses. Approach at Dundee was to use Year 5 marking process to moderate marking guidelines. Group of senior academics mark Year 5, the resulting marking guidelines are used to mark all other years by a team of 6 markers. Paper-Based Testing (2).
Problems with the paper-test. Moderation. Script-by-script marking is tedious and inefficient way to moderate marking guidelines, and required significant time element from senior academics. Marking. ≈160 scripts per year group, a team of 6 markers can together mark around 15 scripts per hour. ≈ 30 man-days just to mark scripts. Admin. Data entry for 180,000 marks. Feedback. Due to the intensity of work required, timely feedback was not achieved. Conclusion : Paper-based progress test was “unsustainable”. Paper-Based Testing (3).
Computerised pilot ran in autumn 2002 : To assess the reaction of the students to a computerised progress test; To examine the accuracy of computerised marking for progress test items; To contribute towards defining the specification of a full system. The pilot system used IAT’s free-text marking engine, AutoMark (see 2002 CAA paper). A Computerised Pilot (1).
How do we mark free-text responses by computer ? IATs Marking Engine does not operate on raw text, but on the output of a sentence analyser. Computerised Marking
How do we represent the mark scheme ? Each mark scheme answer is represented as a template. Each template specifies one particular form of acceptable or unacceptable answer. A Computerised Mark Scheme
Computerised Mark Schemes.
The Pilot. Computerised mark schemes were developed for 25 items used in previous years’ progress tests. An online test comprising the items was delivered to approximately 30 students in November / December Student responses were computer marked, and the marking accuracy analysed. The error in computerised marking was ≈ 1%. Student feedback from the pilot was positive. A Computerised Pilot (2).
A Computerised Progress Test.
Computerised Marking (1).
Computerised Marking (2).
Computer- Assisted Moderation (1).
Computer- Assisted Moderation (2).
Subsequent to moderation of marking guidelines. Where necessary, computerised mark schemes were re-worked. Any outstanding tests were re-marked, and the results output. The re-worked computerised mark schemes are now considered “moderated”, and can be used to mark future tests with a high level of confidence. After Moderation.
The academics’ view : Being able to view all student responses to an item together is a major advantage. The process of moderation via computer is actually a positive experience for academics – could lead to better item writing. On-screen moderation was quicker than expected, responses could be scanned quickly, and most items required little input Computer-assisted moderation is a significant improvement over the previous “ordeal”. Conclusions on Moderation.
Data from Year 5 Moderation. 5.8% of marks changed by moderators. Most (4.2%) due to omissions in original marking guidelines or problems in item wording. Only 1.6% due to errors in computerised marking. After Re-Working the Comp. Mark Schemes. Agreement between moderated marks and computerised marking 99.4% for Year 5. 0.6% error due to system errors in marking engine. Accuracy of Marking (1).
Responses from 10 Year 2 and Year 3 students selected at random, and hand marked. Accuracy of Marking (2). Number of Students Affected Marks Gained / Lost by Hand Marking (0.37%) 1+2 (0.74%) Mean error from the sample was 0.22%, highest error 0.74%
As a further check, 4 Year 5 students chosen. Two who had unexpectedly over-performed, two who had unexpectedly under-performed. Responses hand marked. No discrepancies between human and computer marking encountered. Accuracy of Marking (3).
Hand-marking the progress test is onerous. 800 scripts, 270 items per script, a team of 6 markers can mark approx 15 scripts per hour. The error in hand marking has been measured at between 5% and 5.5% (two studies). This is comparable with unmoderated computerised marking (5.8%). Moderated computerised marking is significantly better - of the order of 1%. Human vs. Computerised Marking.
Advantages of the computerised system include: Moderation less painful, and more productive. After sample-based moderation, re-marking takes hours, not weeks of work. For this test, marking accuracy is actually improved. Production of reports automated, data entry not required. Moderated items can be re-used in future tests. Flexibility of test-taking is greatly increased. Conclusions (1).
The model of computerised marking and computer-assisted moderation can benefit CAA. Enables use of educationally valued free-text items. “Credibility-gap” addressed – marking can be checked and moderated on a sample of the cohort. Enables banks of moderated free-text items to be assembled. Moderation process benefits item-writing – better assessment, not just better CAA. Conclusions (2).
Project : Complete testing of remaining 150+ students. Add new items for next year’s tests. Technology : Enable item writers / academics to create, test, and modify computerised mark schemes. Integrate marking / moderation functionality with QuestionMark Perception. Future Work.