Presentation on theme: "Marks for identifying uncertainty: Stimulation of learning through Certainty-Based Marking Tony Gardner-Medwin Physiology, University College London www.ucl.ac.uk/LAPT."— Presentation transcript:
Marks for identifying uncertainty: Stimulation of learning through Certainty-Based Marking Tony Gardner-Medwin Physiology, University College London Cambridge Assessment June 2009
Starting points (you may agree or disagree!) The nature of assessment affects how students learn & think Objective tests/exercises can stimulate learning & understanding Formative assessment is more important than summative Different Q types suit different situations, e.g. T/F, SBA, free text Scaling to % above chance (%Knowledge) should be universal Negative marking can be either really constructive or really awful Students & kids can enjoy assessment if it is stimulating, fair, varied, challenging, immediately rewarding, not humiliating -- like a game. We should reward the acknowledgment of uncertainty The take home message:
1.How Certainty-Based Marking works 2.How it relates to probability & knowledge 3.How students react & use it 4.CBM as summative assessment 5.Why isnt it used more?
How well do students discriminate reliability ?
ü knowledge ü uncertainty don't know û misconception û delusion Decreasing certainty about what is true. Increasing certainty about something false. Increasing "ignorance" Ordinary words we use to describe Knowledge Knowledge is a function of certainty (confidence, degree of belief) There are states a lot worse than acknowledged ignorance "It's not ignorance does so much damage - it's knowin' so derned much that ain't so." attrib J. Billings I was gratified to be able to answer promptly, and I did ! - I said I didn't know. Mark Twain
You need to know the reliability of your knowledge to use it Confident errors are serious, requiring attention to explanations Expressing uncertainty when you are uncertain is a good thing Confidence is about understanding why things cannot be otherwise, not about personality if over- or under-confident, you must calibrate through practice reflection and justification are essential study habits Student Learning: Principles they readily understand In evaluation surveys, a majority of students have always said they like CBM, finding it useful and fair. They asked to include it in exams, and after 5yrs exam use at UCL they voted 52% : 30% to retain it (in 2005/6), though this was rejected by the conservative medical establishment.
Cheap information (& increased teamwork) require :- 1)Identifying things you will get wrong and not Google!unknown unknowns rather than don't knows 2) Judging reliability and uncertainty correctly.... setting a threshold for seeking help.... evaluating conflicting and corroborating information Why test knowledge? Google makes it so easy to find ! These lessons are core things that CBM teaches In olden times, you had to rely on your own stored information.... you would make a best choice and go for it School leavers have more sparse (though broader) stored info, but still have a go for it culture - to a scary extent!.... responding with an immediate idea & not thinking much
Certainty (Degree of Belief) Choice ? ? ? ? ? ? ?? EVIDENCEEVIDENCE Nuggets of knowledge Inference Network of Understanding CBM places greater demands on justification & stimulates connections To understand = to link correctly the facts that bear on an issue. Thinking about uncertainty / justification develops understanding of relationships
Using CBM 1.With UCL LAPT software, online or from CD 2. With Moodle - work in progress 3. With commercial software – some progress, more needed! 4. Secure exams, with OMR Cards [Speedwell]
The student loses about 3 marks per 'bit' of ignorance - up to a maximum of 3 bits CBM quite closely follows the ideal ignorance measure
No negative marking %50%100% Mark expected on average Confidence (est'd prob'y correct) reply no reply %50%100% Mark expected on average Confidence (est'd prob'y correct) reply no reply Fixed negative marking: +/ %50%100% Mark expected on average Confidence (est'd prob'y correct) high no reply Hevner 1932 mid low 50% Whats a good mark scheme? %50%100% Mark expected on average Confidence (est'd prob'y correct) high no reply Davies 2002 mid low %50%100% Mark expected on average Confidence (est'd prob'y correct) high no reply Hassmen & Hunt 94 mid low 35% 55% 67% 85% min max Gardner-Medwin06 The standard LAPT (1,2,3 / 0,-2,-6) scheme seems better than any of these.
CBM increases the reliability of exam data 'Reliability' indicates to what extent a score measures something about the student's ability, as opposed to 'luck' or chance.
CBM increases the effective test length With increased 'Reliability' you don't need so many exam questions to get data of equal quality.
CBM increases the reliability of exam data with True/False Questions 'Reliability' indicates to what extent a score measures something about the student's ability, as opposed to 'luck' or chance. To achieve these increases using only % correct would have required on average 58% more questions.
Reliability and efficiency of exams (Quality of data / number of questions) are enhanced with CBM Data from 6 medical student exams ( T/F Qs each, >300 students).
Certainty-based scores predict the conventional score on different Qs better than conventional scores do.
How should one handle students with poor calibration? Significantly overconfident in exam: 2 students (1%) e.g. 50% Significantly underconfident in exam:41 students (14%) e.g. 83% Maybe one shouldnt penalise such students Adjusted confidence-based score: Mark the set of answers at each C level as if they were entered at the C level that gives the highest score**. mean benefit = 1.5% ± 2.1% (median 0.6%) ** (first combining sets if %correct is not in ascending order)
Scaling CBM scores to be directly comparable with conventional scores NCOR is based on number correct, scaled so guesses (50% proby correct) give on average 0%. (% Knowledge)
True/False and SBA (5 option) components of a formative test for 345 students were ranked by conventional scores. Then for each decile, mean CBS scores are plotted against % correct above chance (% knowledge). Equivalence of **scaled CBM scores and conventional scores for standard setting. Gardner-Medwin & Curtin 2007 REAP conference, data from Imperial College **CBS = ( (Total-Chance)/(Max-Chance) ) p × 100%, where p = 0.6 for TF, 0.48 for SBA (5opt)
Why doesn't everybody already use CBM ? - a puzzle Enthusiasm was exhausted before the age of 'online' Some CBM methods were complex, opaque or non-motivating Reluctance to treat certainty as integral to knowledge Mistaken worries about 'personality bias' Under-rating of self-assessment & practice as learning tools Worry that CBM would need new questions Worry that CBM would upset standard-setting Inertia and vested interests
A few of the names associated with confidence testing in education Andrew Ahlgren Jim Bruno Confucius Robert Ebel Jack Good Kate Hevner Darwin Hunt Dieudonné Leclercq Emir Shuford London Colleagues: Mike Gahan David Bender Nancy Curtin When you know a thing, to hold that you know it. And when you do not know a thing, to allow that you do not know it. This is knowledge. Learning without thought is a waste of time. Confucius
We fail if we mark a lucky guess as if it were knowledge. We fail if we mark misconceptions as no worse than ignorance.
Lessons from experience with CBM Practice is needed before use in exams Exams should re-use questions from an open database only very sparingly Over-confidence and diffidence are both unhealthy traits that can be moderated by practice to achieve good calibration With multi-option questions, students tend (at least initially) to over-estimate reliability Standard setting - it is easy (but important!) to scale CBM marks to match familiar scales based on number correct.
Some Questions about CBM ! Are there problems using it? Why doesn't my VLE support CBM? Do students need practice? Isn't computer marked assessment just factual? Does CBM increase retention? Do I need new questions? What are the best Q types? What about school education? Is it relevant to my subject, where opinions differ? Isn't it bad to encourage guessing? What if my only assessments are exams? How do I convince an exam board? Isn't it right/wrong that really matters?
Response to LAPT numeracy exercises in medical 1st year
"I think about confidence assessment 0% 10% 20% 30% 40% 50% Every TimeMost of the time RarelyNeverNo reply "I sometimes change my answer while thinking about confidence assessment" Disagree 1234Agree 5 %
thinking about the basis and reliability of answers can help tie bits of knowledge together (to form understanding) checking an answer and rereading the question are worthwhile sound confidence judgement is a valued intellectual skill in every context, and one they can improve immediate feedback while still thinking about the basis of your answer is a hugely valuable study aid confident errors are far worse than acknowledged ignorance and are a wake-up call (-6!) to pay attention to explanations expressing uncertainty when you are uncertain is a good thing Principles that students seem readily to understand :- both under- and over- confidence are impediments to learning Students really take to confidence-based marking
The correlation, across students, between scores on one set of questions and another is higher for confidence than for simple scores. But perhaps they are just measuring ability to handle confidence ? No. Confidence scores are better than simple scores at predicting even the conventional scores on a different set of questions. This can only be because they are a statistically more efficient measure of knowledge.
When you know a thing, to hold that you know it. And when you do not know a thing, to allow that you do not know it. This is knowledge. Confucius Known Knowns... things we know that we know. Known Unknowns... things that we know that we don't know. Unknown Unknowns... things we do not know we don't know. Donald Rumsfeld
Will it snow next weekend? Does a (good) weather forecaster have knowledge? - obviously yes, but expressed through a probability How can you measure and reward this knowledge? - the origin of CBM >100 years ago. Does insulin raise blood glucose levels? Similar, even though the Q is not about a probability. - the probability is your certainty that your answer is right The key is to have a "proper" or "motivating" reward scheme, which ensures that the person does best by expressing their true level of uncertainty
CBM data is a more valid measure of ability 'Validity' means it measures what you want, rather than just something easily measured.
Why CBM? Get students to think more carefully Reward recognition of uncertainty, either personal or in a group Highlight misconceptions Engage students more - the game element of CBM Encourage criticism of Qs (intolerance of ambiguity or looseness) In general: enhance self-assessment as a learning experience NB All of the above arise with little or no practice with CBM. The following do require practice : More searching diagnostic data More valid and reliable assessment data (But NB with CBM you have conventional assessment data too.) SUMMARY