Standard setting Determining the pass mark - OSCEs.

Slides:



Advertisements
Similar presentations
Domain Based Communication Skills Assessment Pilot For EU Doctors Annie M Cushing 1, Jean S Ker 2, Paul Kinnersley 3, Anthony N Warrens 1, Olwyn M R Westwood.
Advertisements

Unit 16: Statistics Sections 16AB Central Tendency/Measures of Spread.
RESULTS In 2007, there were 243 candidates. A clearly discriminating station is shown in Figure 1. ANOVAR revealed eighteen stations in which there was.
Consistency in testing
Topics: Quality of Measurements
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Chapter 4 – Reliability Observed Scores and True Scores Error
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
Explaining Cronbach’s Alpha
EVAL 6970: Meta-Analysis Vote Counting, The Sign Test, Power, Publication Bias, and Outliers Dr. Chris L. S. Coryn Spring 2011.
S tructured O bjective C linical E xamination SOC E.
INFERENTIAL STATISTICS. Descriptive statistics is used simply to describe what's going on in the data. Inferential statistics helps us reach conclusions.
Methods for Estimating Reliability
1 Test a hypothesis about a mean Formulate hypothesis about mean, e.g., mean starting income for graduates from WSU is $25,000. Get random sample, say.
Topics: Inferential Statistics
Updates for Medicine & Surgery Exams in Year 5 Curriculum Retreat 2007 Comparing Outcomes from New and Old Curricula 8 September 2007 Paul B. S. Lai HBP.
Chapter 7 Estimation Procedures. Chapter Outline  A Summary of the Computation of Confidence Intervals  Controlling the Width of Interval Estimates.
Measurement: Reliability and Validity For a measure to be useful, it must be both reliable and valid Reliable = consistent in producing the same results.
Standard setting Determining the pass mark. The Old Way …..I think that the cut score in this is exam is probably about here….. there is a natural lacuna.
Research Methods in MIS
Training the OSCE Examiners
“There are three types of lies: Lies, Damn Lies and Statistics” - Mark Twain.
Assessment of clinical skills Joseph Cacciottolo Josanne Vassallo UNIVERSITY OF MALTA ANNUAL CONFERENCE - OSLO - MAY 2007.
11/08/ Individualisation-Standardisation 11/08/
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
Kaizen–What Can I Do To Improve My Program? F. Jay Breyer, Ph.D. Presented at the 2005 CLEAR Annual Conference September Phoenix,
Standard Setting for a Performance-Based Examination for Medical Licensure Sydney M. Smee Medical Council of Canada Presented at the 2005 CLEAR Annual.
Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Examples for the midterm. data = {4,3,6,3,9,6,3,2,6,9} Example 1 Mode = Median = Mean = Standard deviation = Variance = Z scores =
Analyzing and Interpreting Quantitative Data
Assessment in Education Patricia O’Sullivan Office of Educational Development UAMS.
Reliability & Validity
Research Methodology Lecture No :24. Recap Lecture In the last lecture we discussed about: Frequencies Bar charts and pie charts Histogram Stem and leaf.
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Objectives The student will be able to: find the variance of a data set. find the standard deviation of a data set.
School of Clinical Medicine School of Clinical Medicine UNIVERSITY OF CAMBRIDGE UK Council problems with OSCE assessment Jonathan Silverman 2012.
Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
Exam 2 Review. Example Information A researcher wants to know if using their flashcard program increases the number of items an individual can memorize.
Chapter 6: Analyzing and Interpreting Quantitative Data
T tests comparing two means t tests comparing two means.
Assessing Responsiveness of Health Measurements Ian McDowell, INTA, Santiago, March 20, 2001.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Standardized Testing. Basic Terminology Evaluation: a judgment Measurement: a number Assessment: procedure to gather information.
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
Training the Trainers Assessing the Learner Progress By Dr Malik Zaben IMET
Psychometrics: Exam Analysis David Hope
Oneway ANOVA comparing 3 or more means. Overall Purpose A Oneway ANOVA is used to compare three or more average scores. A Oneway ANOVA is used to compare.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
S tructured O bjective C linical E xamination P ractical.
VALIDITY What is validity? What are the types of validity? How do you assess validity? How do you improve validity?
Reliability and Validity
ASSESSMENT METHODS – Chapter 10 –.
Practice As part of a program to reducing smoking, a national organization ran an advertising campaign to convince people to quit or reduce their smoking.
Lecture 5 Validity and Reliability
Statistical Inference
Analyzing and Interpreting Quantitative Data
Journalism 614: Reliability and Validity
Clinical Assessment Dr. H
Reliability Module 6 Activity 5.
Calculating Reliability of Quantitative Measures
Power, Sample Size, & Effect Size:
מדינת ישראל הוועדה לאנרגיה אטומית
By ____________________
Changes to the Final FRCA Exam
Approach to OSCES.
The first test of validity
Some statistics questions answered:
Practice As part of a program to reducing smoking, a national organization ran an advertising campaign to convince people to quit or reduce their smoking.
Presentation transcript:

Standard setting Determining the pass mark - OSCEs

The Old Way (1) …..I think that the pass mark in this is exam is probably about here….. there is a natural break I can see dividing the bottom from the rest

The Old Way (2) …..The pass mark is 60%.

Assessments

Match Outcome Objectives

Assessments Match Outcome Objectives Integrated

Assessments Match Outcome Objectives Integrated Clinical Competencies tested throughout

Assessments Match Outcome Objectives Integrated Clinical Competencies tested throughout Progressive testing

Assessments Match Outcome Objectives Integrated Clinical Competencies tested throughout Progressive testing House style

Assessments Match Outcome Objectives Integrated Clinical Competencies tested throughout Progressive testing House style From approved list

Assessments Match Outcome Objectives Integrated Clinical Competencies tested throughout Progressive testing House style From approved list Student information

Assessments Match Outcome Objectives Integrated Clinical Competencies tested throughout Progressive testing House style From approved list Student information External Examiners

Standard setting KnowledgeSkills

Angoff Hofstee Borderline method KnowledgeSkills Standard setting

Borderline procedure Used for clinical (OSCE) examinations

Borderline procedure Used for clinical (OSCE) examinations Examiners score the students’s performance at the station e.g 17/20 Examiners judge the overall performance clear pass / borderline / clear fail Mark sheets rated borderline identified and the scores of borderline students averaged Process repeated for each station Calculate the median borderline score for all stations

Borderline Method

A 1 A 2 A 3 Pass/Fail Station 15 (static) Oral Lesions Score Count AAAA A A A AAAAAAAAAA A A A A A

Borderline group pass fail Median Borderline score 58.41

Borderline group pass fail Median Borderline score How wide should this band be?

Borderline group pass fail Median Borderline score How wide should this band be? +/- 1 - standard deviation - standard error - or what?

The Standard Error of Measurement depends on the reliability of the test (R) depends on the standard deviation of the test (SD) SEM = SD  1 - R acts as a confidence interval in high stakes situations

The Standard Error of Measurement depends on the reliability of the test (R) depends on the standard deviation of the test (SD) SEM = SD  1 - R acts as a confidence interval in high stakes situations R = Variance Variance x Error

Borderline group Pass Fail Median Borderline score Pass Score } - 1 SEM Fail score } + 1 SEM

Number of students Mean Score (%) Overall Borderline Score (%) Standard deviation Standard Error of Measurement Pass mark (%) Reliability

Internal Reliability of the Exam Cronbach’s Alpha Cronbach’s alpha Shows whether randomly split halves of the exam by item vary together Split half Correlations (SH) Item / Stations  Population (students) 

Borderline Grading System Borderline Fail Pass Good Pass Top Mark Borderline Mark Distinction Pass Mark Pass Fail

Reducing measurement error 1. Increase reliability of exam 2. Compromise with feasibility/cost * Content specificity * More items in the exam * Blueprinting * Quality assurance of item writing * Training examiners/standardised patients * Feedback from exam performance