M AKING A PPROPRIATE P ASS- F AIL D ECISIONS D WIGHT H ARLEY, Ph.D. DIVISION OF STUDIES IN MEDICAL EDUCATION UNIVERSITY OF ALBERTA.

M AKING A PPROPRIATE P ASS- F AIL D ECISIONS D WIGHT H ARLEY, Ph.D. DIVISION OF STUDIES IN MEDICAL EDUCATION UNIVERSITY OF ALBERTA

P ASSING S CORES Essential component of high stakes exams Essential component of high stakes exams Reaffirm standards Reaffirm standards Their purpose is to ensure that Their purpose is to ensure that qualified candidates pass qualified candidates pass unqualified candidates do not pass unqualified candidates do not pass How much is enough? How much is enough? Is 50% the passing score on this exam ? Is 50% the passing score on this exam ?

R EAFFIRMING S TANDARDS Performance standard Performance standard Minimally adequate level of performance to enter practice Minimally adequate level of performance to enter practice Passing score Passing score Point on the score scale which separates those who are successful and those who are not Point on the score scale which separates those who are successful and those who are not

T HE B ASIS F OR P ASSING S CORES Arbitrary judgment unavoidable Arbitrary judgment unavoidable Reflect consensus of experts on reasonable expectations for evidence of competence Reflect consensus of experts on reasonable expectations for evidence of competence Imposing discrete categories on a continuum Imposing discrete categories on a continuum Set to serve the interests of public and profession Set to serve the interests of public and profession Process should be as open as possible Process should be as open as possible Based on as much relevant data as possible Based on as much relevant data as possible Rationale presented as clearly as possible Rationale presented as clearly as possible

P ROCESS OF S ETTING P ASSING S CORES Unreasonable to expect 100% correct Unreasonable to expect 100% correct Possible to construct tests with predetermined passing scores Possible to construct tests with predetermined passing scores Possible to adjust passing scores to achieve an acceptable pass rate Possible to adjust passing scores to achieve an acceptable pass rate Possible to estimate a minimum passing score by combining estimates of the importance of individual test items Possible to estimate a minimum passing score by combining estimates of the importance of individual test items

P ASSING S CORE L EVEL Determined by the situation and purpose Determined by the situation and purpose Provide society with enough sufficiently competent practitioners Provide society with enough sufficiently competent practitioners Raising the passing score increases the average competence of those who pass but decreases their number Raising the passing score increases the average competence of those who pass but decreases their number Proportions passing should remain constant Proportions passing should remain constant The more relevant and demanding the requirements for writing the test, the fewer are expected to fail The more relevant and demanding the requirements for writing the test, the fewer are expected to fail If more than a small proportion of successful candidates fail the exam, its validity may be subject to serious challenge. If more than a small proportion of successful candidates fail the exam, its validity may be subject to serious challenge.

C RITERIA F OR D EFENSIBILITY A standard setting method should … produce appropriate classification information produce appropriate classification information be sensitive to candidate performance be sensitive to candidate performance be sensitive to instruction be sensitive to instruction be statistically sound be statistically sound identify the “true” standard identify the “true” standard be easy to implement and compute be easy to implement and compute be credible and easily interpretable by lay people be credible and easily interpretable by lay people

More than 3 dozen methods More than 3 dozen methods Some of the better known methods include Some of the better known methods include Nedelsky Nedelsky Angoff Angoff Bookmark Bookmark Ebel Ebel Jaeger Jaeger IRT methods IRT methods S TANDARD S ETTING M ETHODS 

“ T HE I NDUSTRY S TANDARD ” The Angoff Method is: the most commonly used method the most commonly used method convenient to use convenient to use well-researched well-researched easily explained easily explained easily customized easily customized applicable to several response formats applicable to several response formats

A NGOFF M ETHOD Judges assign probabilities that a hypothetical minimally competent borderline candidate will be able to answer each item correctly. Judges assign probabilities that a hypothetical minimally competent borderline candidate will be able to answer each item correctly. For each judge, probabilities are summed to get a minimum performance level (MPL) For each judge, probabilities are summed to get a minimum performance level (MPL) MPLs are averaged to get a final passing score MPLs are averaged to get a final passing score

M INIMALLY C OMPETENT The effectiveness of the Angoff method rests on the judges’ ability to accurately conceptualize a “minimally competent, borderline candidate.” The effectiveness of the Angoff method rests on the judges’ ability to accurately conceptualize a “minimally competent, borderline candidate.” Repeated references to a formal summary of the behaviours and performance indicators is required Repeated references to a formal summary of the behaviours and performance indicators is required Judge training and calibration are essential Judge training and calibration are essential

A NGOFF C ALCULATIONS Item Judge 1 Judge 2 11.000.85 20.650.50 30.800.75 40.450.50 50.300.40 MPL j 3.23.0 Passing score for this test is 3.1 items correct out of 5.

A M INOR V ARIANT Judges are asked to imagine a pool of 100 minimally competent borderline students and then estimate the number of these students who would answer the item correctly Judges are asked to imagine a pool of 100 minimally competent borderline students and then estimate the number of these students who would answer the item correctly Reduces cognitive complexity of the task Reduces cognitive complexity of the task

V ARIATIONS ON A T HEME Scales Scales Iterative process Iterative process Feedback between rounds Feedback between rounds Judges’ results Judges’ results Past item performance Past item performance p-values p-values % passing % passing Yes/No procedure Yes/No procedure

S CALES Probability scales are sometimes provided to simplify the process. For example: Probability scales are sometimes provided to simplify the process. For example: 5%, 20%, 40%, 60%, 75%, 90%, 95% 0%, 5%, 10%, 15% … 95%, 100% 20%, 25%, 30% … 95%, 100%

A NGOFF WITH I TERATION Most commonly used modification. Most commonly used modification. “Angoff-ing” is done a number of times. “Angoff-ing” is done a number of times. Time between rounds is used for discussion among judges. Time between rounds is used for discussion among judges. Intent is to reduce variability among judges on item estimates. Intent is to reduce variability among judges on item estimates.

N ORMATIVE D ATA Normative or impact data is presented just prior to the final iteration. Normative or impact data is presented just prior to the final iteration. Improves inter-rater reliability. Improves inter-rater reliability. Greatest impact on items that have been greatly over or underestimated. Greatest impact on items that have been greatly over or underestimated.

Y ES/ N O P ROCEDURE Judges decide whether or not a single minimally competent borderline student would or would not answer the item correctly Judges decide whether or not a single minimally competent borderline student would or would not answer the item correctly Attempt to simplify the cognitive complexity of the judges’ task Attempt to simplify the cognitive complexity of the judges’ task Comparable results to the traditional method Comparable results to the traditional method

Y ES/ N O C ALCULATIONS Item Judge 1 Judge 2 111 210 311 400 500 MPL j 32 Passing score = Average of MPLs = (3+2)/2 = 2.5 items correct

I N AN E MERGENCY When a committee is not available, Angoff-ing can be done solo When a committee is not available, Angoff-ing can be done solo Assign Angoff values to each item ands sum the values Assign Angoff values to each item ands sum the values Ask a colleague to review your Angoff assignments Ask a colleague to review your Angoff assignments Use an item analysis as a reality check Use an item analysis as a reality check

R OUNDING P ASSING S CORES Rarely do derived passing scores produce exact whole numbers Rarely do derived passing scores produce exact whole numbers Rounding may have an impact on the pass/fail rate Rounding may have an impact on the pass/fail rate Consider the consequences of rounding Consider the consequences of rounding

Questions?

M AKING A PPROPRIATE P ASS- F AIL D ECISIONS D WIGHT H ARLEY, Ph.D. DIVISION OF STUDIES IN MEDICAL EDUCATION UNIVERSITY OF ALBERTA.

Similar presentations

Presentation on theme: "M AKING A PPROPRIATE P ASS- F AIL D ECISIONS D WIGHT H ARLEY, Ph.D. DIVISION OF STUDIES IN MEDICAL EDUCATION UNIVERSITY OF ALBERTA."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

M AKING A PPROPRIATE P ASS- F AIL D ECISIONS D WIGHT H ARLEY, Ph.D. DIVISION OF STUDIES IN MEDICAL EDUCATION UNIVERSITY OF ALBERTA.

Similar presentations

Presentation on theme: "M AKING A PPROPRIATE P ASS- F AIL D ECISIONS D WIGHT H ARLEY, Ph.D. DIVISION OF STUDIES IN MEDICAL EDUCATION UNIVERSITY OF ALBERTA."— Presentation transcript:

Similar presentations

About project

Feedback