M AKING A PPROPRIATE P ASS- F AIL D ECISIONS D WIGHT H ARLEY, Ph.D. DIVISION OF STUDIES IN MEDICAL EDUCATION UNIVERSITY OF ALBERTA.

Slides:



Advertisements
Similar presentations
Assessing Student Performance
Advertisements

Principles of Standard Setting
Developing an Outcomes Assessment Plan. Part One: Asking a Meaningful Question OA is not hard science as we are doing it. Data that you collect is only.
Standardized Scales.
Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Designing Scoring Rubrics. What is a Rubric? Guidelines by which a product is judged Guidelines by which a product is judged Explain the standards for.
Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.
Validity In our last class, we began to discuss some of the ways in which we can assess the quality of our measurements. We discussed the concept of reliability.
Medical school attendedPassing grade Dr JohnNorthsouth COM (NSCOM)80% Dr SmithEastwest COM (EWCOM)50% Which of these doctors would you like to treat you?
Standard Setting for Professional Certification Brian D. Bontempo Mountain Measurement, Inc. (503) ext 129.
Strategic Staffing Chapter 9 – Assessing External Candidates
Workplace-based Assessment. Overview Types of assessment Assessment for learning Assessment of learning Purpose of WBA Benefits of WBA Miller’s Pyramid.
1 New England Common Assessment Program (NECAP) Setting Performance Standards.
MODULE 3 1st 2nd 3rd. The Backward Design Learning Objectives What is the purpose of doing an assessment? How to determine what kind of evidences to.
Setting Performance Standards Grades 5-7 NJ ASK NJDOE Riverside Publishing May 17, 2006.
VALIDITY.
New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Standard Setting Inclusive Assessment Seminar Marianne.
SETTING & MAINTAINING EXAM STANDARDS Raja C. Bandaranayake.
Standard Setting Different names for the same thing Standard Passing Score Cut Score Cutoff Score Mastery Level Bench Mark.
Setting Alternate Achievement Standards Prepared by Sue Rigney U.S. Department of Education NCEO Teleconference March 21, 2005.
SETTING & MAINTAINING EXAM STANDARDS
C R E S S T / U C L A Improving the Validity of Measures by Focusing on Learning Eva L. Baker CRESST National Conference: Research Goes to School Los Angeles,
Examing Rounding Rules in Angoff Type Standard Setting Methods Adam E. Wyse Mark D. Reckase.
Research Methods in MIS
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Classroom Assessment A Practical Guide for Educators by Craig A
McGraw-Hill © 2006 The McGraw-Hill Companies, Inc. All rights reserved. The Nature of Research Chapter One.
Standard Setting Methods with High Stakes Assessments Barbara S. Plake Buros Center for Testing University of Nebraska.
Standard Setting for a Performance-Based Examination for Medical Licensure Sydney M. Smee Medical Council of Canada Presented at the 2005 CLEAR Annual.
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
1 Establishing A Passing Standard Paul D. Naylor, Ph.D. Psychometric Consultant.
Classroom Assessments Checklists, Rating Scales, and Rubrics
Classroom Assessment A Practical Guide for Educators by Craig A
1 New England Common Assessment Program (NECAP) Setting Performance Standards.
 Closing the loop: Providing test developers with performance level descriptors so standard setters can do their job Amanda A. Wolkowitz Alpine Testing.
Grading and Reporting Chapter 15
Understanding Meaning and Importance of Competency Based Assessment
Assessment in Education Patricia O’Sullivan Office of Educational Development UAMS.
Classroom Assessment Literacy and What you need to know to do it well!
Creating Assessments The three properties of good assessments.
International Diabetes Federation (IDF) East Mediterranean and Middle East Region (EMME) Workshop on Professional Educational Methodology in Diabetes
For ABA Importance of Individual Subjects Enables applied behavior analysts to discover and refine effective interventions for socially significant behaviors.
Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.
Chapter 7 Utility. Utility Analysis What is a Utility Analysis? Some Practical Considerations –The pool of job applicants –The complexity of the job –The.
JS Mrunalini Lecturer RAKMHSU Data Collection Considerations: Validity, Reliability, Generalizability, and Ethics.
Using the Many-Faceted Rasch Model to Evaluate Standard Setting Judgments: An IllustrationWith the Advanced Placement Environmental Science Exam Pamela.
Georgia will lead the nation in improving student achievement. 1 Georgia Performance Standards Day 3: Assessment FOR Learning.
Critical Issues Related to ELL Accommodations Designed for Content Area Assessments The University of Central Florida Cocoa Campus Jamal Abedi University.
Principal Component Analysis
Unraveling the Mysteries of Setting Standards and Scaled Scores Julie Miles PhD,
NAEP Achievement Levels Michael Ward, Chair of COSDAM Susan Loomis, Assistant Director NAGB Christina Peterson, Project Director ACT.
Instructional Leadership Supporting Common Assessments.
Setting Performance Standards EPSY 8225 Cizek, G.J., Bunch, M.B., & Koons, H. (2004). An NCME Instructional Module on Setting Performance Standards: Contemporary.
Designing Scoring Rubrics
Classroom Assessments Checklists, Rating Scales, and Rubrics
Jean-Guy Blais Université de Montréal
CLEAR 2011 Annual Educational Conference
Classroom Assessment A Practical Guide for Educators by Craig A
Introduction to the Validation Phase
Classroom Assessments Checklists, Rating Scales, and Rubrics
UMDNJ-New Jersey Medical School
Standard Setting for NGSS
Testing Writing Rio Darmasetiawan
Standard Setting Zagreb, July 2009.
Managerial Decision Making and Evaluating Research
Deanna L. Morgan The College Board
Presentation transcript:

M AKING A PPROPRIATE P ASS- F AIL D ECISIONS D WIGHT H ARLEY, Ph.D. DIVISION OF STUDIES IN MEDICAL EDUCATION UNIVERSITY OF ALBERTA

P ASSING S CORES Essential component of high stakes exams Essential component of high stakes exams Reaffirm standards Reaffirm standards Their purpose is to ensure that Their purpose is to ensure that qualified candidates pass qualified candidates pass unqualified candidates do not pass unqualified candidates do not pass How much is enough? How much is enough? Is 50% the passing score on this exam ? Is 50% the passing score on this exam ?

R EAFFIRMING S TANDARDS Performance standard Performance standard Minimally adequate level of performance to enter practice Minimally adequate level of performance to enter practice Passing score Passing score Point on the score scale which separates those who are successful and those who are not Point on the score scale which separates those who are successful and those who are not

T HE B ASIS F OR P ASSING S CORES Arbitrary judgment unavoidable Arbitrary judgment unavoidable Reflect consensus of experts on reasonable expectations for evidence of competence Reflect consensus of experts on reasonable expectations for evidence of competence Imposing discrete categories on a continuum Imposing discrete categories on a continuum Set to serve the interests of public and profession Set to serve the interests of public and profession Process should be as open as possible Process should be as open as possible Based on as much relevant data as possible Based on as much relevant data as possible Rationale presented as clearly as possible Rationale presented as clearly as possible

P ROCESS OF S ETTING P ASSING S CORES Unreasonable to expect 100% correct Unreasonable to expect 100% correct Possible to construct tests with predetermined passing scores Possible to construct tests with predetermined passing scores Possible to adjust passing scores to achieve an acceptable pass rate Possible to adjust passing scores to achieve an acceptable pass rate Possible to estimate a minimum passing score by combining estimates of the importance of individual test items Possible to estimate a minimum passing score by combining estimates of the importance of individual test items

P ASSING S CORE L EVEL Determined by the situation and purpose Determined by the situation and purpose Provide society with enough sufficiently competent practitioners Provide society with enough sufficiently competent practitioners Raising the passing score increases the average competence of those who pass but decreases their number Raising the passing score increases the average competence of those who pass but decreases their number Proportions passing should remain constant Proportions passing should remain constant The more relevant and demanding the requirements for writing the test, the fewer are expected to fail The more relevant and demanding the requirements for writing the test, the fewer are expected to fail If more than a small proportion of successful candidates fail the exam, its validity may be subject to serious challenge. If more than a small proportion of successful candidates fail the exam, its validity may be subject to serious challenge.

C RITERIA F OR D EFENSIBILITY A standard setting method should … produce appropriate classification information produce appropriate classification information be sensitive to candidate performance be sensitive to candidate performance be sensitive to instruction be sensitive to instruction be statistically sound be statistically sound identify the “true” standard identify the “true” standard be easy to implement and compute be easy to implement and compute be credible and easily interpretable by lay people be credible and easily interpretable by lay people

More than 3 dozen methods More than 3 dozen methods Some of the better known methods include Some of the better known methods include Nedelsky Nedelsky Angoff Angoff Bookmark Bookmark Ebel Ebel Jaeger Jaeger IRT methods IRT methods S TANDARD S ETTING M ETHODS 

“ T HE I NDUSTRY S TANDARD ” The Angoff Method is: the most commonly used method the most commonly used method convenient to use convenient to use well-researched well-researched easily explained easily explained easily customized easily customized applicable to several response formats applicable to several response formats

A NGOFF M ETHOD Judges assign probabilities that a hypothetical minimally competent borderline candidate will be able to answer each item correctly. Judges assign probabilities that a hypothetical minimally competent borderline candidate will be able to answer each item correctly. For each judge, probabilities are summed to get a minimum performance level (MPL) For each judge, probabilities are summed to get a minimum performance level (MPL) MPLs are averaged to get a final passing score MPLs are averaged to get a final passing score

M INIMALLY C OMPETENT The effectiveness of the Angoff method rests on the judges’ ability to accurately conceptualize a “minimally competent, borderline candidate.” The effectiveness of the Angoff method rests on the judges’ ability to accurately conceptualize a “minimally competent, borderline candidate.” Repeated references to a formal summary of the behaviours and performance indicators is required Repeated references to a formal summary of the behaviours and performance indicators is required Judge training and calibration are essential Judge training and calibration are essential

A NGOFF C ALCULATIONS Item Judge 1 Judge MPL j Passing score for this test is 3.1 items correct out of 5.

A M INOR V ARIANT Judges are asked to imagine a pool of 100 minimally competent borderline students and then estimate the number of these students who would answer the item correctly Judges are asked to imagine a pool of 100 minimally competent borderline students and then estimate the number of these students who would answer the item correctly Reduces cognitive complexity of the task Reduces cognitive complexity of the task

V ARIATIONS ON A T HEME Scales Scales Iterative process Iterative process Feedback between rounds Feedback between rounds Judges’ results Judges’ results Past item performance Past item performance p-values p-values % passing % passing Yes/No procedure Yes/No procedure

S CALES Probability scales are sometimes provided to simplify the process. For example: Probability scales are sometimes provided to simplify the process. For example: 5%, 20%, 40%, 60%, 75%, 90%, 95% 0%, 5%, 10%, 15% … 95%, 100% 20%, 25%, 30% … 95%, 100%

A NGOFF WITH I TERATION Most commonly used modification. Most commonly used modification. “Angoff-ing” is done a number of times. “Angoff-ing” is done a number of times. Time between rounds is used for discussion among judges. Time between rounds is used for discussion among judges. Intent is to reduce variability among judges on item estimates. Intent is to reduce variability among judges on item estimates.

N ORMATIVE D ATA Normative or impact data is presented just prior to the final iteration. Normative or impact data is presented just prior to the final iteration. Improves inter-rater reliability. Improves inter-rater reliability. Greatest impact on items that have been greatly over or underestimated. Greatest impact on items that have been greatly over or underestimated.

Y ES/ N O P ROCEDURE Judges decide whether or not a single minimally competent borderline student would or would not answer the item correctly Judges decide whether or not a single minimally competent borderline student would or would not answer the item correctly Attempt to simplify the cognitive complexity of the judges’ task Attempt to simplify the cognitive complexity of the judges’ task Comparable results to the traditional method Comparable results to the traditional method

Y ES/ N O C ALCULATIONS Item Judge 1 Judge MPL j 32 Passing score = Average of MPLs = (3+2)/2 = 2.5 items correct

I N AN E MERGENCY When a committee is not available, Angoff-ing can be done solo When a committee is not available, Angoff-ing can be done solo Assign Angoff values to each item ands sum the values Assign Angoff values to each item ands sum the values Ask a colleague to review your Angoff assignments Ask a colleague to review your Angoff assignments Use an item analysis as a reality check Use an item analysis as a reality check

R OUNDING P ASSING S CORES Rarely do derived passing scores produce exact whole numbers Rarely do derived passing scores produce exact whole numbers Rounding may have an impact on the pass/fail rate Rounding may have an impact on the pass/fail rate Consider the consequences of rounding Consider the consequences of rounding

Questions?