Setting Performance Standards EPSY 8225 Cizek, G.J., Bunch, M.B., & Koons, H. (2004). An NCME Instructional Module on Setting Performance Standards: Contemporary.

Slides:

Advertisements

Similar presentations

Performance Assessment

Advertisements

Test Development.

Standardized Scales.

Spiros Papageorgiou University of Michigan

M AKING A PPROPRIATE P ASS- F AIL D ECISIONS D WIGHT H ARLEY, Ph.D. DIVISION OF STUDIES IN MEDICAL EDUCATION UNIVERSITY OF ALBERTA.

General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.

Designing Scoring Rubrics. What is a Rubric? Guidelines by which a product is judged Guidelines by which a product is judged Explain the standards for.

Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.

1 New England Common Assessment Program (NECAP) Setting Performance Standards.

MCAS Standards Validation: High School Introductory Physics Sheraton Hotel Braintree, MA September 17-18, 2007.

Setting Performance Standards Grades 5-7 NJ ASK NJDOE Riverside Publishing May 17, 2006.

Presented by Denise Sibley Laura Jean Kerr Mississippi Assessment Center Research and Curriculum Unit.

Chapter 4 Validity.

REVIEW I Reliability Index of Reliability Theoretical correlation between observed & true scores Standard Error of Measurement Reliability measure Degree.

MCAS-Alt: Alternate Assessment in Massachusetts Technical Challenges and Approaches to Validity Daniel J. Wiener, Administrator of Inclusive Assessment.

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Standard Setting Inclusive Assessment Seminar Marianne.

Standard Setting Different names for the same thing Standard Passing Score Cut Score Cutoff Score Mastery Level Bench Mark.

National Center on Educational Outcomes N C E O What the heck does proficiency mean for students with significant cognitive disabilities? Nancy Arnold,

Setting Alternate Achievement Standards Prepared by Sue Rigney U.S. Department of Education NCEO Teleconference March 21, 2005.

Principles of High Quality Assessment

Dr. Robert Mayes University of Wyoming Science and Mathematics Teaching Center

Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.

Chapter 7 Correlational Research Gay, Mills, and Airasian

What should be the basis of

Becoming a Teacher Ninth Edition

Ensuring State Assessments Match the Rigor, Depth and Breadth of College- and Career- Ready Standards Student Achievement Partners Spring 2014.

Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.

Classroom Assessment A Practical Guide for Educators by Craig A

1 New England Common Assessment Program (NECAP) Setting Performance Standards.

EDU 385 Education Assessment in the Classroom

Unit 1 – Preparation for Assessment LO 1.1&1.2&1.3.

 Closing the loop: Providing test developers with performance level descriptors so standard setters can do their job Amanda A. Wolkowitz Alpine Testing.

CCSSO Criteria for High-Quality Assessments Technical Issues and Practical Application of Assessment Quality Criteria.

Assessment in Education Patricia O’Sullivan Office of Educational Development UAMS.

Teaching Today: An Introduction to Education 8th edition

{ Principal Leadership Evaluation. The VAL-ED Vision… The construction of valid, reliable, unbiased, accurate, and useful reporting of results Summative.

Employing Empirical Data in Judgmental Processes Wayne J. Camara National Conference on Student Assessment, San Diego, CA June 23, 2015.

Performance Assessment OSI Workshop June 25 – 27, 2003 Yerevan, Armenia Ara Tekian, PhD, MHPE University of Illinois at Chicago.

Graduate studies - Master of Pharmacy (MPharm) 1 st and 2 nd cycle integrated, 5 yrs, 10 semesters, 300 ECTS-credits 1 Integrated master's degrees qualifications.

Programme Objectives Analyze the main components of a competency-based qualification system (e.g., Singapore Workforce Skills) Analyze the process and.

Raising the Bar for Oregon. Adopt New Math Cut Scores and Final Math Achievement Level Descriptors and Policy Definitions Adopt High School Math Achievement.

Performance-Based Assessment HPHE 3150 Dr. Ayers.

Standard Setting Results for the Oklahoma Alternate Assessment Program Dr. Michael Clark Research Scientist Psychometric & Research Services Pearson State.

Assessment and Testing

Copyright © Allyn & Bacon 2008 Intelligent Consumer Chapter 14 This multimedia product and its contents are protected under copyright law. The following.

Georgia will lead the nation in improving student achievement. 1 Georgia Performance Standards Day 3: Assessment FOR Learning.

Criteria for selection of a data collection instrument. 1.Practicality of the instrument: -Concerns its cost and appropriateness for the study population.

Criterion-Referenced Testing and Curriculum-Based Assessment EDPI 344.

N ational Q ualifications F ramework N Q F Quality Center National Accreditation Committee.

Chapter 6 - Standardized Measurement and Assessment

Policy Definitions, Achievement Level Descriptors, and Math Achievement Standards.

Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.

The Role of the Internal and External Evaluators in Student Assessment Arthur Brown Advisor to the Quality Assurance and Accreditation Project Republic.

Proposed End-of-Course (EOC) Cut Scores for the Spring 2015 Test Administration Presentation to the Nevada State Board of Education March 17, 2016.

School practice Dragica Trivic. FINDINGS AND RECOMMENDATIONS FROM TEMPUS MASTS CONFERENCE in Novi Sad Practice should be seen as an integral part of the.

High School Proficiency Exam Nevada Department of Education.

Policy Definitions, Achievement Level Descriptors, and Math Standards.

EVALUATING EPP-CREATED ASSESSMENTS

Jean-Guy Blais Université de Montréal

Assessment of Learning 1

Classroom Assessment A Practical Guide for Educators by Craig A

Assessments for Monitoring and Improving the Quality of Education

Introduction to the Validation Phase

Next-Generation MCAS: Update and review of standard setting

Week 3 Class Discussion.

Updates on the Next-Generation MCAS

Standard Setting for NGSS

Assessment Literacy: Test Purpose and Use

TESTING AND EVALUATION IN EDUCATION GA 3113 lecture 1

REVIEW I Reliability scraps Index of Reliability

Presentation transcript:

Setting Performance Standards EPSY 8225 Cizek, G.J., Bunch, M.B., & Koons, H. (2004). An NCME Instructional Module on Setting Performance Standards: Contemporary Methods. Educational Measurement: Issues & Practice, 23(4),

Content Standards The knowledge, skills, and abilities expected to be achieved by examinees Content standards define the “what” of testing to determine outcomes, licensure, certification, mastery

Performance Standards Cut scores, achievement levels, passing scores Performance standards define the “how much” or “how well” of testing in terms of what examinees are expected to do to be categorized in one group or another Performance standards are needed because decisions have to be made

Recent Attention to Standards Rise of standards-referenced testing New research on standard setting Standards for Educational and Psychological Testing Federal Legislation – IDEA (1997) – NCLB (2001) – ESSA (2016)

Standards for Educational & Psychological Testing (2014) Scale scores, proficiency levels, and cut scores can be central to the use and interpretation of test scores. For that reason, their defensibility is an important consideration in test score validation for the intended purposes. (p. 95)

Testing Standards Cut score: a specified point on a score scale, such that scores at or above that point are reported, interpreted, or acted upon differently from scores below that point.

Testing Standards Cut scores may aid in formulating rules for reaching decisions on the basis of test performance. It should be recognized, however, that the likelihood of misclassification will generally be relatively high for persons with scores close to the cut scores. (p. 97)

Testing Standards A critical step in the development and use of some tests is to establish one or more cut scores dividing the score range to partition the distribution of scores into categories. (p. 100) Such cut scores provide the basis for using and interpreting test results. Thus, in some situations, the validity of test score interpretations may hinge on the cut scores. (p. 100)

Testing Standards Committees examine test items and student performance to recommend cut scores that are used to assign students to each achievement level based on their test performance. The final decision about the cut scores is a policy decision typically made by a policy body such as the board of education for the state. (p. 100)

Testing Standards Such cut scores should be established through a documented process involving appropriate stakeholders and validated through empirical research. (p. 187)

Testing Standards If standard setting employs data on the score distributions for criterion groups or on the relation of test scores to one or more criterion variables, those data should be summarized in technical documentation. (p. 108)

Testing Standards If a judgmental standard-setting process is followed, the method employed should be described clearly, and the precise nature and reliability of the judgments called for should be presented, whether those are judgments of persons, of items or test performances, or of other criterion performances predicted by test scores. (p. 108)

Standard 1.9 When a validation rests in part on the opinions or decisions of expert judges, observers, or raters, procedures for selecting such experts and for eliciting judgments or ratings should be fully described. The qualifications and experience of the judges should be presented. The description of procedures should include any training and instructions provided, should indicate whether participants reached their independently, and should report the level of agreement reached.

Standard 2.14 When possible and appropriate, conditional standard errors of measurement should be reported at several score levels unless there is evidence that the standard error is constant across score levels. Where cut scores are specified for selection or classification, the standard errors of measurement should be reported in the vicinity of each cut score.

Standard 5.21 When proposed score interpretations involve one or more cut scores, the rationale and procedures used for establishing cut scores should be documented clearly.

Standard 5.22 When cut scores defining pass-fail or proficiency levels are based on direct judgments about the adequacy of item or test performances, the judgmental process should be designed so that the participants providing the judgments can bring their knowledge and experience to bear in a reasonable way.

Standard 5.23 When feasible and appropriate, cut scores defining categories with distinct substantive interpretations should be informed by sound empirical data concerning the relation of test performance to relevant criteria.

Standard 7.4 Test documentation should summarize test development procedures, including descriptions and the results of the statistical analyses that were used in the development of the test, evidence of reliability/precision of scores and the validity of their recommended interpretations, and the methods for establishing cut scores.

Standard The level of performance required for passing a credentialing test should depend on the knowledge and skills necessary for credential- worthy performance in the occupation or profession and should not be adjusted to control the number or proportion of persons passing the test.

General Considerations Purpose (Linn, 1994) 1.Exhortation, raising expectations 2.Exemplification, providing examples of competencies 3.Accountability 4.Certification

General Considerations Development of Performance Level Labels PLLs are terms that identify the categories of performance (see Table 3-3). These terms do not have technical basis, but play important roles in communicating the meaning of performance levels.

Labels Proficient Meets standards Achieve the standard Met expectations Mastery Pass Satisfactory Intermediate Level III

General Considerations Development of Performance Level Descriptors A more complete description of what performance looks like within each category (PLL), describing the knowledge, skills, and abilities of examinees within each level (see Table 3-4).

Descriptors Satisfactory achievement. Adequate understanding of the on grade content. Solid understanding of challenging subject matter. Competency indicating preparation for the next grade level. Ability to apply on-grade standards capably. Acceptable command of grade-level content and processes.

Descriptors (cont.) Ability to apply concepts and processes effectively. Solid academic performance... competency with challenging subject matter. Solid academic performance... prepared for the next grade. Mastery of grade-level standards. High level of achievement... ability to solve complex problems.

MN MCA Mathematics Results from 2015 Grade Does not meet 22.0 Partially meets 31.0 Meets 26.8 Exceeds 62,000 Grade Does not meet 22.5 Partially meets 30.8 Meets 17.9 Exceeds 58,000

General Considerations Ensure Quality of Participants The Standards state “Care must be taken to ensure that these persons understand what they are to do and that their judgments are as thoughtful and objective as possible. The process must be such that well-qualified participants can apply their knowledge and experience to reach meaningful and relevant judgments that accurately reflect their understanding and intentions.” (p. 101)

General Considerations Conceptualizing the Examinee Group Various methods of standard setting require panelists to conceptualize the group for whom the standards are being set – may be a minimally competent examinee or an examinee that has mastered a skill.

General Considerations Feedback to Panel Members Normative information (other’s ratings) Reality information (item stats) Impact information (score distributions)

Standard Setting Methods Many methods have been proposed, used, and evaluated. Hambleton (1998) described a set of generic steps that are common to most methods. These steps can be used as an outline for your standard setting plan assignment, but specific methods should be based on your chosen standard setting method.

Generic Steps Choose a method Prepare training materials and agenda Prepare PLDs Select large representative pool of participants Train participants Complete judgments and compile results Facilitate discussion Engage in multiple rounds of ratings

Generic Steps (cont.) Consider employing different kinds of feedback, including normative, reality, and impact data Conduct final round Conduct an evaluation Complete documentation

Evaluation A post-training evaluation may take place to assess the degree to which participants understand the process and are confident that they can proceed. This may serve formative purposes. A post-process evaluation is a minimal expectation to provide summative reflections. Address confidence in process and recommendations (validity issue).

Some Quiz Questions

1.The Standards for Educational and Psychological Testing (1999) require all of the following related to standard setting except: A.estimates of classification decision consistency. B.description of the qualifications and experience of participants. C.scientifically based (i.e., experimental) standard- setting study designs. D.estimates of standard errors of measurement for scores in the regions of recommended cut scores.

2. The typical role of the standard setting panel is to A.determine one or more cut scores for a particular test. B.recommend one or more cut scores to authorized decision makers. C.determine the most appropriate method to use for the standard-setting task. D.develop performance level descriptors that best match the target examinees.

6. Which of the following is true regarding the composition of a standard-setting panel? A. It should consist of at least 10 members for each construct measured by a multidimensional test. B. It should include only participants with previous standard setting experience. C. It should be diverse enough to represent all likely examinee demographics. D. It should be large and representative enough to produce reliable results.

12. Which of the following scenarios would most likely be classified as a "holistic" standard- setting procedure? A.Standard setters review standardized math portfolios produced by 35 different students. B.Standard setters review sample performances by 200 students on a single writing prompt. C.Standard setters estimate the likelihood of a minimally Proficient student answering each of 60 multiple-choice items correctly. D.Standard setters compare the performances of a group of known experts in a field with the performances of a group of known novices.

Evaluating a SS Report Procedural evidence Internal evidence External evidence See Cizek, Bunch, Koons (2004).