Setting Performance Standards EPSY 8225 Cizek, G.J., Bunch, M.B., & Koons, H. (2004). An NCME Instructional Module on Setting Performance Standards: Contemporary.

Setting Performance Standards EPSY 8225 Cizek, G.J., Bunch, M.B., & Koons, H. (2004). An NCME Instructional Module on Setting Performance Standards: Contemporary Methods. Educational Measurement: Issues & Practice, 23(4), 31-50.

Content Standards The knowledge, skills, and abilities expected to be achieved by examinees Content standards define the “what” of testing to determine outcomes, licensure, certification, mastery

Performance Standards Cut scores, achievement levels, passing scores Performance standards define the “how much” or “how well” of testing in terms of what examinees are expected to do to be categorized in one group or another Performance standards are needed because decisions have to be made

Recent Attention to Standards Rise of standards-referenced testing New research on standard setting Standards for Educational and Psychological Testing Federal Legislation – IDEA (1997) – NCLB (2001) – ESSA (2016)

Standards for Educational & Psychological Testing (2014) Scale scores, proficiency levels, and cut scores can be central to the use and interpretation of test scores. For that reason, their defensibility is an important consideration in test score validation for the intended purposes. (p. 95)

Testing Standards Cut score: a specified point on a score scale, such that scores at or above that point are reported, interpreted, or acted upon differently from scores below that point.

Testing Standards Cut scores may aid in formulating rules for reaching decisions on the basis of test performance. It should be recognized, however, that the likelihood of misclassification will generally be relatively high for persons with scores close to the cut scores. (p. 97)

Testing Standards A critical step in the development and use of some tests is to establish one or more cut scores dividing the score range to partition the distribution of scores into categories. (p. 100) Such cut scores provide the basis for using and interpreting test results. Thus, in some situations, the validity of test score interpretations may hinge on the cut scores. (p. 100)

Testing Standards Committees examine test items and student performance to recommend cut scores that are used to assign students to each achievement level based on their test performance. The final decision about the cut scores is a policy decision typically made by a policy body such as the board of education for the state. (p. 100)

Testing Standards Such cut scores should be established through a documented process involving appropriate stakeholders and validated through empirical research. (p. 187)

Testing Standards If standard setting employs data on the score distributions for criterion groups or on the relation of test scores to one or more criterion variables, those data should be summarized in technical documentation. (p. 108)

Testing Standards If a judgmental standard-setting process is followed, the method employed should be described clearly, and the precise nature and reliability of the judgments called for should be presented, whether those are judgments of persons, of items or test performances, or of other criterion performances predicted by test scores. (p. 108)

Standard 1.9 When a validation rests in part on the opinions or decisions of expert judges, observers, or raters, procedures for selecting such experts and for eliciting judgments or ratings should be fully described. The qualifications and experience of the judges should be presented. The description of procedures should include any training and instructions provided, should indicate whether participants reached their independently, and should report the level of agreement reached.

Standard 2.14 When possible and appropriate, conditional standard errors of measurement should be reported at several score levels unless there is evidence that the standard error is constant across score levels. Where cut scores are specified for selection or classification, the standard errors of measurement should be reported in the vicinity of each cut score.

Standard 5.21 When proposed score interpretations involve one or more cut scores, the rationale and procedures used for establishing cut scores should be documented clearly.

Standard 5.22 When cut scores defining pass-fail or proficiency levels are based on direct judgments about the adequacy of item or test performances, the judgmental process should be designed so that the participants providing the judgments can bring their knowledge and experience to bear in a reasonable way.

Standard 5.23 When feasible and appropriate, cut scores defining categories with distinct substantive interpretations should be informed by sound empirical data concerning the relation of test performance to relevant criteria.

Standard 7.4 Test documentation should summarize test development procedures, including descriptions and the results of the statistical analyses that were used in the development of the test, evidence of reliability/precision of scores and the validity of their recommended interpretations, and the methods for establishing cut scores.

Standard 11.16 The level of performance required for passing a credentialing test should depend on the knowledge and skills necessary for credential- worthy performance in the occupation or profession and should not be adjusted to control the number or proportion of persons passing the test.

General Considerations Purpose (Linn, 1994) 1.Exhortation, raising expectations 2.Exemplification, providing examples of competencies 3.Accountability 4.Certification

General Considerations Development of Performance Level Labels PLLs are terms that identify the categories of performance (see Table 3-3). These terms do not have technical basis, but play important roles in communicating the meaning of performance levels.

Labels Proficient Meets standards Achieve the standard Met expectations Mastery Pass Satisfactory Intermediate Level III

General Considerations Development of Performance Level Descriptors A more complete description of what performance looks like within each category (PLL), describing the knowledge, skills, and abilities of examinees within each level (see Table 3-4).

Descriptors Satisfactory achievement. Adequate understanding of the on grade content. Solid understanding of challenging subject matter. Competency indicating preparation for the next grade level. Ability to apply on-grade standards capably. Acceptable command of grade-level content and processes.

Descriptors (cont.) Ability to apply concepts and processes effectively. Solid academic performance... competency with challenging subject matter. Solid academic performance... prepared for the next grade. Mastery of grade-level standards. High level of achievement... ability to solve complex problems.

MN MCA Mathematics Results from 2015 Grade 8 20.2 Does not meet 22.0 Partially meets 31.0 Meets 26.8 Exceeds 62,000 Grade 11 28.8 Does not meet 22.5 Partially meets 30.8 Meets 17.9 Exceeds 58,000

General Considerations Ensure Quality of Participants The Standards state “Care must be taken to ensure that these persons understand what they are to do and that their judgments are as thoughtful and objective as possible. The process must be such that well-qualified participants can apply their knowledge and experience to reach meaningful and relevant judgments that accurately reflect their understanding and intentions.” (p. 101)

General Considerations Conceptualizing the Examinee Group Various methods of standard setting require panelists to conceptualize the group for whom the standards are being set – may be a minimally competent examinee or an examinee that has mastered a skill.

General Considerations Feedback to Panel Members Normative information (other’s ratings) Reality information (item stats) Impact information (score distributions)

Standard Setting Methods Many methods have been proposed, used, and evaluated. Hambleton (1998) described a set of generic steps that are common to most methods. These steps can be used as an outline for your standard setting plan assignment, but specific methods should be based on your chosen standard setting method.

Generic Steps Choose a method Prepare training materials and agenda Prepare PLDs Select large representative pool of participants Train participants Complete judgments and compile results Facilitate discussion Engage in multiple rounds of ratings

Generic Steps (cont.) Consider employing different kinds of feedback, including normative, reality, and impact data Conduct final round Conduct an evaluation Complete documentation

Evaluation A post-training evaluation may take place to assess the degree to which participants understand the process and are confident that they can proceed. This may serve formative purposes. A post-process evaluation is a minimal expectation to provide summative reflections. Address confidence in process and recommendations (validity issue).

Some Quiz Questions

1.The Standards for Educational and Psychological Testing (1999) require all of the following related to standard setting except: A.estimates of classification decision consistency. B.description of the qualifications and experience of participants. C.scientifically based (i.e., experimental) standard- setting study designs. D.estimates of standard errors of measurement for scores in the regions of recommended cut scores.

2. The typical role of the standard setting panel is to A.determine one or more cut scores for a particular test. B.recommend one or more cut scores to authorized decision makers. C.determine the most appropriate method to use for the standard-setting task. D.develop performance level descriptors that best match the target examinees.

6. Which of the following is true regarding the composition of a standard-setting panel? A. It should consist of at least 10 members for each construct measured by a multidimensional test. B. It should include only participants with previous standard setting experience. C. It should be diverse enough to represent all likely examinee demographics. D. It should be large and representative enough to produce reliable results.

12. Which of the following scenarios would most likely be classified as a "holistic" standard- setting procedure? A.Standard setters review standardized math portfolios produced by 35 different students. B.Standard setters review sample performances by 200 students on a single writing prompt. C.Standard setters estimate the likelihood of a minimally Proficient student answering each of 60 multiple-choice items correctly. D.Standard setters compare the performances of a group of known experts in a field with the performances of a group of known novices.

Evaluating a SS Report Procedural evidence Internal evidence External evidence See Cizek, Bunch, Koons (2004).

Setting Performance Standards EPSY 8225 Cizek, G.J., Bunch, M.B., & Koons, H. (2004). An NCME Instructional Module on Setting Performance Standards: Contemporary.

Similar presentations

Presentation on theme: "Setting Performance Standards EPSY 8225 Cizek, G.J., Bunch, M.B., & Koons, H. (2004). An NCME Instructional Module on Setting Performance Standards: Contemporary."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Setting Performance Standards EPSY 8225 Cizek, G.J., Bunch, M.B., & Koons, H. (2004). An NCME Instructional Module on Setting Performance Standards: Contemporary.

Similar presentations

Presentation on theme: "Setting Performance Standards EPSY 8225 Cizek, G.J., Bunch, M.B., & Koons, H. (2004). An NCME Instructional Module on Setting Performance Standards: Contemporary."— Presentation transcript:

Similar presentations

About project

Feedback