Fairness in Testing: Introduction Suzanne Lane University of Pittsburgh Member, Management Committee for the JC on Revision of the 1999 Testing Standards.

Slides:

Advertisements

Similar presentations

Review of AERA/APA/NCME Test Standards Revision

Advertisements

Chapter 22 Evaluating a Research Report Gay, Mills, and Airasian

Open Hearing on Revising the Standards for Educational and Psychological Testing National Council on Measurement in Education March 25, 2008 New York,

National Accessible Reading Assessment Projects Defining Reading Proficiency for Accessible Large Scale Assessments Principles and Issues Paper American.

National Accessible Reading Assessment Projects Goals of Project NARAP Collaboration General Advisory Committee Project Details (ETS and PARA) Plans for.

Chapter 6 Process and Procedures of Testing

Evidence & Preference: Bias in Scoring TEDS-M Scoring Training Seminar Miami Beach, Florida.

Improving Practitioner Assessment Participation Decisions for English Language Learners with Disabilities Laurene Christensen, Ph.D. Linda Goldstone, M.S.

Protocol Development.

The Journey Toward Accessible Assessments Karen Barton CTB/McGraw-Hill Validity & Accommodations:

Update on the Revisions to the Standards for Educational and Psychological Testing: Overview 2010 Annual Meeting of the NCME Denver, Colorado May 1, 2010,

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 6 Validity.

General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.

ELL-Language-based Accommodations for Content Area Assessments The University of Central Florida Cocoa Campus Jamal Abedi University of California, Davis.

Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.

Issues Related to Assessment with Diverse Populations

Assessment: Reliability, Validity, and Absence of bias

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Consequential Validity Inclusive Assessment Seminar Elizabeth.

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Standard Setting Inclusive Assessment Seminar Marianne.

Jamal Abedi University of California, Davis/CRESST Presented at The Race to the Top Assessment Program January 20, 2010 Washington, DC RACE TO THE TOP.

1 Writing the Research Proposal Researchers communicate: Plans, Methods, Thoughts, and Objectives for others to read discuss, and act upon.

Chapter 3 Preparing and Evaluating a Research Plan Gay and Airasian

Minnesota Manual of Accommodations for Students with Disabilities Training Guide

Classroom Assessment A Practical Guide for Educators by Craig A

Please check, just in case…. Announcements 1.Standardized Test Description due in two weeks. 2.Questions about upcoming assignments? Make an appointment.

Validity Lecture Overview Overview of the concept Different types of validity Threats to validity and strategies for handling them Examples of validity.

Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.

1 Evaluating Psychological Tests. 2 Psychological testing Suffers a credibility problem within the eyes of general public Two main problems –Tests used.

September 26, 2012 DATA EVALUATION AND ANALYSIS IN SYSTEMATIC REVIEW.

Creating Assessments with English Language Learners in Mind In this module we will examine: Who are English Language Learners (ELL) and how are they identified?

The University of Central Florida Cocoa Campus

Moving from Development to Efficacy & Intervention Fidelity Topics National Center for Special Education Research Grantee Meeting: June 28, 2010.

Communication Degree Program Outcomes

Psychometric Issues in the Use of Testing Accommodations Chapter 4 David Goh.

Foundations of Recruitment and Selection I: Reliability and Validity

Martha Thurlow and Laurene Christensen National Center on Educational Outcomes CEC Preconvention Workshop #4 April 21, 2010.

Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.

Lecture 8A Designing and Conducting Formative Evaluations English Study Program FKIP _ UNSRI

Evaluating a Research Report

Bilingual Students and the Law n Title VI of the Civil Rights Act of 1964 n Title VII of the Elementary and Secondary Education Act - The Bilingual Education.

CCSSO Criteria for High-Quality Assessments Technical Issues and Practical Application of Assessment Quality Criteria.

NCATE Standard 3: Field Experiences & Clinical Practice Monica Y. Minor, NCATE Jeri A. Carroll, BOE Chair Professor, Wichita State University.

A Principled Approach to Accountability Assessments for Students with Disabilities CCSSO National Conference on Student Assessment Detroit, Michigan June.

Programme Objectives Analyze the main components of a competency-based qualification system (e.g., Singapore Workforce Skills) Analyze the process and.

Enhancing the Technical Quality of the North Carolina Testing Program: An Overview of Current Research Studies Nadine McBride, NCDPI Melinda Taylor, NCDPI.

Selecting a Sample. Sampling Select participants for study Select participants for study Must represent a larger group Must represent a larger group Picked.

Copyright © Allyn & Bacon 2008 Intelligent Consumer Chapter 14 This multimedia product and its contents are protected under copyright law. The following.

Validity Validity is an overall evaluation that supports the intended interpretations, use, in consequences of the obtained scores. (McMillan 17)

Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”

Experimental Research Methods in Language Learning Chapter 5 Validity in Experimental Research.

Ch 9 Internal and External Validity. Validity  The quality of the instruments used in the research study  Will the reader believe what they are readying.

Alternative Assessment Chapter 8 David Goh. Factors Increasing Awareness and Development of Alternative Assessment Educational reform movement Goals 2000,

Update on Program Review Margie Crutchfield AACTE February, 2009.

Criteria for selection of a data collection instrument. 1.Practicality of the instrument: -Concerns its cost and appropriateness for the study population.

Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.

Fitz-Albert R. Russell, PhD JP Establishing Standards for Students’ Assessment.

Chapter 3 Selection of Assessment Tools. Council of Exceptional Children’s Professional Standards All special educators should possess a common core of.

Jamal Abedi, UCLA/CRESST Major psychometric issues Research design issues How to address these issues Universal Design for Assessment: Theoretical Foundation.

Critical Issues Related to ELL Accommodations Designed for Content Area Assessments The University of Central Florida Cocoa Campus Jamal Abedi University.

Quantitative research Meeting 7. Research method is the most concrete and specific part of the proposal.

Assistant Instructor Nian K. Ghafoor Feb Definition of Proposal Proposal is a plan for master’s thesis or doctoral dissertation which provides the.

ELL-Focused Accommodations for Content Area Assessments: An Introduction The University of Central Florida Cocoa Campus Jamal Abedi University of California,

Ethics and the future of psychological testing. Almost any test can be useful in the right circumstances, but even the best test, when used inappropriately,

© 2013 by Nelson Education1 Foundations of Recruitment and Selection I: Reliability and Validity.

AAPPL Assessment Follow Up June What is AAPPL Measure? The ACTFL Assessment of Performance toward Proficiency in Languages (AAPPL) is a performance-

Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.

Laurene Christensen, Ph.D. Linda Goldstone, M.S.

University of North Carolina Wilmington

Samuel O. Ortiz, Ph.D. Professor St. John’s University

Unit 7: Instructional Communication and Technology

Presentation transcript:

Fairness in Testing: Introduction Suzanne Lane University of Pittsburgh Member, Management Committee for the JC on Revision of the 1999 Testing Standards

Organization of 1999 Standards Part I: Foundational Chapters Part II: Fairness in Testing Chapter 7: Fairness in Testing and Test Use Chapter 8: The Rights and Responsibilities of Test Takers Chapter 9: Testing Individuals of Diverse Linguistic Backgrounds Chapter 10: Testing Individuals with Disabilities Part III: Testing Applications

Proposed Revision Combine three of the chapters in Part II into a single chapter: Fairness in Testing Chapter 7: Fairness in Testing and Test Use Chapter 9: Testing Individuals of Diverse Linguistic Backgrounds Chapter 10: Testing Individuals with Disabilities Move combined chapter to Part I: Foundational Chapters

Why Reorganize the Chapters? Fairness in testing cannot be separated from accessibility Individuals should be able to understand and respond without performance being influenced by construct irrelevant characteristics All examinees that test is intended for should have an unobstructed opportunity to demonstrate their standing on the construct(s) being measured by the assessment

Accessibility is Essential for all Members of the Testing Population Accessibility is a fundamental aspect of fairness and is the right of all members of the intended test taking population

Draft Fairness Chapter Four sections: Section I: General Views of Fairness Section II: Threats to the Fair and Valid Interpretations of Test scores Section III: Minimizing Construct Irrelevant Components Through the Use of Test Design and Testing Adaptations Section IV: The Standards

Four Themes or Clusters 1. Use test design, development administration and scoring procedures that minimize barriers to valid test interpretations for all individuals. 2. Conduct studies to examine the validity of test score inferences for the intended examinee population. 3. Provide appropriate accommodations to remove barriers to the accessibility of the construct measured by the assessment and to the valid interpretation of the assessment scores. 4. Guard against inappropriate interpretations, use, and/or unintended consequences of test results for individuals or subgroups.

This Mornings Round Table Four members of the Joint Committee to Revise the 1999 Standards Barbara Plake Joan Herman Linda Cook Frank Worrell Discussants Martha Thurlow Jamal Abedi

Fairness in Testing: Theme 1 Barbara S. Plake University of Nebraska-Lincoln Co-Chair, JC on Revision of the 1999 Testing Standards

Use test design, development, administration, and scoring procedures that minimize barriers to valid test interpretations for all individuals. Test Design: use strategies to be as inclusive as possible for wide range of individuals Universal Design Administration Clearly delineate construct

Use test design, development, administration, and scoring procedures that minimize barriers to valid test interpretations for all individuals. Test Design: linguistic and reading demands consistent with construct Removes construct irrelevant variance Enhances validity of score interpretation; clarifies interpretation of standing on intended construct Even when language is part of construct, demand should be commensurate with needed levels for performance

Use test design, development, administration, and scoring procedures that minimize barriers to valid test interpretations for all individuals. Test Development: remove construct irrelevant components for members of special groups Differentially familiar words, symbols Sensitivity reviews

Use test design, development, administration, and scoring procedures that minimize barriers to valid test interpretations for all individuals. Test Development: evaluate appropriateness of materials/items/tasks for identifiable subgroups Small sample methodology Accumulate data over operational administrations Follow-up with causal investigations and actions to diminish differential test performance

Use test design, development, administration, and scoring procedures that minimize barriers to valid test interpretations for all individuals. Administration: Test takers receive comparable treatment during test administration and scoring Adhere to standardized protocols in admin except where flexibility enhances valid score interpretations Individualized administrations Role of interpersonal dynamics

Use test design, development, administration, and scoring procedures that minimize barriers to valid test interpretations for all individuals. Documentation: include aspects of testing process that supports valid score interpretations Specify how construct irrelevant variance was addressed in test design and development Include results of technical studies to examine measurement quality for subgroups Include studies of impact of accommodations and modifications on valid score interpretations

Fairness in Testing: Theme 2 Joan Herman CRESST/UCLA

THEME 2 Conduct studies to examine the validity of test score inferences for the intended examinee population. Where credible evidence indicates possibility of test bias Where sample sizes constrain empirical evidence, use qualitative methods.

Conduct studies to examine the validity of test score inferences for the intended examinee population the reliability and validity of score inferences for individuals from relevant subgroups should be specifically examined

Conduct studies to examine the validity of test score inferences for the intended examinee population When differential prediction is an issue, use regression equations computed separately for each group under consideration or an analysis in which group is entered as moderator variable.

Conduct studies to examine the validity of test score inferences for the intended examinee population When tests require scoring of constructed responses, evidence of reliability and validity of inferences should be obtained for relevant subgroups.

Fairness in Testing: Theme 3 Linda Cook Educational Testing Service

Provide Appropriate Accommodations to Remove Barriers to the Accessibility of the Construct Measured by the Assessment and to the Valid Interpretation of Scores Provide test accommodations, when appropriate and feasible, to remove construct irrelevant barriers that otherwise would interfere with an examinees ability to demonstrate their standing on the target construct(s).

Provide Appropriate Accommodations to Remove Barriers to the Accessibility of the Construct Measured by the Assessment and to the Valid Interpretation of Scores When test accommodations and/or modifications are permitted, test developers and/or test users are responsible for documenting provisions for their use.

Provide Appropriate Accommodations to Remove Barriers to the Accessibility of the Construct Measured by the Assessment and to the Valid Interpretation of Scores Whoever assigns, administers or documents the use of permissible test accommodations and/or modifications should have sufficient information available to them and sufficient expertise to carry out this role.

Provide Appropriate Accommodations to Remove Barriers to the Accessibility of the Construct Measured by the Assessment and to the Valid Interpretation of Scores When a test is changed to remove barriers to the construct being measured, empirical evidence of the reliability, validity, and comparability of inferences made from the scores should be obtained and documented.

Provide Appropriate Accommodations to Remove Barriers to the Accessibility of the Construct Measured by the Assessment and to the Valid Interpretation of Scores When tests are translated to a different language, empirical evidence of the reliability, validity, and comparability of inferences made from the scores from the changed test should be documented.

Provide Appropriate Accommodations to Remove Barriers to the Accessibility of the Construct Measured by the Assessment and to the Valid Interpretation of Scores A test generally should be administered in the test takers most proficient language for the testing context, unless proficiency in the language of the test is part of the construct that is being measured.

Provide Appropriate Accommodations to Remove Barriers to the Accessibility of the Construct Measured by the Assessment and to the Valid Interpretation of Scores When an interpreter is used in testing, the interpreter should be sufficiently fluent in the language and content of the test and the examinee's native language and culture to translate the test instructions and questions, and, where required, to explain the examinees test responses. Procedures for administering a test when an interpreter is used should be standardized.

Fairness in Testing: Theme 4 Frank C. Worrell University of California, Berkeley

Guard against inappropriate interpretations, use, and/or unintended consequences of test results for individuals or subgroups. Focus of this theme is on the use of test scoresinterpretation and consequences. As with the previous themes, the goal is to apply the general principles to relevant subgroups. ELLs, cultural minorities, immigrants, older individuals

Guard against inappropriate interpretations, use, and/or unintended consequences of test results for individuals or subgroups. Test developers and publishers need to provide information supporting claims that a test can be used with examinees from specific subgroups (e.g., individuals from different linguistic or cultural backgrounds, individuals with disabilities).

Guard against inappropriate interpretations, use, and/or unintended consequences of test results for individuals or subgroups. Research evidence is necessary to support the comparability of scores, when test scores are disaggregated and reported for subgroups (e.g., gender, ethnicity, age, language proficiency, disability).

Guard against inappropriate interpretations, use, and/or unintended consequences of test results for individuals or subgroups. Tests should not be used with subgroups if credible evidence suggests that examinees scores are affected by construct-irrelevant characteristics of the test or of the examinees.

Guard against inappropriate interpretations, use, and/or unintended consequences of test results for individuals or subgroups. It is inappropriate to use test scores as the sole indicator of an individuals functioning, competence, attitudes and/or predisposition for the purposes of diagnosis and intervention.

Guard against inappropriate interpretations, use, and/or unintended consequences of test results for individuals or subgroups. When alternative and equal measures of a construct exist, group differences (e.g., in mean scores or in percentages of subgroups of examinees passing) should be considered in deciding which test to use.

Guard against inappropriate interpretations, use, and/or unintended consequences of test results for individuals or subgroups. When a test is used as an instrument of public policy, test users and policy makers must provide evidence (e.g., reliability, validity, and comparability of scores, likely consequences for individuals from relevant subgroups) in support of the proposed use.