The Journey Toward Accessible Assessments Karen Barton CTB/McGraw-Hill Validity & Accommodations:

Slides:



Advertisements
Similar presentations
Performance Assessment
Advertisements

National Accessible Reading Assessment Projects Defining Reading Proficiency for Accessible Large Scale Assessments Principles and Issues Paper American.
Gifted and Talented Programming. Programming Options? Pull out –Weekly –Full day a week –Daily Self-contained Push in Acceleration Options Cluster Grouping.
Fairness in Testing: Introduction Suzanne Lane University of Pittsburgh Member, Management Committee for the JC on Revision of the 1999 Testing Standards.
Wynne Harlen. What do you mean by assessment? Is there assessment when: 1. A teacher asks pupils questions to find out what ideas they have about a topic.
Lecturette 2: Educational Mandates & Universally Designed Large Scale Assessments.
Bringing it all together!
Presented by Eroika Jeniffer.  We want to set tasks that form a representative of the population of oral tasks that we expect candidates to be able to.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
RESEARCH METHODS Lecture 18
Chapter 4 Validity.
Understanding the Research Process
Concept of Measurement
Measurement Validity and Reliability. Reliability: The degree to which measures are free from random error and therefore yield consistent results.
RELIABILITY consistency or reproducibility of a test score (or measurement)
Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.
Consistency/Reliability
Sampling and Experimental Control Goals of clinical research is to make generalizations beyond the individual studied to others with similar conditions.
Jamal Abedi University of California, Davis/CRESST Presented at The Race to the Top Assessment Program January 20, 2010 Washington, DC RACE TO THE TOP.
Scientific method - 1 Scientific method is a body of techniques for investigating phenomena and acquiring new knowledge, as well as for correcting and.
Introduction to Communication Research
Research Methods in MIS
3 Methods for Collecting Data Mgt Three Major Techniques for Collecting Data: 1. Questionnaires 2. Interviews 3. Observation.
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Validity Lecture Overview Overview of the concept Different types of validity Threats to validity and strategies for handling them Examples of validity.
Science and Engineering Practices
Chapter 2 Understanding the Research Process
Descriptive and Causal Research Designs
The 5 E Instructional Model
Chapter 4 Principles of Quantitative Research. Answering Questions  Quantitative Research attempts to answer questions by ascribing importance (significance)
Creating Assessments with English Language Learners in Mind In this module we will examine: Who are English Language Learners (ELL) and how are they identified?
Chapter 1: Introduction to Statistics
Evaluating Student Growth Looking at student works samples to evaluate for both CCSS- Math Content and Standards for Mathematical Practice.
Copyright © 2008 by Nelson, a division of Thomson Canada Limited Chapter 11 Part 3 Measurement Concepts MEASUREMENT.
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
Final Study Guide Research Design. Experimental Research.
Cara Cahalan-Laitusis Operational Data or Experimental Design? A Variety of Approaches to Examining the Validity of Test Accommodations.
Understanding Variability Unraveling the Mystery of the Data’s Message Becoming a “Data Whisperer”
Evaluating a Research Report
CCSSO Criteria for High-Quality Assessments Technical Issues and Practical Application of Assessment Quality Criteria.
Assessing Learning for Students with Disabilities Tom Haladyna Arizona State University.
Methodology Matters: Doing Research in the Behavioral and Social Sciences ICS 205 Ha Nguyen Chad Ata.
Race to the Top Assessment Program General & Technical Discussion Lizanne DeStefano University of Illinois.
EDU 5900 AB. RAHIM BAKAR 1 Research Methods in Education.
Spring 2015 Kyle Stephenson
1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Research Methods Observations Interviews Case Studies Surveys Quasi Experiments.
SECOND EDITION Chapter 5 Standardized Measurement and Assessment
Chapter 6 - Standardized Measurement and Assessment
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
Jamal Abedi, UCLA/CRESST Major psychometric issues Research design issues How to address these issues Universal Design for Assessment: Theoretical Foundation.
Educational Research Chapter 8. Tools of Research Scales and instruments – measure complex characteristics such as intelligence and achievement Scales.
ELL-Focused Accommodations for Content Area Assessments: An Introduction The University of Central Florida Cocoa Campus Jamal Abedi University of California,
Copyright © Springer Publishing Company, LLC. All Rights Reserved. DEVELOPING AND USING TESTS – Chapter 11 –
Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.
Principles of Quantitative Research
Reliability and Validity in Research
Chapter 4: Studying Behavior
پرسشنامه کارگاه.
Measurement Reliability and Validity
Reliability and Validity of Measurement
RESEARCH METHODS Lecture 18
Construct an idea/concept constructed by the scientist to explain events observed, e.g. self-concept; intelligence; social identity not necessarily clearly.
Measurement Concepts and scale evaluation
Chapter 8 VALIDITY AND RELIABILITY
Presentation transcript:

The Journey Toward Accessible Assessments Karen Barton CTB/McGraw-Hill Validity & Accommodations:

Validity Validity: the ongoing trust in the accuracy of the test, the administration, and interpretations and use of results According to Messick (1995), “validity is not a property of the test... as such, but rather of the meaning of the test scores... (that) are a function not only of the items or stimulus conditions, but also of the persons responding... (p. 741). Validation must therefore encompass the full testing environment: –test constructs –items –persons –characteristics and interactions of each This goes beyond the validity of accommodating to the heart of assessment validity.

Validity Examinee Context (CRT, NCLB) Construct Test/Items

Validity Examinee AccommodationsClassification Context (CRT, NCLB) Construct Test/Items

Validity Examinee Context (CRT, NCLB) Construct Test/Items TargetedAccessible

Validity Examinee AccommodationsClassification Context (CRT, NCLB) Construct Test/Items TargetedAccessible

Validity Examinee AccommodationsClassification Context (CRT, NCLB) Construct Test/Items TargetedAccessible Predetermined Pieces

Confounding Variables: Persons Persons Disabilities Accommodations Access/barriers Response

Person Variables & Variation Identification of disability Accommodation policies, selection and provision Access to and instruction in varied standards and depth/breadth of coverage Access to test information and opportunity to accurately respond

Given the current state of the state, where diverse examinees approach the assessment platform with various accommodations and in non-standard administrations, what can be done to improve the validity of the assessments?

Validity Examinee AccommodationsClassification Context (CRT, NCLB) Construct Test/Items TargetedAccessible Predetermined Pieces

What is a construct? “A product of informed scientific imagination, an idea developed to permit categorization and description of some directly observable behavior.. (The construct itself is) not directly observable... (and) it must first be operationally defined (Crocker and Algina, 1986, p. 230). ≠ Trait.

Construct Targeted Trait Evidence Targeted Trait Evidence Targeted Trait Evidence The operational definition includes the specification of traits and observable skills that, together, represent the unobservable construct. The operational definition should be researched and empirically supported.

Math Intelligence Computation Item Problem Solving Item Numbers Item

Predetermined Pieces

Access Precision Validity

Access Student access to –test information (directions, stimulus), –requirements (expectation of how to respond), –response capabilities (the way in which students respond) Item access to student ability – true performance

Improved Access Improved student access: –Accommodations: Access tools specific to examinees that allow for assessment such that disability or language does not misrepresent true performance. Improved item access: –Minimizing Construct Irrelevant Variance (systematic error)  improved precision

Precision threat: Error Random error –Random or inconsistent –Inherent to the assessment –Examples – content sampling, summative “snap-shot” assessment, scoring, distractions –Reduce usefulness of scores Systematic error –Consistent –Inherent to examinee –Example – students with disabilities without needed accommodation(s), low item accessibility –Reduce accuracy of scores When error is minimized, scores are more trustworthy! SESE RERE

Validity Examinee AccommodationsClassification Context (CRT, NCLB) Construct Test/Items TargetedAccessible RERE SESE SESE SESE

Minimizing Error Random: Standardization – belief that random error can be minimized by standardizing test administrations. Systematic: Construct Irrelevant Variance –Constant ~ group specific –Over/underestimation of scores ~ “Students potentially provide the most serious threat to CIV.” (Haladyna & Downing, p.23) ~ This brings us back to the test and how students interact with the constructs to be measured.

Accommodations Such tools change administrations from standard to non-standard, threatening comparability of results. Providing either a standard or non-standard administration requires sacrifices: –random errors in a non-standard environment –systematic errors when a test is standard and inflexible to the access of students to test information The question is: at what point are the sacrifices impeding measurement precision and the validity thereof?

Back to Basics: Valid Assessment Systems Improved student data –Improved collection, particularly in light of Peer Review, to include subgroup data –Supporting students –Improving decisions on accommodations and standardization of the provisions thereof –Recognizing the assumptions of policy decisions: classifications and accommodations Re-conceptualization of “standardization.” A more valid conceptualization may be what is standard for each examinee.

Back to Basics: Valid Assessment Systems Well targeted to clearly and operationally defined construct –If we can’t define what we want to know, how do we know that what we know is what we want to know? Balanced and aligned expectations of: –standards –skills –range of difficulties Improved measurement precision –Reduction in random AND systematic errors –Expanded item sampling –Increased accessibility –Flexibility –UDA ~ the Goldilocks approach

Past Research Ways to “validate” accommodations – DIF, EFA, cluster analyses, qualitative reviews, etc. Inconclusive results Difficulties in conducting research: –Experimental designs –Concurrent accommodations vs. single accommodations –Confounding variables

Past Research Lack of consensus on what constitutes “valid” accommodations –Does “boost” = validity? –Isn’t it possible that a valid accommodation might increase precision in measurement and possibly reveal student inability – no boost?

Continued & Future Research Given the confounding variables of both persons and tests, accommodations can not be validated apart from an in- depth look at the assessment and what it is trying to measure, in concert with how the accommodation by the student and test items interact. (Ex. - construct irrelevant variance by researchers Abedi, Kopriva, Winters,, et. al) It must be clear how the accommodations affect skill measurement. Therefore, future research should focus deeply on assessment validity in light of how the wide range of students, with all their diversities (and confounding variables), approach assessments.

Continued & Future Research Re-evaluation of test constructs Research on all students, not limited to disability classifications Is there a way to measure individual systematic error? Research on distractors –What are the types of errors students make/distractors students choose Think aloud studies focused on access and student response preferences

Continued & Future Research Flexibility: –New item types and acceptable student response modes –Approach flexible item types and research thereof as parallel item forms and formats for more than the “accommodated” sample.

General, accommodated, alternate, and modified alternate assessments can and should be –Better aligned to clearly defined constructs –More innovative by design, –Valid for more than the middle of the bell, and –More meaningful and useful.