Validity Validity: the ongoing trust in the accuracy of the test, the administration, and interpretations and use of results According to Messick (1995), “validity is not a property of the test... as such, but rather of the meaning of the test scores... (that) are a function not only of the items or stimulus conditions, but also of the persons responding... (p. 741). Validation must therefore encompass the full testing environment: –test constructs –items –persons –characteristics and interactions of each This goes beyond the validity of accommodating to the heart of assessment validity.
Person Variables & Variation Identification of disability Accommodation policies, selection and provision Access to and instruction in varied standards and depth/breadth of coverage Access to test information and opportunity to accurately respond
Given the current state of the state, where diverse examinees approach the assessment platform with various accommodations and in non-standard administrations, what can be done to improve the validity of the assessments?
What is a construct? “A product of informed scientific imagination, an idea developed to permit categorization and description of some directly observable behavior.. (The construct itself is) not directly observable... (and) it must first be operationally defined (Crocker and Algina, 1986, p. 230). ≠ Trait.
Construct Targeted Trait Evidence Targeted Trait Evidence Targeted Trait Evidence The operational definition includes the specification of traits and observable skills that, together, represent the unobservable construct. The operational definition should be researched and empirically supported.
Math Intelligence Computation Item Problem Solving Item Numbers Item
Access Precision Validity
Access Student access to –test information (directions, stimulus), –requirements (expectation of how to respond), –response capabilities (the way in which students respond) Item access to student ability – true performance
Improved Access Improved student access: –Accommodations: Access tools specific to examinees that allow for assessment such that disability or language does not misrepresent true performance. Improved item access: –Minimizing Construct Irrelevant Variance (systematic error) improved precision
Precision threat: Error Random error –Random or inconsistent –Inherent to the assessment –Examples – content sampling, summative “snap-shot” assessment, scoring, distractions –Reduce usefulness of scores Systematic error –Consistent –Inherent to examinee –Example – students with disabilities without needed accommodation(s), low item accessibility –Reduce accuracy of scores When error is minimized, scores are more trustworthy! SESE RERE
Minimizing Error Random: Standardization – belief that random error can be minimized by standardizing test administrations. Systematic: Construct Irrelevant Variance –Constant ~ group specific –Over/underestimation of scores ~ “Students potentially provide the most serious threat to CIV.” (Haladyna & Downing, p.23) ~ This brings us back to the test and how students interact with the constructs to be measured.
Accommodations Such tools change administrations from standard to non-standard, threatening comparability of results. Providing either a standard or non-standard administration requires sacrifices: –random errors in a non-standard environment –systematic errors when a test is standard and inflexible to the access of students to test information The question is: at what point are the sacrifices impeding measurement precision and the validity thereof?
Back to Basics: Valid Assessment Systems Improved student data –Improved collection, particularly in light of Peer Review, to include subgroup data –Supporting students –Improving decisions on accommodations and standardization of the provisions thereof –Recognizing the assumptions of policy decisions: classifications and accommodations Re-conceptualization of “standardization.” A more valid conceptualization may be what is standard for each examinee.
Back to Basics: Valid Assessment Systems Well targeted to clearly and operationally defined construct –If we can’t define what we want to know, how do we know that what we know is what we want to know? Balanced and aligned expectations of: –standards –skills –range of difficulties Improved measurement precision –Reduction in random AND systematic errors –Expanded item sampling –Increased accessibility –Flexibility –UDA ~ the Goldilocks approach
Past Research Ways to “validate” accommodations – DIF, EFA, cluster analyses, qualitative reviews, etc. Inconclusive results Difficulties in conducting research: –Experimental designs –Concurrent accommodations vs. single accommodations –Confounding variables
Past Research Lack of consensus on what constitutes “valid” accommodations –Does “boost” = validity? –Isn’t it possible that a valid accommodation might increase precision in measurement and possibly reveal student inability – no boost?
Continued & Future Research Given the confounding variables of both persons and tests, accommodations can not be validated apart from an in- depth look at the assessment and what it is trying to measure, in concert with how the accommodation by the student and test items interact. (Ex. - construct irrelevant variance by researchers Abedi, Kopriva, Winters,, et. al) It must be clear how the accommodations affect skill measurement. Therefore, future research should focus deeply on assessment validity in light of how the wide range of students, with all their diversities (and confounding variables), approach assessments.
Continued & Future Research Re-evaluation of test constructs Research on all students, not limited to disability classifications Is there a way to measure individual systematic error? Research on distractors –What are the types of errors students make/distractors students choose Think aloud studies focused on access and student response preferences
Continued & Future Research Flexibility: –New item types and acceptable student response modes –Approach flexible item types and research thereof as parallel item forms and formats for more than the “accommodated” sample.
General, accommodated, alternate, and modified alternate assessments can and should be –Better aligned to clearly defined constructs –More innovative by design, –Valid for more than the middle of the bell, and –More meaningful and useful.