Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 An Introduction to Validity Arguments for Alternate Assessments Scott Marion Center for Assessment Eighth Annual MARCES Conference University of Maryland.

Similar presentations


Presentation on theme: "1 An Introduction to Validity Arguments for Alternate Assessments Scott Marion Center for Assessment Eighth Annual MARCES Conference University of Maryland."— Presentation transcript:

1 1 An Introduction to Validity Arguments for Alternate Assessments Scott Marion Center for Assessment Eighth Annual MARCES Conference University of Maryland October 11-12, 2007

2 Marion. Center for Assessment. MARCES 2007 2 Overview A little validity background Creating and evaluating a validity argument…or translating Kane (and others) to AA-AAS –Can we make it practical? A focus on validity in technical documentation

3 Marion. Center for Assessment. MARCES 2007 3 Validation is “a lengthy, even endless process” (Cronbach, 1989, p.151) Good for consultants, but not so great for state folks and contractors Are you nervous yet….

4 Marion. Center for Assessment. MARCES 2007 4 Validity Should be Central We argue that the purpose of the technical documentation is to provide data to support or refute the validity of the inferences from the alternate assessments at both the student and program level

5 Marion. Center for Assessment. MARCES 2007 5 Unified Conception of Validity Drawing on the work of Cronbach, Messick, Shepard, and Kane the proposed evaluation of technical quality is built around a unified conception of validity –centered on the inferences related to the construct including significant attention to the social consequences of the assessment

6 Marion. Center for Assessment. MARCES 2007 6 But what is a validity argument and how do we evaluate the validity of our inferences?

7 Marion. Center for Assessment. MARCES 2007 7 A little history Kane traces the history of validity theory from the criterion through the content model to the construct model. It is worth stopping briefly to discuss the content model, because that appears to be where many still appear to operate.

8 Marion. Center for Assessment. MARCES 2007 8 “The content model interprets test scores based on a sample of performances in some area of activity as an estimate of overall level of skill in that activity.” The sample of items/tasks and observed performances must be: –representative of the domain, –evaluated appropriately and fairly, and –part of a large enough sample So, this sounds good, right?

9 Marion. Center for Assessment. MARCES 2007 9 Concerns with the content model “Messick (1989) argued that content-based validity evidence does not involve test scores or the performances on which the scores are based and therefore cannot be used to justify conclusions about the interpretation of test scores.” (p. 17) –Huh? More simply…content evidence is a matching exercise and doesn’t really help us get at the interpretations we make from scores Is it useful? Sure, but with the intense focus on alignment these days, content evidence appears to be privileged compared with trying to create arguments for the meaning of test scores

10 Marion. Center for Assessment. MARCES 2007 10 The Construct Model We can trace this evolution from Cronbach and Meehl (1955) through Loevinger (1957) to Cronbach (1971) and culminating in Messick 1989) –Focused attention on the many factors associated with the interpretations and uses of test scores (and not simply with correlations) –Emphasized the important effect of assumptions in score interpretations and the need to check these assumptions –Allowed for the possibility of alternative explanations for test scores—in fact, this model even encouraged falsification

11 Marion. Center for Assessment. MARCES 2007 11 Limitations of the Construct Model Does not provide clear guidance for the validation of a test score interpretation and/or use Did not help evaluators prioritize validity studies –If, as Anastasi (1986) noted, “almost any information gathered in the process of developing or using a test is relevant to its validity (p. 3),” where should one start and how do you know when you’re done or are you ever done?

12 Marion. Center for Assessment. MARCES 2007 12 Transitioning to argument… The call for careful examination of alternative explanations within the construct model is helpful for directing a program of validity research

13 Marion. Center for Assessment. MARCES 2007 13 Kane’s argument-based framework “…assumes that the proposed interpretations and uses will be explicitly stated as an argument, or network of inferences and supporting assumptions, leading from observations to the conclusions and decisions. Validation involves an appraisal of the coherence of this argument and of the plausibility of its inferences and assumptions (Kane, 2006, p. 17).” Sounds easy, right…

14 Marion. Center for Assessment. MARCES 2007 14 Two Types of Arguments An interpretative argument specifies the proposed interpretations and uses of test results by laying out the network of inferences and assumptions leading to the observed performances to the conclusions and decisions based on the performances The validity argument provides an evaluation of the interpretative argument (Kane, 2006)

15 Marion. Center for Assessment. MARCES 2007 15 Kane’s approach provides a more pragmatic approach to validation, “…involving the specification of proposed interpretations and uses, the development of a measurement procedure that is consistent with this proposal, and a critical evaluation of the coherence of the proposal and the plausibility of its inferences and assumptions.” The challenge is that most assessments do not start from an explicit attention to validity in the design phase

16 Marion. Center for Assessment. MARCES 2007 16 The Interpretative Argument Essentially a mini-theory—the interpretative argument provides a framework for interpretation and use of test scores Like theory, the interpretative argument guides the data collection and methods and most importantly, theories are falsifiable as we critically evaluate the evidence and arguments

17 Marion. Center for Assessment. MARCES 2007 17 Two stages of the interpretative argument Development stage—focus on development of measurement tools and procedures as well as the corresponding interpretative argument –An appropriate confirmationist bias in this stage since the developers (state and contractors) are trying to make the program the best it can be Appraisal stage—focus on critical evaluation of the interpretative argument –Should be more neutral and “arms-length” to provide a more convincing evaluation of the proposed interpretations and uses “Falsification, obviously, is something we prefer to do unto the constructions of others” (Cronbach, 1989, p. 153)

18 Marion. Center for Assessment. MARCES 2007 18 Interpretative argument “Difficulty in specifying an interpretative argument…may indicate a fundamental problem. If it is not possible to come up with a test plan and plausible rational for a proposed interpretation and use, it is not likely that this interpretation and use will be considered valid” (Kane, 2006, p. 26). Think of the interpretative argument as a series of “if-then” statements… –E.g., if the student performs the task in a certain way, then the observed score should have a certain value

19 Marion. Center for Assessment. MARCES 2007 19 Criteria for Evaluating Interpretative Arguments Clarity—should be clearly stated as a framework for validation. Inferences and warrants specified in enough detail to make proposed claims explicit. Coherence—assuming the individual inferences are plausible, the network of inferences leading from the observations to conclusions and decisions make sense Plausibility—particularly of assumptions, are judged in terms of all the evidence for and against them

20 Marion. Center for Assessment. MARCES 2007 20 One of the most effective challenges to interpretative arguments (or scientific theories) is to propose and substantiate an alternative argument that is more plausible –With AA-AAS we have to seriously consider and challenge ourselves with competing alternative explanations for test scores, for example… “higher scores on our state’s AA-AAS reflects greater learning of the content frameworks” OR “higher scores on our state’s AA-AAS reflects higher levels of student functioning”

21 Marion. Center for Assessment. MARCES 2007 21 Categories of interpretative arguments (Kane, 2006) Trait interpretations Theory-based interpretations Qualitative interpretations Decision procedures Like scientific theories, the specific type of interpretative argument for test-based inferences guides models, data collection, assumptions, analyses, and claims

22 Marion. Center for Assessment. MARCES 2007 22 Decision Procedures Evaluating a decision procedure requires an evaluation of values and consequences “To evaluate a testing program as an instrument of policy [e.g., AA-AAS under NCLB], it is necessary to evaluate its consequences” (Kane, 2006, p.53) Therefore, values inherent in the testing program must be made explicit and the consequences of the decisions as a result of test scores must be evaluated!

23 Marion. Center for Assessment. MARCES 2007 23 Prioritizing and Focusing Shepard (1993) advocated a straightforward means to prioritize validity questions. Using an evaluation framework, she proposed that validity studies be organized in response to the questions: –What does the testing practice claim to do; –What are the arguments for and against the intended aims of the test; and –What does the test do in the system other than what it claims, for good or bad? (Shepard, 1993, p. 429). The questions are directed to concerns about the construct, relevance, interpretation, and social consequences, respectively.

24 OBSERVATION INTERPRETATION COGNITION  Student Population  Academic Content  Theory of Learning  Assessment System  Test Development  Administration  Scoring  Reporting  Alignment  Item Analysis/DIF/Bias  Measurement Error  Scaling and Equating  Standard Setting VALIDITY EVALUATION  Empirical Evidence  Theory and Logic (argument)  Consequential Features A heuristic to help organize and focus the validity evaluation (Marion, Quenemoen, & Kearns, 2006)

25 Marion. Center for Assessment. MARCES 2007 25 Synthesizing and Integrating Haertel (1999) reminded us that the individual pieces of evidence (typically presented in separate chapters of technical documents) do not make the assessment system valid or not, it is only by synthesizing this evidence in order to evaluate the interpretative argument can we judge the validity of the assessment program.

26 Marion. Center for Assessment. MARCES 2007 26 NHEAI/NAAC Technical Documentation The “Nuts and Bolts” The Validity Evaluation The Stakeholder Summary The Transition Document

27 Marion. Center for Assessment. MARCES 2007 27 The Validity Evaluation Author:Independent contractor with considerable input from state DOE Audience:State policy makers, state DOE, district assessment and special education directors, state TAC members, special education teachers, and other key stakeholders. This also will contribute to the legal defensibility of the system. Notes:This will be a dynamic volume where new evidence is collected and evaluated over time.

28 Marion. Center for Assessment. MARCES 2007 28 Table of Contents I. Overview of the Assessment System II. Who are the students? III. What is the content? IV. Introduction of the Validity Framework and Argument V. Empirical Evidence VI. Evaluating the Validity Argument

29 Marion. Center for Assessment. MARCES 2007 29 Chapter VI: The Validity Evaluation A. Revisiting the interpretative argument  Logical/theoretical relationships among the content, students, learning, and assessment—revisiting the assessment triangle B. The specific validity evaluation questions addressed in this volume C. Synthesizing and weighing the various sources of evidence 1.Arguments for the validity of the system 2.Arguments against the validity of the system D. An overall judgment about the defensibility of inferences from the scores of the AA-AAS in the context of specific uses and purposes


Download ppt "1 An Introduction to Validity Arguments for Alternate Assessments Scott Marion Center for Assessment Eighth Annual MARCES Conference University of Maryland."

Similar presentations


Ads by Google