1 An Introduction to Validity Arguments for Alternate Assessments Scott Marion Center for Assessment Eighth Annual MARCES Conference University of Maryland.

Slides:

Advertisements

Similar presentations

Performance Assessment

Advertisements

National Accessible Reading Assessment Projects Defining Reading Proficiency for Accessible Large Scale Assessments Principles and Issues Paper American.

School Based Assessment and Reporting Unit Curriculum Directorate

Elliott / October Understanding the Construct to be Assessed Stephen N. Elliott, PhD Learning Science Institute & Dept. of Special Education Vanderbilt.

RESEARCH CLINIC SESSION 1 Committed Officials Pursuing Excellence in Research 27 June 2013.

Action Research Not traditional educational research often research tests theory not practical Teacher research in classrooms and/or schools/districts.

The Network of Dynamic Learning Communities C 107 F N Increasing Rigor February 5, 2011.

1 Content-based Interpretations of Test Scores Michael Kane National Conference of Bar Examiners Maryland Assessment Research Center for Education Success.

Validity in Action: State Assessment Validity Evidence for Compliance with NCLB William D. Schafer, Joyce Wang, and Vivian Wang University of Maryland.

Research Methodology For reader assistance, have an introductory paragraph in which attention is given to the organization of the section in relation to.

Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.

ASSESSMENT IN HIAs Elizabeth J. Fuller, DrPH, MSPH Georgia Health Policy Center.

Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.

Reviewing and Critiquing Research

Assessment: Reliability, Validity, and Absence of bias

Standards for Qualitative Research in Education

Chapter 4 Validity.

MCAS-Alt: Alternate Assessment in Massachusetts Technical Challenges and Approaches to Validity Daniel J. Wiener, Administrator of Inclusive Assessment.

CLOSING THOUGHTS The long and winding road of alternate assessments Where we started, where we are now, and the road ahead! Rachel F. Quenemoen, Senior.

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Consequential Validity Inclusive Assessment Seminar Elizabeth.

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Standard Setting Inclusive Assessment Seminar Marianne.

1 Some Key Points for Test Evaluators and Developers Scott Marion Center for Assessment Eighth Annual MARCES Conference University of Maryland October.

PPA 502 – Program Evaluation

Consistency/Reliability

Assessment Population and the Validity Evaluation

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments 1 Introduction to Comparability Inclusive Assessment Seminar.

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Alignment Inclusive Assessment Seminar Brian Gong Claudia.

Classroom Assessment A Practical Guide for Educators by Craig A

Validity Lecture Overview Overview of the concept Different types of validity Threats to validity and strategies for handling them Examples of validity.

Understanding Validity for Teachers

Standards and Guidelines for Quality Assurance in the European

Evaluation methods and tools (Focus on delivery mechanism) Jela Tvrdonova, 2014.

Academic Essays & Report Writing

How to Write a Critical Review of Research Articles

Writing research proposal/synopsis

Evaluating a Research Report

Including Quality Assurance Within The Theory of Action Presented to: CCSSO 2012 National Conference on Student Assessment June 27, 2012.

1 Issues in Assessment in Higher Education: Science Higher Education Forum on Scientific Competencies Medellin-Colombia Nov 2-4, 2005 Dr Hans Wagemaker.

CCSSO Criteria for High-Quality Assessments Technical Issues and Practical Application of Assessment Quality Criteria.

Teaching Today: An Introduction to Education 8th edition

Developing Assessments for and of Deeper Learning [Day 2b-afternoon session] Santa Clara County Office of Education June 25, 2014 Karin K. Hess, Ed.D.

RE - SEARCH ---- CAREFUL SEARCH OR ENQUIRY INTO SUBJECT TO DISCOVER FACTS OR INVESTIGATE.

SCOTT MARION CENTER FOR ASSESSMENT CCSSO JUNE 22, 2010 Ensuring, Evaluating, & Documenting Comparability of AA-AAS Scores.

Unpacking the Elements of Scientific Reasoning Keisha Varma, Patricia Ross, Frances Lawrenz, Gill Roehrig, Douglas Huffman, Leah McGuire, Ying-Chih Chen,

1 Evaluating the Validity of State Accountability Systems: Examples of Evaluation Studies Scott Marion Center for Assessment Presented at AERA, Division.

Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.

Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.

Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”

McGraw-Hill/Irwin © 2012 The McGraw-Hill Companies, Inc. All rights reserved. Obtaining Valid and Reliable Classroom Evidence Chapter 4:

The Development and Validation of the Evaluation Involvement Scale for Use in Multi-site Evaluations Stacie A. ToalUniversity of Minnesota Why Validate.

Criterion-Referenced Testing and Curriculum-Based Assessment EDPI 344.

What does exam validity really mean? Andrew Martin Purdue Pesticide Programs.

Validity Evaluation NCSA Presentation NAAC/NCIEA GSEG CONSTORIA.

Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.

National Science Education Standards. Outline what students need to know, understand, and be able to do to be scientifically literate at different grade.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 1 Research: An Overview.

Barry O’Sullivan | British Council Re-conceptualising Validity in High Stakes Testing Invited Seminar February 11 th 2015 University.

Critiquing Quantitative Research.  A critical appraisal is careful evaluation of all aspects of a research study in order to assess the merits, limitations,

Consciousness & Causality Revision Lecture. Questions (open or closed?) Is there good evidence for learning while sleeping? Describe and discuss dualist.

CRITICALLY APPRAISING EVIDENCE Lisa Broughton, PhD, RN, CCRN.

Academic Writing Fatima AlShaikh. A duty that you are assigned to perform or a task that is assigned or undertaken. For example: Research papers (most.

EVALUATING EPP-CREATED ASSESSMENTS

VALIDITY by Barli Tambunan/

Validating Interim Assessments

QRM, IRB, and QRF Differences Explained

Assessment Population and the Validity Evaluation

Style You need to demonstrate knowledge and understanding beyond undergraduate level and should also reach a level of scope and depth beyond that taught.

William D. Schafer, Joyce Wang, and Vivian Wang University of Maryland

AACC Mini Conference June 8-9, 2011

Presentation transcript:

1 An Introduction to Validity Arguments for Alternate Assessments Scott Marion Center for Assessment Eighth Annual MARCES Conference University of Maryland October 11-12, 2007

Marion. Center for Assessment. MARCES Overview A little validity background Creating and evaluating a validity argument…or translating Kane (and others) to AA-AAS –Can we make it practical? A focus on validity in technical documentation

Marion. Center for Assessment. MARCES Validation is “a lengthy, even endless process” (Cronbach, 1989, p.151) Good for consultants, but not so great for state folks and contractors Are you nervous yet….

Marion. Center for Assessment. MARCES Validity Should be Central We argue that the purpose of the technical documentation is to provide data to support or refute the validity of the inferences from the alternate assessments at both the student and program level

Marion. Center for Assessment. MARCES Unified Conception of Validity Drawing on the work of Cronbach, Messick, Shepard, and Kane the proposed evaluation of technical quality is built around a unified conception of validity –centered on the inferences related to the construct including significant attention to the social consequences of the assessment

Marion. Center for Assessment. MARCES But what is a validity argument and how do we evaluate the validity of our inferences?

Marion. Center for Assessment. MARCES A little history Kane traces the history of validity theory from the criterion through the content model to the construct model. It is worth stopping briefly to discuss the content model, because that appears to be where many still appear to operate.

Marion. Center for Assessment. MARCES “The content model interprets test scores based on a sample of performances in some area of activity as an estimate of overall level of skill in that activity.” The sample of items/tasks and observed performances must be: –representative of the domain, –evaluated appropriately and fairly, and –part of a large enough sample So, this sounds good, right?

Marion. Center for Assessment. MARCES Concerns with the content model “Messick (1989) argued that content-based validity evidence does not involve test scores or the performances on which the scores are based and therefore cannot be used to justify conclusions about the interpretation of test scores.” (p. 17) –Huh? More simply…content evidence is a matching exercise and doesn’t really help us get at the interpretations we make from scores Is it useful? Sure, but with the intense focus on alignment these days, content evidence appears to be privileged compared with trying to create arguments for the meaning of test scores

Marion. Center for Assessment. MARCES The Construct Model We can trace this evolution from Cronbach and Meehl (1955) through Loevinger (1957) to Cronbach (1971) and culminating in Messick 1989) –Focused attention on the many factors associated with the interpretations and uses of test scores (and not simply with correlations) –Emphasized the important effect of assumptions in score interpretations and the need to check these assumptions –Allowed for the possibility of alternative explanations for test scores—in fact, this model even encouraged falsification

Marion. Center for Assessment. MARCES Limitations of the Construct Model Does not provide clear guidance for the validation of a test score interpretation and/or use Did not help evaluators prioritize validity studies –If, as Anastasi (1986) noted, “almost any information gathered in the process of developing or using a test is relevant to its validity (p. 3),” where should one start and how do you know when you’re done or are you ever done?

Marion. Center for Assessment. MARCES Transitioning to argument… The call for careful examination of alternative explanations within the construct model is helpful for directing a program of validity research

Marion. Center for Assessment. MARCES Kane’s argument-based framework “…assumes that the proposed interpretations and uses will be explicitly stated as an argument, or network of inferences and supporting assumptions, leading from observations to the conclusions and decisions. Validation involves an appraisal of the coherence of this argument and of the plausibility of its inferences and assumptions (Kane, 2006, p. 17).” Sounds easy, right…

Marion. Center for Assessment. MARCES Two Types of Arguments An interpretative argument specifies the proposed interpretations and uses of test results by laying out the network of inferences and assumptions leading to the observed performances to the conclusions and decisions based on the performances The validity argument provides an evaluation of the interpretative argument (Kane, 2006)

Marion. Center for Assessment. MARCES Kane’s approach provides a more pragmatic approach to validation, “…involving the specification of proposed interpretations and uses, the development of a measurement procedure that is consistent with this proposal, and a critical evaluation of the coherence of the proposal and the plausibility of its inferences and assumptions.” The challenge is that most assessments do not start from an explicit attention to validity in the design phase

Marion. Center for Assessment. MARCES The Interpretative Argument Essentially a mini-theory—the interpretative argument provides a framework for interpretation and use of test scores Like theory, the interpretative argument guides the data collection and methods and most importantly, theories are falsifiable as we critically evaluate the evidence and arguments

Marion. Center for Assessment. MARCES Two stages of the interpretative argument Development stage—focus on development of measurement tools and procedures as well as the corresponding interpretative argument –An appropriate confirmationist bias in this stage since the developers (state and contractors) are trying to make the program the best it can be Appraisal stage—focus on critical evaluation of the interpretative argument –Should be more neutral and “arms-length” to provide a more convincing evaluation of the proposed interpretations and uses “Falsification, obviously, is something we prefer to do unto the constructions of others” (Cronbach, 1989, p. 153)

Marion. Center for Assessment. MARCES Interpretative argument “Difficulty in specifying an interpretative argument…may indicate a fundamental problem. If it is not possible to come up with a test plan and plausible rational for a proposed interpretation and use, it is not likely that this interpretation and use will be considered valid” (Kane, 2006, p. 26). Think of the interpretative argument as a series of “if-then” statements… –E.g., if the student performs the task in a certain way, then the observed score should have a certain value

Marion. Center for Assessment. MARCES Criteria for Evaluating Interpretative Arguments Clarity—should be clearly stated as a framework for validation. Inferences and warrants specified in enough detail to make proposed claims explicit. Coherence—assuming the individual inferences are plausible, the network of inferences leading from the observations to conclusions and decisions make sense Plausibility—particularly of assumptions, are judged in terms of all the evidence for and against them

Marion. Center for Assessment. MARCES One of the most effective challenges to interpretative arguments (or scientific theories) is to propose and substantiate an alternative argument that is more plausible –With AA-AAS we have to seriously consider and challenge ourselves with competing alternative explanations for test scores, for example… “higher scores on our state’s AA-AAS reflects greater learning of the content frameworks” OR “higher scores on our state’s AA-AAS reflects higher levels of student functioning”

Marion. Center for Assessment. MARCES Categories of interpretative arguments (Kane, 2006) Trait interpretations Theory-based interpretations Qualitative interpretations Decision procedures Like scientific theories, the specific type of interpretative argument for test-based inferences guides models, data collection, assumptions, analyses, and claims

Marion. Center for Assessment. MARCES Decision Procedures Evaluating a decision procedure requires an evaluation of values and consequences “To evaluate a testing program as an instrument of policy [e.g., AA-AAS under NCLB], it is necessary to evaluate its consequences” (Kane, 2006, p.53) Therefore, values inherent in the testing program must be made explicit and the consequences of the decisions as a result of test scores must be evaluated!

Marion. Center for Assessment. MARCES Prioritizing and Focusing Shepard (1993) advocated a straightforward means to prioritize validity questions. Using an evaluation framework, she proposed that validity studies be organized in response to the questions: –What does the testing practice claim to do; –What are the arguments for and against the intended aims of the test; and –What does the test do in the system other than what it claims, for good or bad? (Shepard, 1993, p. 429). The questions are directed to concerns about the construct, relevance, interpretation, and social consequences, respectively.

OBSERVATION INTERPRETATION COGNITION  Student Population  Academic Content  Theory of Learning  Assessment System  Test Development  Administration  Scoring  Reporting  Alignment  Item Analysis/DIF/Bias  Measurement Error  Scaling and Equating  Standard Setting VALIDITY EVALUATION  Empirical Evidence  Theory and Logic (argument)  Consequential Features A heuristic to help organize and focus the validity evaluation (Marion, Quenemoen, & Kearns, 2006)

Marion. Center for Assessment. MARCES Synthesizing and Integrating Haertel (1999) reminded us that the individual pieces of evidence (typically presented in separate chapters of technical documents) do not make the assessment system valid or not, it is only by synthesizing this evidence in order to evaluate the interpretative argument can we judge the validity of the assessment program.

Marion. Center for Assessment. MARCES NHEAI/NAAC Technical Documentation The “Nuts and Bolts” The Validity Evaluation The Stakeholder Summary The Transition Document

Marion. Center for Assessment. MARCES The Validity Evaluation Author:Independent contractor with considerable input from state DOE Audience:State policy makers, state DOE, district assessment and special education directors, state TAC members, special education teachers, and other key stakeholders. This also will contribute to the legal defensibility of the system. Notes:This will be a dynamic volume where new evidence is collected and evaluated over time.

Marion. Center for Assessment. MARCES Table of Contents I. Overview of the Assessment System II. Who are the students? III. What is the content? IV. Introduction of the Validity Framework and Argument V. Empirical Evidence VI. Evaluating the Validity Argument

Marion. Center for Assessment. MARCES Chapter VI: The Validity Evaluation A. Revisiting the interpretative argument  Logical/theoretical relationships among the content, students, learning, and assessment—revisiting the assessment triangle B. The specific validity evaluation questions addressed in this volume C. Synthesizing and weighing the various sources of evidence 1.Arguments for the validity of the system 2.Arguments against the validity of the system D. An overall judgment about the defensibility of inferences from the scores of the AA-AAS in the context of specific uses and purposes