MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

Slides:

Advertisements

Similar presentations

Using Growth Models to improve quality of school accountability systems October 22, 2010.

Advertisements

Comparing State Reading and Math Performance Standards Using NAEP Don McLaughlin Victor Bandeira de Mello National Conference on Large-Scale Assessment.

Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.

Student Learning Targets (SLT) You Can Do This! Getting Ready for the School Year.

Designs to Estimate Impacts of MSP Projects with Confidence. Ellen Bobronnikov March 29, 2010.

Reliability, the Properties of Random Errors, and Composite Scores.

PRESENTATION AT THE 12 TH ANNUAL MARYLAND ASSESSMENT CONFERENCE COLLEGE PARK, MD OCTOBER 18, 2012 JOSEPH A. MARTINEAU JI ZENG MICHIGAN DEPARTMENT OF EDUCATION.

MCAS-Alt: Alternate Assessment in Massachusetts Technical Challenges and Approaches to Validity Daniel J. Wiener, Administrator of Inclusive Assessment.

When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.

Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.

Estimating Growth when Content Specifications Change: A Multidimensional IRT Approach Mark D. Reckase Tianli Li Michigan State University.

Meeting NCLB Act: Students with Disabilities Who Are Caught in the Gap Martha Thurlow Ross Moen Jane Minnema National Center on Educational Outcomes

Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.

Measurement Challenges in Growth and Value Added Models Joseph A. Martineau Executive Director of Assessment & Accountability Michigan Department of Education.

Classroom Assessment A Practical Guide for Educators by Craig A

Introduction to GREAT for ELs Office of Student Assessment Wisconsin Department of Public Instruction (608)

Common Core State Standards & Assessment Update The Next Step in Preparing Michigan’s Students for Career and College MERA Spring Conference May 17, 2011.

MDE / OEAA 1 Growing Pains: The State of the Art in Value-Added Modeling Presentation on March 2, 2005 to Michigan School Testing Conference By Joseph.

ASSESSMENT FOR BETTER LEARNING USING NAPLAN DATA Presented by Philip Holmes-Smith School Research Evaluation and Measurement Services.

NCAASE Work with NC Dataset: Initial Analyses for Students with Disabilities Ann Schulte NCAASE Co-PI

ASSESSMENT ACCOMMODATIONS How to Select, Administer, and Evaluate Use of Accommodations for Instruction and Assessment of Students with Disabilities Ohio.

Introduction to Adequate Yearly Progress (AYP) Michigan Department of Education Office of Psychometrics, Accountability, Research, & Evaluation Summer.

Office of Institutional Research, Planning and Assessment January 24, 2011 UNDERSTANDING THE DIAGNOSTIC GUIDE.

1 Comments on: “New Research on Training, Growing and Evaluating Teachers” 6 th Annual CALDER Conference February 21, 2013.

Student Engagement Survey Results and Analysis June 2011.

Mathematics Indicators and Goals. Math Tier II Indicator Indicator 1.8: All junior high students will meet or exceed standards and be identified as proficient.

IOWA Department of Education Substantial Deficiency: Fall, Winter, Spring.

Fall Testing Update David Abrams Assistant Commissioner for Standards, Assessment, & Reporting Middle Level Liaisons & Support Schools Network November.

Assessing Students With Disabilities: IDEA and NCLB Working Together.

A Closer Look at Adequate Yearly Progress (AYP) Michigan Department of Education Office of Educational Assessment and Accountability Paul Bielawski Conference.

Department of Research and Planning Leadership Meeting January 16, 2013 ASSESSMENT CORRELATIONS.

Measured Progress ©2012 Student Growth in the Non-Tested Subjects and Grades: Options for Teacher Evaluators Elena Diaz-Bilello, Center for Assessment.

Issues in Assessment Design, Vertical Alignment, and Data Management : Working with Growth Models Pete Goldschmidt UCLA Graduate School of Education &

Fall 2007 Assessment & Accountability Update Joseph A. Martineau, Interim Director Office of General Assessment & Accountability Michigan Department of.

1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.

Fall 2007 MEAP Reporting 2007 OEAA Conference Jim Griffiths – Manager, Assessment Administration & Reporting Sue Peterman - Department Analyst, MEAP.

MEAP / MME New Cut Scores Gill Elementary February 2012.

“Value added” measures of teacher quality: use and policy validity Sean P. Corcoran New York University NYU Abu Dhabi Conference January 22, 2009.

RTI and District Assessments Jack B. Monpas-Huber, Ph.D. Director of Assessment and Student Information Anzara Miller RTI Coordinator / Professional Development.

Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.

Pearson Copyright 2010 Some Perspectives on CAT for K-12 Assessments Denny Way, Ph.D. Presented at the 2010 National Conference on Student Assessment June.

1 New York State Growth Model for Educator Evaluation 2011–12 July 2012 PRESENTATION as of 7/9/12.

Successfully “Translating” ELPA Results Session #25 Assessment and Accountability Conference 2008.

Scale Scoring A New Format for Provincial Assessment Reports.

Michigan School Report Card Update Michigan Department of Education.

DVAS Training Find out how Battelle for Kids can help Presentation Outcomes Learn rationale for value-added progress measures Receive conceptual.

NCLB / Education YES! What’s New for Students With Disabilities? Michigan Department of Education.

©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

TM Peoria Public Schools NWEA – Measure of Academic Progress

Adequate Yearly Progress (AYP) for Special Populations Michigan Department of Education Office of Educational Assessment and Accountability Paul Bielawski.

Application of Growth and Value-Added Models to WASL A Summary of Issues, Developments and Plans for Washington WERA Symposium on Achievement Growth Models.

1 Mississippi Statewide Accountability System Adequate Yearly Progress Model Improving Mississippi Schools Conference June 11-13, 2003 Mississippi Department.

School and District Accountability Reports Implementing No Child Left Behind (NCLB) The New York State Education Department March 2004.

Top to Bottom and Persistently Lowest Achieving Schools Lists Federally Approved Requirements for Identifying Persistently Lowest Achieving Schools August.

Welcome. Outcomes  Learn to analyze growth as a catalyst for change  Understand the process to evaluate the effectiveness of instructional interventions.

Lenoir County Public Schools New North Carolina Principal Evaluation Process 2008.

A GUIDE FOR CANTON PUBLIC SCHOOL DISTRICT’S PARENTS AND STAKEHOLDERS The Mississippi Literacy-Based Promotion Act

Department of Curriculum and Instruction Considerations for Choosing Mathematics Progress Monitoring Measures from K-12 Anne Foegen, Ph.D. Pursuing the.

Florida Algebra I EOC Value-Added Model June 2013.

STAR Reading. Purpose Periodic progress monitoring assessment Quick and accurate estimates of reading comprehension Assessment of reading relative to.

Understanding the Results Ye Tong, Ph.D. Pearson.

Every Student Succeeds Act (ESSA) Accountability

Review, Revise and Amend from Procedures for State Board Policy 74

CHAPTER 7 LINEAR RELATIONSHIPS

Student Growth Measurements and Accountability

Michigan’s Lessons and Uses of the CTEAG

This presentation document has been prepared by Vault Intelligence Limited (“Vault") and is intended for off line demonstration, presentation and educational.

Office of Education Improvement and Innovation

AACC Mini Conference June 8-9, 2011

Virginia Board of Education’s

Presentation transcript:

MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference on Large-Scale Assessment By Joseph A. Martineau, Psychometrician Office of Educational Assessment & Accountability (OEAA) Michigan Department of Education (MDE)

MDE / OEAA 2 Introduction Measurement of growth or “progress” –Growth models Measurement of educators’ contributions to student growth or progress –Value Added Models (VAM) Both require vertical scales that –Measure the “same thing” along the entire scale –Have the same meaning along the entire scale

MDE / OEAA 3 Distortions in studies of growth Using traditional vertical scales to measure growth can result in the following distortions: –Identification of growth trajectories with little resemblance to true growth trajectories –Attribution of effects on growth to effects on initial status and vice versa –Identification of false effects on initial status or growth –Failure to detect true effects on initial status or growth –Identification of effective interventions as harmful and vice versa

MDE / OEAA 4 Graphical demonstration of one kind of distortion in growth models Grade 5 scale mostly measures differences in number sense Grade 6 scale mostly measures differences in algebra

MDE / OEAA 5 Graphical demonstration of one kind of distortion in growth models Vertically “equated, unidimensional” scales have to bend to accommodate both the grade-5 and grade-6 content mixes This can come out as fitting a unidimensional model if number sense and algebra scores are strongly correlated, but strong correlations do not alleviate distortions in measures of growth

MDE / OEAA 6 Graphical demonstration of one kind of distortion in growth models Any given student’s true achievement may not lie near the vertical scale, so the vertical scale may be incapable of accurately representing student achievement

MDE / OEAA 7 Graphical demonstration of one kind of distortion in growth models Therefore, the true multidimensional achievement of a student becomes projected onto the “unidimensional” vertical scale

MDE / OEAA 8 Graphical demonstration of one kind of distortion in growth models The nearest point on the “unidimensional” vertical scale is the most likely estimate of “unidimensional” student ability

MDE / OEAA 9 Graphical demonstration of one kind of distortion in growth models The true measure of growth and the “unidimensional” measure of growth are remarkably different The distortion can be overestimation of growth (as shown here) or under- estimation of growth This can have remarkable effects on studies of growth

MDE / OEAA 10 Distortions in studies of value added Using traditional vertical scales to measure educators contributions to student growth can result in the following distortions: –Mis-estimation of educator effectiveness simply because educators serve students whose growth is occurring outside the range measured well by the test –Attribution of prior educators’ effectiveness to later educators One promise of value added is to cease to hold educators accountable for prior experiences of students This distortion betrays that promise

MDE / OEAA 11 Graphical demonstration of one kind of distortion in value added models Grade 5 scale mostly measures differences in number sense Grade 6 scale mostly measures differences in algebra Scale has to “bend” to accommodate both tests’ content

MDE / OEAA 12 Graphical demonstration of one kind of distortion in value added models True average statewide scores are likely to lie close to (but not on) the vertical scale

MDE / OEAA 13 Graphical demonstration of one kind of distortion in value added models Individual school (or teacher) average true scores are likely to lie farther off the vertical scale than statewide averages Individual school (or teacher) average true scores are likely to be quite different than the statewide averages

MDE / OEAA 14 Graphical demonstration of one kind of distortion in value added models In this carefully chosen scenario, both the statewide averages and the average scores of a given school project onto the vertical scale at exactly the same place

MDE / OEAA 15 Graphical demonstration of one kind of distortion in value added models Even though statewide and school averages are very different in two dimen- sions, they are estimated to be identical on the “unidimensional” score scale.

MDE / OEAA 16 Graphical demonstration of one kind of distortion in value added models The average state growth is overestimated, the average school-X growth is underestimated, such that both are equal In a vertical-scale-based value added model, this exceptionally effective school would be identified as average Overestimation of individual school effectiveness can also result from the distortions

MDE / OEAA 17 Graphical Demonstration Table 1 on page 13 of the document Interpretation –Effect size of 0.00 is equivalent to 1 part truth, no parts distortion –Effect size of 0.25 is equivalent to 4 parts truth, 1 part distortion –Effect size of 1.00 is equivalent to the results of VAM being 1 part truth, 1 part distortion. –Effect size of 2.00 is equivalent to 1 part truth, 2 parts distortion

MDE / OEAA 18 Alternatives to Traditional Vertical Scales Given that using vertical scales in growth-based statistical models results in distorted outcomes, where do we go from here? Michigan has investigated several alternatives –Vertically Moderated Standard Setting –Domain-Referenced Measurement of Growth –Link only adjacent grades –Provided stronger out-of-level content representation as vertical linking items Matrix sampling Large number of forms All of these are important to do, but are insufficient to resolve the distortions arising from using vertical scales in growth-based models

MDE / OEAA 19 Alternatives to Traditional Vertical Scales Michigan is investigating other alternatives –Additional testing Fall and Spring More than twice per year Eliminates summer loss/gain problem Completely eliminates distortions!

MDE / OEAA 20 Alternatives to Traditional Vertical Scales Michigan is investigating other alternatives –Additional testing Fall and Spring More than twice per year Eliminates summer loss/gain problem Completely eliminates distortions! –Yeah, whatever!

MDE / OEAA 21 Alternatives to Traditional Vertical Scales Michigan is investigating other alternatives –Supplement grade-level content with substantial quantities of out-of-level items Items like those on lower grade-level tests Items like those on higher grade-level tests Could be done either by P&P or CBT Implementing with CAT –Would require little additional testing because out-of- level items could inform the stopping rules –May not work with NCLB

MDE / OEAA 22 Alternatives to Traditional Vertical Scales Michigan is investigating other alternatives –Supplement grade-level content with substantial quantities of out-of-level items Provides for less precise estimates of growth, but they should at least be undistorted Administer items like those on lower and/or higher grade-level tests Could be done either by P&P or CBT Implementing with CAT –Would require little additional testing because out-of- level items could inform the stopping rules –May not work for NCLB because of on-grade-level requirements

MDE / OEAA 23 Alternatives to Traditional Vertical Scales Michigan is investigating other alternatives –More complex psychometric models Without changing the administration model, the only way to address the distortions is to change the psychometric model The psychometric model needs to acknowledge and exploit the multidimensional complexity of item response data Multidimensional models can be a liability as well –Public relations (complexity of the model) –Possibility for error (complexity of the model) –Turnaround time (intensity of the analysis) This area is promising as well as challenging

MDE / OEAA 24 Conclusion Growth-based statistical models using vertically scaled student achievement data are much further along than they were several years ago Growth-based statistical models using vertically scaled student achievement data are still not robust enough to support high-stakes use Either the test administration model or the psychometric model needs to reflect the complexity of the intended analyses No existing methods have been proven to allow for high-stakes use of growth-based statistical models, including Value Added Models

MDE / OEAA 25 Contact Information Joseph Martineau, Psychometrician Office of Educational Assessment & Accountability Michigan Department of Education P.O. Box Lansing, MI (517)