National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Robert L. Linn Paper prepared for The CRESST.

Slides:

Advertisements

Similar presentations

Using Growth Models to improve quality of school accountability systems October 22, 2010.

Advertisements

Presented to the State Board of Education August 22, 2012 Jonathan Wiens, PhD Office of Assessment and Information Services Oregon Department of Education.

Pennsylvania’s Continuous Improvement Process. Understanding AYP How much do you know about AYP?

Elementary/Secondary Education Act (1965) “No Child Left Behind” (2002) Adequacy Committee February 6,2008.

NCLB Basics From “What Parents of Students with Disabilities Need to Know & Do” National Center on Educational Outcomes University of Minnesota

Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Knowledge is Power Pitt County Schools Title I Workshop.

Franklin Public Schools MCAS Presentation November 27, 2012 Joyce Edwards Director of Instructional Services.

Determining Validity For Oklahoma’s Educational Accountability System Prepared for the American Educational Research Association (AERA) Oklahoma State.

Designing Content Targets for Alternate Assessments in Science: Reducing depth, breadth, and/or complexity Brian Gong Center for Assessment Web seminar.

Beyond the Classroom: The Use of Essential Skills for Remediation and Extension Christine Koch November 2008.

Robert L. Linn CRESST, University of Colorado at Boulder Paper presented at a symposium sponsored by the National Association of Test Directors entitled.

Robert L. Linn CRESST, University of Colorado at Boulder Paper presented at a symposium sponsored entitled “Accountability: Measurement and Value-Added.

The Special Education Leadership Training Project January, 2003 Mary Lynn Boscardin, Ph.D. Associate Professor Preston C. Green, III, Ed.D., J.D., Associate.

Assessment & Accountability TEP 128A March 7, 2006.

Accountability and Assessment: From “A Nation at Risk”  NCLB  Race to the Top.

Catherine Cross Maple, Ph.D. Deputy Secretary Learning and Accountability

Delaware’s Accountability Plan for Schools, Districts and the State Delaware Department of Education 6/23/04.

Grade 3-8 English Language Arts and Mathematics Results August 8, 2011.

Understanding Wisconsin’s New School Report Card.

NCLB AND VALUE-ADDED APPROACHES ECS State Leader Forum on Educational Accountability June 4, 2004 Stanley Rabinowitz, Ph.D. WestEd

Office of Institutional Research, Planning and Assessment January 24, 2011 UNDERSTANDING THE DIAGNOSTIC GUIDE.

High Stakes Testing EDU 330: Educational Psychology Daniel Moos.

Michigan’s Accountability Scorecards A Brief Introduction.

Florida’s Implementation of NCLB John L. Winn Deputy Commissioner Florida Department of Education.

Fall Testing Update David Abrams Assistant Commissioner for Standards, Assessment, & Reporting Middle Level Liaisons & Support Schools Network November.

What is Title I ?  It is federal funding that is attached to NCLB/ESEA legislation  It is intended to help students who are falling behind.

A Closer Look at Adequate Yearly Progress (AYP) Michigan Department of Education Office of Educational Assessment and Accountability Paul Bielawski Conference.

Instruction, Teacher Evaluation and Value-Added Student Learning Minneapolis Public Schools November,

Presentation on The Elementary and Secondary Education Act “No Child Left Behind” Nicholas C. Donohue, Commissioner of Education New Hampshire Department.

1 Community Accountability Summit April History of Accountability Changes.

Agenda (5:00-6:30 PM): Introduction to Staff Title I Presentation PTA Information Classroom visits (two 30 minute rotations)

Understanding the TerraNova Test Testing Dates: May Kindergarten to Grade 2.

1 Watertown Public Schools Assessment Reports 2010 Ann Koufman-Frederick and Administrative Council School Committee Meetings Oct, Nov, Dec, 2010 Part.

November 2006 Copyright © 2006 Mississippi Department of Education 1 Where are We? Where do we want to be?

Ohio’s New Accountability System Ohio’s Response to No Child Left Behind (NCLB) a.k.a. Elementary & Secondary Education Act a.k.a. ESEA January 8, 2002.

Robert L. Linn Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder CRESST Conference, UCLA September 9,

Jackson County School District A overview of test scores and cumulative data from 2001 – 2006 relative to the following: Mississippi Curriculum Test Writing.

School Accountability in Delaware for the School Year August 3, 2005.

MEAP / MME New Cut Scores Gill Elementary February 2012.

“Value added” measures of teacher quality: use and policy validity Sean P. Corcoran New York University NYU Abu Dhabi Conference January 22, 2009.

July 2 nd, 2008 Austin, Texas Chrys Dougherty Senior Research Scientist National Center for Educational Achievement Adequate Growth Models.

No Child Left Behind. HISTORY President Lyndon B. Johnson signs Elementary and Secondary Education Act, 1965 Title I and ESEA coordinated through Improving.

Capacity Development and School Reform Accountability The School District Of Palm Beach County Adequate Yearly Progress, Differentiated Accountability.

ESEA Federal Accountability System Overview 1. Federal Accountability System Adequate Yearly Progress – AYP defined by the Elementary and Secondary Education.

2012 MOASBO SPRING CONFERENCE Missouri Department of Elementary and Secondary Education 1 April 26, 2012.

Application of Growth and Value-Added Models to WASL A Summary of Issues, Developments and Plans for Washington WERA Symposium on Achievement Growth Models.

Georgia Milestones End of Grade (EOG) Assessment Grades 3, 4, and 5

On the horizon: State Accountability Systems U.S. Department of Education Office of Elementary and Secondary Education October 2002 Archived Information.

1 Accountability Systems.  Do RFEPs count in the EL subgroup for API?  How many “points” is a proficient score worth?  Does a passing score on the.

No Child Left Behind Impact on Gwinnett County Public Schools’ Students and Schools.

February 2016 Our School Report Cards and Accountability Determinations South Lewis Central School District.

University of Colorado at Boulder National Center for Research on Evaluation, Standards, and Student Testing Challenges for States and Schools in the No.

C R E S S T / CU University of Colorado at Boulder National Center for Research on Evaluation, Standards, and Student Testing Design Principles for Assessment.

C R E S S T / CU University of Colorado at Boulder National Center for Research on Evaluation, Standards, and Student Testing Measuring Adequate Yearly.

The READY Accountability Report: Growth and Performance of North Carolina Public Schools State Board of Education November 7, 2013.

2007 – 2008 Assessment and Accountability Report LVUSD Report to the Board September 23, 2008 Presented by Mary Schillinger, Assistant Superintendent Education.

Kansas Association of School Boards ESEA Flexibility Waiver KASB Briefing August 10, 2012.

A Close Look at Don’t Fail Idaho’s Student Achievement Message June 25, 2013 Bert Stoneberg, Ph.D. K-12 Research Idaho

Every Student Succeeds Act (ESSA) Accountability

State of Wisconsin School Report Cards Fall 2014 Results

Accountability in California Before and After NCLB

Adequate Yearly Progress (AYP)

What is API? The Academic Performance Index (API) is the cornerstone of California's Public Schools Accountability Act of 1999 (PSAA). It is required.

2015 PARCC Results for R.I: Work to do, focus on teaching and learning

Madison Elementary / Middle School and the New Accountability System

WAO Elementary School and the New Accountability System

History of No Child Left Behind (NCLB)

Every Student Succeeds Act (ESSA):

State of Wisconsin School Report Cards Fall 2014 Results

Presentation transcript:

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Robert L. Linn Paper prepared for The CRESST Conference: The Future of Test-Based Educational Accountability, January 23, 2007 Educational Accountability Systems

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Test-based Accountability Popular tool for purposes of educational reform Accountability is one of few tools available to policymakers to leverage changes in instruction In use in many states since the early 1990s Quite a range of approaches to using student test results for accountability systems Central component of NCLB

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Some Rationales for Testing Clarify expectations for teaching and learning Motivate greater effort on part of students, teachers and administrators Monitor educational progress of schools and students Identify schools that need to be improved Provide a basis for distributing rewards and sanctions Monitor achievement gaps and encourage the closing of those gaps

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder No Child Left Behind NCLB is the latest in a series of re- authorizations of the Elementary and Secondary Education Act (ESEA) of 1965 ESEA was the main educational component of President Johnson’s “Great Society” program ESEA, as re-authorized every view years, is the principal federal law affecting elementary and secondary education throughout the country

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Assessments Basic skills and norm-referenced tests of 1980s and early 90s Nation of Risk encouragement of more ambitious tests - performance assessments NCLB increased uniformity of assessments for grades 3-8 of reading and mathematics

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Content Standards States encouraged to develop content standards by Goals 2000 and IASA NCLB requires all states to have academic content standards in reading/English language arts, mathematics, and science All states adopted content standards by 2005 to meet requirements of NCLB if they had not already done so

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder NCLB States required to adopt “challenging academic content standards” that “specify what children are expected to know and be able to do; coherent and rigorous content; [and] encourage the teaching of advanced skills” (NCLB, 2001, part A, subpart 1, Sec. 1111, a (D).

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Performance Standards Called Academic Achievement Standards by NCLB Absolute rather than normative Establish fixed criterion of performance Intended to be challenging Relatively small number of levels Apply to all, or essentially all students Depend on judgment

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Standards Movement High expectations of NCLB consistent with the standards movement of 1990s National Assessment of Educational Progress (NAEP) standards (called achievement levels) set at ambitious levels NAEP 1990 proficient level in mathematics set at high levels Grade 4: 87 th percentile – 13% proficient or above Grade 8: 85 th percentile – 15% proficient or above Grade 12: 88 th percentile – 12% proficient or above

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder States with the Highest and Lowest Percent Proficient or Above on State Assessments in 2005 Highest Reading: Grade 4 Mississippi: 89% Reading: Grade 8 North Carolina: 88% Math: Grade 4 North Carolina, 92% Math: Grade 8 Tennessee: 87% Lowest Reading: Grade 4 Missouri: 35% Reading: Grade 8 South Carolina: 30% Math: Grade 4 Maine & Wyo.: 39% Math: Grade 8 Missouri: 16%

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Contrasts of Percent Proficient or above on NAEP and State Assessments (Grade 8 Mathematics) NAEP Missouri 21% Tennessee 26% State Assessments Missouri 16% Tennessee 87%

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Alignment Alignment of assessments and content standards viewed as critical by proponents of standards-based reform NCLB peer review requires states to demonstrate alignment, usually through studies by independent contractors

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Alignment of Assessments to Content Standards Webb Categorical concurrence Depth of knowledge consistency Range of knowledge correspondence Balance of representation Porter Content categories by cognitive demand matrix

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Alignment of Assessments to Content Standards (Cont’d) Achieve Content centrality Performance centrality Challenge Balance Range

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Approaches to Test-Based Accountability Status Approach: compare assessment results for a given year to fixed targets (the NCLB approach) Growth Approach: evaluate growth in achievement (allowed for NCLB pilot program states) “Growth” may be measured by comparing performance of successive cohorts of students Growth may be evaluated by longitudinal tracking of students from year to year

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Status and Growth Approaches Status approach has many drawbacks when used to identify schools as successes or in need of improvement Does not account for differences in student characteristics, most importantly differences in prior achievement Growth approach has advantage of accounting for differences in prior achievement, but may set different standards for schools that start in different places

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder NCLB Pilot Program Five states have received approval to use growth model approaches to determining AYP Early results suggest that it does not radically alter the proportion of schools failing to make AYP Constraints on growth models are severe, most notably the retention of the requirement that they lead to the completely unrealistic goal of 100% proficiency by 2014

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Multiple-Hurdle Approach NCLB uses multiple-hurdle approach Schools must meet multiple targets each year – participation and achievement separately for reading and mathematics for the total student body and for subgroups of sufficient size Many ways to fail to make AYP (miss any target), but only one way to make AYP (meet or exceed every target) Large schools with diverse student bodies at a relative disadvantage in comparison to small schools or schools with relatively homogeneous student bodies

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Compensatory Approach State systems often use a compensatory approach rather than a multiple-hurdle approach An advantage of compensatory approach is that it creates fewer ways for a school to fall short of targets Hybrid models also possible that use a combination of compensatory and multiple-hurdle approaches

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Disaggegation Critical for monitoring the closing of gaps in achievement No real relevance for small schools with homogeneous student bodies However, it leads to many hurdles that large, diverse schools must meet

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Implications of Subgroup Results Schools with multiple subgroups at relative disadvantage to schools with homogeneous student population May want to consider combining across more than one year as is already allowed for students with disabilities

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Subgroup Gains in NAEP Mathematics Average Scale Scores (1996 to 2005) GroupGrade 4Grade 8 White14 8 Black2215 Hispanic1911

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Closing Achievement Gaps: NAEP Mathematics Average Scale Scores (1996 to 2005) GroupsGrade 4Grade 8 White and Black -8-7 White and Hispanic -5-3

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Use of Academic Achievement Standards Apparent closing or widening of achievement gaps using percent above cut scores can depend on choice of level, e.g., basic or above vs. proficient or above See, for example, Holland, P. W. (2002). Two measures of changes in gaps between CDF’s of test score distributions. JEBS, 27, 3-17.

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Subgroup Gains in NAEP Mathematics Percent At or Above Basic or Proficient (1996 to 2005) Grade 4 Grade 8 GroupBasicProf.BasicProf. White Black Hispanic

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Changes in Achievement Gaps: NAEP Mathematics Percent At or Above Basic or Proficient (1996 to 2005) Grade 4 Grade 8 GroupsBasicProf.BasicProf. White and Black White and Hispanic

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Gaps and Percent Above Cuts “Using differences in percent above cut scores can give a confusing impression of a rather simple situation” (Holland, 2002) Need to look beyond percents basic or above or proficient or above Compare average scale scores, effect size statistics, and comparisons of distributions

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Comparing States on Closing Gaps Gaps measured in terms of percent proficient or above on state assessments can be quite misleading due to the wide variation in the stringency of state definitions of the proficient standard

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Performance Indexes Focusing only on percent proficient or above has disadvantages Does not give credit to student moving from below basic to basic Encourages attention to students thought to be near the proficient cut, possibly at the expense of other students Performance Index scores avoid these problems

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Illustration of MA Index Scores for a Hypothetical School in 2006 & 2007 Perfor- mance Level PointsN 2006 N Points 2007 Points Prof ,000 NI high ,6257,500 NI low ,0006,250 W/F high ,5003,125 W/F low Total400 18,12521,875

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder School Index Scores 2006 Score = 18,125/400 = Score = 21,875/400 = Percent Proficient or Above 2006 = 12.5% 2007 = 12.5%

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Score Inflation Defined as “.. a gain in scores that substantially overstates the improvement in learning it implies” (Koretz, 2005) Research has found that gains in scores in high- stakes accountability systems often fail to generalize to other measures of achievement Narrow focus on past tests rather than broader content standard can cause score inflation Emphasis on alignment and the need to repeat a substantial percentage of items on assessments for year-to-year equating may contribute to score inflation

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Validity of Causal Inferences Status approach does not provide a defensible basis for inferring that higher scoring school is more effective than a lower scoring school Making an inference about school quality requires the elimination of many alternate explanations of differences in student achievement other than differences in instructional effectiveness Prior achievement differences Differences in support from home

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Inferences About Schools Growth models rule out the alternate explanation of differences in prior achievement Nonetheless, causal inferences about school effectiveness are not justified the growth approach to test-based accountability Many rival explanations to between-school differences in growth besides differences in school quality or effectiveness Results better thought of as descriptive for generating hypotheses about school quality that need to be evaluated

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder School Characteristics and Instructional Practice School differences in achievement and in growth describe outcomes and can be the source of hypotheses about school effectiveness Accountability systems need to be informed by direct information about school characteristics and instructional practices

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Conclusions Test-based accountability has become a pervasive part of efforts to improve education in the U.S. The features of accountability systems matter Requirement to include nearly all students in test-based accountability has brought needed attention to groups often ignored in the past

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Conclusions (continued) Performance standards are supposed to define the level of achievement that students should reach, but The definition of proficient achievement varies so widely from state to state that it lacks any semblance of common meaning Using percent proficient or above a primary indicator does not give credit for gains of students at other levels Using percent proficient or above to monitor gaps in achievement is not an adequate approach

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Conclusions (continued) Status-based approach to accountability does not provide a valid way of distinguishing successful schools from schools that are in need of improvement Growth models have advantages over status models but still are best thought of as providing descriptive information rather than the providing the basis for causal inferences about school quality