Automated Scoring: Smarter Balanced Studies CCSSO- NCSA San Diego, CA June, 2015.

Slides:

Advertisements

Similar presentations

Performance Tasks for English Language Arts

Advertisements

COMMON CORE STATE STANDARDS (CCSS) PARENT WORKSHOP.

Iowa Assessment Update School Administrators of Iowa November 2013 Catherine Welch Iowa Testing Programs.

Common Core Standards and the Edmonds School District November 4, 2013.

Missouri State Assessments: What do families need to know? [INSERT DATE]

Office of Assessment October 22, Smarter ELA/Literacy Smarter Mathematics Smarter Interim Comp Assessments Smarter Digital Library DCAS Science.

CORE California Office to Reform Education Fall Performance Assessment Pilot October-December 2012.

Smarter Balanced Rubrics, Scoring, and Reporting March 24, 2015

Confidential and Proprietary. Copyright © 2010 Educational Testing Service. All rights reserved. Catherine Trapani Educational Testing Service ECOLT: October.

TOM TORLAKSON State Superintendent of Public Instruction CALIFORNIA DEPARTMENT OF EDUCATION Tom Torlakson, State Superintendent of Public Instruction Senior.

Principal Meeting: Accountability and Assessment Update Brian Huff March 18, 2015.

CALIFORNIA DEPARTMENT OF EDUCATION Tom Torlakson, State Superintendent of Public Instruction Gina Koency California Department of Education (CDE) Senior.

November 20, 2014 West Virginia Department of Education

California Assessment of Student Performance and Progress (CAASPP)

SBAC: UPDATE, RESOURCES, BEST PRACTICES OCTOBER 6, 2014.

Palo Alto Unified School District SMARTER BALANCED ASSESSMENT WORKSHOP Paly – SSC November 4, 2013 Diana Wilmot, Ph.D. Director of Research, Evaluation.

ELA Common Core State Standards Job Alike #8 Assessment.

SMARTER BALANCED QUESTION TYPES OVERVIEW TEXT TXT EXT Assess a broad range of content. Scoring is objective, fast, and inexpensive to score. Difficult.

The Five New Multi-State Assessment Systems Under Development April 1, 2012 These illustrations have been approved by the leadership of each Consortium.

Get Smarter!. OAKS Transition Calibrate Smarter Balanced to OAKS Option 1 Set achievement level on Smarter Balanced that represents equivalent rigor.

Background Information Next Steps. 6tY.

PARCC Update June 6, PARCC Update Today’s Presentation:  PARCC Field Test  Lessons Learned from the Field Test  PARCC Resources 2.

Topics Digital Library updates

Interim Assessments Overview of Spring 2015 Assessments Training Module.

Overview of English language arts (ELA) assessment Vaughn G. Rhudy, Ed.D., NBCT Assessment Coordinator Stacey Murrell, Ed.D. Interim.

FAN MEETING NOVEMBER 27. CORE Background 8 Districts: Long Beach Los Angeles Fresno Sanger Clovis Sacramento City San Francisco Oakland.

M-STEP A presentation for Macomb County Math Teachers.

SMARTER BALANCED Wisconsin Grades Smarter Balanced supports Wisconsin’s vision of encouraging all students to aim high while giving all educators.

The use of asynchronously scored items in adaptive test sessions. Marty McCall Smarter Balanced Assessment Consortium CCSSO NCSA San Diego CA.

Michigan State Assessments: What Do Families Need to Know?

Arizona English Language Learner Assessment AZELLA

Based on Common Core.  South Carolina will discontinue PASS testing in the year This will be a bridge year for common core and state standards.

The Four P’s of an Effective Writing Tool: Personalized Practice with Proven Progress April 30, 2014.

Oxford Preparatory Academy Scholar Academy Parent Social Topic: Changes in State Testing May 4, 5, and 6, 2015.

ASSOCIATION OF WASHINGTON MIDDLE LEVEL PRINCIPALS WINTER MEETING -- JANUARY 24, 2015 Leveraging the SBAC System to Support Effective Assessment Practices.

Smarter Balanced Interim Assessment System. Session Overview What are the interim assessments? How to access? How to score? Using the THSS and the scoring.

Summer Scoring Training Smarter Balanced Mathematics Deborah J. Bryant September 18, 2015.

Summary of Assessments By the Big Island Team: (Sherry, Alan, John, Bess) CCSS SBAC PARCC AP CCSSO.

What is a Rubric? A rubric is a set of scoring criteria for a performance task. A rubric also serves as a blueprint for the student to use in constructing.

Understanding the 2015 Smarter Balanced Assessment Results Assessment Services.

Getting Ready for Smarter Balanced Jan Martin Assessment Director, SD DOE Feb. 7, 2014.

Performance Task Overview Introduction This training module answers the following questions: –What is a performance task? –What is a Classroom Activity?

Welcome to the Interim Assessment Training for Teachers Office of Assessment March 3, 2015 While you wait for the webinar to begin, please be sure to check.

29 States $176,000,000 for development Includes formative, interim & summative Governed and controlled by states Co-chairs, Judy Park, Utah; Tony Alpert,

Interim Assessments OFFICE OF SUPERINTENDENT OF PUBLIC INSTRUCTION 1.

Math Performance Tasks: Scoring & Feedback Smarter Balanced Professional Development for Washington High-need Schools University of Washington Tacoma Maria.

Smarter Balanced 103: Item Types and Instructional Implementation Rachel Aazzerah Science & Social Science Assessment Specialist

CALIFORNIA DEPARTMENT OF EDUCATION Tom Torlakson, State Superintendent of Public Instruction Riverside COE November 21, 2014 Gina Koency Senior Assessment.

SMARTER BALANCED ASSESSMENT PARA LOS NIÑOS APRIL 30, 2013 Transitioning to the Common Core.

April 2011 Division of Academics, Performance and Support Office of Assessment High School Testing – Scoring Best Practices May 24, 2011 CFN 201.

Tips to Scoring the ELA Smarter Balanced Interim Assessments

Hands-on Automated Scoring

Overview of Assessments

Utilizing the ELA Results

Performance Task Overview

Overview of Spring 2015 Assessments

Shasta County Curriculum Leads November 14, 2014 Mary Tribbey Senior Assessment Fellow Interim Assessments Welcome and thank you for your interest.

Deputy Commissioner Jeff Wulfson Associate Commissioner Michol Stapel

California Assessment of Student Performance and Progress (CAASPP)

Presentation transcript:

Automated Scoring: Smarter Balanced Studies CCSSO- NCSA San Diego, CA June, 2015

Smarter Pilot and Field Test Studies Moved the field forward –Big data sets –Many methods, spectacular researchers Immediate, practical results –Field test improved on pilot findings –We learned a lot about advantages and limitations

The Field Test Items in Study 683 English language arts (ELA)/literacy short-text, constructed-response, items –Reading short text - CAT –Writing brief writes - CAT –Research PT questions 238 mathematics short-text, constructed- response items –Includes 40 mathematical reasoning items 66 ELA/literacy essay items.

Criteria Quadratic weighted kappa for engine score and human score less than 0.70 Pearson correlation between engine score and human score less than 0.70 Standardized difference between engine score and human score greater than 0.12 in absolute value Degradation in quadratic weighted kappa or correlation from human-human to engine-human >= 0 Standardized difference between engine score and human score for a subgroup greater than 0.10 in absolute value Notable reduction in perfect agreement rates from human-human to engine-human equal to or greater than 0.05

Read-Behind Studies Costs limit the number of responses getting a second human read. Can using scoring engines as a second rater improve scoring? Results: Scoring scenarios where an Automated Scoring system serves as a second rater (“read-behind”) behind a human rater produce high quality scores. M-H and H-H results are similar.

Targeting Responses for Human Review Can scoring engines detect responses most likely to be rated differently by humans and machines so they can be routed to second raters? Result: Using scoring engines to identify candidates for a second human read yielded major reliability improvements over random assignment of responses.

Item Characteristics that Correlate with Agreement for Human and Automated Scoring - ELA Reading short text items –item specific rubrics yield higher reliability than generic rubrics –There was higher agreement when the text was fictional Essays – generic rubrics are associated with higher reliability for the conventions trait –For the other traits, prompt specific engine training is preferred Brief Writes: Significantly higher agreement for narrative stimuli All trends above were true for both human and machine scoring.

Item Characteristics that Correlate with Agreement for Human and Automated Scoring Mathematics Using an Automated Scoring system as a read-behind improves score quality, provided non-exact adjudication is used. In mathematics, hand-scoring agreement was statistically significantly higher than the best engine scores. –Mathematics responses could be expressed in a large number of ways. –Student responses tended to be short.

Moving forward Summative tests –Use as second rater –target second human reads –Smarter rules allow vendors to use scoring engines, but none are currently doing so Interim –Provide to teachers to score specific tasks Classroom Assessment –Provide to teachers to allow assignment of more writing tasks

Policy Issues Resistance to AI use: –The Chinese Room –Threat to training, understanding Inflated expectations lead to disappointment –Doesn’t always work –Requires planning and coordination –Is not cheap

Moving forward Platform integration –Current engines use batch or stand- alone processing –Need trained engine apps that work with online delivery engines in real time Item development –Studies gave better info about what kinds of items are likely to succeed –It is desirable to have scoring engine experts involved in task development

The Field Test Scoring Study and appendices have been posted to SmarterApp: omatedScoringResearchStudies.html omatedScoringResearchStudies.html An updated version of the pilot study is on the Smarter website: res/pilot-test-automated-scoring-research-studies/ res/pilot-test-automated-scoring-research-studies/ Want Details?

Thank you for your attention