Interpreting Assessment Results using Benchmarks Program Information & Improvement Service Mohawk Regional Information Center Madison-Oneida BOCES.

Slides:



Advertisements
Similar presentations
Performance Assessment
Advertisements

Standardized Scales.
Jack Buckley Commissioner National Center for Education Statistics May 10, 2012.
November 2009 Oregon RTI Project Cadre 5.  Participants will understand both general IDEA evaluation requirements and evaluation requirements for Specific.
VALUE – ADDED 101 Ken Bernacki and Denise Brewster.
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.
Designing Content Targets for Alternate Assessments in Science: Reducing depth, breadth, and/or complexity Brian Gong Center for Assessment Web seminar.
NYS Assessment Updates & Processes for New Social Studies Regents Exams September 18, 2014 Candace Shyer Assistant Commissioner for Assessment, Standards.
Build Assessment Literacy and Create a Data Overview Oct. 10, 2006.
Communicating through Data Displays October 10, 2006 © 2006 Public Consulting Group, Inc.
Activity-Oriented Mathematics
Georgia Modification Research Study Spring 2006 Sharron Hunt Melissa Fincher.
Brock’s Gap Intermediate School Hoover City Schools Testing- Spring 2014 Results / Analysis- Fall 2014.
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Understanding Validity for Teachers
1 The New York State Education Department New York State’s Student Reporting and Accountability System.
Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.
Vertical Scale Scores.
Student Learning Objectives 1 Implementing High Quality Student Learning Objectives: The Promise and the Challenge Maryland Association of Secondary School.
Creating Assessments with English Language Learners in Mind In this module we will examine: Who are English Language Learners (ELL) and how are they identified?
ICSD District RtI Committee Agenda 3/13/12 3:45- Review of Our Norms and today’s agenda 4:00- Defining RtI and screening tool criteria 4:30- Begin review.
New York State Education Department Understanding The Process: Science Assessments and the New York State Learning Standards.
Adolescent Literacy – Professional Development
Building Effective Assessments. Agenda  Brief overview of Assess2Know content development  Assessment building pre-planning  Cognitive factors  Building.
Out with the Old, In with the New: NYS Assessments “Primer” Basics to Keep in Mind & Strategies to Enhance Student Achievement Maria Fallacaro, MORIC
Predicting Patterns: Lenawee County's Use of EXPLORE and PLAN DataDirector 2011 User Conference Dearborn, Michigan.
NECAP DATA ANALYSIS 2012 Presented by Greg Bartlett March, 2013.
* Provide clarity in the purpose and function of the Student Learning Objectives (SLOs) as a part of the APPR system * Describe procedures for using.
Fall Testing Update David Abrams Assistant Commissioner for Standards, Assessment, & Reporting Middle Level Liaisons & Support Schools Network November.
Measuring of student subject competencies by SAM: regional experience Elena Kardanova National Research University Higher School of Economics.
Cluster 5 Spring 2005 Assessment Results Sociocultural Domain.
375 students took the number sense common formative assessments this school year. These are their stories. (Please view as a slide show)
2007 Grade 3-8 English Test Results. 2 Raising Achievement Over past several years, Board of Regents has voted measures to raise standards and require.
{ Principal Leadership Evaluation. The VAL-ED Vision… The construction of valid, reliable, unbiased, accurate, and useful reporting of results Summative.
MELS 601 Ch. 7. If curriculum can be defined most simply as what is taught in the school, then instruction is the how —the methods and techniques that.
Jackson County School District A overview of test scores and cumulative data from 2001 – 2006 relative to the following: Mississippi Curriculum Test Writing.
1 The New York State Education Department New York State’s Student Data Collection and Reporting System.
Guide to Test Interpretation Using DC CAS Score Reports to Guide Decisions and Planning District of Columbia Office of the State Superintendent of Education.
Understanding Alaska Measures of Progress Results: Reports 1 ASA Fall Meeting 9/25/2015 Alaska Department of Education & Early Development Margaret MacKinnon,
OIP The Ohio Improvement Process and the role of the BLT.
Claremont Graduate University Teacher Education Special Education Seminars Dr. Phyllis B. Harris, Executive Director Oakland Unified School District Programs.
Melrose High School 2014 MCAS Presentation October 6, 2014.
Scale Scoring A New Format for Provincial Assessment Reports.
DWW: Doing What Works Recommendation 1. Make data part of an ongoing cycle of instructional improvement. Recommendation 2. Teach students to examine their.
Validity Validity is an overall evaluation that supports the intended interpretations, use, in consequences of the obtained scores. (McMillan 17)
PLC Team Leader Meeting
West Virginia Achieves Professional Development Series Volume III Curriculum Prioritization and Mapping.
Virginia State University Summer Data Institute: Digging into Data to Identify the Learner-Centered Problem Presented by: Justina O. Osa, Ed.D.
FASA Middle School Principal ’ s Leadership Academy Don Griesheimer Laura Hassler Lang July 22, 2007.
Connecticut SDE 2012 Connecticut Assessment Forum August 15, 2012 Bureau of Student Assessment.
1 Scoring Provincial Large-Scale Assessments María Elena Oliveri, University of British Columbia Britta Gundersen-Bryden, British Columbia Ministry of.
1 Grade 3-8 English Language Arts Results Student Growth Tracked Over Time: 2006 – 2009 Grade-by-grade testing began in The tests and data.
Presentation to the Nevada Council to Establish Academic Standards Proposed Math I and Math II End of Course Cut Scores December 22, 2015 Carson City,
Materials FCAT Achievement Levels Test Content and Format, pg. 12 (FCAT Skills Assessed) Examples of Reading Activities Across Cognitive Complexity Levels,
NAEP What is it? What can I do with it? Kate Beattie MN NAEP State Coordinator MN Dept of Education This session will describe what the National Assessment.
Performance Goals Samples (Please note, these goals are not proficient- they are for training purposes) What do you think?
Measuring College and Career Readiness
Assessments for Monitoring and Improving the Quality of Education
Smarter Balanced Assessment Results
Update on Data Collection and Reporting
Test Validity.
Testing an Individual Differences
K-6 Benchmark Assessment Inservices
Lietta Scott, PhD Arizona Department of Education
2015 PARCC Results for R.I: Work to do, focus on teaching and learning
Bursting the assessment mythology: A discussion of key concepts
Office of Education Improvement and Innovation
Kathy Cox State Superintendent of Schools GPS Day 3 Training
Presentation transcript:

Interpreting Assessment Results using Benchmarks Program Information & Improvement Service Mohawk Regional Information Center Madison-Oneida BOCES

Why Use Benchmarks? Benchmarks are useful for comparing results of individual students or schools to a larger population who took the same assessment at the same time.

What are Benchmarks? Each benchmark represents a sample group of students who performed similarly on a given assessment.

Benchmarks, cont. Benchmarks of any given assessment are determined at selected points of the overall performance: specifically at Low, Average, and High performance levels.

Benchmarks, cont. A “Low” performance is equated with Level 2; “Average” performance with Level 3; “High” performance with Level 4.

Benchmarks, cont. An analysis is done to determine how students who performed at these key points achieved on the standards, i.e.: Standard (SPI) 1, 2, & 3 for ELA; Key Idea (KPI) 1, 2, 3, 4, 5, 6, & 7 for Math.

Benchmarks, cont. The best way to determine these points is to select the lowest scale score associated with Low (Level 2), Average (Level 3) and High (Level 4) performance levels.

Benchmarks, cont. Finding enough students who achieve these exact scale scores enables a “benchmark profile” to be constructed. To do this, MORIC uses regional data from all of the students within the 52 school districts served within our region.

Benchmarks, cont. Typically, students achieve the exact scale scores representing each of the benchmarks.

Benchmarks, cont. These students come from among the 6,000 to 6,500 students within the districts served by the Mohawk RIC who take any given assessment at the same time.

Benchmarks, cont. The students from the benchmark groups are anonymously selected and their SPI or KPI scores are analyzed. That is, the scores for all students within a given benchmark group are averaged.

Benchmarks, cont. Because any given assessment is based on items of unequal difficulty, it turns out that students who receive identical scale scores tend to answer questions nearly all the same way. This is how “benchmark profiles” for each proficiency level are determined.

Benchmarks, cont. Once it is known how the benchmark groups performed on each learning standard or key idea, there is a relevant context for comparing individual or school scores.

FAQs About Benchmarks

FAQ 1 – Where do benchmark groups come from? Within the 52-district region comprising MORIC, there are about 6,000 to 6,400 students who take any given state assessment at grades 4 and 8. Benchmark groups come from this large group of students.

FAQ 2: What exactly is a benchmark? Each year the benchmarking procedure identifies groups of children who score EXACTLY at a scale score cut-point.

FAQ 2, cont. For example, the lowest scale score to earn a Level 4 is designated "Benchmark Level 4", and this represents a group of children who have achieved an advanced level of proficiency. More importantly, these children represent those who scored at the exact "cut off" for Level 4.

FAQ 2, cont. For each assessment the New York State Education Department establishes the scale score cut off, (also called “cut scores” or “cut points”.)

FAQ 3: How many benchmark groups are there? Benchmark level 2 group— (low, not proficient) Benchmark level 3 group— (average proficiency) Benchmark level 4 group— (advanced proficiency)

FAQ 4: Why use benchmarks? Children within a given benchmark group tend to answer items on the assessment in the same way. Thus, comparing a test item result for a particular school against the benchmark group provides a relevant context for interpreting results.

FAQ 4, cont. Since not all test items are of the same difficulty level, this can help to discriminate between test item results within a given sub-skill. It then aids in identifying where instruction could be improved.

FAQ 5: How big are benchmark groups? There are generally students who comprise a given benchmark group within the MORIC region. These children's item scores are anonymously pulled together to form the benchmark group.

FAQ 5, cont. We don't know which schools these children come from, but it is not important where they come from. What is important is that these students have scored the same exact overall scale score.

FAQ 6: Why analyze students who got the same scale score? The tests measure overall achievement of learning standards and key ideas. The cut points for scale scores demark distinct levels of performance. Therefore, students who scored at the same exact scale score achieved the same level of performance. They also tend to answer questions on the test in the same way.

FAQ 7: How reliable are regional benchmarks? MORIC staff participate with other state- wide educational data analysts through the New York State School Analyst Group (DATAG) to compare how MORIC’s benchmarks compare to those from other regions.

FAQ 7, cont. In five years of state data, there have been no statistically significant differences between MORIC’s benchmarks and those from other regional groups. Because of the way the state assessments are designed, statistically significant differences are not anticipated, either.

FAQ 7, cont. When SED releases the benchmark values for statewide results, (usually around one year after the assessments are administered), these are also found not to be significantly different from those of the MORIC region.

FAQ 7, cont. Therefore, as long as the benchmark groups remain reasonably large (greater than 25 students), basing the benchmark groups upon regional results from all 52 MORIC districts is a defensible procedure.

Data Do’s and Don’ts

Do: Consider these results in the context of other related data, (e.g., classroom work & other assessments, such as the Terra Novas, Milestones, or TONYSS.)

Do: Use the findings as “conversation starters”, “not conversation enders.” Good analysis of data provides questions to be discussed jointly by administrative and instructional teams.

Do: Make lists of questions generated by the data for the data analyst, staff developers, & the students.

Do: Remember the tests are a “snapshot” of achievement in a given time, but that they are not the total view.

Don’t: Make major programmatic decisions on the basis of any one data analysis finding. It is statistically unsound to do so.

Don’t: Read too much into the result of a single test question. Place more trust on the “broader measure” (i.e., the sub skill results and the SPI/KPI) than the “smaller, narrower measure.” It is more statistically sound to rely upon the “bigger, broader measure.”

Tips for Interpreting Assessment Results

Tip #1: Ask Questions What should be done instructionally? What should/should not be done with the curriculum? Are there non-instructional factors, (such as the school culture, attendance, etc.), affecting student achievement?

Tip #2: Validate Use multiple measures for making programmatic or instructional changes. The state assessment is one measure of student achievement in a given subject area. Utilize other sources of student performance.

Tip #3: Examine The best way to improve overall performance is to examine all of the curriculum content related to a given sub skill or standard/key

Tip #4: Focus Focus program improvements around the full breadth of content within that sub skill area, standard, or key idea.

Tip #5: But, don’t limit! Do not over emphasize any one sub skill in a single year. State assessments contain questions assessing a students’ knowledge on a number of sub skills. A sub skill measured one year may not be assessed the following year.

Contact: Maria Fallacaro Educational Data Analyst (315) Click on >”Data Analysis”