Unpublished Work © 2005 by Educational Testing Service Growth Options for California County and District Evaluators’ Meetings May 10 and 19, 2005.

Slides:



Advertisements
Similar presentations
Ed-D 420 Inclusion of Exceptional Learners. CAT time Learner-Centered - Learner-centered techniques focus on strategies and approaches to improve learning.
Advertisements

Using Growth Models to improve quality of school accountability systems October 22, 2010.
Testing for Tomorrow Growth Model Testing Measuring student progress over time.
o Nearly all 50 states have adopted the Common Core State Standards and Essential Standards. o State-led and developed Common Core Standards for K-12.
You can use this presentation to: Gain an overall understanding of the purpose of the revised tool Learn about the changes that have been made Find advice.
Standardized Tests: What Are They? Why Use Them?
Wortham: Chapter 2 Assessing young children Why are infants and Preschoolers measured differently than older children and adults? How does the demand for.
Iowa Assessment Update School Administrators of Iowa November 2013 Catherine Welch Iowa Testing Programs.
1 Effective Use of Benchmark Test and Item Statistics and Considerations When Setting Performance Levels California Educational Research Association Anaheim,
Grading. Why do we grade? To communicate To tell students how they are doing To tell parents how students are doing To make students uneasy To wield power.
STAR 2010 September 10, Agenda New in 2010 Interpreting reports Comparing results Appendixes A-G 2.
Communicating through Data Displays October 10, 2006 © 2006 Public Consulting Group, Inc.
Causal-Comparative Research
Minnesota Manual of Accommodations for Students with Disabilities Training Guide
Measuring Human Performance. Introduction n Kirkpatrick (1994) provides a very usable model for measurement across the four levels; Reaction, Learning,
Lesson Thirteen Standardized Test. Yuan 2 Contents Components of a Standardized test Reasons for the Name “Standardized” Reasons for Using a Standardized.
Unpublished Work © 2005 by Educational Testing Service Models for Evaluating Grade-to- Grade Growth LMSA Presentation Robert L. Smith and Wendy M. Yen,
The Art of Teaching Writing
Classroom Assessment A Practical Guide for Educators by Craig A
Introduction to GREAT for ELs Office of Student Assessment Wisconsin Department of Public Instruction (608)
But What Does It All Mean? Key Concepts for Getting the Most Out of Your Assessments Emily Moiduddin.
Vertical Scale Scores.
Standardized Tests. Standardized tests are commercially published tests most often constructed by experts in the field. They are developed in a very precise.
Evaluating Student Growth Looking at student works samples to evaluate for both CCSS- Math Content and Standards for Mathematical Practice.
2008 STAR Interpreting and Using Results August/September 2008.
Chapter 8: Systems analysis and design
Department of Research and Evaluation Santa Ana Unified School District 2011 CST API and AYP Elementary Presentation Version: Elementary.
Copyright ©2006. Battelle for Kids. Understanding & Using Value-Added Analysis.
Classroom Assessments Checklists, Rating Scales, and Rubrics
CALIFORNIA DEPARTMENT OF EDUCATION Tom Torlakson, State Superintendent of Public Instruction California Measurement of Academic Performance and Progress.
EDU 385 Education Assessment in the Classroom
Jasmine Carey CDE Psychometrician Interpreting Science and Social Studies Assessment Results September 2014.
Diagnostics Mathematics Assessments: Main Ideas  Now typically assess the knowledge and skill on the subsets of the 10 standards specified by the National.
Teaching Today: An Introduction to Education 8th edition
General Information Iowa Writing Assessment The Riverside Publishing Company, 1994 $39.00: 25 test booklets, 25 response sheets 40 minutes to plan, write.
1 Overview of Class #7 Teaching Segment #3: Warm-up problem Introduction to base-ten blocks Analysis of student thinking using samples of students’ written.
Session 7 Standardized Assessment. Standardized Tests Assess students’ under uniform conditions: a) Structured directions for administration b) Procedures.
Chapter 17 Introduction to Survey Research. Surveys – why a survey? Surveys are conducted to describe the characteristics of a population. Examples of.
The Teaching Process. Problem/condition Analyze Design Develop Implement Evaluate.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Comparisons of independent schools across time: QCEA 2005, 2006, and 2007 Key assumptions Summary of Results Arabic: All students, and independent school.
Scaling and Equating Joe Willhoft Assistant Superintendent of Assessment and Student Information Yoonsun Lee Director of Assessment and Psychometrics Office.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 5: Introduction to Norm- Referenced.
Copyright © 2014 by Educational Testing Service. All rights reserved. Influencing Education: Implementing Online Reporting Systems to Support Assessment.
Assessing Information Literacy with SAILS Juliet Rumble Reference & Instruction Librarian Auburn University.
1 Maximizing Predictive Accuracy of District Benchmarks Illuminate Education, Inc. User’s Conference Aliso Viejo, California June 4&5, 2012.
PSYCHOMETRICS. SPHS 5780, LECTURE 6: PSYCHOMETRICS, “STANDARDIZED ASSESSMENT”, NORM-REFERENCED TESTING.
LISA A. KELLER UNIVERSITY OF MASSACHUSETTS AMHERST Statistical Issues in Growth Modeling.
Assessment Assessment is the collection, recording and analysis of data about students as they work over a period of time. This should include, teacher,
An Overview of Georgia Milestones 4 th Grade Parent Involvement.
Instructional Leadership Supporting Common Assessments.
Next Generation Iowa Assessments.  Overview of the Iowa Assessments ◦ Purpose of Forms E/F and Alignment Considerations ◦ Next Generation Iowa Assessments.
Lesson Thirteen Standardized Test. Contents Components of a Standardized test Reasons for the Name “Standardized” Reasons for Using a Standardized Test.
CaMSP Science Assessment Webinar Public Works, Inc. Sharing Lessons Learned in the Development and Use of Science Assessments for CaMSP Teachers and Students.
California Assessment of STUDENT PERFORMANCE and PROGRESS
Classroom Assessments Checklists, Rating Scales, and Rubrics
Department of Research and Evaluation
ASSESSMENT OF STUDENT LEARNING
Classroom Assessments Checklists, Rating Scales, and Rubrics
NWEA Measures of Academic Progress (MAP)
Language Arts Assessment Update
Booklet Design and Equating
Interpreting Science and Social Studies Assessment Results
Chapter Eight: Quantitative Methods
Using Data for Improvement
AWG Spoke Committee- English Learner Subgroup
Assessment Literacy: Test Purpose and Use
Milwaukee Public Schools University of Wisconsin-Milwaukee
EDUC 2130 Quiz #10 W. Huitt.
Presentation transcript:

Unpublished Work © 2005 by Educational Testing Service Growth Options for California County and District Evaluators’ Meetings May 10 and 19, 2005

2 Californians Want to Measure Student Growth CST scales are separate by grade Each grade has its own Basic (300) and Proficient (350) standards Connections do not presently exist between grades

3 “Measuring growth” can mean different things to different users “Vertical scaling” Catch-all phrase used by a variety of people to represent growth measures A technical term for one particular statistical procedure May or may not be most useful and cost- effective growth measure needed by CA Today we will explain options for measuring growth and get your input

4 Progress Toward Determining the Best Growth Measure(s) for CA Exploratory study of vertical scaling of CSTs Technical Advisory Group Interviews of CA school district staff about what growth measures would be useful Growth Options Task Force Evaluators’ meetings Growth Options Task Force follow-up

5 Vertical Scaling (Technical definition) Connect the scales across grades by having students take “linking” items from adjacent grade tests These links place the items (and scores) across grades on a common scale Scale scores might range from 200 (grade 2) up to 800 (grade 11)

6 Vertical Scaling Ideal goals: Scale scores increase by grade Scale scores can be compared across grades A 500 “means the same thing” if it comes from a grade 4 test or a grade 5 test “Growth” of 10 units “means the same thing” in low grades as high grades Ideal approximated in real life but never exactly met Vast majority of vertical scales have been developed with published norm-referenced tests Few vertical scales exist for state standards- referenced tests

7 Exploratory Vertical Scaling Study for California ELA grades 2-11 Math grades 2-7 Linking embedded in 2004 operational CST testing No incremental testing or cost to state Linking items Measured standards that were common across adjacent grades Placed in “field test buckets”

8 Design N=3000 to 5000 per linking item ELA linking items per grade pair Math linking items per grade pair Grade 2 students took some grade 3 items and grade 3 students took some grade 2 items, etc. Scales linked sequentially: 2<3<4<5<6<7<8<9<10<11

9 Evaluation of Links Evidence that supports the validity of vertical scaling is the growth of student scores Better performance of higher-grade students than lower-grade students on common items Scale score distributions that increase as grade increases

10 Findings: Higher-grade students consistently did better than lower-grade students on common items that came from the higher-grade operational test Higher-grade students did not necessarily do better than lower-grade students when common items were from the lower-grade operational test Position effects were evident: items became more difficult when they appeared later in a test

11 Findings (cont.): Scale scores generally increased by grade except ELA: grades 9, 10, 11 minimal growth Math: grades 6 and 7 essentially no growth

12 Conclusions of exploratory study Concerns: ELA: Minimal growth in grades 9, 10, and 11 Math: Minimal growth in grades 6 and 7 Possible factors affecting vertical links Item position effects Grade x curriculum interactions Changes in populations Not clear if vertical scaling will work for CSTs at all grades

13 Phone Interviews March/April respondents from CA counties and districts Asked 5 questions

14 Are you currently using STAR data to make any longitudinal comparisons, and if so, what are you doing with that data? Used NRT or CST Aware of inappropriateness of using current CST scale scores for growth

15 Who are the most important potential users in your district of longitudinal information? Full range: Teachers to Superintendents Parents School Boards Administrators: instructional planning Teachers: expected student performance

16 If we were able to improve the psychometric underpinnings for making comparisons across grades using CSTs, would that be of benefit to your district? How would you plan to use that information? Overwhelming enthusiasm for legitimate method of making longitudinal comparisons Should provide legitimate procedure so users don’t “hurt themselves” Concern about over-burdening the CSTs by addition of one more purpose

17 Longitudinal comparisons do have their limitations and can be misinterpreted, so we’d like to get your input on what interpretive materials would be most useful to you. Current post-test workshops and guides should cover this Few saw need for special efforts Largest districts have resources to address this Teacher-specific interpretive materials would be helpful

18 One of the options we are considering is a vertical scale. If we used a vertical scale, there would be some changes, and we would need to have an in-grade scale that differed from an “across- grade” scale. Would that be a problem in your district? Two diametrically opposed opinions: Acquired meaning of 300 and 350 too important to do away with The meaning of 300 and 350 could be easily supplanted Use of both in-grade and across-grade scales seen as complicated and potentially confusing

19 Growth Options Task Force Tom Barrett, Riverside USD Paula Carroll, Lodi USD J.T. Lawrence, San Diego COE Phil Morse, LAUSD Jim Parker, Paramount USD Jim Stack, SFUSD Mary Tribbey, Butte COE Mao Vang, Sacramento City USD

20 Major Options for Tracking Growth Vertical Scales Norms Tables of Expected Growth

21 Vertical Scales Advantages Scale scores comparable across grades Useful if tracking students across many grades Suitable for statistical analyses

22 Vertical Scales Disadvantages Assumption of hierarchical growth maybe not met; scores may not grow between grades Across-grade scale different from within-grade scale Can highlight inconsistencies (if they exist) of with-in grade standards Scale scores have no intrinsic meaning Need caution in comparing growth in different parts of scale Special data collection needed

23 Norms CA percentiles, NCEs, or Z-scores By grade by content area “Typical” growth defined to be what is seen cross-sectionally in state from grade to grade Types Static (using a base year such as 2003) Rolling (using current year)

24 Norms Advantages Fairly easy to understand Allow comparisons of relative standing and growth relative to norm group Minimal assumptions are required Comparisons can be made across content areas No special data collection needed

25 Norms Disadvantages Need to keep clear relative nature of comparison (static vs rolling norm) No continuous growth scale Growth expectations are based on cross- sectional, not longitudinal data “Typical” growth does not necessarily mean student is progressing sufficiently toward Proficiency

26 Tables of Expected Growth Use longitudinal CA data (e.g., grade 3 and 4 performance for the same students) Determine statistical expectation of grade 4 scores typically seen for students with each possible grade 3 score Calculate standard error along with expectation Standardized deviations from expectations can be compared across grades and content areas

27 Tables of Expected Growth Advantages Fairly easy to understand Allow comparisons of growth relative to norm group Minimal assumptions are required; could be done for high school courses Comparisons can be made across content areas Based on actual student growth

28 Tables of Expected Growth Disadvantages Tables of expectations may need to be recalculated each year No continuous growth scale “Typical” growth does not necessarily mean student is progressing sufficiently toward Proficiency Matching student data over years required Expectations would not include students who have been in CA < 1 year or who cannot be tracked

29 Growth Options Task Force Discussed options in detail for a day Norms may be most easily understood Growth Expectations may be most useful for administrators and program evaluation Classification may be useful: Growth is average/above average/below average Standardized growth measures that could be pooled over grades could be useful: (Observed score – Expected score)/SE Will work with CDE and ETS to pilot test some options