The Many Challenges of Using Test Scores in Evaluation David E. W. Mott Tests for Higher Standards.

Slides:



Advertisements
Similar presentations
Value Added in CPS. What is value added? A measure of the contribution of schooling to student performance Uses statistical techniques to isolate the.
Advertisements

Standardized Scales.
Richard M. Jacobs, OSA, Ph.D.
Mark D. Reckase Michigan State University The Evaluation of Teachers and Schools Using the Educator Response Function (ERF)
DESCRIPTIVE STATISTICS Chapter 2 BASED ON SCHAUM’S Outline of Probability and Statistics BY MURRAY R. SPIEGEL, JOHN SCHILLER, AND R. ALU SRINIVASAN ABRIDGMENT,
AchieveNJ: Teacher Evaluation Scoring Guide
VALUE – ADDED 101 Ken Bernacki and Denise Brewster.
Math Qualification from Cambridge University
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 4. Measuring Averages.
Implementing Virginia’s Growth Measure: A Practical Perspective Deborah L. Jonas, Ph.D. Executive Director, Research and Strategic Planning Virginia Department.
Dallas ISD’s Value-Added Model School Effectiveness Index (SEI) Classroom Effectiveness Index (CEI) Data Analysis, Reporting, and Research Services.
Measures of Central Tendency.  Parentheses  Exponents  Multiplication or division  Addition or subtraction  *remember that signs form the skeleton.
Calculating & Reporting Healthcare Statistics
Chapter 5 Time Series Analysis
Software Quality Control Methods. Introduction Quality control methods have received a world wide surge of interest within the past couple of decades.
Calculating & Reporting Healthcare Statistics
99th Percentile 1st Percentile 50th Percentile What Do Percentiles Mean? Percentiles express the percentage of students that fall below a certain score.
Chapter 3: Central Tendency
Today: Central Tendency & Dispersion
Measures of Central Tendency
Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the.
Classroom Assessment A Practical Guide for Educators by Craig A
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Introduction to the Georgia Student Growth Model Student Growth Percentiles 1.
Including a detailed description of the Colorado Growth Model 1.
Office of Institutional Research, Planning and Assessment January 24, 2011 UNDERSTANDING THE DIAGNOSTIC GUIDE.
1 New York State Growth Model for Educator Evaluation 2011–12 July 2012 PRESENTATION as of 7/9/12.
© 2008 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Descriptive Statistics
Analyzing and Interpreting Quantitative Data
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal.
New York State Scores 2011—2012 School Year. Growth Ratings and Score Ranges Growth RatingDescriptionGrowth Score Range (2011–12) Highly EffectiveWell.
1 Psych 5500/6500 Standard Deviations, Standard Scores, and Areas Under the Normal Curve Fall, 2008.
© 2007 Board of Regents of the University of Wisconsin System, on behalf of the WIDA Consortium WIDA Focus on Growth H Gary Cook, Ph.D. WIDA.
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
Chapter 3 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 3: Measures of Central Tendency and Variability Imagine that a researcher.
Measures of Central Tendency: The Mean, Median, and Mode
Understanding the Rhode Island Growth Model An Introductory Guide for Educators May 2012.
Central Tendency & Dispersion
1 Chapter 4 Numerical Methods for Describing Data.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Chapter 6: Analyzing and Interpreting Quantitative Data
THE NORMAL DISTRIBUTION AND Z- SCORES Areas Under the Curve.
The Normal distribution and z-scores
Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
PSAT 8/9: Understanding the Score Report
October 24, 2012 Jonathan Wiens, PhD Accountability and Reporting Oregon Department of Education.
Educational Research: Data analysis and interpretation – 1 Descriptive statistics EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.
Producing Data: Experiments BPS - 5th Ed. Chapter 9 1.
1 Outcome Measures for School Evaluation Coalition for Excellence in Science and Math Education.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Chapter 3 Fundamental statistical characteristics I: Measures of central tendency.
Copyright © 2014 American Institutes for Research and Cleveland Metropolitan School District. All rights reserved. March 2014 Interpreting Vendor Assessment.
Value Added Model Value Added Model. New Standard for Teacher EvaluationsNew Standard for Teacher Evaluations Performance of Students. At least 50% of.
1 New York State Growth Model for Educator Evaluation June 2012 PRESENTATION as of 6/14/12.
Theme 4. Measures of individual position
Student Growth Measurements and Accountability
Analyzing and Interpreting Quantitative Data
Chapter 8: Inference for Proportions
Dr. Robert H. Meyer Research Professor and Director
CORE Academic Growth Model: Results Interpretation
AchieveNJ: Teacher Evaluation Scoring Guide
CORE Academic Growth Model: Step-By-Step
CORE Academic Growth Model: Step-By-Step
15.1 The Role of Statistics in the Research Process
Presentation transcript:

The Many Challenges of Using Test Scores in Evaluation David E. W. Mott Tests for Higher Standards

Abstract The Many Challenges of Using Test Scores in Evaluation Teacher and school evaluations using test scores is upon us. Probably none of us asked for it, and few of us want it, but it is here! It will not go away, at least not soon. What can division staffs do to encourage reasonable outcomes? The first thing is to start planning. Plan in conjunction with the affected groups. The stakeholders that must be involved are teachers, school staff, and central staff. It is only by planning your course of action with these primary players that viable approaches will emerge. At the same time, it would be valuable to call in some outside experts. This presentation suggests that one party that should be involved is a local assessment provider. Whatever else is part of your plan, your division will probably need their services. This presentation will briefly describe what Tests for Higher Standards is doing in this area, as a prelude to a multiple-sided conversation as to the current status of work and possible courses of action.

Ways to Evaluate Test Scores For IndividualsFor Groups (averaged) Pass/Fail# Passing % Passing Scores# Correct % Correct Scaled Score Percentile (%ile) (Mean, Median, etc.) Score ChangeAmount of Change % Change Scale Score Change Percentile Change Possible ChangeAmount of Possible Change % of Possible Change Any of the above can be corrected for any number of combinations of input variables.

Corrections This is the typical form: Value ± Correction = Adjusted Value. The Correction is an amount determined by a characteristic or set of characteristics that pertain to the student population. Typically such things as gender/ethnic mix, poverty indices, mix of non-English speakers, school charactercs, etc.

Adjusted Value Meaning Adjusted Values are used to permit fair comparisons: Student to Student Class to Class Teacher to Teacher School to School District to District

Regression Equations The adjustments are commonly derived and applied through complex regression equations. They are usually quite specific to the populations for which they were originally derived: Virginia cannot use Tennessees adjustments. The equations also tend to be unstable over time.

Two types of Value-added Models So this is one type of Value-added Model. This could be called: Controlling for Named Input Characteristics. Virginia uses a different approach Which the VDOE calls: Student Growth Percentiles (SGP)

Student Growth Percentiles (SGPs) The VDOEs Student Growth Percentile scheme can be seen as a score adjustment, using one or more previous test scores as the adjusting variable. The presumption is that the students previous score summarizes all the other input variable adjustments.

Ways to Evaluate Test Scores For IndividualsFor Groups (averaged) Pass/Fail# Passing % Passing Scores# Correct % Correct Scaled Score Percentile (%ile) (Mean, Median, etc.) Score ChangeAmount of Change % Change Scale Score Change Percentile Change Possible ChangeAmount of Possible Change % of Possible Change Any of the above can be corrected for any number of combinations of input variables. Here we are and here

Why Use SGPs? They are conceptially simpler than regression-based growth models. SGPs require no vertical scaling and no test equating. (%iles are scaleless.) They are relatively simple to compute. SGP methods are not sensitive to the distributions of scores. Scores from year-to-year are likely to be quite stable.

Problems with the VDOEs SGPs for School Divisions Because the SOLs are adminis-tered in certain grades and subjects only, SGPs will not be available for all teachers. The States method of estimating SGP is complex and can lead to scores for some students not being used. (These are students who had very high SOL Scaled Scores.) SOLs are given only once / year.

A Problem (?) Associated with A L L Quantile Methods For every School, Teacher, Class, or Student, one half will always be below average. This is the very essence of the technique. There is no Lake Wobegone* effect. For this reason we need to look at the test score in conjunction with the SGP or any other quantile method. * Lake Wobegon, where all the women are strong, all the men are good looking, and all the children are above average.

One More Observation What should we do with students near the top of the distribution of scores? They cant be expected to grow very much; they have no place to go. They can only decline and some of them will. We can always simply state that students who start at the top will not be averaged. Or we can look at one other possibility the Amount or % of Possible Change.

The Amount or Percentage of Possible Change What to do with students near the top of the distribution of scores? They cant be expected to grow very much; they have no place to go. They can only decline and some of them will. We can always simply state that students who start at the top will not be averaged. Or we can look at one other possibility...

Ways to Evaluate Test Scores For IndividualsFor Groups (averaged) Pass/Fail# Passing % Passing Scores# Correct % Correct Scaled Score Percentile (%ile) (Mean, Median, etc.) Score ChangeAmount of Change % Change Scale Score Change Percentile Change Possible ChangeAmount of Possible Change % of Possible Change Any of the above can be corrected for any number of combinations of input variables. Now here

Back out of the clouds...

What Can a School Division Do ? TfHS / ROSworks suggests the following: Use the states method, but simplify it. We have proposed something we call Student Growth Deciles (SGD). Instead of percentile groups we use ten decile groups. We only track two previous score years. We use actual groups rather than statistical estimations of those groups.

Some Details about our SGD Method First, administer a pre-test. (We call it the Base Test.) Score the pretest and break up the students into ten score groups. Then, collapse the two top and the two bottom groups together, as test scores are least reliable at the two extremes. Instruction occurs here. Next, administer a post-test. (The Growth Test) Within each pretest group, sort all the students into ten (or 8) new SGD groups on the basis of their post-test scores. These are their SGD scores. They represent their growth. continue...

Continued Some Details about our SGD Method For each analysis unit (School, Teacher, or Class), compute the average of the SGD scores of the students assigned. This number (rounded perhaps to one decimal) is the score for the school, teacher, or class. For demographic analysis units: gender, ethnic, ELL groups, AMOs, etc., the averages can be computed in the same way. All of the proceeding steps should be calculated separately for each grade and subject area.

TfHS / ROSworks SGD Coding Scheme

What Might a Report Look Like? For each analysis unit (School, Teacher, or Class), compute an average of the SGD scores of the students assigned. This number (rounded perhaps to one decimal) is the score for the school, teacher, or class. For demographic analysis units: gender, ethnic, ELL groups, AMOs, etc., the averages can be computed in the same way. All of the proceeding steps should be calculated separately for each grade and subject area.

TfHS / ROSworks SGD Example Report

One Last Vital Part ! Involve the entire affected staff in your evaluation project. If you do, they will probably support it, work within it, and make it continually better. If those affected are not involved, a good outcome is in jeoprody. When TfHS / ROSworks proposed the process reported here as a part of our companys response to the VDOEs recent RFP, we indicated that we wished to be included in both the planning and implementation processes with any contracted School Division to be a part of the solution and not a part of the problem.

Contact Information David E. W. Mott, PhD ROSworks LLC -- Reports Online System (ROS) Tests for Higher Standards (TfHS) 5310 Markel Road, Suite 104 Richmond, VA USA toll free fax Discussion Forum & Blog