The Source of Lake Wobegon By Richard P. Phelps (c)2007-2012, Richard P. Phelps.

Slides:



Advertisements
Similar presentations
Standardized Tests: What Are They? Why Use Them?
Advertisements

Teacher Effectiveness in Urban Schools Richard Buddin & Gema Zamarro IES Research Conference, June 2010.
A Guide for IEP Teams Including Students with Disabilities in State and District- wide Assessment.
VALUE – ADDED 101 Ken Bernacki and Denise Brewster.
Standardized Tests What They Measure How They Measure.
A Closer Look at Verifying the Integrity of NAEP Commissioner Jack Buckley National Assessment Governing Board Meeting August 5, 2011.
High Stakes Tests -Lorrie Shepard, UC-Boulder Discussion by: Trudy Samuelson Rhonda Martin Leland Jacobs Dani Ladwig.
Chapter Fifteen Understanding and Using Standardized Tests.
Wednesday, September 10 th 8:00 AM and 6:00 PM.  Prayer  Introductions  Donna Palmer – 4 th grade teacher  Marlece Davis – Middle School History.
C R E S S T / Harvard Daniel Koretz Harvard Graduate School of Education National Center for Research on Evaluation, Standards, and Student Testing “Believe.
South Carolina Alternate Assessment (SC-Alt) Advisory Committee September 28,
By: Michele Leslie B. David MAE-IM WIDE USAGE To identify students who may be eligible to receive special services To monitor student performance from.
Lindsay Chase-Lansdale, Andrew Cherlin and Kathleen Kiernan
Using School Climate Surveys to Categorize Schools and Examine Relationships with School Achievement Christine DiStefano, Diane M. Monrad, R.J. May, Patricia.
Tests and Measurement Donna Sundre, EdD Robin D. Anderson, PsyD.
Assessing Achievement and Aptitude
Achievement Testing Dale Pietrzak, Ed.D., LPC-MH, CCMHC University of South Dakota Counseling & Psychology in Education.
Student Achievement and Predictors of Student Achievement in a State Level Agricultural Mechanics Career Development Event Edward Franklin Glen Miller.
MEASUREMENT AND EVALUATION
Chapter 14 Understanding and Using Standardized Tests Viewing recommendations for Windows: Use the Arial TrueType font and set your screen area to at least.
Standardized Tests. Standardized tests are commercially published tests most often constructed by experts in the field. They are developed in a very precise.
Norm-Referenced and Criterion- Referenced Assessments A Historical view from 1900 to the Present.
Testing in the classroom: Using tests to promote learning Richard P. Phelps Universidad Finis Terrae, Santiago, Chile January 7, 2014.
Gilbert Primary School A Title One School. Agenda for Evening Introductions Administrators Teachers Ways to help at home Title 1 Information Time in Classrooms.
CAHSEE California High School Exit Exam. OVERVIEW Purpose of the CAHSEE Purpose of the CAHSEE Background Background Contents of the CAHSEE Contents of.
Think of a topic to study Review the previous literature and research Develop research questions and hypotheses Specify how to measure the variables in.
Topic 4: Formal assessment
John Cronin, Ph.D. Director The Kingsbury NWEA Measuring and Modeling Growth in a High Stakes Environment.
AFT 7/12/04 Marywood University Using Data for Decision Support and Planning.
How to Interpret Test Scores. 1. What are standardized tests?  A standardized test is one that is administered under standardized or controlled conditions.
High Stakes Testing EDU 330: Educational Psychology Daniel Moos.
Understanding and Using Standardized Tests
+ Equity Audit & Root Cause Analysis University of Mount Union.
Are there “Hidden Variables” in Students’ Initial Knowledge State Which Correlate with Learning Gains? David E. Meltzer Department of Physics and Astronomy.
The Genetics Concept Assessment: a new concept inventory for genetics Michelle K. Smith, William B. Wood, and Jennifer K. Knight Science Education Initiative.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Instruction, Teacher Evaluation and Value-Added Student Learning Minneapolis Public Schools November,
Measuring of student subject competencies by SAM: regional experience Elena Kardanova National Research University Higher School of Economics.
Diagnostics Mathematics Assessments: Main Ideas  Now typically assess the knowledge and skill on the subsets of the 10 standards specified by the National.
The Achievement Benefits of Standardized Testing (c) Richard P. Phelps (c) 2003, by Richard P. Phelps.
Understanding ITBS Scores A Guide for Parents Created By: Ginger Psalmonds/CST New Prospect Elementary.
0 Michele Sonnenfeld NAEP State Coordinator Florida Department of Education October 2006 Florida Association of Science Supervisors.
Session 7 Standardized Assessment. Standardized Tests Assess students’ under uniform conditions: a) Structured directions for administration b) Procedures.
The Source of Lake Wobegon By Richard P. Phelps (c) , Richard P. Phelps.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 5: Introduction to Norm- Referenced.
High stakes tests Does testing really contribute to students’ learning and academic achievement?
CREATE – National Evaluation Institute Annual Conference – October 8-10, 2009 The Brown Hotel, Louisville, Kentucky Research and Evaluation that inform.
Mini-Project #2 Quality Criteria Review of an Assessment Rhonda Martin.
Assessment Parents Due Process Title 6 and ELL Using Assessment to Identify Evaluating Formally –IQ –Achievement Evaluating Informally –tying into instruction.
Iowa Test of Basic Skills – ITBS Informational Meeting Spring 2009 Al-Hedayah Academy Assessment.
Understanding ITBS Scores A Parent Guide. Overview ITBS testing is done in October at BMS. The testing is a “norm-referenced” test that compares a student’s.
Teaching the Control of Variables Strategy in Fourth Grade Classrooms Robert F. Lorch, Jr., William J. Calderhead, Emily E. Dunlap, Emily C. Hodell, Benjamin.
Parent Workshop Year 2 Assessment without levels January 2016.
The Normal Distribution and Norm-Referenced Testing Norm-referenced tests compare students with their age or grade peers. Scores on these tests are compared.
C R E S S T / CU University of Colorado at Boulder National Center for Research on Evaluation, Standards, and Student Testing Design Principles for Assessment.
1/27 CRESST/University of Colorado at Boulder A Brief History of Test-Based Accountability Lorrie A. Shepard CRESST Conference UCLA, Los Angeles, CA January.
Assessment Assessment is the collection, recording and analysis of data about students as they work over a period of time. This should include, teacher,
San Luis Valley Gifted Education Network Meeting October 17, 2013.
Understanding ITBS Scores A Parent Guide. Overview ITBS testing is given in the Fall of each year to students in grades 1 – 8 enrolled in Crosscreek Charter.
Chapter 11 Effective Grading in Physical Education 11 Effective Grading in Physical Education C H A P T E R.
Chapter 18 Assessment Issues in Education
Concept of Test Validity
Starter - Whiteboards Strengths & Limitations of documents…
Understanding ITBS Scores
The Source of Lake Wobegon
Understanding ITBS Scores
Weschler Individual Achievement Test
Understanding and Using Standardized Tests
What is does it mean to be a Title I School?
Why We Should be Skeptical about the Common Core
Presentation transcript:

The Source of Lake Wobegon By Richard P. Phelps (c) , Richard P. Phelps

“Welcome to Lake Wobegon, where all the women are strong, all the men are good-looking, and all the children are above average.” - Garrison Keillor, A Prairie Home Companion

Residency in rural West Virginia, 1980s Surprised by claims that state and school district scored “above average” on national tests Investigated, found that all 50 states claimed to be “above average” John J. Cannell, M.D.

Cannell’s suspects Outdated or invalid norms Lax security Deliberate educator manipulation –Showing test items to teachers beforehand –Keeping test forms around for years –Misleading reporting, etc.

CRESST’s suspects Outdated or invalid norms High stakes, that induce “teaching to the test” (i.e., test coaching) (This hypothesis now generally accepted as accurate among K-12 education researchers)

“We know that tests that are used for accountability tend to be taught to in ways that produce inflated scores.” - Dan Koretz, CRESST, 1992 “Corruption of indicators is a continuing problem where tests are used for accountability or other high-stakes purposes.” - Robert Linn, CRESST, 2000

Explanations for Spuriously High Achievement Scores From Responses to CannelI in Educational Measurement: Issues and Practice (1988) Authors:ABCDEF Inadequate normsXXXX Outdated normsXXXXX Curriculum alignmentXXX High stakes pressureXX Teaching the testXXX Incomplete population testedXXX Inappropriate comparisonsXX

More left-out- variable bias Linn (2000) cites higher gains on Title 1 pre-post testing over 9 months than over 12 as evidence of inflation –Does not consider 3 months of forgetting CRESST study (1991) in one school district also cited as evidence of inflation –Does not consider curricular misalignment, motivation, test security, variation in stakes

Examining the high- stakes-cause-score- inflation hypothesis “Strong” version of hypothesis: –There are no rival hypotheses “Weak” version of hypothesis: –More inflation in grades closer to stakes –Test coaching increases scores –Correlation between stakes and inflation

State percentile difference between: Cannell’s NRTs (late ‘80s) & Math NAEP (’90 or ’92) Defining “test-score inflation”

Testing the strong hypothesis 1 State rotated items?yes no Average “score inflation” Level of test security lax medtight Average “score inflation”

Testing the strong hypothesis 2 Moreover… Cannell found score inflation in elementary school tests in dozens of states – none of those tests had high stakes. Cannell also found score inflation in secondary school tests in dozens of states – only one had high stakes.

Test Security in South Carolina: score-inflated test Cannell, 1989, p.89: “Unlike their other two tests, teachers are allowed to look at test booklets, teachers may obtain test booklets before the day of testing, booklets are not sealed, and testing is not routinely monitored by state officials. Outside test proctors are not used, test questions have not been rotated every year, and answer sheets have not been scanned for suspicious erasures or analyzed for cluster variance. There are no state regulations that govern test security and test administration for norm-referenced testing done independently in the local school districts.”

Test Security In South Carolina: two high-stakes tests Cannell, 1989, p.89: “South Carolina also administers a graduation exam and a criterion referenced test, both of which have significant security measures. Teachers are not allowed to look at either of these two test booklets, teachers may not obtain booklets before the day of testing, the graduation test booklets are sealed, testing is routinely monitored by state officials, special education students are generally included in all tests used in South Carolina unless their IEP recommends against testing, outside test proctors administer the graduation exam, and most test questions are rotated every year on the criterion referenced test.”

Tomāto Tomăto Is the high-stakes-cause-test-score-inflation hypothesis caused by semantic distortion? “Tests are ‘high-stakes’ when: teachers feel judged by the results?” parents receive reports of their child’s test scores?” test scores are widely reported in the newspapers?”

“High-stakes test. A test used to provide results that have important, direct consequences for examinees, programs, or institutions involved in the testing.” (p.176) “Low-stakes test. A test used to provide results that have only minor or indirect consequences for examinees, programs, or institutions involved in the testing.” (p.178) Standards for Educational and Psychological Testing:

Shortcomings of Cannell’s studies Responses to his survey of state test security practices do not always specify which practices apply to which tests in states that administered more than one He calculated score trends for NRTs and, with one exception, not for standards-based tests

Testing the weak hypothesis 1 Q. Do grade levels closer to high-stakes event (e.g., high school graduation exam) show greater score increases? Yes, in “washback” studies of: John Bishop (1997), Linda Winfield (1990), Norm Fredericksen (1994) No, in Cannell’s data

Q. Why disparate results? A. Low-stakes comparison tests differed Washback studies used untraceable, sample-based tests, administered with tight security (TIMSS, NAEP) Cannell used traceable NRTs administered with lax security

Testing the weak hypothesis 2 Q. Is there direct evidence that test coaching raises test scores? A. No, see Powers (1993), Becker (1990), Powers & Rock (1994), Camara (2001), etc.

Testing the weak hypothesis 3 Perhaps low-stakes tests are subject to score inflation where a jurisdiction administers a separate high-stakes test, thereby creating a general environment of high-stakes pressure?

Q. High-stakes, score inflation related? A. Maybe negatively. Coef S.E. t p Intercept NAEP %-ile score Item rotation? Level of security? High-stakes?

Pink squares: states with a high-stakes test Blue diamonds: states without any high-stakes test

Two types of tests resist score inflation: 1. Those untraceable to individual jurisdictions or schools (no incentive to cheat) 2. Those with tight security and ample item rotation (no opportunity to cheat) Traceable tests lacking security and item rotation are candidates for score inflation

Motive is only present with traceable tests. Means and opportunity exist only in the absence of security measures and item rotation. Artificial test score gains (score inflation) are caused by neglect, incompetence, or deliberate educator manipulation, but always require means and opportunity.