Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Source of Lake Wobegon By Richard P. Phelps (c)2007-2012, Richard P. Phelps.

Similar presentations


Presentation on theme: "The Source of Lake Wobegon By Richard P. Phelps (c)2007-2012, Richard P. Phelps."— Presentation transcript:

1 The Source of Lake Wobegon By Richard P. Phelps (c)2007-2012, Richard P. Phelps

2 “Welcome to Lake Wobegon, where all the women are strong, all the men are good-looking, and all the children are above average.” - Garrison Keillor, A Prairie Home Companion

3 Residency in rural West Virginia, 1980s Surprised by claims that state and school district scored “above average” on national tests Investigated, found that all 50 states claimed to be “above average” John J. Cannell, M.D.

4 Cannell’s suspects Outdated or invalid norms Lax security Deliberate educator manipulation –Showing test items to teachers beforehand –Keeping test forms around for years –Misleading reporting, etc.

5 CRESST’s suspects Outdated or invalid norms High stakes, that induce “teaching to the test” (i.e., test coaching) (This hypothesis now generally accepted as accurate among K-12 education researchers)

6 “We know that tests that are used for accountability tend to be taught to in ways that produce inflated scores.” - Dan Koretz, CRESST, 1992 “Corruption of indicators is a continuing problem where tests are used for accountability or other high-stakes purposes.” - Robert Linn, CRESST, 2000

7 Explanations for Spuriously High Achievement Scores From Responses to CannelI in Educational Measurement: Issues and Practice (1988) Authors:ABCDEF Inadequate normsXXXX Outdated normsXXXXX Curriculum alignmentXXX High stakes pressureXX Teaching the testXXX Incomplete population testedXXX Inappropriate comparisonsXX

8 More left-out- variable bias Linn (2000) cites higher gains on Title 1 pre-post testing over 9 months than over 12 as evidence of inflation –Does not consider 3 months of forgetting CRESST study (1991) in one school district also cited as evidence of inflation –Does not consider curricular misalignment, motivation, test security, variation in stakes

9 Examining the high- stakes-cause-score- inflation hypothesis “Strong” version of hypothesis: –There are no rival hypotheses “Weak” version of hypothesis: –More inflation in grades closer to stakes –Test coaching increases scores –Correlation between stakes and inflation

10 State percentile difference between: Cannell’s NRTs (late ‘80s) & Math NAEP (’90 or ’92) Defining “test-score inflation”

11 Testing the strong hypothesis 1 State rotated items?yes no Average “score inflation”9.310.0 Level of test security lax medtight Average “score inflation” 10.6 9.7 8.9

12 Testing the strong hypothesis 2 Moreover… Cannell found score inflation in elementary school tests in dozens of states – none of those tests had high stakes. Cannell also found score inflation in secondary school tests in dozens of states – only one had high stakes.

13 Test Security in South Carolina: score-inflated test Cannell, 1989, p.89: “Unlike their other two tests, teachers are allowed to look at test booklets, teachers may obtain test booklets before the day of testing, booklets are not sealed, and testing is not routinely monitored by state officials. Outside test proctors are not used, test questions have not been rotated every year, and answer sheets have not been scanned for suspicious erasures or analyzed for cluster variance. There are no state regulations that govern test security and test administration for norm-referenced testing done independently in the local school districts.”

14 Test Security In South Carolina: two high-stakes tests Cannell, 1989, p.89: “South Carolina also administers a graduation exam and a criterion referenced test, both of which have significant security measures. Teachers are not allowed to look at either of these two test booklets, teachers may not obtain booklets before the day of testing, the graduation test booklets are sealed, testing is routinely monitored by state officials, special education students are generally included in all tests used in South Carolina unless their IEP recommends against testing, outside test proctors administer the graduation exam, and most test questions are rotated every year on the criterion referenced test.”

15 Tomāto Tomăto Is the high-stakes-cause-test-score-inflation hypothesis caused by semantic distortion? “Tests are ‘high-stakes’ when: teachers feel judged by the results?” parents receive reports of their child’s test scores?” test scores are widely reported in the newspapers?”

16 “High-stakes test. A test used to provide results that have important, direct consequences for examinees, programs, or institutions involved in the testing.” (p.176) “Low-stakes test. A test used to provide results that have only minor or indirect consequences for examinees, programs, or institutions involved in the testing.” (p.178) Standards for Educational and Psychological Testing:

17 Shortcomings of Cannell’s studies Responses to his survey of state test security practices do not always specify which practices apply to which tests in states that administered more than one He calculated score trends for NRTs and, with one exception, not for standards-based tests

18 Testing the weak hypothesis 1 Q. Do grade levels closer to high-stakes event (e.g., high school graduation exam) show greater score increases? Yes, in “washback” studies of: John Bishop (1997), Linda Winfield (1990), Norm Fredericksen (1994) No, in Cannell’s data

19 Q. Why disparate results? A. Low-stakes comparison tests differed Washback studies used untraceable, sample-based tests, administered with tight security (TIMSS, NAEP) Cannell used traceable NRTs administered with lax security

20 Testing the weak hypothesis 2 Q. Is there direct evidence that test coaching raises test scores? A. No, see Powers (1993), Becker (1990), Powers & Rock (1994), Camara (2001), etc.

21 Testing the weak hypothesis 3 Perhaps low-stakes tests are subject to score inflation where a jurisdiction administers a separate high-stakes test, thereby creating a general environment of high-stakes pressure?

22 Q. High-stakes, score inflation related? A. Maybe negatively. Coef S.E. t p Intercept45.7010.20 4.480.0004 NAEP %-ile score -0.55 0.15-3.720.0020 Item rotation? 0.57 2.94 0.190.8501 Level of security? 0.85 1.66 0.520.6141 High-stakes? -6.47 3.51-1.840.0853

23 Pink squares: states with a high-stakes test Blue diamonds: states without any high-stakes test

24 Two types of tests resist score inflation: 1. Those untraceable to individual jurisdictions or schools (no incentive to cheat) 2. Those with tight security and ample item rotation (no opportunity to cheat) Traceable tests lacking security and item rotation are candidates for score inflation

25 Motive is only present with traceable tests. Means and opportunity exist only in the absence of security measures and item rotation. Artificial test score gains (score inflation) are caused by neglect, incompetence, or deliberate educator manipulation, but always require means and opportunity.

26 http://www.nonpartisaneducation.org/Review/Articles/v1n2.htm


Download ppt "The Source of Lake Wobegon By Richard P. Phelps (c)2007-2012, Richard P. Phelps."

Similar presentations


Ads by Google