Presentation on theme: "Reading Assessment: Still Time for a Change"— Presentation transcript:
1 Reading Assessment: Still Time for a Change P. David PearsonUC BerkeleyProfessor and Former Dean•KiaoraThank the IRA and especially Pat Edwards for inviting me (3rd time I’ve had a chance to address a world congress as a plenary speaker)—testament to longevity•Thank Heather Bell and her colleagues for the delightful venue, program, and organization.•Best word on the slide is formerFormerSlides available at
2 Why did I pick such a boring topic? I’m a professor!Who needs fun?The consequences are too grave.I have a perverse standard of fun.I’m a professor and I think boring topics are good for people!In general, I don’t like to see people have a good time.Whether the task is exciting, engaging or deadly dull, the consequences of not getting assessment right are too grave to allow us to postpone the task any longer. We must, at all costs, stop the irreparable harm that otherwise gets done—to students, parents, teachers, schools, and curriculum.It could be fun, in a perverse way, if we could turn this assessment problem on its ear, get it right, make it an ally of teaching and learning rather than an enemy.One thing you should all know at the outset is that there are 87 slides in this presentation, so those of you who are compulsive notetakers this is a real challenge. Those who want to relax can download the slides atSlides available at
3 Valencia and Pearson (1987) Reading Assessment: Time for a Change Valencia and Pearson (1987) Reading Assessment: Time for a Change. In Reading TeacherA set of contrasts between cognitively oriented views of reading and prevailing practices in assessing reading circa 1986New views of the reading process tell us that . . .Yet when we assess reading comprehension, we . . .Prior knowledge is an important determinant of reading comprehension.Mask any relationship between prior knowledge and reading comprehension by using lots of short passages on lots of topics.A complete story or text has structural and topical integrity.Use short texts that seldom approximate the structural and topical integrity of an authentic text.Inference is an essential part of the process of comprehending units as small as sentences.Rely on literal comprehension text items.The diversity in prior knowledge across individuals as well as the varied causal relations in human experiences invites many possible inferences to fit a text or question.Use multiple-choice items with only one correct answer, even when many of the responses might, under certain conditions, be plausible.The ability to vary reading strategies to fit the text and the situation is one hallmark of an expert reader.Seldom assess how and when students vary the strategies they use during normal reading, studying, or when the going gets tough.The ability to synthesize information from various parts of the text and different texts is hallmark of an expert reader.Rarely go beyond finding the main idea of a paragraph or passage.The ability to ask good questions of text, as well as to answer them, is hallmark of an expert reader.Seldom ask students to create or select questions about a selection they may have just read.All aspects of a reader’s experience, including habits that arise from school and home, influence reading comprehension.Rarely view information on reading habits and attitudes as being as important information about performance.Reading involves the orchestration of many skills that complement one another in a variety of ways.Use tests that fragment reading into isolated skills and report performance on each.Skilled readers are fluent; their word identification is sufficiently automatic to allow most cognitive resources to be used for comprehension.Rarely consider fluency as an index of skilled reading.Learning from text involves the restructuring, application, and flexible use of knowledge in new situations.Often ask readers to respond to the text’s declarative knowledge rather than to apply it to near and far transfer tasks.1987 article, Sheila Valencia and I tried to convince the reading field and at least America’s policy makers that it was time for a change in the nature, development, and format of assessments IF we were to rely on them as the friend, not the enemy, of teaching and learning.We noted 11 discrepancies between what we knew to be true and reading and what we knew to be the state of the art in reading assessment. Hence the title, Reading Assessment: Time for a Change.Today I will wonder with you whether it is still time for a change.Short answer, yes. Now let me tell you why.
4 New views of the reading process tell us that . . . Yet when we assess reading comprehension, we . . .Prior knowledge is an important determinant of reading comprehension.Mask any relationship between prior knowledge and reading comprehension by using lots of short passages on lots of topics.A complete story or text has structural and topical integrity.Use short texts that seldom approximate the structural and topical integrity of an authentic text.Inference is an essential for comprehending units as small as sentences.Rely on literal comprehension text items.Side effect: privilege those with highest general verbal abilitySnippets or textoids
5 New views of the reading process tell us that . . . Yet when we assess reading comprehension, we . . .The diversity in prior knowledge across individuals as well as the varied causal relations in human experiences invites many possible inferences to fit a text or question.Use multiple-choice items with only one correct answer, even when many of the responses might, under certain conditions, be plausible.The ability to synthesize information from various parts of the text and different texts is hallmark of an expert reader.Rarely go beyond finding the main idea of a paragraph or passage.The ability to vary reading strategies to fit the text and the situation is one hallmark of an expert reader.Seldom assess how and when students vary the strategies they use during normal reading, studying, or when the going gets tough.
6 What is thinking?You do it in your head, without a pencil..Alexandra, age 4You shouldn’t do it in the dark. It’s too scary, Thomas, age 5Speaking of being metacognitive and strategic, one of the things we used to do in those days was to interview a lot of kids. And when we did, we asked them questions like, what is reading? What is thinking?In doing the research for this talk, I ran across some old overheads (remember those), and dumped them into a power point for the first time in their natural lives. Kids’ responses to this question, What is thinking?
7 What is Thinking?Thinking is when you’re doing math and getting the answers right, Sissy, age 5And in response…NO! You do the thinking when you DON’T know the answer. Alex, age 5
8 What is Thinking?It’s very, very easy. The way you do it is just close your eyes and look inside your head. Robert, age 4
9 What is Thinking? You think before you cross the street! What do you think about?You think about what you would look like smashed up! Leon, age 5
10 What is Thinking? You have to think in swimming class. About what? About don’t drink the water because maybe someone peed in it…and don’t drown!
11 New views of the reading process tell us that . . . Yet when we assess reading comprehension, we . . .The ability to ask good questions of text, as well as to answer them, is hallmark of an expert reader.Seldom ask students to create or select questions about a selection they may have just read.All aspects of a reader’s experience, including habits that arise from school and home, influence reading comprehension.Rarely view information on reading habits and attitudes as being as important information about performance.Reading involves the orchestration of many skills that complement one another in a variety of ways.Use tests that fragment reading into isolated skills and report performance on each.Did some work on number 1Habits are important outcomesScott Paris’ important work on constrained and unconstrained skills.
12 New views of the reading process tell us that . . . Yet when we assess reading comprehension, we . . .Skilled readers are fluent; their word identification is sufficiently automatic to allow most cognitive resources to be used for comprehension.Rarely consider fluency as an index of skilled reading.Learning from text involves the restructuring, application, and flexible use of knowledge in new situations.Often ask readers to respond to the text’s declarative knowledge rather than to apply it to near and far transfer tasks.
13 Why did We Take this Stance? Need a little mini-history of assessment to understand our motivesSlides available at
14 The Scene in the US in the 1970s and early 1980s Behavioral objectivesMastery LearningCriterion referenced assessmentsCurriculum-embedded assessmentsMinimal competency tests: New JerseyStatewide assessments: Michigan & MinnesotaSlides available at
15 Historical relationships between instruction and assessment Skill 1TeachAssessConcludeSkill 2TeachAssessConcludeBloom’s notion of mastery learning was that if you could just be sufficiently transparent and explicit about the nature of the task and the criterion for demonstrating mastery of it, a lot more people would be able to demonstrate mastery of it.Bloom argued that we usually fix the instruction and allow performance outcomes to vary.He wanted us to fix the outcome, and allow instruction to vary.Got perverted into all these bits and pieces of low level skills, not the big stuff like comprehension or composition.The 1970s Skills management mentality: Teach a skill, assess it for mastery, reteach it if necessary, and then go onto the next skill.Foundation: Benjamin Bloom’s ideas of mastery learning
16 Skill 1TeachAssessConcludeThe 1970s, cont.And we taught each of these skills until we had covered the entire curriculum for a grade level.Skill 2TeachAssessConcludeSkill 3TeachAssessConcludeSkill 4TeachAssessConclude1972 White Bear Lake MinnesotaSkill 5TeachAssessConcludeSkill 6TeachAssessConclude
17 Dangers in the Mismatch we Saw in 1987 False sense of security.Instructionally insensitive to progress on new curriculaAccountability will do us in and force us to teach to the testsand all the bits and pieces.We’ll feel good about teaching to specific skill tests when what we need our tests that challenge students to think.Tests will be insensitive to progress on the higher order thinking agenda implied by the new curriculumAs accountability increases, we’ll see more teaching to the test rather than teaching to our highest ideals.
18 Pearson’s First Law of Assessment The finer the grain size at which we monitor a process like reading and writing, the greater the likelihood that we will end up teaching and testing bits and pieces rather than global processes like comprehension and composition.As an aside, one of the things we learned in that period, but we could never manage to make stick, was thatThe finer the grain size at which we monitor a process like reading and writing, the greater the likelihood that we will end up teaching and testing bits and pieces rather than global processes like comprehension and composition
19 The ideal The best possible assessment teachers observe and interact with studentsas they read authentic texts for genuine purposes.they evaluate the way in which the students construct meaning.intervening to provide support or suggestions when the students appear to have difficulty.Given such a view. the best possible assessment of reading would seem to occur when teachers observe and interact with students as they read authentic texts for genuine purposes. As teachers interact with students, they evaluate the way in which the students construct meaning. intervening to provide support or suggestions when the students appear to have difficulty
20 Pearson’s Second Law of Assessment An assessment tool is valued to the degree that it can approximate the good judgment of a professional teacher!So, anything that falls short of the ideal should be evaluated according to how close it comes to that ideal.A M/C test that correlates highly with an informal approach is to be valued more highlyWe should be explicitly mindful of the shortcomings of all surrogates for the “real thing”
21 A new conceptualization of the goal FeatureLevel of Decision-MakingBeyond SchoolSchoolClassroomIndividualAccuracyIRI or UnitTest or NRTTestIRIFluencyWord MeaningNorm RefencedUnit or NRTUnitUnit assessmentComprehensionNRTIRI or unit activitiesCritiquePerformDiscussionResponseEssayWhat Sheila and I proposed in this article and some others was this: Educators should select some aspects of reading that are worth monitoring and then decide how to monitor them—what tools to use--at each level in a system. And those levels vary from an individual student to an entire district, school authority, state or province or even nation.Notice that I did not include Norm Referenced Tests in every row. Why, because I think there are some aspects of reading that can never be assessed in anything short of direct performance.Time for every purpose under heaven principle.
22 A 1987 Agenda for the FutureAnother way to look at these issues is to imagine that the assessment system has many clients, and each client has decisions to make and questions to answer. Our job as assessment system designers is to help each client make critical decisions in as valid a manner as possible, with the least possible harm done to any individual or aggregation in the system
23 Pearson’s Third Law of Assessment When we ask an assessment to serve a purpose for which it was not designed, it is likely to crumble under the pressure, leading to invalid decisions and detrimental consequences.A time to every purpose under HeavenExactly what we do when we milk the scores on a standardized test, looking for diagnostic value. A test might be perfectly well-suited to monitoring progress or evaluating programs. That does not mean it will help us figure out what to do next for an individual child. Time to stop making silk purses of out sows’ ears. Nor by the way, should we try to make sows’ ears out of silk purses.
24 Early 1990s in the USA Standards based reform State initiativesIASA modelTrading flexibility for accountabilityMove from being accountable for the means and leaving the ends up for grabs (doctor or lawyer model) TOBeing accountable for the ends and leaving the means up for grabs (carpenter or product model)Just a promissory note: When NCLB came into being 8 years later, this bargain of flexibility for accountability disappeared. So let’s watch out for that to see how it happened.
25 Mid 1990s DevelopmentsAssessment got situated within the standards movementContent Standards: Know and be able to do?Performance Standards: What counts?Opportunity to Learn Standards: Quid pro quo?Assessment got situated within the Standards Movement that took off around the globe.Content Standards: Know and be able to doPerformance Standards: What counts as evidence of meeting the content standards?Opportunity to Learn Standards: What do we have to provide to kids and teachers to achieve the content and performance standards? Somehow got left behind.
26 Standards-Based Reform The Initial Theory of Action AssessmentAccountabilityClearExpectationsMotivationHigherStudentLearningWe began our work with the same set of assumptions about standards-based reform that undergirded the IASA of The theory of action, to use our chair Dick Elmore’s favorite term, was that if you put in place a standards based accountability system (comprised of standards and assessments and the accountability requirement), that will be sufficient to drive the reform engine. The standards determine the content, the assessments make the expectations clear to all, and the accountability system provides the motivation to improve. The final ingredient, which is a critical assumption in this classic standards based reform mode,l is flexibility; that is, in return for being accountable, schools and teachers will be granted wide rein in the processes, strategies, and methods they use to improve student learning.But the studies we reviewed and the experiences of our committee members suggested that this model does not necessarily achieve the goal of higher student learning. Too often, for example, a probationary or reconstituted school threatened with takeover or severe penalties will focus on improving scores rather than changing instruction.We also found evidence that assumptions in this model did not correspond to reality; namely the assumption that teachers would develop improved practices if they had both the freedom and the motivation to do so. Changes in practice, we found, seldom occurred without intentional and arduous effort on behalf of school leaders.Ala Tucker and Resnick in the early 1990s
27 Expanded Theory of Action StandardsAssessmentAccountabilityClearExp’sMotivationInstructionProfessionalDevelopmentHigherStudentLearningSo we expanded our theory of action to match what the research we reviewed and the experiences we shared told us.In our expanded theory of action, two key elements are inserted between the clear expectations provided by assessments and the motivation provided by accountability on the one side and student learning on the other. And those two elements are instruction and professional development. The implication here is that standards, assessment, and accountability are not enough, that standards have to be explicitly and deliberately transformed into instructional practices and that professional development is the pathway to improved instruction.Only then, our work told us, would student learning improve in the way the theory predicts it shouldAla Elmore and Resnick in the late 1990s
28 The Golden Years of the 90s? A flying start in the late 1980s and early 1990sInternational activity in Europe, Down Under, North AmericaDevelopmental RubricsPerformance TasksNew StandardsCLASPortfolios of Various SortsStorage binsShowcase: best workCompliance: Walden, NYCIncrease the use of constructed response items in NRTs
29 Late 1980s/early 1990s: Portfolios Performance Assessments Make Assessment Look Like Instruction From which we drawConclusionsActivitiesOn standards 1-nIn the late 1980s, building on all the good work on performance assessment, portfolio assessment, developmental rubrics, and the like that was begun a decade or two earlier in New Zealand and Australia and in pockets in Europe and North America, we began to experiment with these forms of assessment in the US.The key to the whole system was to tighten the link between instruction and assessment by making assessment look more like instruction rather than the other way round.Some in the movement took the point of view that as long as we were going to teach to the tests, we might as well have tests worth teaching to. That proved a fatal flaw in the movement because, as I will point out later, it is high stakes, not necessarily the format of the test, that are the evil that lurks in the heart of assessment.But for a few years, from roughly 1991 or 92 through 96 or 97, at least in the US, we experienced a proliferation of alternative assessmentsWe engage in instructional activities, from which we collect evidence which permits us to draw conclusions about student growth or accomplishment on several dimensions (standards) of interest.
30 The complexity of modern assessment practices: one to many Activity XAny given activity may offer evidence for many standards, e.g, responding to a story.Standard 1Standard 2Standard 3This was very exciting because it meant that you could use artifacts from your classroom—student work—as evidence that students had mastered important standards.Standard 4Standard 5
31 The complexity of performance assessment practices: many to one Standard XActivity 1For any given standard, there are many activities from which we could gather relevant evidence about growth and accomplishment, e.g., reads fluentlyActivity 2Activity 3This by the way is the real meaning of curriculum embedded assessments. Instruction as an occasion for assessment.Activity 4Activity 5
32 The complexity of portfolio assessment practices, many to many Activity 1Standard 1Activity 2Standard 2Standard 3Activity 3Activity 4Standard 4By the way, it is complexity that among other things probably accounts for the demise of this family of alternative approaches.Activity 5Standard 5Any given artifact/activity can provide evidence for many standardsAny given standard can be indexed by many different artifacts/activities
33 The perils of performance assessment: or maybe those multiple-choice assessments aren’t so bad after all…….Thunder is a rich source of loudness"Nitrogen is not found in Ireland because it is not found in a free state"
34 The perils of performance assessment "Water is composed of two gins, Oxygin and Hydrogin. Oxygin is pure gin. Hydrogin is gin and water.”"The tides are a fight between the Earth and moon. All water tends towards the moon, because there is no water in the moon, and nature abhors a vacuum. I forget where the sun joins in this fight."
35 The perils of performance assessment "Germinate: To become a naturalized German.""Vacumm: A large, empty space where the pope lives.”Momentum is something you give a person when they go away.
36 The perils of performance assessment The cause of perfume disappearing is evaporation. Evaporation gets blamed for a lot of things people forget to put the top on.Mushrooms always grow in damp places which is why they look like umbrellas.Genetics explains why you look like your father, and if you don't, why you should.
37 The perils of performance assessment "When you breath, you inspire. When you do not breath, you expire."
38 Post 1996: The Demise of Performance Assessment A definite retreat from performance-based assessment as a wide-scale toolPsychometric issuesCost issuesLabor issuesPolitical issuesWhy the demise of performance assessment:GeneralizabilityCostLabor (BUT PD)California: Open Mind
39 The Remains… Still alive inside classrooms and schools Hybrid assessments based on the NAEP modelmultiple-choiceshort answerextended responseThe persistence of standards-based reform.Still alive inside classrooms and schoolsFugitive lifeHybrid assessments based on the NAEP modelmultiple-choiceshort answerextended responseThe persistence of standards-based reform
40 No Child Left Behind Accountability in Spades Every grade level reportingCensus assessment rather than sampling (everybody takes the same test)Disaggregated Reporting byIncomeExceptionalityLanguageEthnicityFull employment for psychometricians Law
41 NCLB, continued Assessments for varied purposes PlacementProgress monitoringDiagnosisOutcomes/program evaluationScientifically based curriculum tooThis may seem like progress but it can explode on you.The curriculum: fix both the ends (with assessments) and the means (curriculum and monitoring devices to promote fidelity).Remember the deal: trading flexibility on the curriculum side for accountability on the outcomes side. Guess what: in 2002, the policy makers reneged on the deal. Fix both. Where is the professional prerogative there?Slides available at
42 There is good reason to worry about disaggregation L Achievement HSchool 1School 2
43 Disaggregation and masking Height of bar = average achievement; width = number of studentsDisaggregation and maskingSimpson’s Paradox?ALarge NBSmall NBLarge NASmall NL Achievement HSchool 1School 2
44 Disaggregation: Damned if we do and damned if we don’t Don’t report: render certain groups invisibleDo report: blame the victim (they are the group that did not meet the standard.
45 Pearson’s Fourth Law of Assessment Disaggregation is the right approach to reporting results. Just be careful where the accountability falls.
46 Pearson’s Fourth Law: A Corollary Accountability, in general, falls to the lowest level of reporting in the system.If it is reported at the state or provincial level, states or provinces fail. If at the district or authority level, districts and authorities fail. If at the school level, schools fail. If at the classroom level, teachers fail. If at the subgroup level, subgroups fail. . If at the student level, students fail.Everybody’s failing it, failing it, failing it, everybody’s failing it:
47 Assessment can be the friend or the enemy of teaching and learning The curious case of DIBELS, … and other benchmark assessmentscan wreak havoc on the best laid curricular plansThe Dark Side
48 A word about benchmark assessments… The world is filled with assessments that provide useful information…But are not worth teaching toThey are good thermometers or dipsticksNot good curriculum
49 The ultimate assessment dilemma… What do we do with all of these timed tests of fine-grained skills:Words correct per minuteWords recalled per minuteLetter sounds named per minutePhonemes identified per minuteScott Paris: Constrained versus unconstrained skillsPearson: Mastery skills versus growth constructs
50 Why they are so seductive Mirror at least some of the components of the NRP reportCorrelate with lots of other assessments that have the look and feel of real readingTakes advantage of the well-documented finding that speed metrics are almost always correlated with ability, especially verbal ability.Example: alphabet knowledge90% of the kids might be 90% accurate but…They will be normally distributed in terms of LNPM
51 How to get a high correlation between a mastered skill and something else Letter Name Fluency (LNPM)Letter Name AccuracyThe wider the distribution of scores, the greater the likelihood of obtaining a high correlation
52 Face validity problem: What virtue is there in doing things faster? naming letters, sounds, words, ideasWhat would you do differently if you knew that Susie was faster than Ted at naming X, Y, or Z???For a paper I did for the new handbook of reading disability research, I have had occasion to go back to a lot of the work done trying to understand the skill infrastructure of kids classified as reading disabled. Curious thing: all kinds of speed metrics—naming letters, naming pictures, naming words—turn out to be entirely predicted by age. So, older kids, even older kids with problems, can do lots of things faster than younger kids reading at the same reading level.I guess that means that if we want WCPM to go up, we should just wait a year or two.
54 They meet only one of tests of validity: criterion-related validity correlate with other measures given at the same time--concurrent validitypredict scores on other reading assessments--predictive validity
55 Fail the test of curricular or face validity They do not, on the face of it, look like what we are teaching…especially the speeded partUnless, of course, we change instruction to match the test
56 Really fail the test of consequential validity Weekly timed trials instructionConfuses means and endsProxies don’t make good goalsWeekly timed trials instructionConfuses means and endsWe want kids to read faster, sure, butBecause they are getting better at all aspects of reading and language performanceNot because we practiced timed trials 3 times a day
57 The Achilles Heel: Consequential Validity Give DIBELSGive Comprehension TestUse results to craft instructionDibels does not, by the way, claim to be diagnostic. It is supposedly a progress monitoring test. Tell you how kids are doing on the road to somewhere.But, not how it gets used.Give DIBELS againGive Comprehension TestThe emperor has no clothes
58 The bottom line on so many of these tests Pearson’s Third Law again New Bumper StickerThe world is filled with tests that provide useful and convenient proxies for the real thing. It is true that wcpm is a good proxy for comprehension.By the way, this is another application of Pearson’s Third LawBut the minute that indirect indicator morphs itself into a curricular goal, it becomes a monster.I want kids to read faster and more fluently, sure, because we taught everything well in a balanced curriculum, not because we had all the kids practice timed trials 5 days a week all year long.Never send a test out to do a curriulum’s job!
59 The dark side of alignment: the transfer problem I agree about the importance of curriculum-based assessment and situated learning, BUT…We do expect what you learn in one context to assist you in othersIn our heart of hearts we do NOT believe that kids learn ONLY what you teach ORThat only what is tested is what should get learned (and taught)Note our strong faith in the idea of applicationI agree with a lot of what Bill had to say about curriculum-based assessment. In fact, most of what I suggested as part of an assessment system would qualify as curriculum-based assessment, except for the big picture assessments. Those are a little different, I think.In today’s educational discussions, we hear a lot in recent years about the notion of situated cognition and situated learning. And the key point in this concept is that we have to stop treating learning as a set of abstract principles or constructs that rise above the specifics of each learning situation and guide us whenever we encounter a new instance of that same phenomenon. When you learn a skill, a process, or a fact, granted you learn it in a specific context. But even if we endorse the notion of situated learning—that context matters--we do expect that what you learn in one situation will serve your similar needs in other contexts. There is still a hint of transfer left in our thinking. I think our big picture assessments have to have to require students to export what they have learned in one or more contexts to a new context. Otherwise, I do not see how we can call it authentic assessment.So, how do we test for transfer?
60 How do we test for transfer? A continuum of cognitive distanceAn example: Learn about the structure of texts/knowledge about insect societies--bees, ants, termitesNew passagesPaper waspsA human societyA biomeHow far will the learning travel?Our problem today: THIS IDEA OF TRANSFER IS NOT EVEN ON OUR CURRENT RADAR SCREEN!!!And it ought to be!!!!!When I grew up academically, transfer of learning was regarded as the gold standard.Hence tests of transfer are also the gold standard of assessment.But what you really want is that proximal to distal, near to far continuum of assessments.
61 Domain representation If we teach to the standards and the assessments, will we guarantee that all important aspects of the curriculum are covered?Linn and Shepard study: improvements on a narrow assessment do not transfer to other assessmentsShepard et al: in high stakes districts, high performance on consequential assessments comes at a price...First, it makes sense to base our instruction on a set of standards only if we can assume that the set of standards we have developed provides a complete representation of the curriculum domain in question.Lacking some guarantee that all important aspects of the curriculum are covered, it would be foolhardy to limit what we do in classrooms to a set of standards. Just as surely we do not want to develop a laundry list of skills.
62 Linn and Shepard’s work... = New Standardized Test= Old Standardized TestSo even with the best of intentions, there can be a kind of covert, insidious teaching to the test that goes on.Year
63 Shepard et al workST = consequential standardized assessmentAA = more authentic assessment of the same skill domainNote the consequences of high stakes on alternative assessmentsSTSTAAAALow Stakes SchoolsHigh Stakes Schools
64 Key Concept: HaladynaTest Score Pollution: a rise or fall in a score on a test without an accompanying rise or fall in the cognitive or affective outcome allegedly measured by the test
65 Aligning everything to the standards: A model worth rejecting AssessmentInstructionThis model is likely to shape the instruction too narrowly.Lead to test score pollution.
66 A better way of thinking about the link between standards, instruction and assessment Standards: How we operationalize our values about teaching and learningGuide the development of both instruction and assessmentTeaching and Learning ActivitiesAssessment ActivitiesBy the way, in a piece that I did with Monica Yoo and Terry Underwood, in examining the secondary standards, curriculum, and assessments in Massachusetts, California, and Texas, we discovered that it was not the standards or even the curricula that drove teachers and schools nuts, it was the tests. The tests did NOT measure the standards, particularly the higher order ones, well or even at all. And they certainly did not do justice to the curriculum.This relationship can operate at the regional or local levelThe logic of lots of good reform projects!
67 Pearson’s Fifth Law of Assessment Alignment is a double-edged sword. If there must be alignment, lead with the instruction and let the assessment follow.If the assessments are aligned to the instruction, things will work out.If instruction is aligned to the assessment, pollution will occur.
68 Pearson’s Sixth LawHigh Stakes will corrupt any assessment, no matter how virtuous or pure in intentIt’s the stakes that drive us to madness and distraction.Teaching to the testPacking to the portfolio
69 Corollary to Pearson’s Fifth and Sixth Laws The worst possible combination is high stakes and low challengeHgh Stakes and Low ChallengeWhy? Drag us all to the bottom of our pool of aspirations.
70 So how did we do in responding the the challenges from Valencia & Pearson? IssueGradeSolutionPrior KnowledgeDChoice of PassagesAuthentic TextB+Things are lots better on lots of comprehension assessmentsInferenceBDepends on the testDiversity in Knowledge means diversity in responseConstructed response and multiple correct answers or graded answersFlexible use of strategiesCHard to assess; easy to coach; I’d abandon except for diagnostic interviewsSynthesizing Information is paramountStill too much emphasis on details
71 So how did we do in responding the the challenges from Valencia & Pearson? IssueGradeSolutionAsking questions as an index of comprehensionDNo progress except in informal classroom assessmentMeasuring habits, attitudes, and dispositionsCSome reasonable things out there. But no teethOrchestrating many skillsToo many mastery skills; not enough growth skillsFluencyMade a fetish out of itTransfer and applicationLimited to a few situationsOverall GradeLots of work to do
72 Where should we be headed? So, what makes sense for a district or school?Develop an educational improvement system
73 Elements of an Educational Improvement System Standards, yesAssessments, yesOutcome assessments for program evaluationBenchmark assessments for monitoring individual progress“Closer look” diagnostic assessments for determining individual student emphasesReporting system, yes as long as we are prepared to live with the dilemmas of disaggregationAlignment, but of a different sort
74 Outcome assessments Slides available at www.scienceandliteracy.org Drop in out of the skyCurriculum independentAssess reading in their most global aspectsGrowth constructs NOT mastery constructsCould be some sort of standardized assessmentNot directly linked to the curriculumCould be some sort of standardized assessment—as long as they are not taught toSlides available at
75 A plan for early reading benchmark assessments Still trying to figure out how to work in Vocabulary.Every so often, give four benchmark assessments.
76 Benchmarks for Intermediate and Secondary ComprehendDeconstruct: What do authors do and whyComposeNarrativesResponse to LiteratureAuthor’s CraftCreative WritingInformation GenresSummaries, Charts, Key ideasGenre (form follows function)Writing from sources to convey ideas
77 Closer Look Assessments There is no sin in examining the infrastructure of readingReally do need to know which of those pieces kids have and have not masteredQuestion is what to do about themTeach to and practice the weak bitsRely on strengths to bootstrap the weaknessesJust read more “just right” materialDo the weak skills get better if we bootstrap them to the strengths of just do more orchestrated enactments of the process—i.e., just plain reading.I’d do all three.
78 Teaching to Weaknesses Flaw Basic Skills Conspiracy of Good Intentions:First you gotta get the words right and the facts straight before you can do the what ifs and I wonder whats?Some kids spend their entire careers doing the what ifs and I wonder whats.
79 Monitoring Conditions of Instruction Collect data on curriculum, instructional practicesWe need clear data on enacted curriculum and instructional practices in order to link it as precisely as possible to achievementUse data for program improvementDesign professional developmentOften overlooked but at our own perilBest work in this area is Barbara Taylor’s at the University of Minnesota.Look in particular for evidence of these curricular and instructional practicesHigher order thinkingDeep KnowledgeSubstantive ConversationConnection to the world beyond the classroomUsing data to design new instructional and staff development programsTying Professional development tied to standardsONE other point: in our cases of effective educational improvement systems, one of the common characteristics is internally developed systems for monitoring student progress, both within and across grades
80 Return to the hard work on assessment Encouraged by recent funding of new century assessmentsCould be some good coming out of our reading for understanding assessment grants in the USPossibilities in the Australian work: NAPLAN??Tests that take the high road (tests worth teaching to)Focus on making and monitoring meaningFocus on the role of reading in knowledge building and the acquisition of disciplinary knowledgeFocus on critical reasoning and problem solvingFocus on representation of self.Assessment is something you do to and for yourself because it helps you outgrow your current self.The unfinished business from the 1990s
81 Where Could we Be Headed: A Near Term Research Agenda The Development of More Trustworthy, More Useful Curriculum-Based AssessmentsExpanding the logic of the Informal Reading InventoryGetting comprehension assessment rightComputerized Assessments (yes, but no time today)
82 Expanding the logic of the IRI Benchmark books model ala Reading RecoveryIndices of…Level of text one can read independentlyAccuracy (including error patterns)FluencyComprehensionNot one, not two, not three, but many, many conceptually and psychometrically comparable passages at every level of text challenge.
83 Comprehension Assessment Our models for external assessment, modeled after some of the better wide-scale assessments, are OK.We desperately need a school/classroom tool that does for comprehension what running records/benchmark books have done for oral reading accuracy and fluency
84 Disciplinary Grounding We’re much better off if we ground our comprehension assessments in the inquiry and knowledge traditions of the disciplines rather than to
85 Pearson’s (bet on a) Seventh Law of Assessment Comprehension assessment begins and ends within the knowledge traditions and inquiry processes of each discipline
86 Pearson’s (bet on a) a Corollary to the Seventh Law Summative (big external) assessments of reading comprehension will be better if they begin as formative (smaller internal) assessments of reading comprehension within the knowledge traditions and inquiry processes of each discipline.In other words figure out how to assess comprehension in a way that respects the disciplinary bases of science, history, mathematics, and literature, and then we can develop a good general test of comprehension by sampling from those really well-grounded formative assessments.
87 My bottom line Tests that are Instructionally sensitivePsychometric soundTrustworthyNo decision of consequence should be based upon a single indicator.Tests are a means to an end:.We desperately need instructionally sensitive assessments that have first rate psychometric characteristics so that we can build trustworthy internal systems for monitoring student progressNo decision of consequence about any individual, school, district or other aggregation should be based upon a single indicator of anything.Tests are a means to an end: Their value is measured by the degree to which they allow us to make good decisions and provide good instruction. They are not the ends themselves. They are NOT curriculum
88 To reduce it to a single idea Six, maybe seven lawsTwo, maybe three corollariesBut only one thing truly worth remembering…Only one thing worth rememberingThanks for spending your valuable time with me today. It is your most precious gift and one ought not to waste it.Take care; hope the rest of your day, your conference, your stay in New Zealand, your school year, and your life are filled with satisfying work, students who value reading, and a teaching life that promotes your students’ opportunity to become literate citizens of our global community.Never send a test out to do a curriulum’s job!Slides available at
89 Coda in Stuart McNaughton’s Spirit A new bumper sticker with a tinge of optimism.And with that cheery if implausible thought, I’ll truly say thank you. Kiaora.Tests in support of teaching and learning.
91 Computerized Assessment With advances in voice recognition, we are close to being able to teach computers to recognize and score students’ oral responsesApplications:Listen to oral reading of benchmark passages and conduct a first level diagnosis (thus eliminating a key barrier, time, to more widespread use of this important diagnostic tool).Mention new reading for understanding grants
92 Computerized Assessment in Early Literacy More applications of voice recognitionPhonemic awareness tasksWord reading tasksPhonics tests (both real words and synthetic words)Comprehension assessmentstill a way down the road because of the interpretive problemThe computer has to both listen to and understand the responseBARLA: Bay Area Reading and Listening Assessment