Presentation on theme: "Economic Perspectives on Standardized Testing(c)"— Presentation transcript:
1Economic Perspectives on Standardized Testing(c) Richard P. Phelps(c) 2002, by Richard P. Phelps
2Economic Perspectives on Standardized Testing: Outline Why can’t economists and psychologists just get along?Overview of economic theory as it pertains to education & testingHuman capital theory and the economics of informationSupply & demand; benefits & costs; goods & badsThe cost of standardized testing (from society’s point of view)The benefits of standardized testing (information)The benefits of standardized testing (motivation)Optimal testing system structuresOptimal testing industry structuresDiscussion
3Topic 1: Why can’t economists and psychologists just get along?
41) Why can’t economists and psychologists just get along 1) Why can’t economists and psychologists just get along? [answer: sometimes they do]Tversky and Kahneman, two cognitive psychologists, asked themselves why rational economic man patronizes casinos, where the odds are against him.Their experiments revealed that tolerance of (or, attraction to) risk varies widely among individuals, and most weigh small risks against low-probability, but very large, gains “sub-optimally”Tversky’s and Kahneman’s work is now required reading for any economics majorExperimental economics, which strongly resembles cognitive psychology in its methods, is now the fastest growing area of research in the field.
5Decline in interest in Test Utility research 1) Why can’t economists and psychologists just get along? [answer: sometimes they do not]Test Utility researchThousands of studies conducted by I/O psychologists from the 1960s through the 1980sDozens of meta-analysesEven a few meta-analyses of the meta-analysesFew economists, then or now, even aware of the fieldDecline in interest in Test Utility researchRegulatory ruling against validity generalization in late 1980s by Civil Rights office in Reagan administrationNational Research Council forms committee with curious membership to critique a single Test Utility study (critique interpreted by many as a condemnation of all Test Utility research)
6Topic 2: Overview of economic theory as it pertains to education & testing
72) Economic theory as it pertains to education in general Traditionally, education economics conducted in 2 fieldsLabor EconomicsLabor markets for teachers and graduatesReturns (in wages) to investment (in years) in educationPublic FinanceReturns (in achievement, attainment) to investment (in tax revenues)Funding equity, adequacy, efficiency, & intra-metropolitan migration
82) Economic theory as it pertains to testing in particular Human Capital TheoryHigher wages over the long term can more than compensate for the earnings foregone while still in school…assumed a strong correlation between accumulation (years in school, any school) and earning power (applicable knowledge and skills)Economics of InformationBasic economic assumption of “perfect information” is simplisticWhen buyer and seller have “asymmetric” information, classic economic assumptions are not appropriate
9Topic 3: Human capital theory and the economics of information
103) Human capital theory: seminal works Human Capital (1954), Gary BeckerSchooling, Experience, and Earnings (1974), Jacob MincerDozens of World Bank reports
113) Economics of Information: seminal works “The Market for Lemons” (1970) George AkerlofWhen buyers can evaluate a purchase based only on a quality assessment of the entire group, sellers have an incentive to market poor quality merchandise and, over time, the average quality of goods declines. Often-used counters to quality decline are: guarantees, brand names, franchising, and credentials.“Economics of Imperfect Information”(1976) Rothschild, Stiglitz, GrossmanPerfectly competitive markets have perfect information. In markets without perfect information, there is little incentive for private individuals to fill the breach (Consumers’ Reports is an exception, and not very profitable). Thus, there can be a role for government to promote market efficiency, by providing information.
123) Screening, signaling, filtering, credentialing, I Education and Jobs: The Great Training Robbery (1970), Ivar BergEmployers pay for credentials, not human capital; they know little to nothing of the quality of education programs, only the perception thereofGenerating Inequality (1972) Lester ThurowEmployers want “trainable” employees, and judge that those who could endure schooling are probably more trainable than those who could notWork of Piore and Doeringer on “Market Segmentation”Neither education nor education credentials matter in “secondary” labor markets, only in “primary” market, with career ladders
133) Screening, signaling, filtering, credentialing, II Market Signaling (1973), Michael SpenceDiplomas are a signaling device to employers, who take a gamble with every new hire; evidence that the graduate is hoping employers will conclude that certain human capital has been obtained, but not proof that it has“On the Weak versus the Strong Version of the Screening Hypothesis” (1979) George PsacharopoulosWeak: employers pay only higher starting wages for “better” credentialsStrong: employers continue to pay higher wages for “better” credentials even after they become familiar with each employee’s actual productivity“Higher Education as a Filter” (1973) Kenneth Arrow“The Theory of Screening” (1975) Joseph Stiglitz
143) Empirical and theoretical work on standards Burton Weisbrod (1964)Discovered that 90% of adults are hired within the boundaries of a school district other than the one from which they graduatedSo, employers are not familiar with and have no influence over the education standards used to train virtually all their employeesJohn Bishop (1980s)It is unreasonable to expect a teacher to be both a sympathetic coach and a neutral judge. External exams let them be coaches exclusively, which is in keeping with what most of them probably want anyway.Robert Costrell (1994)School district incentives are to inflate grades and socially promote. If they maintain tough standards, they only hurt their own children in later competition against graduates of other districts where standards are lax and grades inflated.Standards must be enforced externally, or they will not be.
164) Benefits & costs; goods & bads Economists are (small d) democratswhat is a “good” or a benefit is relative to each individual; the researcher does not get to decide what is good or bad for the consumer; consumers decide for themselvesbut, we’d all like more money (freely exchangeable) and more free timeEconomists assume we all want more of something (even if it is spiritual enlightenment), and that we can’t always get itBenefits have two phases: creation and captureNot all potential benefits are realized, or “captured”(e.g.,) You do very well and learn very much at a college with a terrible reputation, and then cannot get a job because of that reputation
174) The demand for standardized testing Phelps (1998) - 40 years of public opinion poll dataThe adult public is not ignorant about standardized tests, since all have taken many, for better or for worseSupport for high-stakes standardized testing is overwhelming, and has been consistently so for decadesMost stakeholders, including students and parents, are strongly supportive. Teachers are usually supportive, but don’t like being judged for outcomes over which they have little control. Education professors are strongly opposed. Administrators have been on the fence, may now be opposed.The year 2000 “testing backlash” was very strongly hyped public relations creature, and completely unsupported by the objective evidence.
184) “Natural Experiments” in test demand and valuation: a) countries liberalize education, b) drop test requirements, c) find that standards deteriorate, d) then revert back to testingMany Western European and North American states (1960s – 1970s)Many Post-Colonial, Newly-Independent states (1940s – 1970s)Ex-Communist Eastern European states (1990s – 2000s)
194) Trends in test adding/dropping, OECD countries: 1974--1999
204) Countries adding or dropping large-scale, external testing, by type of testing: 1974-1999 Number of countries or provinces... Type of testing ...adding testing...dropping testingAssessments 17 0Upper secondary exit exams 12*University entrance exams 5Subject-area end-of-course exams 6Lower secondary exit or entrance exams 4 2Inclusion of voc/prof tracks in exit exam system 3Primary/secondary-level achievement testing 1Diagnostic testingTOTAL 51
21Lower secondary school 4) Countries with nationally standardized high-stakes exit exams, by level of educationPrimary schoolLower secondary schoolUpper secondaryschoolBelgium (French)ItalyNetherlandsRussiaSingaporeSwitzerland (some cantons)Canada: QuebecChinaCzech RepublicDenmarkFranceHungaryIcelandIrelandJapanKoreaNew ZealandNorwayPortugalSwedenSwitzerlandUnited Kingdom: England & Wales, ScotlandBelgium: (Flemish) & (French)Canada: Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland, QuebecFinlandGermany
224) Demand for testing is not unlimited – saturation is possible School district response to state test mandates (1991)State and local tests'purpose and content are…Percent of districts substitutingstate test…exactly the same or very similar82…somewhat or moderately similar69…not at all similar or very little41SOURCE: U.S. GAO, 1993.
23Topic 5: The Cost of Standardized Testing (from society’s point of view)
245) Cost jargonMarginal cost (the cost of the next unit): For a test, it is the cost that is incurred due to the addition of a test, and only that cost.(e.g., during test administration, the school building must be maintained, but such would be the case without a test, too. The test is not responsible for this cost.)Subject-matter instruction occurs whether or not there is external testing, so it also is not a cost of the test.Opportunity cost (cost of foregone opportunities (i.e., instead of doing this, you could have been at work making money)): For a test, the time a teacher spends preparing for, monitoring, or scoring a test is time he could have been planning his course, grading homework, etc.If the teacher makes productive use of the time while students are taking a test, there are no opportunity costs.
255) Average all-inclusive per-student costs of two test types in states having both: 1990-91 Type of testCost factorsMultiple-choicePerformanceStart-up development$2$10Ongoing, annual costs$16$33SOURCE: U.S. GAO, 1993, p.43
26 Sample of 6 multiple-choice tests in those same states 5) Average per-student costs of two test types in states having both, with adjustments: All systemwidetests Sample of 11 stateperformance tests Sample of 6 multiple-choice tests in those same states All-inclusive marginal cost$15$33$16 …minus adjustment for regular school year administration-7-15 ...minus adjustment for replacement of preexisting tests-6-12 Marginal cost after adjustments$5$11$2SOURCE: Phelps, 2000.
275) “Economies” jargonThe unit cost of producing your product declines the more of an “economy” you have (because fixed/overhead costs get spread out)Scale – you can sell at lower cost because you make so many of themScope – you can sell at lower cost because you make other stuff that is similar, or in similar waysLearning – you figure out ways to be more efficient and productive as you gain experienceThere are many “economies” (just like validities)
29Some economies of scope in state performance testing
305) General structure of testing costs Scorers are...GROUPS of teachers or professional scorersINDIVIDUAL teachers or professional scorersa COMPUTERStudents take tests...EN MASSE in GROUPS ONE at a TIME
31Playing or socializing 5) Slack capacity in U.S. students’ time = opportunity for windfall gain ?Average number of hours per day devoted to…Region/CountrySportsTV watchingPlaying or socializingStudyingUSA18.104.22.168.3East Asia(N = 5)0.92.41.33.1West Europe(N = 4)1.62.02.8East Europe(N = 7)2.9
32Topic 6: The Benefits of Standardized Testing -- Information
336) Information benefits of testing For whom? Could be anyone – student, parent, teacher, school, public, postsecondary institution, employer, …Information can be used beneficially in:Diagnosis (of student, teacher, school, ….)Alignment (to standards, schedule, each other, …)Learning for teachersGoodwill with publicDecisions (promotion, placement, selection, …)
346) Information benefits of testing – how are they measured? Predictive validity (fairly measurable)Allocative efficiency (fairly measurable)(the greater the range restriction the higher the allocative efficiency?)Alignment (not so easy to measure)Goodwill (not at all easy to measure)
35Topic 7: The Benefits of Standardized Testing -- Motivation
366) Motivational benefits of testing – how are they measured? In controlled experiments:Ex. A) One group is told the test at the end of the course comes with a reward; control group told it does not countEx. B) One group is tested throughout course; control group is notIn large-scale studies--Graduates from regions with high-stakes tests compared to their non-tested counterparts:By their relative performance on another, common testTheir relative wages after graduationTheir relative rates of dropout, persistence, attainment, …“Backwash Effect” (e.g., students in states with high-stakes high school graduation tests perform better even on the 8th-grade level IAEP, TIMSS, or NAEP
377) Large-scale studies finding benefits to the use of external, high-stakes examinations John Bishop (1980s+) several studies -- IAEP, TIMSS, SAT, NY State, Canada, …Winfield; Fredericksen; Bishop; Jacobson (minimum comp. states)Others: Graham, Husted (SAT); Grissmer, Flanagan (NAEP); Phelps (TIMSS+); Carnoy (NAEP); Rosenshine (NAEP); Braun (NAEP); Wenglinsky
397) Bishop's estimates of dollar value of high-stakes exams on student outcomes Difference (in standard deviation units)Difference (in grade-level-equivalent units)Difference per student (in net present value)in 1993 dollars*Canada: High-stakes testing provinces vs. others.233 (in math).183 (in science).75 (in math).67 (in science)$13,370 (in math)$11,940 (in science)USA: New York State vs rest of U.S..164 (inSAT Verbal +Math).75 (verbal + math)$13,370IAEP: High-stakes testing countries vs. others.586 (in math)2.0 (in math).7 (in science)$35,650 (in math)$12,480 (in science)TIMSS: High-stakes testing countries vs. othersn/a.9 (in math)1.3 (in science)$16,040 (in math)$23,170 (in science)* Based on male-female average, averaged across six longitudinal studies, cited in Bishop, 1995a, Table 2, counting only general academic achievement, not accounting for technical abilities.
418) Single or multiple target systems Becker and Rosen (1990)A “single target” examination (e.g., minimum competency) is problematicSet too high, slower kids will be discouraged and drop outSet too low, and advanced kids will be bored and may work lessExamination systems should have multiple targetsEmpirical Studies of 1970s—1980s Minimum Competency Exams (e.g., Ligon, Mangino, Babcock Johnstone, Brightman, Davis)Performance of lowest students did improve, but that of advanced students either stayed flat, or decreasedJonathan Jacobson (1992)Longitudinal analysis of students from minimum competency states showed that slowest students gained and middle students lostProbably, the test induced resource flows to the slow students and away from the middle students
428) Examples of multiple target systems Hierarchical, or “tiered,” systems – British system, New York StateAll students must pass exams with broad, common requirements, but at choice of levels (Advanced or Ordinary; Competency or Honors)British just recently changed, creating a hybrid that looks more like continental exam systemsBranched or parallel track systems – Most of Continental EuropeStudents choose (or the choice is made for them) where to concentrate their efforts, and they are tested mostly on that concentrationFirst branching (junior high level) into academic, general, vocationalSecond branching (high school level) into subject area or vocational concentration
438) Some current research on testing system structure John BishopSuspects that standardized end-of-course or end-of-year examinations may be the most optimal form of standardized testing.Why? – perhaps because they combine the best of both worldsstandardized and externalconcise, targeted, with very strong alignment between curriculum and testValue-added systemsConcerns for volatility and fairness mandate that the testing be frequent – at least annualTests not only quality control measure; How to optimize whole set (Phelps, Just for the Kids, others…)
448) The more high-stakes decision points, the better the student performance ? SOURCE: Phelps, 2001
458) Quality control has proportionally greater effect in poorer countries SOURCE: Phelps, 2001
479) The industry structure game, in theory Selfish consumers want a perfectly competitive industryLots of producers, cutthroat competitionEasy producer entry to, exit from industryLow prices, lots of choice and informationSelfish producers want to be monopolistsRaise prices, lower qualityBlock new entrants, withhold information
489) The industry structure game, in practice Consumers want stable suppliers, salespeople they know, brand names they can trustSo, sure, they want competition, choice, and low prices…But, they do not want to have to try out a new brand of detergent after every visit to the grocery storeProducers try to avoid monopoly, or else get regulated or split upe.g., Microsoft pushes Apple and Corel to the brink of bankruptcy, then tosses each of them a lifeline to keep them in business (barely)So, the goal is to approach having a monopoly without quite having one
499) Competitive strategy theory In industries with steep economies (of scale, scope, learning, ….) there is only room for so many producersIf you do not have the relevant “economies” in your firm, you had better focus on a specialty niche that makes you unique, or else get out(e.g.) General Electric/RCA Consumer Electronics (1987)Crowded field: Sony, Zenith, Phillips, Toshiba, Mitsubishi, othersSony - technological edge, reputation for quality, could charge high pricesNiche players – Mitsubishi (big screen TVs); Sharp (flat panels)Low cost players – Koreans had entered market, Chinese were purchasing the facilities of bankrupt American firms (e.g., Admiral, Philco, Sylvania)Japanese manufacturers were building assembly plants in US and Mexico in order to lower their shipping costs for large setsGE was “stuck in the middle” – could not compete on cost or quality and had no unique niche – they sold out
509) Possible sources of competitive advantage in the testing industry Advantages related to scale economiesHuge item banks take time to accumulate and test and they are copyrighted (‘sunk costs’ => barrier to entry)Established client base, relationshipsAdvantages related to scope economiesMuch psychometric expertise is equally useful across a variety of testsCustomers needs largely similar across states, countriesGood brand name provides instant cachet in new marketsAdvantages related to learning economiesExperience working with, knowledge of clientsExperience gained with a new type of product will lower cost for subsequent, similar projects
519) Niche markets in educational testing (where “economies” may be of little help) Custom-made performance tests, “built from scratch”Some special education and psychological testing that requires one-on-one administration, highly-specialized protocols, or licensed test administratorsSome vocational-occupational testing that employs “hands on” demonstrations observed by specialistsOral interviews