Download presentation
Presentation is loading. Please wait.
Published byJanis Loraine May Modified over 10 years ago
1
Large-scale testing: Uses and abuses Richard P. Phelps Universidad Finis Terrae, Santiago, Chile January 7, 2014
2
Large-scale testing: Uses and abuses 1.3 types of large-scale tests 2.Measuring test quality 3.A chronology of mistakes 4.Economists misunderstand testing 5.How SIMCE is affected
3
Achievement Aptitude Non-cognitive 1. Three types of large-scale tests
4
Achievement tests Historically, were larger versions of classroom tests ~ 1900 - “scientific” achievement tests developed (Germany & USA) SOURCE: Phelps, Standardized Testing Primer, 2007 J.M. Rice - systematically analyzed test structures & effects E.L. Thorndike - developed scoring scales
5
Achievement tests Purpose: to measure how much you know and can recall Developed using: content coverage analysis How validated: retrospective or concurrent validity (correlation with past measures, such as high school grades) Requires a mastery of content prior to test. Fairness assumes that all have same opportunity to learn content Coachable – specific content is known in advance SOURCE: Phelps, Standardized Testing Primer, 2007
6
Aptitude tests 1917 – Adapted by U.S. Army to select, assign soldiers in World War 1 1930s – Harvard University president J. Conant -wanted new admission test to identify students from lower social classes with the potential to succeed at Harvard -developed the first Scholastic Aptitude Test (SAT) SOURCE: Phelps, Standardized Testing Primer, 2007 1890s – A. Binet & T. Simon (France) -Pre-school children with mental disabilities - achievement test not possible - developed content-free test of mental abilities (association, attention, memory, motor skills, reasoning)
7
Aptitude tests Purpose: predict how much can be learned Developed using: skills/job analysis How validated: predictive validity, correlation with future activity (e.g., university or job evaluations) Content independent. Measures: … what student does with content provided … how student applies skills & abilities developed over a lifetime Not easily coachable – the content is either… … not known in advance, … basic, broad, commonly known by all, curriculum-free; … less dependent on the quality of schools SOURCE: Phelps, Standardized Testing Primer, 2007
8
Aptitude tests Aptitude tests can identify: - Students bored in school who study what interests them on their own - Students not well adapted to high school, but well adapted to university - Students of high ability stuck in poor schools SOURCE: Phelps, Standardized Testing Primer, 2007
9
AchievementAptitude Measurepast learningpotential Developmentcontent analysisjob/skills analysis Validationretrospectivepredictive Contentdependentindependent Coachable?very muchnot much Comparing Achievement & Aptitude tests
10
Non-cognitive tests More recently developed – measure values, attitudes, preferences Types: integrity tests career exploration matchmaking employment “fit”
11
Non-cognitive tests Purpose: to identify “fit” with others or a situation Developed using: surveys, personal interviews How validated? success rate in future activities Content is personal, not learned “Faking” can be an issue (e.g., “honesty” tests)
12
AchievementAptitudeNon-Cognitive Measurepast learningpotential attitudes, values, preferences Developmentcontent analysisjob/skills analysissurveys Validationretrospectivepredictive Contentdependentindependent Coachable?very muchvery littlecan be faked Comparing Achievement, Aptitude, & Non-Cognitive Tests
13
2. Measuring test quality 3 measures are important: 1. Predictive validity 2. Content coverage 3. Sub-group differences Test reports can be “data dumps ”
14
Predictive validity (values from -1.0 to +1.0) …measures how well higher scores on admission test match better outcomes at university (e.g., grades, completion) A test with low predictive validity provides a little information.
15
Source: NIST, Engineering Statistics Handbook A positive correlation between two measures
16
Source: NIST, Engineering Statistics Handbook A negative correlation between two measures
17
Source: NIST, Engineering Statistics Handbook No correlation between two measures
18
How does one measure predictive capacity? Correlation Coefficient: I--------------------------------------------I -1 0 1
19
Predictive validities: SAT and PSU SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013
20
Predictive validities: SAT and PSU (faculty: Administracion) SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.