Presentation is loading. Please wait.

Presentation is loading. Please wait.

Large-scale testing: Uses and abuses Richard P. Phelps Universidad Finis Terrae, Santiago, Chile January 7, 2014.

Similar presentations


Presentation on theme: "Large-scale testing: Uses and abuses Richard P. Phelps Universidad Finis Terrae, Santiago, Chile January 7, 2014."— Presentation transcript:

1 Large-scale testing: Uses and abuses Richard P. Phelps Universidad Finis Terrae, Santiago, Chile January 7, 2014

2 Large-scale testing: Uses and abuses 1.3 types of large-scale tests 2.Measuring test quality 3.A chronology of mistakes 4.Economists misunderstand testing 5.How SIMCE is affected

3 Achievement Aptitude Non-cognitive 1. Three types of large-scale tests

4 Achievement tests Historically, were larger versions of classroom tests ~ 1900 - “scientific” achievement tests developed (Germany & USA) SOURCE: Phelps, Standardized Testing Primer, 2007 J.M. Rice - systematically analyzed test structures & effects E.L. Thorndike - developed scoring scales

5 Achievement tests Purpose: to measure how much you know and can recall Developed using: content coverage analysis How validated: retrospective or concurrent validity (correlation with past measures, such as high school grades) Requires a mastery of content prior to test. Fairness assumes that all have same opportunity to learn content Coachable – specific content is known in advance SOURCE: Phelps, Standardized Testing Primer, 2007

6 Aptitude tests 1917 – Adapted by U.S. Army to select, assign soldiers in World War 1 1930s – Harvard University president J. Conant -wanted new admission test to identify students from lower social classes with the potential to succeed at Harvard -developed the first Scholastic Aptitude Test (SAT) SOURCE: Phelps, Standardized Testing Primer, 2007 1890s – A. Binet & T. Simon (France) -Pre-school children with mental disabilities - achievement test not possible - developed content-free test of mental abilities (association, attention, memory, motor skills, reasoning)

7 Aptitude tests Purpose: predict how much can be learned Developed using: skills/job analysis How validated: predictive validity, correlation with future activity (e.g., university or job evaluations) Content independent. Measures: … what student does with content provided … how student applies skills & abilities developed over a lifetime Not easily coachable – the content is either… … not known in advance, … basic, broad, commonly known by all, curriculum-free; … less dependent on the quality of schools SOURCE: Phelps, Standardized Testing Primer, 2007

8 Aptitude tests Aptitude tests can identify: - Students bored in school who study what interests them on their own - Students not well adapted to high school, but well adapted to university - Students of high ability stuck in poor schools SOURCE: Phelps, Standardized Testing Primer, 2007

9 AchievementAptitude Measurepast learningpotential Developmentcontent analysisjob/skills analysis Validationretrospectivepredictive Contentdependentindependent Coachable?very muchnot much Comparing Achievement & Aptitude tests

10 Non-cognitive tests More recently developed – measure values, attitudes, preferences Types: integrity tests career exploration matchmaking employment “fit”

11 Non-cognitive tests Purpose: to identify “fit” with others or a situation Developed using: surveys, personal interviews How validated? success rate in future activities Content is personal, not learned “Faking” can be an issue (e.g., “honesty” tests)

12 AchievementAptitudeNon-Cognitive Measurepast learningpotential attitudes, values, preferences Developmentcontent analysisjob/skills analysissurveys Validationretrospectivepredictive Contentdependentindependent Coachable?very muchvery littlecan be faked Comparing Achievement, Aptitude, & Non-Cognitive Tests

13 2. Measuring test quality 3 measures are important: 1. Predictive validity 2. Content coverage 3. Sub-group differences Test reports can be “data dumps ”

14 Predictive validity (values from -1.0 to +1.0) …measures how well higher scores on admission test match better outcomes at university (e.g., grades, completion) A test with low predictive validity provides a little information.

15 Source: NIST, Engineering Statistics Handbook A positive correlation between two measures

16 Source: NIST, Engineering Statistics Handbook A negative correlation between two measures

17 Source: NIST, Engineering Statistics Handbook No correlation between two measures

18 How does one measure predictive capacity? Correlation Coefficient: I--------------------------------------------I -1 0 1

19 Predictive validities: SAT and PSU SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013

20 Predictive validities: SAT and PSU (faculty: Administracion) SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013


Download ppt "Large-scale testing: Uses and abuses Richard P. Phelps Universidad Finis Terrae, Santiago, Chile January 7, 2014."

Similar presentations


Ads by Google