Presentation is loading. Please wait.

Presentation is loading. Please wait.

Innovation and Growth of Large Scale Assessments Irwin Kirsch Educational Testing Service February 18, 2013.

Similar presentations


Presentation on theme: "Innovation and Growth of Large Scale Assessments Irwin Kirsch Educational Testing Service February 18, 2013."— Presentation transcript:

1 Innovation and Growth of Large Scale Assessments Irwin Kirsch Educational Testing Service February 18, 2013

2 Overview Setting a context Growth in Large Scale Assessments (LSA) Features of Large Scale Assessments (LSA) Growing importance of CBA Innovations in recent LSA (PIAAC and PISA) Future areas for innovation

3 Until relatively recently educational data were not collected in a consistent or standardized manner. In 1958, a group of scholars representing various disciplines met at UNESCO in Hamburg, Germany to discuss issues surrounding the evaluation of schools and students through the systematic collection of data relating to knowledge, skills and attitudes. Their meeting led to the development of a feasibility study of 13 year olds in 12 countries covering 5 content areas and the legal entity known as IEA in Setting a Context

4 Back in the United States the Commissioner of Education, Francis Keppel, invited Ralph Tyler in 1963 to develop a plan for the periodic assessment of student learning. Planning meetings were held in 1963 and 1964 and a technical advisory committee formed in In April 1969, NAEP first assessed in-school 17 year olds in citizenship, science and writing. Setting a Context

5 Tyler’s vision for NAEP was that it would focus on what groups of students know and can do rather than on what score an individual might receive on a test. The assessment would be based on identified objectives whose specifications would be determined by subject matter experts. Reports would be based on the performance of selected groups, not individuals, who responded correctly to the exercises and would not rely on grade-level norms. Setting a Context

6 Prior to IEA and NAEP there were no assessment programs to measure students or adults as a group. The primary focus of educational testing had been on measuring individual differences in achievement rather than on students’ learning. And, the data that were collected dealt primarily with the inputs to education rather than the yield of education. Setting a Context

7 Interpretations would be limited to the set of items used in each assessment. This basic approach to large scale assessments remained in place through all of the 1970s. In the 1980s programs beginning with NAEP began to use item response theory (IRT) to allow for the creation of scales and the broadening of inferences to include items not included in the assessment. New methodology involving marginal estimation was developed to optimize the reporting of proficiency distributions based on complex designs such as BIB spiraling. This approach remains in use today. Setting a Context

8 … not being satisfied with assertions or self reports … in response to policy makers and researchers wanting to know more … asking more challenging questions … and creating both the need and opportunity for new methodological and technological developments Growth and Expansion

9 Number of assessments Participation of countries Populations who are surveyed Domains / Constructs that are measured Methodology Modes Growth and Expansion

10 Overview 10 Large-Scale International Surveys School-Based PIRLS TIMSS PISA Adults IALS ALL PIAAC STEP Growth and Expansion

11 Curriculum Life skills Measurement

12 Growth and Expansion Curriculum Life skills Measurement

13 Features of LSA Assessment LSA are primarily concerned with the accuracy of estimating the distribution of a group of respondents rather than individuals. In this way, the focus is on providing information that can inform policy and further research Differ from individual testing in key ways

14 Extensive framework development Sampling Weighting Use of Complex Assessment Designs IRT Modeling Population Modeling Connection to background variables Increasing reliance on CBA Features of LSA Assessment

15 Until very recently all large scale national and international assessments were paper based assessments with some optional computer based components. PIAAC (2012) was the first large scale survey of adult skills in which the primary mode of delivery was computer and paper and pencil became the option. In 2015, PISA will also use computers as the primary mode of delivery with paper and pencil becoming an option for countries Growing Importance of Computer Based Assessments

16 Why is a Computer Delivered Assessment Important for PISA? Better reflects the ways in which students & adults access, use and communicate information Enables surveys like PIAAC and PISA to broaden the range of skills that can be measured; Allow these surveys to take better advantage of both operational and measurement efficiencies that technology can provide Why is Computer Based Assessment Important for Surveys such as PIAAC and PISA?

17 Goals of the PIAAC 2012 and PISA 2015 Assessment Designs Establish the comparability of inferences across countries, across assessments and across modes Broaden what can be measured by both extending the existing constructs and by being able to introduce new constructs Reduce random and systematic error through the use of more complex designs, automated scoring; use of timing information; and the use of adaptive testing

18 PIAAC Main Study Cognitive Assessment Design CORE 4L + 4N LITERACY 20 Tasks NUMERACY 20 Tasks READING COMPONENTS CBA-Core Stage 1: ICT LITERACY Stage 1 (9 tasks) Stage 2 (11 tasks) NUMERACY Stage 1 (9 tasks) Stage 2 (11 tasks) PS in TRE NUMERACY Stage 1 (9 tasks) Stage 2 (11 tasks) LITERACY Stage 1 (9 tasks) Stage 2 (11 tasks) PS in TRE ICT use from BQ CBA-Core Stage 2: 3L + 3N No computer experience Computer experience Pass Fail :Random assignment

19 Average Proficiency Scores By Domain and Subgroups LiteracyNumeracyPSTRE No ICT Failed CBA Core Refused CBA CBA

20 Cumulative Distribution of Numeracy Proficiency by Subgroups

21 Percentage of Item-by-Country Interactions * Literacy and numeracy interactions go across modes and time LiteracyNumeracyPSTRE 8%7%3% 146 out of 1748 pairs (76 items x 23 countries) 118 out of 1748 pairs (76 items x 23 countries) 8 out of 280 pairs (14 items x 20 countries)

22 Number of Unique Parameters for Each Country - Numeracy

23 Maintaining and Improving Measurement of Trends Proposal for PISA 2015 is to enhance and stabilise the measurement of trend data Refocus the balance between random and systematic errors

24 Recommended approach stabilizes trend through reducing bias by including all items in each minor domain while reducing the number of students responding to each item MAJOR minor Construct Coverage The reduced height of the bars for the minor domains represents the reduction of items in that domain and therefore the degree to which construct coverage has been reduced Width conveys the relative number of students who respond to each item within the domain MAJOR minor Construct Coverage in the Current PISA Design by Major and Minor Domains Recommended Approach for Measuring Trends in PISA 2015 and Beyond Height of the bars represents the proportion of items measured in each assessment cycle by domain Maintaining and Improving Measurement of Trends

25 MAJOR 2006 minor 2009 minor 2012 MAJOR 2015 minor 2018 New Items New Items Reflecting New Construct New Items Reflecting Old Construct Trend Items minor 2021 Domain Rotation Scientific Literacy as a major domain - new items Scientific Literacy as a minor domain – new trend line from a construct point of view Maintaining and Improving Measurement of Trends Impact Over Cycles

26 Introduction of new item types Use of fully automated scoring More flexible use of languages Development of research around process information contained in log files Introduction of more complex psychometric models Development of derivative products Future Innovations

27 Large scale international assessments continue to grow in importance Computer based assessments are now feasible and will become the standard for development and delivery … better reflect the ways in people now access, use and communicate information add efficiency and quality to the data introduce innovation that broadens what can be measured and reported Summary

28 Questions and Discussion

29

30 Broaden what was measured; Demonstrate high comparability among countries, over time and across modes; Introduce multi-stage adaptive testing; Include the use of timing information to better distinguish between omit and not reached items; Demonstrate an improvement in the quality of the data that was collected The design for PIAAC was able to …

31 Growth and Expansion Curriculum Life skills Measurement


Download ppt "Innovation and Growth of Large Scale Assessments Irwin Kirsch Educational Testing Service February 18, 2013."

Similar presentations


Ads by Google