PIAAC 2013 results: Care needed in reading reports of international surveys Jeff Evans j.evans@mdx.ac.ukj.evans@mdx.ac.uk ALM Webinar, 18 March 2014

2 Plan 1. Introducing PIAAC (Project for International Assessment of Adult Competencies, aka Survey of Adult Skills), including its concept of adult numeracy 2. Social surveys, and several key issues of survey validity 3. Findings for the UK sample and international comparisons; consideration of various interpretations currently circulating

3 PIAAC (Project for International Assessment of Adult Competencies (aka Survey of Adult Skills) Fieldwork in 2011-12, results available in Oct. 2013 Measures: Literacy, Numeracy, and Problem solving in TRE Samples: adults usually 16-65: 5000 [or more*] per country Builds on earlier IALS (1990s) and ALLS (2002-06), BUT … larger sample of 24 “industrial” countries, in 1 st round uses computer administration, allows ‘adaptive routing’, to find appropriate “level” of respondent methodological & fieldwork improvements, e.g. regulation of sampling and fieldwork standards. Some affinity to PISA (15 year-olds): BUT different concepts; PIAAC uses household survey methodology + educ’l testing

4 PIAAC aims Education Directorate at OECD (PIAAC sponsor): helping countries to: Identify and measure differences between individuals and across countries in key “competencies” Relate measures of skills based on these competencies to: individual outcomes, e.g. labour market participation / earnings/ further learning; or to aggregate outcomes, e.g. economic growth, or social equity in the labour market Assess performance of education / training systems, to enhance competencies through formal educational system – or in the work-place, through incentives (Schleicher, 2008)

5 PIAAC concepts and measures (1) OECD: competencies: […] abilities, capacities or dispositions embedded in the individual […] cognitive skills & knowledge base are critical elements, [but] important […] to include other aspects such as motivation and value orientation. Numeracy: the ability to access, use, interpret, communicate mathematical information & ideas, to engage in / manage mathematical demands of a range of situations in adult life. Conceptualisation (PIAAC Numeracy Expert Group, 2009)

6 Social Surveys – a distinctive method Standardised measure for every respondent allows comparison of “like with like” Emphasises representativeness sampling ’ Random’ BUT ALSO produces sampling variation {‘error’} So … need statistical inference, using the SAMPLE (n=5000) significance testing, of hypothesis about the value in the POPULATION e.g. average numeracy score in the UK … or ‘confidence interval’ estimation of the value in the POPULATION: sample estimate + margin of error Thus uses probability to reduce uncertainty: illustrations below

7 Surveys (non – experiments): issues of validity Several concerns: appropriateness of indicators for concepts to be measured … Construct Validity Comparability across countries, or across groups, where one wishes to assess the effect of other differences, such as gender or amount of formal schooling … Internal Validity [Campbell & Stanley (1966), arguing that controlled experiments (now aka RCTs), do not solve everything] representativeness and generalisability of findings outside the research context … External Validity

8 PIAAC concepts and measures (2) To produce measures, must characterise Numerate behaviour, dimensions used in construction / validation of set of items: context (4 types): everyday life, work, societal, further learning response (or ‘cognitive strategy’ – 3 main types): identify / locate / access (information); act on / use; interpret / evaluate. mathematical content ( 4 main types): quantity & number, dimension & shape, pattern & relationships, data & chance. representations (of mathematical / statistical information): e.g. text, tables, graphs. Also Background Questionnaire: demographic & attitudinal information, e.g. level of trust, political efficacy, health + Job-Related Assessment: use of / need for skills at work

9 Methodology (1) the content validity of the definitions of numeracy and numerate behaviour [‘types’ of items] the measurement validity of the items presented, including the administration and scoring procedures [‘qualities’ of items] the reliability of the measurement procedures the internal validity, or validity of (‘effective’) relationships claimed (within the sample), e.g. between skill scores and desirable life outcomes, e.g. wages, employment, health the external validity, or representativeness, for the national population of interest, of the results produced from the sample. … Similar dilemmas for most educational assessment. -and for both Qual. and Quant. educational research -

10 Methodology (2) Content validity: the extent to which a measure represents all facets of a given concept: … Here definition of numeracy based on 4 dimensions of numerate behaviour stipulated: context, content, response, representation. Each item can be categorised on these four dimensions, and the proportion of items falling into each category can be controlled over the scale, so as to enhance the transparency of the operational definition. However, this is a standard definition …(generalising)... How well does it “fit” adults’ lives in any particular country? Further, the four types of context (everyday, work, society and community, further learning) are under-specified: rather general to refer to any actual specific social practice or social context, in any particular respondent’s everyday life. (Evans Wedege & Yasukawa, 2013)

11 Methodology (3) Measurement validity: extent to which person’s responses to set of items actually capture what the conceptualisation of numeracy specifies Depends on the actual range of items used: see 3 illustrative items presented by OECD (2013) / on websites (e.g. CSO Ireland, PIAAC 2012 Results) … and next slide Requires design of procedures for administration of the survey to be standardised across all countries, e.g. training of interviewers / testers; design specs. of the laptops (& software) to be used, and rules for access to calculators and other aids. Full appreciation of the validity of procedures requires assurance of how these procedures are followed in the field … even more crucial when results are compared across countries using different fieldwork teams (see PIAAC Technical Report).

Numeracy – Sample Item 3 12 This sample item(of difficulty level4)focuses on the following aspects of the numeracy construct: Content Quantity andnumber Process Act upon, use (compute) Context Community and society Correct Response: One of thethree values (no values between): 595, 596 or 600.

13 Methodology (4) External validity: includes representativeness of sample for the “population “… check a country’s sample design + other fieldwork aspects, e.g. incentives for completing interview … & judgments depend on knowing about actual field practices. SO any summaries, e.g. mean scores, or gender differences, are sample-based estimates for the population value (of the mean score or size of gender difference...) for country x These interval estimates not exact, but show a margin of error [say, 2* standard errors, on either side -* depends on the level of confidence desired in the estimate] surprises e.g. PIAAC numeracy: overall country results 2013 Japan = 288 Finland = 282 NL / BELG = 280 286 to 290 280 to 284 278 to 282 (overlap !)

14 Methodology (5) Reliability of test administration across countries and across interviewers, especially assuring same standards / practices in marking (problem with past international surveys) … Computer presentation and marking will help greatly. But it may tend to undermine construct validity, if it reduces that range of types of question that can be asked (example)… And, increasing the reliability may lead to concerns about ecological validity, whether the setting of the research is representative of those to which one wishes to generalise the results. For example, on-screen presentation may limit this?

15 Presentation of Results (1) Adult’s performance not expressed as ‘proportion correct’, since adaptive routing some presented with ‘harder’ items So Item Response Theory (IRT) used to (‘psychometrically’) estimate a standardised score (e.g. mean 250, std dev 50) (e.g. Tout, 2013) Then, to make numerical scores meaningful, they are commonly related to one of 5 general ‘levels’ of literacy or numeracy …

PIAAC Proficiency levels: numeracy LevelScore rangeNumeracy Below Level 1 Lower than 176 Tasks at this level require the respondents to carry out simple processes such as counting, sorting, performing basic arithmetic operations with whole numbers or money, or recognising common spatial representations in concrete, familiar contexts where the mathematical content is explicit with little or no text or distractors. 1176-225 Tasks at this level require the respondent to carry out basic mathematical processes in common, concrete contexts where the mathematical content is explicit with little text and minimal distractors. Tasks usually require one-step or simple processes involving counting; sorting; performing basic arithmetic operations; understanding simple percents such as 50%; and locating and identifying elements of simple or common graphical or spatial representations. 2226-275 Tasks at this level require the respondent to identify and act on mathematical information and ideas embedded in a range of common contexts where the mathematical content is fairly explicit or visual with relatively few distractors. Tasks tend to require the application of two or more steps or processes involving calculation with whole numbers and common decimals, percents and fractions; simple measurement and spatial representation; estimation; and interpretation of relatively simple data and statistics in texts, tables and graphs. 3276-325 Tasks at this level require the respondent to understand mathematical information that may be less explicit, embedded in contexts that are not always familiar and represented in more complex ways. Tasks require several steps and may involve the choice of problem-solving strategies and relevant processes. Tasks tend to require the application of number sense and spatial sense; recognising and working with mathematical relationships, patterns, and proportions expressed in verbal or numerical form; and interpretation and basic analysis of data and statistics in texts, tables and graphs. 4326-375 Tasks at this level require the respondent to understand a broad range of mathematical information that may be complex, abstract or embedded in unfamiliar contexts. These tasks involve undertaking multiple steps and choosing relevant problem-solving strategies and processes. Tasks tend to require analysis and more complex reasoning about quantities and data; statistics and chance; spatial relationships; and change, proportions and formulas. Tasks at this level may also require understanding arguments or communicating well-reasoned explanations for answers or choices. 5Higher than 376 Tasks at this level require the respondent to understand complex representations and abstract and formal mathematical and statistical ideas, possibly embedded in complex texts. Respondents may have to integrate multiple types of mathematical information where considerable translation or interpretation is required; draw inferences; develop or work with mathematical arguments or models; and justify, evaluate and critically reflect upon solutions or choices. 16

17 Presentation of Results (2) BUT this is simple, one-dimensional sense … e.g. “levels embody predetermined assumptions about progression and relative difficulty” (Gillespie (2004) referring to UK Skills for Life) Partly because many adults have different “spiky profiles”, distinctive life experiences: some find type A items (e.g. “data & chance”) more difficult; others items type B (e.g. “dimension & shape”).... Some policy makers attempt to stipulate “minimum level of numeracy needed to cope with the demands of adult life” in particular country - BUT not supported by OECD [cf. IALS] …or by Canada (Bussière, Centre for Literacy Webinar, 17 Feb. 2014) … in Australia, debate (see Tout, 2013; Black & Yasukawa, 2014) tends to assume ‘demands’ are the same across countries conflates adults with different work, family, social situations

18 Some interpretations of PIAAC results (1) In each of 24 countries reporting PIAAC results in 2013, the media seem to focus on “prominent results”: You can check them out in your country (cf. Hamilton, Yasukawa & Evans, ESREA 2014) … For example, in the UK … “the UK (England and Northern Ireland) performed significantly below average in numeracy” …

19 Results (2) Results (1)

20 Results (1a) Not only Means … look at the Spreads

21 Some interpretations of PIAAC results (2) Prominent in the UK: the UK (England and Northern Ireland) performed significantly below average in numeracy – with particular problems among the 16-24 age group where the UK came 21st out of 24 industrialised countries. … “UK faces a shrinking pool of skills, with England the only country where the skills of young people are below those of older people.”

22 Results (2)

23 Some interpretations of PIAAC results (3) OECD (UK Country Note: UK, 2013, p2): “The median hourly wage of workers who score at Level 4 or 5 in literacy is 94% higher than that of workers who score at or below Level 1.”

24 Results (3)

25 Results (3a): another correlation

Other results: early impressions 1. Within-country results complex much “fun” for media, politicians, spin-doctors, since 1a.... Some praiseworthy and some regrettable findings for almost everyone 2. Between country results ‘striking’… 2a. e.g. much discussion of age / generation differences – patterns vary widely 2b. but need to allow for sampling variation – and even harder to control for wide range of cultural differences between countries or groups 26

Other results: methodological tools 3. In interpretation of results, beware: A. Is the ‘numeracy’ (literacy, PSTRE) measured an appropriate indicator for the ‘numeracy’ referred to in research, policy and pedagogical discussions? [Construct Validity – several dims.] B. Many of the interesting findings are correlations, but not necessarily causal [Internal Validity ] C. All scores for countries and subgroups are double estimates: sample estimates and “psychometric” (IRT) estimates [External Validity] 27

28 What is to be done? (1) … by researchers and tutors / practitioners working together ** Generalising (E Wedege & Yasukawa): bring research evidence / practitioner experience / to argue (remind) that Adult numeracy is distinctive from School Maths Adult numeracy is distinctive in different settings Adult numeracy is distinctive across different cultures, i.e. different subgroups ANDdifferent countries ** One-dimensionality: Adult numeracy is multi-dimensional ** Need other kinds of research: local surveys, case studies, incl. life histories (cf. Barton & Hamilton, 2012 on local literacies)

29 What is to be done? (2) Examples of possible research topics: a. Are numeracy levels higher in England than in NI; and, if so, why? E.G. Higher educational qualifications – or higher levels of numerate experience at work? b. Why do a higher proportion of males (17%) attain scores at Level 4/5 in Australia on numeracy scale compared with females (9%)? ??? c. Why are the proportions of people at Level 1 (& below) generally highest in the oldest age groups (people aged 60+)? Does this indicate, as sometimes claimed, that “a person’s skills deteriorate over the life-course”?

30 References OECD (2013). OECD Skills Outlook 2013: First Results from the Survey of Adult Skills. Paris: OECD. Online: Online: http://www.oecd.org/site/piaac/#d.en.221854http://www.oecd.org/site/piaac/#d.en.221854 OECD (2013). The Survey of Adult Skills: Reader’s Companion. Paris: OECD. Online: Online: http://www.oecd.org/site/piaac/#d.en.221854http://www.oecd.org/site/piaac/#d.en.221854 OECD (2013). Survey of Adult Skills First Results: Country Note - England and Northern Ireland. Paris: OECD. Online: Online: http://www.oecd.org/site/piaac/#d.en.221854 http://www.oecd.org/site/piaac/#d.en.221854 Evans, J. (2013/14). What to look for in PIAAC results: reading reports from international surveys; paper given at ALM-20; revised to appear in ALM-IJ. Evans, J., Wedege, T. & Yasukawa, K. (2013). Critical Perspectives on Adults’ Mathematics Education; in M. A. Clements, A. Bishop, C. Keitel, J. Kilpatrick and F. Leung (Eds.), Third International Handbook of Mathematics Education, New York: Springer. Tout, D. (2013). Lessons Learned from International Assessments, Fine Print, 36, 2. Black, S. & Yasukawa, K. (2014). Level 3: another single measure of adult literacy and numeracy, Australian Educational Researcher, 41, 2, April, 125-138. Barton, D. & Hamilton, M. (2012). Local Literacies: Reading and Writing in One Community. London: Routledge. Hamilton, M. (2011). Literacy and the Politics of Representation. London: Routledge.

31 Appendices The current 24 participating countries in PIAAC include: 17 EU members, plus USA, Can., Aus, Japan, Korea, possibly Russian Federation. Developing countries are not involved in Round 1, including BRIC (except Russia), And Round 2 includes: Chile, Greece, Indonesia, Israel, Lithuania, New Zealand, Slovenia, Singapore, Turkey. Results expected in 2016. Illustrative items (3 slides): taken from OECD (@013), The Survey of Adult Skills: Reader’s Companion, pp28-30. Available online. Claimed “equivalences” among different qualifications in Engl.

Numeracy – Sample Item 1 32

Numeracy – Sample Item 2 33

34 Equivalences ? Notice: columns 1, 2, and final – neat equivalences claimed between different tests and age groups

