CEM Primary Overview
Purposes of assessment Formative role Means of providing feedback to teachers about on- going progress in learning. It has a direct influence on the quality of pupils’ learning experiences and thus on the level of attainment which can be achieved.
Purposes of assessment Summative role It is the means for communicating the nature and level of pupils’ achievements at various points in their schooling and when they leave.
Purposes of assessment Certification role It is used as a means of summarising, for the purposes of selection and qualification, what has been achieved.
Purposes of assessment Evaluative or quality control role It provides part of the information used in judging the effectiveness of educational institutions and of the system as a whole. (Harlen, Gipps et al, 1992)
Assessment and pupils Pupils are exposed to assessment on a daily basis Observations of play Listening to reading Oral questioning Spelling tests Teacher assessment Standardised assessment National testing e.g. SATS, GCSE Informal Low stakes Formal High stakes The role of assessment can be unclear
Consequential validity The social consequences of using a particular test for a particular purpose Standardised assessment National testing e.g. SATS, GCSE The higher the stakes, the more likely decisions will be made about pupils that may affect their educational future The higher the stakes, the more important it becomes that the assessment is good i.e. it measures fairly and measures well The social consequences of using a particular test for a particular purpose Consequential Validity Some testing experts use consequential validity to refer to the social consequences of using a particular test for a particular purpose. The use of a test is said to have consequential validity to the extent that society benefits from that use of the test. Other testing experts believe that the social consequences of using a test—however important they may be—are not properly part of the concept of validity. Messick (1988) makes the point that ". . . it is not that adverse social consequences of test use render the use invalid but, rather, that adverse social consequences should not be attributable to any source of test invalidity such as construct–irrelevant variance." For example, suppose some subgroups obtain lower scores on a mathematics placement test and, consequently, are required to take developmental courses. According to Messick, this action alone does not render the test scores invalid. However, suppose it was determined that the test was measuring different traits for the particular subgroup than for the larger group, and those traits were not important for doing the required mathematics. In this case, one could conclude that the adverse social consequences (e.g., more subgroup members in developmental mathematics courses) were caused by using the test scores and were traceable to sources of invalidity. In that case, the validity of the test use (course placement) would be jeopardized.
Confidence in the measure Developed from items that are good indicators of later progress Reliable Test/retest (around 0.98) Internal (0.8 – 0.98) Concurrent (between 0.71 and 0.86 with KS2 SATS) Manageable Appropriate for the child’s ability Equitable
Equitable assessment Special educational needs Physical or emotional disabilities Diverse language and cultural backgrounds
An example – the achievement gap Studies from the US and the UK have shown English Language Learners (ELLs) to have lower scores than their native English speaking peers It is not a white vs black issue It is not entirely a socio-economic issue Most studies based on high-stakes assessment
Research Coleman report 1966 Swann report 1985 Harvard meta-analysis Verbal and non-verbal reasoning, reading and maths Swann report 1985 West Indian children underachieving, IQ not significant factor Harvard meta-analysis 11 studies with 23,000 participants ELLs have lower scores in maths and science
Research Department for Education, 2010 TIMSS 2008 On average, children of any black background achieved below the national level in reading, writing, mathematics and science at KS1 and KS2 TIMSS 2008 Children whose parents are both born in the UK were likely to score higher in maths Children who always or almost always spoke language of the test generally more able
Equitable assessment To what extent is the test itself responsible for some of the gap?
Equity & CEM assessment Assessments developed in the UK Stage by stage approach to looking at how the assessments work for international students First evidence comes from Vocabulary assessment Reading assessment For 5-11 year olds
Participants UK-based sample Sample from group of international schools in East- Asia
Analysis Compared the responses of each sample to each question in the test Differential item functioning analysis used to see whether items were easier for one group or another
Reading – Word recognition
Reading – Word decoding
Reading – Spelling
Vocabulary Drenched Daffodil Saxophone Aquarium Lantern Transport Luggage
What does this tell us? The international pupils find some questions harder to answer than others Roughly the same number of questions are easier for the international pupils The sections of CEM assessments analysed do not appear to advantage non-ELL pupils
What can the US studies tell us? Various accommodations have been suggested to level the playing field Making specific changes to the test format or test conditions Varying degrees of success Findings must be: Effective Valid
Effect size Effect size reports the magnitude of the difference in achievement between the two groups 0.2 small effect 0.8 large effect
What worked? 0.15 – a small effect Simplified English English dictionaries or glossaries Bilingual dictionaries or glossaries Tests written in native language Dual language test booklets Dual language questions for English passages Extra time What worked? Kieffer, Lesaux, Rivera and Frances, 2009
Findings Only English language dictionaries or glossaries was found to have a positive and significant effect Practical impact of use of English dictionaries or glossaries might be reduction of achievement gap by between 10% and 25% The accommodation was found to be valid as it did not improve the non-ELL scores This was true for a homogenous group of students Not controlled for other variables e.g. non-verbal ability
Other recommendations Reduce language load – helps all students Include graphical or visual support Include local and situated perspectives in test development Provide alternative norms Test preparation support Not violating ethical norms Should not increase scores without corresponding increase in mastery of the curriculum
Way forward Despite our best efforts, no test is likely to measure all students fairly – so how can standardised tests, which impact upon real lives, be made equitable? Further examination of achievement gaps Appropriate interpretation of standardised scores for certain groups Development of interpretative tool
More information Fairbairn, 2009. Inclusive achievement testing for linguistically and culturally diverse test takers Kieffer, Lesaux, Rivera and Frances, 2009. Accommodations for English Language Learners Taking Large-Scale Assessments: A Meta-Analysis on Effectiveness and Validity. 2005 National Assessment of Educational Progress, National Center for Education Statistics Jamal Abedi, Professor of Education, Vanderbilt University, Tennessee AERA (American Educational Research Association) BERA (British Educational Research Association)
Contact If you are interested in being involved in research on equitable assessment, please contact me. katharine.bailey@cem.dur.ac.uk