Presentation on theme: "Quality Control in Evaluation and Assessment"— Presentation transcript:
1Quality Control in Evaluation and Assessment J Charles Alderson,Department of Linguistics and Modern English Language,Lancaster University
2“Assessment is central to language learning, in order to establishwhere learners are at present,what level they have achieved,to give learners feedback on their learning,to diagnose their needs for further development, andto enable the planning of curricula, materials and activities.”
3Outline Current practice Assessment for certification Tradition one: teacher-centred, school-basedTradition two: central, quality controlledBasic parametersWhat is needed to ensure parameters are met
4Current practice Quality of important examinations not monitored No obligation to show that exams are relevant, fair, unbiased, reliable, and measure relevant skillsUniversity degree in a foreign language qualifies one to examine language competence, despite lack of training in language testingIn many circumstances merely being a native speaker qualifies one to assess language competence.Teachers assess students’ ability without having been trained.
5First tradition · Teacher-centred · School/university-based assessment · Teacher develops the questions· Teacher's opinion the only one that counts· Teacher-examiners have no explicit marking criteria· Assumption that by virtue of being a teacher, and having taught the student being examined, teacher- examiner makes reliable and valid judgements· Authority, professionalism, reliability and validity of teacher rarely questioned· Rare for students to fail
6Second tradition · Tests externally developed and administered · National or regional agencies responsible for development, following accepted standards· Tests centrally constructed, piloted and revised· Difficulty levels empirically determined· Externally trained assessors· Empirical equating to known standards or levels of proficiency
8“Validity in general refers to the appropriateness of a given test or any of its component parts as a measure of what it is purported to measure. A test is said to be valid to the extent that it measures what it is supposed to measure. It follows that the term valid when used to describe a test should usually be accompanied by the preposition for. Any test may then be valid for some purposes, but not for others.”(Henning, 1987)
10How can validity be established? My parents think the test looks good.The test measures what I have been taught.My teachers tell me that the test is communicative and authentic.If I take the Rigo utca test instead of the FCE, I will get the same result.I got a good English test result, and I had no difficulty studying in English at university.
11How can validity be established? Does the test look valid to the general public?Does the test match the curriculum, or its specifications?Is the test based adequately on a relevant and acceptable theory?
12How can validity be established? Does the test yield results similar to those from a test known to be valid for the same audience and purpose?Does the test predict a learner’s future achievements?Note: a test that is not reliable cannot, by definition, be valid
13How can validity be established? A test’s items should work well: they should be of suitable difficulty, and good students should get them right, whilst weak students are expected to get them wrong.All tests should be piloted, and the results analysed to see if the test performed as predicted
14Factors affecting validity Unclear or non-existent theoryLack of specificationsLack of training of item/ test writersLack of / unclear criteria for markingLack of piloting/ pre-testingLack of detailed analysis of items/ tasksLack of standard setting to CEFLack of feedback to candidates and teachers
15ReliabilityIf I take the test again tomorrow, will I get the same result?If I take a different version of the test, will I get the same result?If the test had had different items, would I have got the same result?Do all markers agree on the mark I got?If a marker marks my test again tomorrow, will I get the same result?
16Reliability Over time: test – re-test Over different forms: parallel Over different samples: homogeneityOver different markers: inter-raterWithin one rater over time: intra-rater
17Factors affecting reliability Poor administration conditions – noise, lighting, cheatingLack of information beforehandLack of specificationsLack of marker trainingLack of standardisationLack of monitoring
18Practicality Number of tests to be produced Length of test in time Cost of testCost of trainingCost of monitoringDifficulty in piloting/ pre-testingTime to report results
19Factors affecting practicality Awareness of complexity and costTime to do the job: ‘quick and dirty’ remains dirtyFunding to support development, monitoring and further developmentRecognition of need for training – of testers and of teachers
20Authenticity Genuineness of text Naturalness of task Naturalness of learners’ responseSuitability of test for purposeMatch of test to learners’ needs (if known)Face validityExpectations of stakeholders and culture
21Factors affecting ‘authenticity’ A test is a test is a testAvailability of resourcesTraining of test developers/ item writersRelative importance of reliability over validityPurpose of test: proficiency versus progress or diagnosis
22Washback Test can have positive or negative effects Test can affect content of teachingTest can affect method of teachingTest can affect attitudes and motivationTest can affect all teachers and students in same way, or individuals differentlyImportance of test will affect washback
23Factors affecting washback Extent to which teachers know nature of testExtent to which teachers understand rationale of testExtent to which teachers consider how best to prepare learners for testNature of teachers’ beliefs about teachingEffort teachers are willing to makeDifficulty of test
24Impact Effect of test on society Effect of test on stakeholders: employers, higher education, parents, politiciansIntended and unintendedBeneficial or detrimental
25Factors affecting impact Extent to which purpose of test is understood and acceptedCurrency of testFace validity of testStakes of testAvailability of informationEducation of stakeholders re complexity of testing
26Currency of test Extent to which test is valued by stakeholders Different stakeholders may have different perspectives: university vs employer; parents vs teachers; teachers vs principals? politicians vs professionals?
27Factors affecting currency Consequences of passing or failing – stakesExtent to which stakeholders take results seriously into considerationBeliefs about value of tests in generalExtent to which test matches expectations about tests in general or language tests in particularDifficulty of testInstitution offering the test
28General Issues · Teacher-based assessment vs central quality control · Internal vs external assessment· Quality control of exams (and the associated cost)· Piloting and pre-testing· Test analysis and the role of the expert· The existence of test specifications· Guidance and training for test developers and markers
29General Issues (continued) Feedback to candidatesPass / fail ratesThe currency of the old and the new traditionsThe relationship with other languages and countriesThe standards of the local exams in terms of "Europe"
30Constraints on testing · Time – much less than for teaching · Sample – inevitably limited· Resources always limited – money, infrastructure, trained personnel· Assessment culture / tradition· Lack of awareness of problems and solutions
31BUT WASHBACK · Testing is too important to be left to the teacher · Testing is too important to be left to the tester· Both are needed, to reflect and influence teaching, validly and reliably.
32“Assessment is central to language learning, in order to establishwhere learners are at present,what level they have achieved,to give learners feedback on their learning,to diagnose their needs for further development, andto enable the planning of curricula, materials and activities.”