Quality Control in Evaluation and Assessment

Quality Control in Evaluation and Assessment
J Charles Alderson, Department of Linguistics and Modern English Language, Lancaster University

“Assessment is central to language learning, in order
to establish where learners are at present, what level they have achieved, to give learners feedback on their learning, to diagnose their needs for further development, and to enable the planning of curricula, materials and activities.”

Outline Current practice Assessment for certification
Tradition one: teacher-centred, school-based Tradition two: central, quality controlled Basic parameters What is needed to ensure parameters are met

Current practice Quality of important examinations not monitored
No obligation to show that exams are relevant, fair, unbiased, reliable, and measure relevant skills University degree in a foreign language qualifies one to examine language competence, despite lack of training in language testing In many circumstances merely being a native speaker qualifies one to assess language competence. Teachers assess students’ ability without having been trained.

First tradition · Teacher-centred · School/university-based assessment
· Teacher develops the questions · Teacher's opinion the only one that counts · Teacher-examiners have no explicit marking criteria · Assumption that by virtue of being a teacher, and having taught the student being examined, teacher- examiner makes reliable and valid judgements · Authority, professionalism, reliability and validity of teacher rarely questioned · Rare for students to fail

Second tradition · Tests externally developed and administered
· National or regional agencies responsible for development, following accepted standards · Tests centrally constructed, piloted and revised · Difficulty levels empirically determined · Externally trained assessors · Empirical equating to known standards or levels of proficiency

Basic parameters Validity Reliability Practicality Authenticity
Washback Impact Currency

“Validity in general refers to the appropriateness of a given test or any of its component parts as a measure of what it is purported to measure. A test is said to be valid to the extent that it measures what it is supposed to measure. It follows that the term valid when used to describe a test should usually be accompanied by the preposition for. Any test may then be valid for some purposes, but not for others.”(Henning, 1987)

Validity Rational, empirical, construct Internal and external validity
Face, content, construct Concurrent, predictive Construct

How can validity be established?
My parents think the test looks good. The test measures what I have been taught. My teachers tell me that the test is communicative and authentic. If I take the Rigo utca test instead of the FCE, I will get the same result. I got a good English test result, and I had no difficulty studying in English at university.

Does the test look valid to the general public? Does the test match the curriculum, or its specifications? Is the test based adequately on a relevant and acceptable theory?

Does the test yield results similar to those from a test known to be valid for the same audience and purpose? Does the test predict a learner’s future achievements? Note: a test that is not reliable cannot, by definition, be valid

A test’s items should work well: they should be of suitable difficulty, and good students should get them right, whilst weak students are expected to get them wrong. All tests should be piloted, and the results analysed to see if the test performed as predicted

Factors affecting validity
Unclear or non-existent theory Lack of specifications Lack of training of item/ test writers Lack of / unclear criteria for marking Lack of piloting/ pre-testing Lack of detailed analysis of items/ tasks Lack of standard setting to CEF Lack of feedback to candidates and teachers

Reliability If I take the test again tomorrow, will I get the same result? If I take a different version of the test, will I get the same result? If the test had had different items, would I have got the same result? Do all markers agree on the mark I got? If a marker marks my test again tomorrow, will I get the same result?

Reliability Over time: test – re-test Over different forms: parallel
Over different samples: homogeneity Over different markers: inter-rater Within one rater over time: intra-rater

Factors affecting reliability
Poor administration conditions – noise, lighting, cheating Lack of information beforehand Lack of specifications Lack of marker training Lack of standardisation Lack of monitoring

Practicality Number of tests to be produced Length of test in time
Cost of test Cost of training Cost of monitoring Difficulty in piloting/ pre-testing Time to report results

Factors affecting practicality
Awareness of complexity and cost Time to do the job: ‘quick and dirty’ remains dirty Funding to support development, monitoring and further development Recognition of need for training – of testers and of teachers

Authenticity Genuineness of text Naturalness of task
Naturalness of learners’ response Suitability of test for purpose Match of test to learners’ needs (if known) Face validity Expectations of stakeholders and culture

Factors affecting ‘authenticity’
A test is a test is a test Availability of resources Training of test developers/ item writers Relative importance of reliability over validity Purpose of test: proficiency versus progress or diagnosis

Washback Test can have positive or negative effects
Test can affect content of teaching Test can affect method of teaching Test can affect attitudes and motivation Test can affect all teachers and students in same way, or individuals differently Importance of test will affect washback

Factors affecting washback
Extent to which teachers know nature of test Extent to which teachers understand rationale of test Extent to which teachers consider how best to prepare learners for test Nature of teachers’ beliefs about teaching Effort teachers are willing to make Difficulty of test

Impact Effect of test on society
Effect of test on stakeholders: employers, higher education, parents, politicians Intended and unintended Beneficial or detrimental

Factors affecting impact
Extent to which purpose of test is understood and accepted Currency of test Face validity of test Stakes of test Availability of information Education of stakeholders re complexity of testing

Currency of test Extent to which test is valued by stakeholders
Different stakeholders may have different perspectives: university vs employer; parents vs teachers; teachers vs principals? politicians vs professionals?

Factors affecting currency
Consequences of passing or failing – stakes Extent to which stakeholders take results seriously into consideration Beliefs about value of tests in general Extent to which test matches expectations about tests in general or language tests in particular Difficulty of test Institution offering the test

General Issues · Teacher-based assessment vs central quality control
· Internal vs external assessment · Quality control of exams (and the associated cost) · Piloting and pre-testing · Test analysis and the role of the expert · The existence of test specifications · Guidance and training for test developers and markers

General Issues (continued)
Feedback to candidates Pass / fail rates The currency of the old and the new traditions The relationship with other languages and countries The standards of the local exams in terms of "Europe"

Constraints on testing
· Time – much less than for teaching · Sample – inevitably limited · Resources always limited – money, infrastructure, trained personnel · Assessment culture / tradition · Lack of awareness of problems and solutions

BUT WASHBACK · Testing is too important to be left to the teacher
· Testing is too important to be left to the tester · Both are needed, to reflect and influence teaching, validly and reliably.

“Assessment is central to language learning, in order
to establish where learners are at present, what level they have achieved, to give learners feedback on their learning, to diagnose their needs for further development, and to enable the planning of curricula, materials and activities.”

Quality Control in Evaluation and Assessment

Similar presentations

Presentation on theme: "Quality Control in Evaluation and Assessment"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Quality Control in Evaluation and Assessment

Similar presentations

Presentation on theme: "Quality Control in Evaluation and Assessment"— Presentation transcript:

Similar presentations

About project

Feedback