The objective of this presentation is to address the following points: A. Role and types of evaluation criteria ( theoretical issues ) B. Impact of evaluation criteria on writing performance ( empirical evidence ) C. Creation of evaluation criteria to assess writing ( practice for teachers )
Overview PART ONE: a.What are issues in assessing writing? b.What is fairness? c.What is positive washback? PART TWO: A classroom based study on effect of evaluation criteria on writing performance PART THREE: A practice session on designing task-specific evaluation criteria
What are issues in assessing writing? 1.What to evaluate?Content /and Language 2.How to evaluate?Scales: Holistic/ Analytical 3.How to ensure inter rater reliability? fairness? positive washback?
What is fairness? Choose/create evaluation criteria according to level of learners, task requirements and test purpose( construct validity ) Share evaluation criteria with learners ( justice: inter-learner equity ) Train learners to use evaluation criteria ( access: educational opportunity to learn )
Evaluation criteria: types Only one description of each sub- feature (content, language & organization) Used in large scale assessments Holistic Level-specific descriptions for each sub-feature (content, language & organization) Mostly used to provide feedback Analytical
Role of Evaluation Criteria 1. Assessment of learning To check understanding To check proficiency levels 2. Assessment as learning: Feedback To identify strengths and weaknesses To track growth through formative summative 3. Assessment for learning: Development To make the criteria transparent to learners To document how this effects their writing
Evaluation Criteria: research Evaluation Criteria: research 1.Each task needs a different scale and the criterion should reflect the writing construct ( Hamp-Lyons 1991, 1995 ) 2.What guides rater’s rating: educational background, interpretations of construct of language proficiency and task requirements ( Cumming, Kantor & Powers 2001; Eckes 2005; Fahim & Baijani 2011 ) 3.Correlations between raters’ judgment: how to ensure inter rater reliability ( Wang 2009 )
What is positive washback? ESL/ EFL learners will be able to a.use evaluation criteria as a checklist to fulfill task requirements a.understand assessor’s expectations from tasks in a transparent manner and work to fulfill those b.do self and peer assessments using criteria ( thereby learn to maintain inter-rater reliability and provide feedback to each other ) c.generalize from task-specific criteria and use this knowledge in other writing assessments
The Study Aim: To examine the role of evaluation criteria in writing performance To show this, we look at (a)participants’ perception of criteria and benefit (while and post task) (b)their awareness and use of criteria in writing Context: A course at PhD level titled ‘Language Testing and Assessment’ was where this study was conducted. The course had a formative assessment model – each assessment had task specific evaluation criteria which were shared with the learners prior to doing the tasks. An in-depth study was done to get evidence of learning through writing assessment. (assessment for and as learning) fairness positive washback
Research questions If task specific criteria are provided to adult ESL learners, (i) will they benefit from this knowledge? (ii) what kinds of benefit will they experience?
Method of data collection 13 adult learners (8 female), 24 to 45 years of age, participated in the study. 8 participants had prior teaching experience and 2 of them reported to have used criteria in assessment. Stage 1:making available task-specific criteria Stage 2:perception of criteria Stage 3: using criteria (implicit training) Stage 4: talking about benefit(s)
Example of a writing assessment Task prompt: Look at the proficiency test. This was used as an entrance test for BA English programme at EFL-U. Does this test pass all the five principles of assessment (authenticity, reliability, validity, practicality and washback)? Justify your stance with relevant examples. Write a critical response in about 500 words. Evaluation criteria: 1.Does the response contain an overall thesis statement and comments on all the five principles? Is each principle justified with at least one example?(content) 2.Is the response written in academic language (e.g., passivization, linkers, voice) and includes referencing details? (language) 3.Is the response presented in three parts ( intro-body-conclusion ) with adequate links between them? Are ideas linked at intra and inter sentential levels?(organization)
Method of analysis Qualitative analysis of perceptions from two sources: A. participants B. tutor as evaluator to capture instances of learning ( positive washback ).
Measurement of learning ( positive washback ) Do the participants a.experience an ease in planning and performing on tasks? b.understand assessor’s expectations for each task? a.use criteria for self and peer assessment meaningfully? b.reflect on strengths and weaknesses post performance? c.generalize planning and writing techniques to write critical responses in the course and outside of it?
Were benefits experienced? Overall usefulness of evaluation criteria100% Usefulness of task-specific criteria to complete tasks 94.5% Use of criteria as a checklist to revise before submission 98.1% Usefulness of weight allotted to each feature 96.3% Use of criteria to understand assessor's expectations 96.3% Usefulness of analytical criteria with level-specific description 94.5% Participants reported benefit at the level of planning and post task reflection, at 96%. This was experienced due to availability of evaluation criteria to complete writing assessments.
1a. Examples: participants’ responses I liked the idea of writing with the prompt and evaluation criteria as it helped me to produce responses that were clear and to a greater extent, up to the assessor’s expectations. By the end of the course my response to using the evaluation criteria to plan and write my assignments improved. I think that it is a very significant and necessary aspect of writing an assignment. For the other courses, where we did not receive any evaluation criteria I tried to speculate the expectations of the assessor and create the criteria and then write the assignment. (S:VI) ease in planning understand assessor’s expectations generalize techniques to other pieces of writing – post course application
1b. Examples: participants’ responses I could not follow the evaluation criteria that much meaningfully for the first time. The problem was not obviously with the criteria, but with my understanding of the nature of assignment… But later on, day by day I had been trying to build a sort of familiarity or say rapport with the evaluation criteria, and started adjusting my writing into the criteria. My later assignments would manifest how much labor I devoted to follow those criteria. And the result was satisfactory. I was happy, indeed. (S:RU) ease in using criteria positive reflection post performance
Summary of benefits Post task 1.useful to complete peer assessment and provide feedback 2.understand strengths and weaknesses in one’s own work, especially in content development (gaps in providing evidence to support claims) During task 1.crucial to finish tasks/ assignments on the course in an organized manner 2.understand different levels of performance and check before submission which level has their response met 3.understand assessor’s expectation(s) and features (content-organization-language) that were part of different levels of performance Source: Participants’ responses
2. Peer evaluation In the course there was one assessment task where the participants had to critique a test for its degree of usefulness. Evaluation criteria to complete the task was given to the participants before they attempted the task. They reported that they had used the criteria while working on the task. Later, the same criteria was used by them to do peer assessment on the same task. It was found that the correlation of the peer assessment to the tutor’s assessment was at r=.79. This was a high positive correlation indicating a high degree of inter-rater reliability. In a one-to-one discussion ( through discussion board on the internet ), the participants said that they found peer evaluation methodical because of use of task specific criteria. They could understand the direction in which the writing task had to be attempted and could give appropriate scores and feedback to their peers.
2a. Example 1. Did the criteria help you in assessing the response of your peer? If yes, then why? Yes, the criteria helped me assessing my peer because it allowed one to look for specifics in the answer and score against that. 2. If you were not given the criteria but only the prompt then would your assessment have differed? If yes, then in what way? Would you have been able to justify the scores that you would have given as a holistic score or analytical score? Which score type would you be likely to give in the absence of a criteria? Yes, if the criteria was not given then scoring would not have easy and it would not have been based on the specifics. Also, the justification of the scores would have been difficult. The scoring without criteria would have been a holistic one. 3. When you were given back your response as evaluated by your peer, did you agree on the scoring or disagree? Explain why you agreed or disagreed. I agreed with the scoring because it was objectively scored against the criteria given.(S:SH) positive washback: inter-rater reliability, feedback
3. Tutor’s assessment Content 1.Some attempts at forming an opinion and justifying it through elaboration and examples. 2.Most of the key ideas present. 3.Argumentation is weak. Organization 1.Macro coherence attempted (all the key ideas were presented in their proper order). 1.Signaling of ideas present (organizational details of the paper presented). 3.Micro coherence not well developed (links between paragraphs and sentences not well developed). Source: Tutor as evaluator
Why were the benefits experienced? 1.Cognitively it made tasks easier as it broke them down into manageable bits (e.g., key ideas, text structure). 2.It drew learners’ attention to structure content coherently and present the ideas in an academic manner. ( comprehensible output, Schmidt 2001, 2010 ) 3Provided learners with a checklist to edit and revise work prior to submission. So criterion was made available to the participants and this yielded positive washback. ( Hughes 2003 )
Why were benefits experienced? 4. Noticing specific details of tasks to do peer assessment helped learners process ideas at a deeper level. Consequently, they could give each other meaningful feedback on responses. (Robinson 2009) 5. Learners felt responsible for what they had written and evaluated: they learned to focus closely on content development. For instance use of appropriate examples to substantiate a claim was noticed by the learners due to the twin use of evaluation criteria. This created an atmosphere of democratic method of assessment that lead to further instances of learning (positive washback). (Shohamy 2002)
2. Assessment as learning (formative) 3. Assessment for learning (formative +…) 1. Assessment of learning (summative) Approaches to assessment Nitko 1983, 1989; Earl 2003; English language arts curriculum, British Columbia 2006 ; Ontario Report 2010
Assessment as and for learning Fairness (Kunnan 2000) Washback (Hughes 2003) Content & language development
Pedagogical implications Assessment can and should be used to support learning. Free response items should have task-specific evaluation criteria. Criteria can be shared to raise awareness, notice task requirements, revise documents track growth positive washback
We need to design and share evaluation criteria with our learners because it can : a) ensure fairness b) give rise to instances of learning (positive washback)
Evaluation criteria: examples TASK:You wish to subscribe the magazine READER’S DIGEST. Write a letter in 100-150 words to the editor requesting him/ her to give you the subscription details. In your letter, you can ask about the subscription rate, mode of payment, delivery and any other query that you may have. Option 1:General criteria You will be graded on content, language and organization. Option 2:Task-specific criteria Enquires about subscription details, mode(s) of payment, details of delivery, time to be taken, whom to contact in case of problems(Content) Uses vocabulary appropriate to express each language function and a variety of sentence structures accurately.(Language) Begins with a formal address to the editor and expresses interest about the magazine presents all enquiries about the subscription concludes by thanking the editor and intends to receive information at the earliest (Organization)
TASK 1 Being and looking fair is important. Do you agree? Discuss with reference to the following pictures. Write your answer in 100-150 words. Picture APicture B
Anand, Ayesha, Barka, Clementine, Jayant, Kezo, Manish, Remya, Rukan, Shehla, Sunitha, Suraj and Vrishali. Thank you for your participation and timely responses without which this project would have remained unfulfilled Anand, Ayesha, Barka, Clementine, Jayant, Kezo, Manish, Remya, Rukan, Shehla, Sunitha, Suraj and Vrishali. Thank you for your participation and timely responses without which this project would have remained unfulfilled Acknowledgements
References Brown, J. D., and Abeywickrama, P. (2011). Language assessment: principles and classroom practices (2 nd Edn). Pearson Education. Earl, L. (2003) Assessment as Learning: Using Classroom Assessment to Maximise Student Learning. Thousand Oaks, CA, Corwin Press. Hughes, A. (2003). Testing for language teachers. Cambridge: Cambridge University Press. Kunnan, A. J. (2000). Fairness and validation in language assessment. Studies in Language Testing 9. Cambridge: Cambridge University Press. Reid, J.M., 1993. Teaching ESL Writing. Prentice-Hall, New Jersey. Schmidt, R. (2010). Attention, awareness, and individual differences in language learning. In W. M. Chan, S. Chi, K. N. Cin, J. Istanto, M. Nagami, J. W. Sew, T. Suthiwan, & I. Walker, Proceedings of CLaSIC 2010, Singapore, December 2-4 (pp. 721-737). Singapore: National University of Singapore, Centre for Language Studies. Shohamy, E. (2001). The power of tests: a critical perspective on the use of language tests. Pearson Education. Upshur, J.A., Turner, C.E., 1995. Constructing rating scales for second language tests. ELT Journal 49 (1), 3–12.