ASSESSING SPEAKING – PURPOSES AND TECHNIQUES

ASSESSING SPEAKING – PURPOSES AND TECHNIQUES
Prepared by Elena Onoprienko, Yulia Polshina, Tatiana Shkuratova Based on material by Fumiyo Nakatsuhara

Outline Key questions Nature of speaking Speaking as a skill
Test purposes and types of test Speaking test tasks Scoring Washback effect These issues will be raised in the lecture to serve as a theoretical springboard for practice of administering, rating and designing tests.

Key questions

Key questions What is speaking? Why assess speaking? Construct Purpose
Task types Scoring criteria How (test)? How (score)? These issues will be raised in the lecture to serve as a theoretical springboard for practice of administering, rating and designing tests.

The nature of speaking

Nature of speaking: spoken language; speaking as interaction;
speaking as a social activity; speaking as a situation-based activity. Before speaking about assessment it is logical to discuss what speaking is. It is essential to know as precisely as possible what we are going to assess. We shall look at speaking from different perspectives. This will help to understand that speaking is an essential part of people’s everyday life and what it means to be able to speak. What features of speaking are to be assessed and what should be taken into consideration when designing a test and developing criteria?

What is speaking? A part of the shared social activity of talking (Luoma, 2004: 29). In comparison with writing, speaking is More: transient dynamic interpersonal content dependent. Less: planned complex formal lexically dense. Partly, the information is taken from Rebecca Hughes’s picture on pp in the book Teaching and Researching Speaking 2002, Pearson Education Ltd.

Speaking vs Writing The main differences are in two sets of conditions - processing and reciprocity: Processing is connected with time - speaking is going on under greater pressure of time. Solution to this problem in spoken language – reciprocity. Speakers take turns and create a text together. (Bygate,1987) It is important to remember that short turns are more common in spoken interaction : they are usually more spontaneous loosely-strung-together phrases, rather than neat sentences. Long turns, e.g. oral presentations or lectures, require more planning decisions, and thus often tend to be more prepared. So decisions are to be made how much time test takers need for the preparation of long turns. Reciprocity is concerned with who has speaking rights and with sharing responsibility in maintaining an interaction. A valid test of spoken interaction must include reciprocity. This means that candidates must be involved in an interaction to a greater extent than that of merely answering questions. (Weir, 1993)

Spoken language Pronunciation Spoken grammar Lexis
Three features of spoken language which distinguish spoken from written language are pronunciation, grammar and lexis. The grammar and vocabulary commonly used in speech tend to be rather different from those used in writing.

Pronunciation Speech is judged on the basis of pronunciation.
What is standard? Native speaker vs non-native speaker. Communicative effectiveness, which is based on comprehensibility and probably guided by native speaker standards but defined in terms of realistic learner achievement, is a better standard for learner pronunciation. (Luoma, 2004). What to include in assessment of pronunciation? Pronunciation – individual sounds, pitch, volume, speed, pausing, stress and intonation. Pronunciation is the first thing that people tend to judge while listening, and they often do so on the basis of native/non-native speaker standard. However, this is questionable. There are two points to consider. First, in the contemporary world it is difficult to determine who a ‘native’ speaker is. And secondly, research shows that very few learners can achieve native-like level of pronunciation – even though they can be very efficient in communicative situations. There are some more psychological and social reasons, for example, an accent is considered as a characteristic of a learner's personal identity and may not be something he/she wishes to eliminate. Another question in assessing the sounds of speech is whether to take into account all the features of pronunciation or not. The answer can be – it depends on the purpose of assessment. Thus, in designing the criteria, developers need to consider the type of information about the sound of speech that they need.

Spoken grammar: grammar is easy to judge because it is easy to detect in speech and writing; speakers do not usually speak in sentences; speech consists of idea units connected with and, or, but, or that; planned vs unplanned speech – complex structures vs short idea units; the internal structure of idea units - topicalisation and tails create an impression of naturalness. Learners' progress is often tracked according to the grammatical forms that they can produce accurately. Grammar is easy to judge because it is easy to detect and because fully fledged grammars of most languages are available for use as performance standards. However, the grammar evaluated in assessing speaking should be specifically related to the grammar of speech. Some specific features of spoken grammar are (see slide). Idea units are about 2 seconds or seven words long or shorter because speakers try to communicate ides and listeners need to comprehend them in real time and this means working within the parameters of speakers’ and listener's working memory. In some situations complex grammatical structures and influence of written grammar is common – lectures, speeches, presentations. These situations are examples of planned speech. It is in unplanned speech that short idea units are used. And even in planned speech sentences are shorter than in written speech because listeners have to understand them in real time. Two structures that clearly belong to spoken-like language are topicalisation and tails. Topicalisation, or thematic fronting gives special informational emphasis to the initial element of a clause in informal speech, as in Joe,his name is. The aim is to emphasize the topic. Tails are noun phrases that come at the of a clause. By using tails the speakers emphasize the comment they make at the beginning of the sentence. Both of them create an impression of naturalness.

Features of spoken lexis:
‘simple’ and ‘ordinary’ words are common in normal spoken discourse and mark a highly advanced level of speaking skills (Luoma, 2004); generic words (important for the naturalness of talk); vague words; fixed conventional phrases; ‘small words’ (the more – the better perceived fluency). Many rating scales include descriptions of vocabulary and at the highest levels they describe it as an ability to express oneself precisely and demonstrate the richness of one’s lexicon. However simple words are common in spoken discourse. Generic words are also common, because the things and activities people talk about are seen or familiar to them. Vague words are also a common feature of informal talk. They help the speaker to go on regardless of a missing word. Fixed conventional phrases or sometimes they are called filler or hesitation markers give speakers time to plan and formulate their speech So, they are common in native speakers’ talk. Studies show that even advanced learners produce a much narrower range of ‘spoken-like’ expressions and discourse markers than native speakers.

Slips and errors Normal speech contains a fair number of slips and errors such as mispronounced words, mixed sounds, and wrong words due to inattention (Luoma, 2004). One of the controversial issues in assessing speaking is how to treat mistakes and errors. Slips and errors are normal, but in assessing speaking they acquire special significance. So, raters should be trained not to count each ‘error’ they hear.

Speaking as a skill

Speaking as a skill What is skillful speech? task fulfillment/content;
fluency; accuracy; vocabulary and grammar range; interaction. So, in general what we assess in speaking and what can be found in the majority of scoring scales, and what correlates with the ideas discussed earlier are task fulfillment or content, fluency, accuracy, vocabulary and grammar range and interaction.

Speaking as meaningful interaction
Speaking is both personal and a part of the shared social activity of talking. The openness of meanings is not only a convenience in speech; it is also an effective strategy for speakers. (Luoma, 2004) Chatting vs information-related talk. The role of speaking situations. Roles, role relationships and politeness. Speaking is a social event and production of speech depends greatly on a partner/partners. In a typical spoken interaction two people talk about things that they think are mutually interesting and relevant in the situation. Their aim can be to pass time, to amuse each other, share opinions, or get something done. When people talk they search for meaning, and meaning is not always clear and explicit. Moreover, people know that what is said can have not one meaning. Native speakers use vague expressions for different purposes automatically and this is strategically a highly skillful way to achieve these aims. According to purpose we can differentiate between chatting and information-related talk. The natural appearance of open meanings in a discussion involving a learner is a clear sign of highly advanced speaking skills, Learner’s involvement in such communication and ability to produce indirect utterances is a sign of naturalness which is not easy for a rater to notice and assess unless they are trained to do so. (Luoma, 21) Brown et al. (1984) define chatting as the exchange of amicable conversational turns with another speaker (Luoma S.) The primary purpose is to make and maintain social contact. Chatting involves personalities, their social behavior and cultures. Appropriate topics for chatting differ between cultures and this can cause difficulties for assessing speaking. In assessment chatting is present in an initial stage of test interaction. Information-related talk refers to speech aimed at transferring information about some topic. (also from Brown et al.) The most important point about it is getting the message across and confirming that the listener has understood it. The features of information-related talk are – establishing common ground, giving information in chunks, logical progression, questions, repetitions and comprehension checks. According to Brown and Yule (1983) there are five information-oriented tasks for language learning, including telling a story from a picture. Good story telling routines are important for speakers, as they are most common types of chatting involving personal stories about accidents or embarrassing situations. (Luoma, 24) Social and situational context influence greatly what is said in a speaking event. For example, in a role play examiners and test takers take different social roles (patients and health workers), and the way they play these roles (norms and manner, spirit) can have a great impact on the performance. Another feature that influences speaker’s choice of words is speaker role and role relationships. Politeness is usually the reason why people do not communicate ‘maximally efficiently’ as they would if they followed Grice’s conversational maxims: Quantity: give sufficient information but not too much; Quality: say only what you know to be true; relation: be relevant; Manner: be clear, brief and orderly. Politeness is difficult for assessment because it is interpersonal and social, and the social relationship between test participants is artificial.

Why assess speaking? No single answer:
different groups of language learners have different needs, such as: international travellers: language for travel, leisure; migrants: survival skills, access to employment; students: exams, academic communication, social interaction; professionals: workplace communication, presentations. different users have different purposes when they seek information from tests; but most users of language do need to speak. Different people have different needs – at a personal level and at a professional level. If the learner realises that the interviewer is interested in his personal needs to adapt the test accordingly, he will respond to that expression of interest. He will probably have more to say about topics that concern him personally (Underhill, 2004:18-19). And most users of language do need to speak. The objective of teaching spoken language is the development of the ability to interact successfully in that language, and this involves comprehension as well as production. In an oral test tasks should form a representative sample of the population of oral tasks that we expect candidates to be able to perform. Moreover, these tasks should elicit behaviour which truly represents the candidates’ ability (Hughes, 2012:113).

TEST PURPOSES AND TYPES OF TEST

Test purposes and types of test
proficiency tests achievement tests placement tests diagnostic tests. As with all skills speaking can be tested for different purposes, so the same test purposes are applied for speaking.

What do we need to decide before giving a speaking test?
what aspects of language we want to assess; how to elicit ratable language samples from test- takers suitable for the aspects of language. We need to decide: rating criteria [marking categories, levels, descriptors] [holistic scales vs. analytical scales]; elicitation techniques / test format (types of questions, task types). We must analyse the kind of speaking that we need to assess in a particular assessment context in terms of social and situational needs. Also, we must remember that speaking is interactive when we design rating criteria and procedures, and reward candidates when they repeat or mirror the other speaker’s phrases and structures or develop topics by referring to earlier turns and building on them, because this shows that they know how to work interactively with other speakers. Thus, the aim in all assessment is to focus on the right thing. This provides the basis for construct validity (Luoma, 2011: 27-28).

Performance testing Performance testing in second language proficiency assessment is traditionally used to describe the approach in which a candidate produces a sample of spoken or written language that is observed and evaluated by an agreed judging process. (McNamara, 1996) Speaking is an activity, so to test whether learners can speak, it is necessary to get them to take part in direct spoken language activities. Performance tests assess what examinees can do with the language rather than what they know.

What is performance testing?
sample of written or spoken language; simulates behaviour in the real world - not like paper-and-pencil ‘objective’ tests; observed and evaluated by an agreed judging process.

Speaking tasks A communicative task is a piece of classroom work which involves learners in comprehending, manipulating, producing or interacting in the target language while their attention is principally focused on meaning rather than form… (Nunan 1993:59). Speaking tasks can be seen as activities that involve speakers in using language for the purpose of achieving a particular goal or objective in a particular speaking situation (Bachman and Palmer 2010). . Tasks outline the content and general format of the talk to be assessed and also provide the context for it. Luoma, Assessing Speaking, p. 30 Luoma, Assessing Speaking, p. 31

Types of information-related talk
Factually-oriented talk: description narration instruction comparison. Evaluative talk: explanation justification prediction decision. (Bygate,1987) One of the key decisions in task design is what the speakers will be asked to do with the language. Bygate differentiates between factually oriented talk and evaluative talks of informational talk: description, instruction, storytelling and opinion expressing/justification. Another way to evaluate examinees’ speaking is to analyze the actions they perform when they say something. This approach was introduced by J.L. Austin (1962), who called the actions speech acts. Using language for real-life purposes is concerned with functional competence. Language functions can be an important element in designing speaking assessment.

Communicative functions
‘Microfunctions’ according to CEFR: giving and asking for factual information (describing reporting, asking); expressing and asking about attitudes (agreement/disagreement); suasion (suggesting, requesting, warning); socialising (attaching attention, addressing, greeting, introducing); structuring discourse (opening, summarising, changing the topic); communication repair (signalling non-understanding, appealing for assistance, paraphrasing). (Council of Europe, 2001:123, Luoma, 2004:33) Another way to evaluate examinees’ speaking is to analyze the actions they perform when they say something. This approach was introduced by J.L. Austin (1962), who called the actions speech acts. Using language for real-life purposes is concerned with functional competence. Language functions can be an important element in designing speaking assessment. The CEF divides functions into Macro functions and micro functions. Macrofunctions refer to spoken or written language serving the same functional purpose, such as description, narration, explanation etc. Microfunctions are connected with individual actions.

Features of a speaking task:
input, or material used in the task; roles of the participants; settings, or classroom arrangements for paired or group work; actions, or what is to happen in the task; monitoring, or who is to select input, choose role or setting, alter actions; outcomes as the goal of the task; feedback given as evaluation to participants. Candlin (1987) cited by Fulcher (2003) A primary concern in designing a task is to elicit enough speech to allow a rating to take place. That is why we need to create conditions to facilitate examinees in production of speech. These characteristics can be used to describe tasks and to select tasks for speaking assessment.

Speaking test task formats
Individual Paired Group Open-ended tasks Structured tasks It is important to set the format of a test task, whether the test takers will be assessed individually, in pairs or in groups. will be . The most typical individual format is an interview format. Paired – interview, instructing another examinee. Group – discussion, role play. All of them have both advantages and disadvantages. Another distinction is in the amount of structure that the tasks provide for the test discourse. Open-ended speaking tasks guide the discussion but allow room for different ways of fulfilling the task requirement. Structured - specify quite precisely what examinees should say. (Luoma, 2004: 48)

Advantages and disadvantages of an interview
+ tester’s control over interaction + opportunity for an examinee to show the range of their speaking skills - it is costly in terms of tester’s time - interviewer’s power over an examinee

Advantages and disadvantages of paired formats
+ Capable of eliciting more symmetrical contribution to the interaction from test-takers + Capable of eliciting much richer and more varied language functions + Positive reaction from test-takers (less anxious), a sign of positive washback effect + Practical: time-efficient, cost-effective, less burden and less training for the examiners - The amount of responsibility on examinees who are not trained in interview techniques It is important that the task is sufficiently clear and sufficiently motivating for all participants, and that everyone understands the rules for managing interaction and providing each other with the opportunities to speak. (Luoma, 39)

Advantages and disadvantages of group formats
+ Well-received by learners + Support learning - Difficult to administer and manage (size of the groups and mixture of learners’ abilities) Difficult to monitor the progress of the testing Although it is well-received by learners, this format is rarely used in formal testing because of the many constraints. Not only is it difficult to administer – to decide how many students there should be in a group and how to mix St-s with different levels of language us, but also it is difficult to assess because an examiner has to keep track of all test takers’ contributions and be able to make judgments at the same time. At the same time students should know the criteria and know the rules of managing the discussion, so that everyone could provide each other with an opportunity to speak. For all the reasons mentioned, it is very popular I and highly recommended for classroom assessment.

SPEAKING TEST TASKS

Speaking test tasks: oral presentation (verbal essay, prepared monologue); information transfer (description of picture sequence, questions on a single picture, alternative visual stimuli); interaction tasks (information gap: student – student, student – examiner, open role play, guided role play); interview (free, structured); discussion (student-student, student-examiner). (O’Sullivan, 2008: 10-11) The details about advantages and disadvantages are in Barry O’Sullivan’s article Notes on Assessing Speaking (pp 10-11) which will be discussed in detail at seminars. This slide represents his classification of test tasks in terms of discourse type, input, freedom, interaction and other.

Framework for designing test tasks
Operations (activities/skills) - informational routines (e.g. telling a story) and improvisational skills (negotiation of meaning and management of interaction) Conditions under which the tasks are performed (e.g. time constraints, the number of people involved and familiarity with each other) Quality of output, the expected level of performance in terms of various relevant criteria, e.g. accuracy, fluency or intelligibility. (Weir, 1993: 30) To sum up, the framework for designing a task is suggested by Weir. It shows what important factors should be taken into consideration when creation a task.

scoring

Developing criteria for assessing speaking
The importance of double marking for reducing unreliability is undeniable. These criteria need to reflect the features of spoken language interaction the test task is designed to generate. The criteria used would depend on the nature of the skills being tested and the level of detail desired by the end users. The crucial question would be what the tester wants to find out about a student’s performance on appropriate spoken interaction tasks. (Weir, 1993, p.30) After designing a task, we need to think of the rating criteria.

Rating criteria Phonological control; Grammatical accuracy; Vocabulary range; Fluency (Council of Europe 2001) Test format: interview format with the following structure: Openings (1 minute). Conversation on familiar topics (3 minutes) The interviewer asks the candidate to talk about him/herself. Picture Description (2 minutes) The interviewer asks the candidate to describe a photo. Conversation on topics from the given picture (5 minutes) The interviewer asks the candidate questions linked to the picture (from general to extended questions). Closings (1 minute). (Nakatsuhara, 2012) This is an example of rating criteria and an interview format of a test.

Scoring Holistic scale e.g. Trinity College Bands A, B, C, D
Analytic scale e.g. IELTS Fluency and coherence Lexical resources Grammatical range and accuracy Pronunciation Speaking scores express how well the examinees can speak the language being tested. They usually take the form of numbers, but they also can be verbal categories such as “excellent” or “fair”. In addition to the plain score , there is usually shorter or longer statement that describes what each score means, and the series of statements form lowest to highest constitutes a rating scale. (Luoma, 2004: 59)

Holistic rating scales
Positive features: practicality: fast; rating holistically may be more naturalistic. Disadvantages: no useful diagnostic information: single score; not always easy to interpret: raters not required to use same criteria to arrive at score. Holistic scales express an overall impression of an examinee’s ability in one score. When holistic scales are used as rating scales, the raters may be asked to note several different features in the performances or pay attention to overall impression only. Holistic scales are practical for decision making because they only give one score. They make rating quick because there is less to read and remember. They are flexible – they allow many combinations of strengths and weaknesses within a level. The other side of the coin – they are not practical for diagnosing strengths and weaknesses. Differences between levels depend too much on quantifiers like many, a few and few, or quality words like adequately and well and not enough on concrete differences. (Luoma, 2004: 60-62)

Analytic rating scales
Positive features: can provide diagnostic information if scores reported separately; potentially clear, explicit and detailed; usually more reliable (multiple scores); useful in training raters to focus on our construct; potentially useful in guiding learners. Disadvantages: time-consuming; may overburden raters. (Green, 2012) Analytic scales contain a number of criteria, usually three to five, each of which has descriptors at the different levels of the scale. The scale forms a grid, and the examinees usually get a profile of scores, one for each of the criteria. THE ADVANTAGES : detailed guidance that they give to raters, rich information they provide on specific strengths and weaknesses in examinee performances. (Luoma, 2004: 68)

The role of an interviewer
Interrater/ intrarater reliability The solution – training raters: understanding criteria for assessment; agreement with other raters; consistency of performance. One of the threats to reliability which is usually defined as score consistency is intrarater reliability or internal consistency – raters agree within themselves, over a period of a few days, about the ratings that they give. Iterrater reliability – different raters rate performances similarly. (Luoma. 2004:179). These threats can eliminated through rater training and good definition of criteria.

Washback effect

Washback Effect: The effect of testing on teaching and learning
Positive / negative washback: positive – test stimulates classroom teaching of important skills; negative – narrow focus on teaching just for the test. Before summing up it is advisable to elicit answers to the question ‘What in your opinion are positive and negative effects of speaking tests?‘ from students. Positive: As a result of high-stakes test, teachers start teaching what is on the test, thus developing the skills that are being tested. Negative: Students can feel stressful before and disappointed after a test if the procedure of administering a test and the scores seem unreliable. This can cause distrust among students, their parents and decision makers.

ASSESSING SPEAKING – PURPOSES AND TECHNIQUES

Similar presentations

Presentation on theme: "ASSESSING SPEAKING – PURPOSES AND TECHNIQUES"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ASSESSING SPEAKING – PURPOSES AND TECHNIQUES

Similar presentations

Presentation on theme: "ASSESSING SPEAKING – PURPOSES AND TECHNIQUES"— Presentation transcript:

Similar presentations

About project

Feedback