Presentation on theme: "Language Testing Forum 2013, Nottingham"— Presentation transcript:
1Language Testing Forum 2013, Nottingham Listening tests: past, present and future John Field, CRELLA, University of BedfordshireLanguage Testing Forum 2013, Nottingham
2A problematic skillDifficult to test because it is an extremely individual operation in terms of both listener and input.Internalised. Takes place in the mind of the test taker.Highly variable signal. Variable at the levels of phoneme –word – speaker.University of Bedfordshire
3The value of a cognitive approach It sheds light on what goes on in the mind of the test taker.We need to know whether high-stakes test actually test what they claim to test. Can a listening test, for example, accurately predict the ability of a test taker to study at an English medium university?At local level, we need to use tests to diagnose learner problems so that the tests can feed into learning. This is especially true of listening.
4Cognitive validation asks… Does a test elicit from test takers the kind of process that they would use in a real-world context? In the case of listening, are we testing the kinds of process that listeners would actually use ?Or do the recordings and formats that we use lead test takers to behave differently from the way they would in real life?
5Phases of listening (Field 2008, 2013) Input decodingSpeech signal Words MeaningLexical searchParsingMeaning constructionDiscourse construction
6Issues of cognitive validity A. To what extent do the processes elicited by a test resemble real-world processes?B. To what extent are the processes elicited by a test comprehensive enough to represent the range of processes that make up a skill?C. Are the processes finely enough calibrated to reflect what a listener is capable of at the target level?University of Bedfordshire
7The ghost of listening past: 1913-1974 University of Bedfordshire
8Word identification Tick the word you hear the examiner say: [ ] hide [ ] heard [ ] hard [ ] hoardTest taker hears: I heard her telling himTest taker chooses: A heard B hurt C hot D hotelTest taker hears: It’s hot all day long[Lower Certificate in English, 1972, quoted Weir 2013][University of Bedfordshire
9A cognitive perspective Only taps into lowest two levels of processing (phoneme recognition – lexical FORM)Role of the phoneme as a perceptual unit has been much questioned. Processing is now viewed as taking place at multiple levels (including top-down word level matches that overrule phoneme level information: the veshtable effect)And yet: We still use items based on minimal pair phoneme perception in lower-level and YL tests:The porter said that the train leaves atA B C D 5.50University of Bedfordshire
10DictationFear seized him / in the woods. / At one moment / it seemed to him / that enemy soldiers / were watching him / from behind the trees, / crawling out of the bushes./ He ran blindly, paying no attention / to the path / until he was out of breath.Lower Certificate of English, June 1945 (quoted Weir, 2013)The passage will be read three times. During the first reading the candidates will write nothing down. It will be read a second time by groups of words, as divided by bars on the printed copy.… After each group, a pause will be made to allow the candidates to write it down. All essential punctuation will be given by the examinerUniversity of Bedfordshire
11From a cognitive perspective A classic divided attention task (writing vs speaking). Conversion from one modality (speech) to another (writing)Little resemblance to any real-life listening taskNatural processing (Jarvella, 1971) entails assembling words in order to parse them, then erasing them once they have been converted into a piece of information. Dictation requires test takers to hang on to words beyond the end of the phrase / clause.Encourages test takers to focus attention at word level. This reinforces a tendency among listeners at B1 and below to focus on discrete words rather than chunks.University of Bedfordshire
12And yet….Dictation takes the spoken word as its point of departure unlike today’s formats that rely heavily on written itemsPresent-day tasks such as gap filling also entail divided attention effects.Dictation taps into lexical segmentation where the listener has to detect word boundaries in connected speechConclusion. There might be value in including in lower level tests the transcription of clips of authentic speech. Such tests would showa) the ability to segment words in connected speechb) whether test takers can process words in chunks rather than just singly (a mark of progress towards B2 level)University of Bedfordshire
13The ghost of listening present: ‘comprehension’ in listening and reading University of Bedfordshire
14Listening test components RecordingRecording as textFormatItemsUniversity of Bedfordshire
15RecordingsDoes the input impose similar listening demands to those of a real-world speaker?
16Natural speech ( Recording Level B2) To what extent do these recordings resemble authentic everyday speech?
17Some conclusions on studio recordings Actors adapt their delivery to fit punctuation.They pause regularly at the ends of clausesThere are few hesitation pauses.No overlap between speakers
18Solution: transcribe the speech as speech M1: the long lunch hour has been replaced by the quick snack + according to a new survey ++ most people take just 30 minutes to eat in the middle of the day + many of us don’t even leave our desksF: er I’m taking an hour today + but it’s normally sort of half an hour or 20 minutes.M2: I pop out for about ten minutes + get something to eat + and then go back to my deskM1: a survey at the start of the year + found that only one per cent of people in Britain + regularly take a full 60 minute break + this is very different from forty years ago + when offices everywhere stopped work at one o’clock + people went out to lunch + and didn’t return until two.[loosely based on BBC Radio 4 broadcast]
19Solution: Specify speaker variables for item writers AccentSpeech rate: speed and consistencyPausingLevel and placing of focal stressNumber of speakersPitch of voice; familiarity of voicePrecision of articulation
20Recording-as-textIs the recording content at an appropriate level for the expertise of the listener?FormatDoes the task elicit processes which resemble those that a listener would use in a real-world listening event?
21RecordingYou hear a man and a woman talking about going to the gym. What does the man say about going to the gym?A. It is too expensive for himB. It takes too much of his time.C. It is too physically demanding(FCE Handbook, 2008: 68)
22Recording as textWoman: So that didn’t last long, did it? Two weeks going to the gym and you’re already talking about giving it up…Man: Look, if you’re saying I’m not up to it, you’re wrong. I realise it’s very effective in working every muscle, and when I get started, it’s just like other sports. I don’t even mind feeling exhausted at the end. But, listen, you sort out your kit at home, lug it to the gym, queue to pay your entrance fee, then change and queue for the machines … when you could have been for a run straight from your home and then been free to get on with your life.Woman: Well, I think you’re wrong and you should make the effort to carry on.
23Recording as text 2Woman: So that didn’t last long, did it? Two weeks going to the gym and you’re already talking about giving it up…Man: Look, if you’re saying I’m not up to it, you’re wrong. I realise it’s very effective in working every muscle, and when I get started, it’s just like other sports. I don’t even mind feeling exhausted at the end. But, listen, you sort out your kit at home, lug it to the gym, queue to pay your entrance fee, then change and queue for the machines … when you could have been for a run straight from your home and then been free to get on with your life.Woman: Well, I think you’re wrong and you should make the effort to carry on.
24Recording as text 2Woman: So that didn’t last long, did it? Two weeks going to the gym and you’re already talking about giving it up…Man: Look, if you’re saying I’m not up to it, you’re wrong. I realise it’s very effective in working every muscle, and when I get started, it’s just like other sports. I don’t even mind feeling exhausted at the end. But, listen, you sort out your kit at home, lug it to the gym, queue to pay your entrance fee, then change and queue for the machines … when you could have been for a run straight from your home and then been free to get on with your life.Woman: Well, I think you’re wrong and you should make the effort to carry on.
25Recording as textTest setters tend to base their tests on a written script which has not yet been recorded.The linguistic criteria they employ rely heavily on lexical frequency and syntactic simplicity.BUT in processing terms difficulty is often caused by:a. the density of ideas and the complexity of the links between themb. perceptual saliency of phrases and clausesUniversity of Bedfordshire
26Recording difficulty: cognitive criteria How frequent is the vocabulary?How complex is the grammar?How familiar is the topic?How long is the recording?How dense are the idea units in the recording?How complex are the connections between idea units?How clearly structured is the overall line of argument?How concrete or abstract are the points made?
27Using conventional tasks Provide items after a first playing of the recording and before a second. This ensures more natural listening, without preconceptions or advance information other than general context.Keep items short. Loading difficulty on to items (especially MCQ ones) just biases the test in favour of reading rather than listening.Favour tasks (e.g. multiple matching) that allow items to ignore the order of the recording and to focus on global meaning rather than local detail.
28Do the items target a sufficiently wide range of levels of processing? University of BedfordshireSeptember 2006
29Five phases of listening (Field 2008) DecodingSpeech signal Words MeaningWord searchParsingMeaning constructionDiscourse construction
30Targets An item in a test can target any of these levels: Decoding: She caught the (a) (b) (c) (d) 5.50 train.Lexical search:She went to London by …….Factual information:Where did she go and how?Meaning construction:Was she keen on going by train?Discourse construction.What two reasons did she give for going by train?
31Targeting levels of listening Test takers at proficiency level B1 and below focus heavily on word recognition and have problems in processing language in chunks.. In these tests, it may be desirable to focus items mainly on the first three areas Higher- level tests should particularly target meaning representation and discourse representation.
32Information handling But they don’t. Reason 1: Item writers tend to focus on discrete points of information. They do not target the connections between them. In real life, the listener has to build an information structure.Reason 2: It is the item writer who decides what is/is not important in a recording. In the real world, the listener has to identify major and minor points and ignore irrelevant pointsUniversity of Bedfordshire
33Structure building (Gernsbacher, 1990) Skilled listeners construct a hierarchical representation of a recording
34Structure buildingUnskilled listeners focus their attention at local level.They build a linear structure.
35A structure building task Three types of pollution1..…………………..a. Example:………….b. Solution:……………2. …………………….a. Cause: ………………..b. Result: Climate change3. …………………….a. Result:……………………..b. Solution: ……………………..
36The inflexibility of high stakes tests Large scale high-stakes tests have major constraints which prevent them from testing listening in a way that fully represents the skill.Reliability and ease of markingHighly controlled test methods, using traditional formats that the candidate knowsLittle attention possible to individual variation or alternative answers
37Advantages of more local tests and tasks Smaller-scale tests afford the possibility of testing a wider range of listening processes with:More open ended questionsMore scope for testing information handlingMarking on an individual basisPossible acceptance of alternative answers
38Computer delivered tests offer the possibility of: Controlling timing. Computer deliveryComputer delivered tests offer the possibility of:Controlling timing.Providing a first play of the recording before items become visibleMonitoring responses to direct the test taker towards a particular level of difficultyExploiting oral questions (including oral MCQ with short options)University of Bedfordshire
39An important issue for the future… Testers need to find a means of validating listening tests by means of evidence external to the test.This would entail establishing the listening proficiency of a listener by subjective assessment of performance.Methods might include ‘Listen and speak’ activities or the separate assessment of listening performance within a speaking test..University of BedfordshireSeptember 2006
40ReferencesField, J. (2008) Listening in the Language Classroom. Cambridge: CUPField, J. (2013) Cognitive validity. In Geranpayeh, A. & Taylor, L. (eds.) Examining Listening. Cambridge: Cambridge University PressWeir, C.J. (2013) Measured Constructs. Cambridge: Cambridge University Press
41Thanks for listening email@example.com University of Bedfordshire