Presentation on theme: "LSP TESTING: Good Practice Procedure"— Presentation transcript:
1LSP TESTING: Good Practice Procedure Designing LSP testsPointers for good practiceLSP TESTING: Good Practice Procedure
2Executive summary Designing LSP tests Pointers for good practice Ideally a test is written by a small team of writers and reviewers. Subject specialists should be consulted for specific knowledge.Before a test is ready for administration, it passes through four phases of development during which feasibility, validity, authenticity and reliability are determined.When composing an LSP test, it is important to make sure that the skill-based tasks are representative for professional practice.Test tasks are ideally composed of a rubric an subject-specific input materials, contributing to the authenticity.Quality control of a test is an essential step in the writing process. A number of qualitative and quantitative procedures are available.
3Test writing Test timing Test content Test tasks Test analysis Designing LSP testsPointers for good practiceContentsTest writingTest timingTest contentTest tasksTest analysisFurther reading
4Test writersDesigning LSP testsPointers for good practice
5Test writing Designing LSP tests Pointers for good practice The people involved in test development:Test writers: Ideally, two or more people are responsible for designing, writing and revising the testTest reviewers: One or two reviewers give feedback to the writersSubject specialists: work in the LSP subject field give input concerning test content and goalRepresentative end users: resemble the actual test taking population as closely as possible make up for the sample population in test piloting
6Test timingDesigning LSP testsPointers for good practice
7Test timing Designing LSP tests Pointers for good practice Phase 1: PlanningTest writers decide on test goals, test format and test contentPhase 2: DesignTest writers collect material and compose a first draftTest reviewers evaluate and rewrite the draftPhase 3: DevelopmentFinal draft is piloted to a group of representative testeesFinal draft is adjusted based on qualitative and quantitative conclusions from pilotPhase 4: Live testTest is ready for use
8Test timing Phase 1: Planning Designing LSP tests Pointers for good practiceTest timingPhase 1: PlanningTest writers decide on test goals, test format and test contentGoal: Ideally the test goals should be linked to the professional reality. To ensure this, subject specialists might be consulted. Getting an idea of their routine language tasks may help in drawing up the test goals. Research has shown that non representative tasks cause irritation or uncertainty with test takers.Format: Although computer-based tests are as reliable as their paper-based ones, their limitations and possibilities are different from their paper-based counterparts.Task type: Depending on the language level, the test goals, the format and the time available, various task types are at hand.
9Test timing Phase 2: Design Designing LSP tests Pointers for good practiceTest timingPhase 2: DesignTest writers collect material and compose a first draft Test reviewers evaluate and rewrite the draftCollecting material In collaboration with subject specialists, the writers collect authentic and representative material.First draft Ideally, the first draft contains a large number of tasks and task types, so the reviewers can select which tasks are taken down to the next phase.Evaluation The reviewers compare the draft to the test goals they had in mind. They check the test for validity (i.e. do we test what we want to test) and authenticity (i.e. does this test represent realistic interactions and situations) and suggest revisions.
10Test timing Phase 3: Development Designing LSP tests Pointers for good practiceTest timingPhase 3: DevelopmentThe final test draft is now piloted: a group of representative end users take the test in conditions similar or identical to the live test setting. Ideally 30 to 50 respondents are used. This pilot will offer information concerning the…- authenticity: to what extent does the test include situations / interactions that are meaningful or representative for the test taker?- validity: to what extent does the test test what it means to test?- reliability: to what extent do test scores reflect actual language ability?- feasibility: this concerns practicalities such as timing and rating… of the testNote: If representative end users cannot be reached, a group of colleagues can also be used. In this case, the test’s reliability cannot be determined, but the feedback will be useful nonetheless.
11Test timing Phase 4: Live tests Designing LSP tests Pointers for good practiceTest timingPhase 4: Live testsWhen the test has been adjusted based on the conclusions from the third phase, it can be taken down to the final phase; live testing.The test is ready for use. If any remarks still arise during the administering, they are reported to the test writers, who keep the remarks in mind for later versions.
12Test contentDesigning LSP testsPointers for good practice
13Test content Designing LSP tests Pointers for good practice The specificity of content in LSP tests is the cause of many debates among researchers.One side of the spectrum states that both content and tasks cannot be too specific, whereas the other extreme advocates that LSP testing does not make sense, since all language has got a specific purpose.Example:A test of English for biomedical science should not use the same material as a test of English for the humanities, even if the required language proficiency is identical.Both sides of the debate do agree on the importance of face validity (i.e. how do test takers perceive a test; how representative do they feel it is).Interviews with representative end users also the importance of using familiar material dealing with familiar topics.
14Test content Designing LSP tests Pointers for good practice Example:Whereas writing a reflective essay might be a representative task within the humanities, it is alien to the biomedical sciences.Test takers will respond negatively to tasks they perceive as non-representative.For skill-based exercises the importance of authenticity cannot be overstressed.Make sure that both the content and the context of the task relate to the specific purpose professional reality.Task content: Ask the students to write, speak, read on something within their professional field of expertise.Task context: Clarify the context in which a communicative act is taking place as accurately as possible. If you ask students to present at a conference, state which one and give ample information regarding the setting and audience.
15Test content Designing LSP tests Pointers for good practice For knowledge-related exercises, such as grammar or vocabulary exercises the context does not appear to matter as much as for skill-based tasks.Face validity is an important element here as well though; the texts, examples and stimuli should be related to the test takers’ field of expertise.Quote:“LSP testing cannot be about testing for subject specific knowledge. It must be about testing for the ability / abilities to manipulate language functions appropriately in a wide variety of ways. […] No doubt for face validity reasons, the stimuli in such tests will be field related, however.”(Davies, 2001)
16Test content Designing LSP tests Pointers for good practice The role of subject specialistsUsing the expertise of subject specialists is a much contested theme in LSP testing research.In any case, when designing a test for specific purposes within a field outside of your expertise, it is always useful to get in touch with people who are in tune with the specific purpose. They will be able to tell you which tasks and texts are representative and which aren’t.
17Test tasksDesigning LSP testsPointers for good practice
18Task types Designing LSP tests Pointers for good practice There is a myriad of possible task types, that can be used in LSP testing. Since this overview is restricted to online testing with Curios however, the next pages will only cover those task types that are available through Curios.To ensure task completeness Bachman & Palmer (1996) and Douglas (2001) suggest including the following elements in each task.Rubric“Characteristics that specify how test takers are expected to proceed in taking the test.” (Bachman, 1990)ObjectiveThe goal of the test task, i.e. to write a text summaryProcedure for respondingInformation on how the test taker is expected to respond (ie. checking boxes, writing full senteces...)StructureStructure refers to information on the number of tasks in the communicative event, their importance, and the degree of distinction among them.Time allotmentThe testee is told how much time can be spent on the task.Evaluation criteriaThe explicit information concerning the criteria that will be used to judge the performance. (Douglas, 2000)
19Task types Designing LSP tests Pointers for good practice Input Input is the specific purpose material in the TLU situation that language users process and respond to.PromptContextual information that clarifies the setting in which a communicative event takes place.Input dataThe data to be processed during a (communicative) task. Here, the degree of authenticity of the material matters a great deal.- Situational authenticity: to what extent does the situation in the test represent reality?- Interactional authenticity: to what extent does the communicative act from the test correspond to reality?
20Task types Designing LSP tests Pointers for good practice Bachman & Palmer (1996) suggest a framework of analysis which considers the following:Designing LSP testsPointers for good practiceTask typesAlso according to Bachman & Palmer (1996, the rater’s manual should include at least the following:Expected responseThis refers to what the test developer intends the test takers to do in response to the rubric and input.If the actual response and the intended response do not generally match, the test task is most probably unclear.Assessment criteriaThe criteria and procedures by which to judge a language performance. And which scores to deduce from that.Example 3: assessment criteria (taken from TOEFL iBT)
21Task types and Curios Designing LSP tests Pointers for good practice Curios is the Ghent university online testing environment. It can be accessed through Minerva and Zephyr and allows for the following multiple choice task typesSingle response: A multiple choice task where only one option is possible.Multiple response: A multiple choice task where more than one option is possible.True/False: The student is given a statement and should indicate whether it is correct or not.Matching: A multiple choice task in which the test taker combines two or more items.Hotspot: the test taker digitally pinpoints or highlights areas on a picture or in a text.
22Task types and Curios Designing LSP tests Pointers for good practice Curios is the Ghent university online testing environment. It can be accessed through Minerva and Zephyr and allows for the following open answer task typesText/numeric: As an answer to a question, students fill in words, short sentences or numbers.Cloze: In a running text one or more words or numbers have been deleted. It is up to the test taker to fill in the gaps.C-Cloze: A cloze test which includes the first letter of each deleted word.Extended: Students can be asked to reply to an open question or to produce longer answers.
23Getting started with Designing LSP tests Pointers for good practice Step 1: Accessing CuriosStep 2: “Nieuwe vragenreeks”Access Curios via Zephyr of MinervaEach new test starts with this.Click image for videoClick image for videoGetting started withStep 3a: “Nieuwe vraag” (MC)Step 3b: “Nieuwe vraag” (cloze)Creating a multiple choice question.Creating a cloze question.Click image for videoClick image for video
24Getting started with Designing LSP tests Pointers for good practice Step 4: Double checking the scoringStep 5: Publishing the testAlways double check the questions using “geavanceerde editeermedthode”Students can only access tests through Minerva or Zephyr.Click image for videoClick image for videoGetting started withStep 6: Taking the testStep 7: Checking the resultsTry a test before making it public.Have a look at the scores.Click image for videoClick image for video
25Test analysisDesigning LSP testsPointers for good practice
26Test analysis Designing LSP tests Pointers for good practice Determining the quality of a language test should take place in the third phase of development, but it should also be a persistent concern of test developers.When the test has been piloted, qualitative and quantitative analyses can help to improve the reliability and validity of the live test.In the case of LSP tests, three concepts are of vital importance:RELIABILITY: reliable scores reflect one’s abilityVALIDITY: valid questions test what is intended to be testedAUTHENTICITY: authentic tests reflect real-life interactions and situations
27Test analysis Reliability Designing LSP tests Pointers for good practiceTest analysisReliabilityAn efficient way to check a test’s reliability, is performing an Item Reliability Analysis. This statistical application indicates the discriminating potential of a test item. In other words: it checks to what degree able students get a hard item right and lesser able student’s don’t.The graph on the right shows the statistical data resulting from a reliability analysis. The column showing the Corrected Item-Total Correlation indicates the reliability of each item.Items scoring within the -.3 ↔ .3 spectrum are considered unreliable and should be removed or rewritten.
28Test analysis Designing LSP tests Pointers for good practice Reliability: How to perform an Item Reliability AnalysisEnter all the results of all the test takers on all test items in SPSS (available on Athena). The easiest way to do this, is by assigning a score of 1 to a correctly answered question and 0 to an incorrect answer.Click here for information on quantitative test analysis.Next, click Analyze – Scale – Reliability analysis and indicate the items you want an analysis for.Do not forget to check the box which states “scale if item deleted”.Please click here for a clip on performing an Item Reliability Analysis.
29Test analysis Validity Designing LSP tests Pointers for good practice The extent to which scores on a test enable inferences to be made which are appropriate, meaningful and useful, given the purpose of the test (i.e.: does the test measure what it intends to measure?).There are various subclassifications of validity. The most significant ones in this LSP testing project are:A test has construct validity if scores reflect a theory about a construct. It could be predicted, for example, that two valid tests of listening comprehension would rank learners in the same way, but each would have a weaker relationship with scores on a test of grammatical competence.A test is said to have content validity if the items or tasks of which it is made up constitute a representative sample of items or tasks for the area of knowledge or ability to be tested. These are often related to a syllabus or course.Face validity refers to the extent to which a test appears to candidates, or those choosing it on behalf of candidates, to be an acceptable measure of the ability they wish to measure. This is a subjective judgement rather than one based on any objective analysis of the test.
30Test analysis Designing LSP tests Pointers for good practice Determining construct validity implies a thorough knowledge of the construct to be tested.A construct is a theoretical concept related to linguistic knowledge: i.e. listening comprehension, metacognition or pragmatic competence.i.e. If you mean to test listening skills and ask students to write an essay about an audiosample, are you then testing receptive or productive skills?i.e. If you ask students to type an essay within thirty minutes, are you then testing writing skills or typing speed.
31Test analysis Designing LSP tests Pointers for good practice The most effective way of determining an LSP test’s content validity is by having interviews with subject specialists. Various interview types are possible to determine whether the test tasks correspond to reality.Note that the interviewer should get the chance to practice his/her interview skills beforehand. Ideally, the pilot settings will resemble the actual conditions as accurately as possible.UnstructuredThe success of this kind of interview depends on the interaction between the researcher and the respondent. There is no fixed interview schedule, but rather a number of themes that are to be addressed.Semi structuredThe researcher follows a preset schedule. It is possible however to deviate from this when interesting issues arise.StructuredThe interviewer goes through a fixed series of written questions without deviation. This type of interview closely resembles a questionnaire.One on oneThis kind of interview allows the researcher to zoom in on the views of individual respondents.GroupThe advantage interviewing larger numbers at once is that group interactions might spark observations that would have gone unnoticed.
32Test analysis Designing LSP tests Pointers for good practice Since face validity is a subjective measure imposed by the test takers, only test takers can be the judge of it. During or after the pilot test, ask the representative end users to give a verbal report. There are various ways of going about this.Talk aloudinformants voice their thoughts while taking the testThink aloudInformants say what they are thinking and provide other non-verbal information, such as physical movements.ConcurrentThe verbal is report is given in real timeRetrospectiveThe verbal is report is given afterwardsMediatedThe researcher occasionally intervenesNon-mediatedThe researcher does not interveneClick here for information on qualitative test analysis.
33Test analysis Authenticity Designing LSP tests Pointers for good practiceTest analysisAuthenticityIf you have interviewed subject specialists and representative end users have given a verbal report, you will also have a good understanding of a test’s situational authenticity (the extent to which a test / task represents real situations) and interactional authenticity (the extent to which a test / task represents realistic conversational interactions).Note that using material destined for L1-users does not always appear interactionally authentic to test takers, since it depicts interactions that are meaningless to them.i.e. the abovementioned example of the OET of English for veterinary sciences is not representative for the professional practice of researchers within the field of veterinary sciences. A doctor-patient dialogue is very relevant for students who would like to start working in a practice later.
34Further readingDesigning LSP testsPointers for good practice
35Further reading Designing LSP tests Pointers for good practice For more info on LSP testing, please consultABRASKEVICIUTE, Ausra et al. (2003). Handbook of LSP Examinations. Tut Press.BACHMAN, Lyle F. (2000). “Modern Language Testing at the Turn of the Century: assuring that what we count counts”. Language Testing. 17(1)BROADFOOT, Patricia and Paul Black. (2004). “Redefining Assessment? The first the years of Assessment in Education”. Assessment in Education. 11(1)Clapham, C. (2000). "Assessment for academic purposes: where next?" System 28(4)DAVIES, A. (2001). “The logic of testing Languages for Specific Purposes”. Language Testing. 18(2)DOUGLAS, Dan. (2001). “Language for Specific Purposes assessment criteria: where do they come from?”. Language Testing. 18(2)DOUGLAS, Dan. (2000). Assessing Languages for Specific Purposes. Cambridge University PressDovey, T. (2006). "What purposes, specifically? Re-thinking purposes and specificityin the context of the ‘new vocationalism’." English for Specific Purposes 25(4)ROEVER, Carsten. (2001) “Web-Based Language Testing” Language Learning & Technology. 5(2)Hyland, K. (2002). "Specificity revisited: how far should we go now?" English for Specific Purposes 21(4)Further reading