Presentation on theme: "Designing LSP tests LSP TESTING: Good Practice Procedure Pointers for good practice."— Presentation transcript:
Designing LSP tests LSP TESTING: Good Practice Procedure Pointers for good practice
Executive summary Designing LSP tests Pointers for good practice Ideally a test is written by a small team of writers and reviewers. Subject specialists should be consulted for specific knowledge. Before a test is ready for administration, it passes through four phases of development during which feasibility, validity, authenticity and reliability are determined. When composing an LSP test, it is important to make sure that the skill-based tasks are representative for professional practice. Test tasks are ideally composed of a rubric an subject-specific input materials, contributing to the authenticity. Quality control of a test is an essential step in the writing process. A number of qualitative and quantitative procedures are available.
Designing LSP tests Pointers for good practice Contents Test writing Test timing Test content Test tasks Test analysis Further reading
Designing LSP tests Pointers for good practice Test writers
Test writing Designing LSP tests Pointers for good practice The people involved in test development: Test writers: Ideally, two or more people are responsible for designing, writing and revising the test Test reviewers: One or two reviewers give feedback to the writers Subject specialists: work in the LSP subject field give input concerning test content and goal Representative end users: resemble the actual test taking population as closely as possible. make up for the sample population in test piloting
Designing LSP tests Pointers for good practice Test timing
Designing LSP tests Pointers for good practice Phase 1: Planning Test writers decide on test goals, test format and test content Phase 2: Design Test writers collect material and compose a first draft Test reviewers evaluate and rewrite the draft Phase 3: Development Final draft is piloted to a group of representative testees Final draft is adjusted based on qualitative and quantitative conclusions from pilot Phase 4: Live test Test is ready for use
Test timing Designing LSP tests Pointers for good practice Phase 1: Planning Test writers decide on test goals, test format and test content Goal: Ideally the test goals should be linked to the professional reality. To ensure this, subject specialists might be consulted. Getting an idea of their routine language tasks may help in drawing up the test goals. Research has shown that non representative tasks cause irritation or uncertainty with test takers. Format:Although computer-based tests are as reliable as their paper-based ones, their limitations and possibilities are different from their paper-based counterparts. Task type: Depending on the language level, the test goals, the format and the time available, various task types are at hand.
Test timing Designing LSP tests Pointers for good practice Phase 2: Design Test writers collect material and compose a first draft Test reviewers evaluate and rewrite the draft Collecting materialIn collaboration with subject specialists, the writers collect authentic and representative material. First draft Ideally, the first draft contains a large number of tasks and task types, so the reviewers can select which tasks are taken down to the next phase. Evaluation The reviewers compare the draft to the test goals they had in mind. They check the test for validity (i.e. do we test what we want to test) and authenticity (i.e. does this test represent realistic interactions and situations) and suggest revisions.
Test timing Designing LSP tests Pointers for good practice Phase 3: Development The final test draft is now piloted: a group of representative end users take the test in conditions similar or identical to the live test setting. Ideally 30 to 50 respondents are used. This pilot will offer information concerning the… - authenticity: to what extent does the test include situations / interactions that are meaningful or representative for the test taker? - validity: to what extent does the test test what it means to test? - reliability: to what extent do test scores reflect actual language ability? - feasibility: this concerns practicalities such as timing and rating … of the test Note: If representative end users cannot be reached, a group of colleagues can also be used. In this case, the tests reliability cannot be determined, but the feedback will be useful nonetheless.
Test timing Designing LSP tests Pointers for good practice Phase 4: Live tests When the test has been adjusted based on the conclusions from the third phase, it can be taken down to the final phase; live testing. The test is ready for use. If any remarks still arise during the administering, they are reported to the test writers, who keep the remarks in mind for later versions.
Designing LSP tests Pointers for good practice Test content
Designing LSP tests Pointers for good practice The specificity of content in LSP tests is the cause of many debates among researchers. One side of the spectrum states that both content and tasks cannot be too specific, whereas the other extreme advocates that LSP testing does not make sense, since all language has got a specific purpose. Example: A test of English for biomedical science should not use the same material as a test of English for the humanities, even if the required language proficiency is identical. Both sides of the debate do agree on the importance of face validity (i.e. how do test takers perceive a test; how representative do they feel it is). Interviews with representative end users also the importance of using familiar material dealing with familiar topics.
Test content Designing LSP tests Pointers for good practice For skill-based exercises the importance of authenticity cannot be overstressed. Make sure that both the content and the context of the task relate to the specific purpose professional reality. Example: Whereas writing a reflective essay might be a representative task within the humanities, it is alien to the biomedical sciences. Test takers will respond negatively to tasks they perceive as non- representative. Task content: Ask the students to write, speak, read on something within their professional field of expertise. Task context: Clarify the context in which a communicative act is taking place as accurately as possible. If you ask students to present at a conference, state which one and give ample information regarding the setting and audience.
Test content Designing LSP tests Pointers for good practice For knowledge-related exercises, such as grammar or vocabulary exercises the context does not appear to matter as much as for skill-based tasks. Face validity is an important element here as well though; the texts, examples and stimuli should be related to the test takers field of expertise. Quote: LSP testing cannot be about testing for subject specific knowledge. It must be about testing for the ability / abilities to manipulate language functions appropriately in a wide variety of ways. […] No doubt for face validity reasons, the stimuli in such tests will be field related, however. (Davies, 2001)
Test content Designing LSP tests Pointers for good practice The role of subject specialists Using the expertise of subject specialists is a much contested theme in LSP testing research. In any case, when designing a test for specific purposes within a field outside of your expertise, it is always useful to get in touch with people who are in tune with the specific purpose. They will be able to tell you which tasks and texts are representative and which arent.
Designing LSP tests Pointers for good practice Test tasks
Task types Designing LSP tests Pointers for good practice There is a myriad of possible task types, that can be used in LSP testing. Since this overview is restricted to online testing with Curios however, the next pages will only cover those task types that are available through Curios. To ensure task completeness Bachman & Palmer (1996) and Douglas (2001) suggest including the following elements in each task. Rubric Characteristics that specify how test takers are expected to proceed in taking the test. (Bachman, 1990) ObjectiveThe goal of the test task, i.e. to write a text summary Procedure for respondingInformation on how the test taker is expected to respond (ie. checking boxes, writing full senteces...) StructureStructure refers to information on the number of tasks in the communicative event, their importance, and the degree of distinction among them. Time allotmentThe testee is told how much time can be spent on the task. Evaluation criteriaThe explicit information concerning the criteria that will be used to judge the performance. (Douglas, 2000)
Task types Designing LSP tests Pointers for good practice Input Input is the specific purpose material in the TLU situation that language users process and respond to. PromptContextual information that clarifies the setting in which a communicative event takes place. Input dataThe data to be processed during a (communicative) task. Here, the degree of authenticity of the material matters a great deal. - Situational authenticity: to what extent does the situation in the test represent reality? - Interactional authenticity: to what extent does the communicative act from the test correspond to reality?
Bachman & Palmer (1996) suggest a framework of analysis which considers the following: Task types Expected response This refers to what the test developer intends the test takers to do in response to the rubric and input. If the actual response and the intended response do not generally match, the test task is most probably unclear. Assessment criteria The criteria and procedures by which to judge a language performance. And which scores to deduce from that. Also according to Bachman & Palmer (1996, the raters manual should include at least the following: Designing LSP tests Pointers for good practice Example 3: assessment criteria (taken from TOEFL iBT)
Task types and Curios Designing LSP tests Pointers for good practice Curios is the Ghent university online testing environment. It can be accessed through Minerva and Zephyr and allows for the following multiple choice task types Single response:A multiple choice task where only one option is possible. Multiple response: A multiple choice task where more than one option is possible. True/False: The student is given a statement and should indicate whether it is correct or not. Matching:A multiple choice task in which the test taker combines two or more items. Hotspot:the test taker digitally pinpoints or highlights areas on a picture or in a text.
Task types and Curios Designing LSP tests Pointers for good practice Curios is the Ghent university online testing environment. It can be accessed through Minerva and Zephyr and allows for the following open answer task types Text/numeric:As an answer to a question, students fill in words, short sentences or numbers. Cloze:In a running text one or more words or numbers have been deleted. It is up to the test taker to fill in the gaps. C-Cloze:A cloze test which includes the first letter of each deleted word. Extended: Students can be asked to reply to an open question or to produce longer answers.
Getting started with Designing LSP tests Pointers for good practice Step 1: Accessing CuriosStep 2: Nieuwe vragenreeks Step 3b: Nieuwe vraag (cloze)Step 3a: Nieuwe vraag (MC) Access Curios via Zephyr of MinervaEach new test starts with this. Creating a cloze question.Creating a multiple choice question. Click image for video
Designing LSP tests Pointers for good practice Step 4: Double checking the scoring Step 5: Publishing the test Step 7: Checking the resultsStep 6: Taking the test Getting started with Always double check the questions using geavanceerde editeermedthode Students can only access tests through Minerva or Zephyr. Have a look at the scores.Try a test before making it public. Click image for video
Designing LSP tests Pointers for good practice Test analysis
Designing LSP tests Pointers for good practice Determining the quality of a language test should take place in the third phase of development, but it should also be a persistent concern of test developers. When the test has been piloted, qualitative and quantitative analyses can help to improve the reliability and validity of the live test. In the case of LSP tests, three concepts are of vital importance: 1.RELIABILITY: reliable scores reflect ones ability 2.VALIDITY: valid questions test what is intended to be tested 3.AUTHENTICITY: authentic tests reflect real-life interactions and situations
Test analysis Pointers for good practice Reliability An efficient way to check a tests reliability, is performing an Item Reliability Analysis. This statistical application indicates the discriminating potential of a test item. In other words: it checks to what degree able students get a hard item right and lesser able students dont. The graph on the right shows the statistical data resulting from a reliability analysis. The column showing the Corrected Item- Total Correlation indicates the reliability of each item. Designing LSP tests Items scoring within the -.3.3 spectrum are considered unreliable and should be removed or rewritten.
Test analysis Designing LSP tests Pointers for good practice Reliability: How to perform an Item Reliability Analysis Enter all the results of all the test takers on all test items in SPSS (available on Athena). The easiest way to do this, is by assigning a score of 1 to a correctly answered question and 0 to an incorrect answer. Click here for information on quantitative test analysis.here Next, click Analyze – Scale – Reliability analysis and indicate the items you want an analysis for. Do not forget to check the box which states scale if item deleted. Please click here for a clip on performing an Item Reliability Analysis.here
Test analysis Designing LSP tests Pointers for good practice Validity The extent to which scores on a test enable inferences to be made which are appropriate, meaningful and useful, given the purpose of the test (i.e.: does the test measure what it intends to measure?). There are various subclassifications of validity. The most significant ones in this LSP testing project are: A test has construct validity if scores reflect a theory about a construct. It could be predicted, for example, that two valid tests of listening comprehension would rank learners in the same way, but each would have a weaker relationship with scores on a test of grammatical competence. A test is said to have content validity if the items or tasks of which it is made up constitute a representative sample of items or tasks for the area of knowledge or ability to be tested. These are often related to a syllabus or course. Face validity refers to the extent to which a test appears to candidates, or those choosing it on behalf of candidates, to be an acceptable measure of the ability they wish to measure. This is a subjective judgement rather than one based on any objective analysis of the test.
Test analysis Designing LSP tests Pointers for good practice Determining construct validity implies a thorough knowledge of the construct to be tested. A construct is a theoretical concept related to linguistic knowledge: i.e. listening comprehension, metacognition or pragmatic competence. i.e. If you mean to test listening skills and ask students to write an essay about an audiosample, are you then testing receptive or productive skills? i.e. If you ask students to type an essay within thirty minutes, are you then testing writing skills or typing speed.
Test analysis Designing LSP tests Pointers for good practice The most effective way of determining an LSP tests content validity is by having interviews with subject specialists. Various interview types are possible to determine whether the test tasks correspond to reality. UnstructuredThe success of this kind of interview depends on the interaction between the researcher and the respondent. There is no fixed interview schedule, but rather a number of themes that are to be addressed. Semi structured The researcher follows a preset schedule. It is possible however to deviate from this when interesting issues arise. StructuredThe interviewer goes through a fixed series of written questions without deviation. This type of interview closely resembles a questionnaire. One on oneThis kind of interview allows the researcher to zoom in on the views of individual respondents. GroupThe advantage interviewing larger numbers at once is that group interactions might spark observations that would have gone unnoticed. Note that the interviewer should get the chance to practice his/her interview skills beforehand. Ideally, the pilot settings will resemble the actual conditions as accurately as possible.
Test analysis Designing LSP tests Pointers for good practice Since face validity is a subjective measure imposed by the test takers, only test takers can be the judge of it. During or after the pilot test, ask the representative end users to give a verbal report. There are various ways of going about this. Talk aloudinformants voice their thoughts while taking the test Think aloudInformants say what they are thinking and provide other non-verbal information, such as physical movements. ConcurrentThe verbal is report is given in real time RetrospectiveThe verbal is report is given afterwards MediatedThe researcher occasionally intervenes Non-mediatedThe researcher does not intervene Click here for information on qualitative test analysis.here
Test analysis Designing LSP tests Pointers for good practice Authenticity If you have interviewed subject specialists and representative end users have given a verbal report, you will also have a good understanding of a tests situational authenticity (the extent to which a test / task represents real situations) and interactional authenticity (the extent to which a test / task represents realistic conversational interactions). Note that using material destined for L1-users does not always appear interactionally authentic to test takers, since it depicts interactions that are meaningless to them. i.e. the abovementioned example of the OET of English for veterinary sciences is not representative for the professional practice of researchers within the field of veterinary sciences. A doctor-patient dialogue is very relevant for students who would like to start working in a practice later.
Designing LSP tests Pointers for good practice Further reading
Designing LSP tests Pointers for good practice For more info on LSP testing, please consult ABRASKEVICIUTE, Ausra et al. (2003). Handbook of LSP Examinations. Tut Press. BACHMAN, Lyle F. (2000). Modern Language Testing at the Turn of the Century: assuring that what we count counts. Language Testing. 17(1) BROADFOOT, Patricia and Paul Black. (2004). Redefining Assessment? The first the years of Assessment in Education. Assessment in Education. 11(1) Clapham, C. (2000). "Assessment for academic purposes: where next?" System 28(4) DAVIES, A. (2001). The logic of testing Languages for Specific Purposes. Language Testing. 18(2) DOUGLAS, Dan. (2001). Language for Specific Purposes assessment criteria: where do they come from?. Language Testing. 18(2) DOUGLAS, Dan. (2000). Assessing Languages for Specific Purposes. Cambridge University Press Dovey, T. (2006). "What purposes, specifically? Re-thinking purposes and specificity in the context of the new vocationalism." English for Specific Purposes 25(4) ROEVER, Carsten. (2001) Web-Based Language Testing Language Learning & Technology. 5(2) Hyland, K. (2002). "Specificity revisited: how far should we go now?" English for Specific Purposes 21(4)