Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kornwipa Poonpon Khon Kaen University, Thailand

Similar presentations


Presentation on theme: "Kornwipa Poonpon Khon Kaen University, Thailand"— Presentation transcript:

1 Kornwipa Poonpon Khon Kaen University, Thailand korpul@kku.ac.th
Learner Corpora and Language Testing and Assessment: Application and Challenges Kornwipa Poonpon Khon Kaen University, Thailand 3rd English Language & Literature International Conference 27th April 2019 Universitas Muhammadiyah, Semarang, Indonesia

2 Overview Introduction to a corpus & learner corpora
Corpus linguistics & Key concepts Corpus use in language testing and assessment Corpus studies in Thai contexts A study of a learner corpus at a Thai university

3 “A corpus is a collection of pieces of language that are selected and ordered according to explicit linguistic criteria in order to be used as a sample of the language.” (Sinclair, 1996) “A corpus is a large and principled collection of natural texts” (Biber, et al., 1998) What is a corpus?

4 “[…] the term corpus as used in modern linguistics can best be defined as a collection of sampled texts, written or spoken, in machine- readable form which may be annotated with various forms of linguistic information” (McEnery, Xiao & Tono, 2006) What is a corpus?

5 Texts in a corpus Machine-readable texts Authentic texts Sampled texts
Representative of a particular language or language variety Texts in a corpus

6 Why use a corpus? Intuition alone is not enough
Is “however” always replaceable by “nevertheless”? “think of” & “think about” Native speaker intuition is unreliable provides no information on frequency of occurrence “head” => body part - Is this the most used sense?

7 Help answering questions of usage easily
Why use a corpus? Help answering questions of usage easily How frequent is the word actuallly used? Is actually typical of spoken or written modalities? Does actually typically start a sentence or utterance? What is the function of actually?

8 Why use a corpus? Corpus-based approach draws upon authentic or real texts. Computer-based analysis can retrieve differences that intuition alone cannot perceive. Reliable quantitative data

9 Corpus types Written vs. Spoken
General language purpose vs. Specialised language purpose Plain text vs. Annotated (tagged) text Native speaker & Language learner corpora

10 Written vs. Spoken corpora
Written vs. Spoken corpora Written The Standard Corpus of Present-Day Edited American English (BROWN) The Lancaster-Oslo/Bergen Corpus (LOB) The Freiburg-Brown Corpus of American English (FROWN) The Freiburg-LOB Corpus of British English (FLOB)  Spoken Lancaster/IBM Spoken English Corpus (SEC) Cambridge and Nottingham Corpus of Discourse in English (CANCODE) Santa Barbara Corpus of Spoken American English (SBCSAE) Michigan Corpus of Academic Spoken English (MICASE) Wellington Corpus of Spoken New Zealand English (WSC) Corpus of Contemporary American English (COCA) 

11 General vs. Specialized corpora
BROWN LOB British National Corpus (BNC) The American National Corpus Specialized Guangzhou Petroleum English Corpus HKUST Computer Science Corpus Corpus of Professional Spoken American English (CPSA ) Acknowledgments Corpus Blog Authorship Corpus Air Traffic Control Corpus (ATC)

12 Plain text vs. Tagged text
Studying history is fun for some people but for some people they may find out it’s quite hardly to understand by themselves. Reading big thick textbook of history is like consuming sleeping medicine so other type of media like films or movies might be the better choice for some group of people to learn the history. Studying_VVG history_NN is_VBZ fun_JJ for_IN some_DT people_NNS but_CC for_IN some_DT people_NNS they_PP may_MD find_VV out_RP it�s_NNS quite_RB hardly_RB to_TO understand_VV by_IN themselves_PP ._SENT Reading_VVG big_JJ thick_JJ textbook_NN of_IN history_NN is_VBZ like_IN consuming_NN sleeping_VVG medicine_NN so_IN other_JJ type_NN of_IN media_NNS like_IN films_NNS or_CC movies_NNS might_MD be_VB the_DT better_JJR choice_NN for_IN some_DT group_NN of_IN people_NNS to_TO learn_VV the_DT history_NN ._SENT

13 Taggers CLAWS – Part-of-speech tagger of English WordSmith Tools
TagAnt

14 Native speaker vs. Language learner corpora

15 Learner Corpora

16 What is a learner corpus?
electronic collections of texts produced by language learners (Granger, 2008) Who are the language learners? Those who learn a language which is neither their L1 nor official language in the country where they live varieties of English in Kachru’s expanding circle (1985).

17 Some (English) Learner corpora
Corpus L1 Text type International Corpus of Learner English (ICLE) Various W The Cologne-Hanover Advanced Learner Corpus (CHALC) German W (term papers) Arab Learner English Corpus (ALEC) Arabic W (1st year ss essays) The Advanced Learner English Corpus (ALEC) Swedish W (Eng major essays) The Chinese Academic Written English Corpus (CAWE) Chinese W (Thesis Eng major) The International Teaching Assistants Corpus (ITAcorp) S The ANGLISH Corpus French The Japanese Learner English Corpus (NICT JLE) Japanese The Barcelona English Language Corpus (BELC) Spanish W & S The International Corpus Network of Asian Learners of English (ICNALE) 10 Asian languages

18 Why learner corpora? Second language acquisition
To explore interlanguage produced by second or foreign language learners To have a better understanding of the factors influence it To identify in what respects learners differ from each other or from the language of native speakers English language teaching To develop pedagogical tools and methods that more accurate target the needs of language learners (e.g., Data-driven learning)

19 Corpus Linguistics is a methodology, which tends to:
involve the analysis of “actual” language use in natural texts utilise a large and principled collection of natural texts (corpus) as the basis for analysis makes extensive use of computers, utilising both automatic and interactive techniques depend on both quantitative and qualitative analytical techniques: “The goal of corpus-based investigations is not simply to report quantitative findings, but to explore the importance of these findings for learning about the patterns of language use” (Biber, et al. 1998: 4-5) Corpus Linguistics

20 Corpus Analysis

21 ELT SLA Corpus Linguistics ESP Testing Discourse Analysis

22 Common purposes To highlight importance of lexico-grammatical resources To reveal typical grammatical features (e.g., verbs) associated with grammatical constructions To complement to textbook language To examine textbooks or course materials (student input) To explore learner corpora (Barlow, 2011)

23 Key analysis Lexical (e.g., single word, collocations, lexical bundles) frequent words (in academic English, etc.) range of words (across genres) word combinations (collocations) typical usage -- from frequent words to semantic connotations Grammatical Frequent (particular) grammatical features Range of grammatical features

24 Corpus studies & language testing/assessment
Representing Language Use in the University: Analysis of the TOEFL®2000 Spoken and Written Academic Language Corpus D. Biber, S.M. Conrad, R. Reppen, P. Byrd, M. Helt, V. Clark, V. Cortes, E. Csomay, & A. Urzua (2004)

25 Use of learner corpora in language testing and assessment
To develop tests, e.g., cloze tests (e.g., Rees, 1998). To examine typicality of features in a particular text type (e.g., investigation of modals used in service encounters (Friginal, 2009). To compare high- and low-scoring essays to find features that differentiate them (e.g., Cumming, et. al, 2001). To compare distribution of word-classes used in speech and academic written English in the ICLE and LOCNESS corpora (e.g., Granger & Rayson, 1998).

26 Corpus studies in Thai contexts

27 Studies using non-learner English corpora
Development of wordlists: academic and technical word lists in Chemistry articles (Nuamjapho & Poonpon, 2017), academic word list in business English (Patanasorn, 2017) Vocabulary profile in high school textbooks (Sujinpram, Senchantichai, & Poonpon, 2014), university ESP course materials (Chanchanglek & Sriussadaporn, 2011) Collocations (Buakaew, 2015)

28 Studies using non-learner English corpora
Noun phrases in political news (Siriphum, Thongyoi, & Poonpon, 2016) Reporting verbs in research article introductions in international and Thai medical journals (Jirapanakorn, 2012) Lexical bundles in TESOL conference abstracts (Wongwiwat, 2016), business English coursebooks (Sriumporn, 2011), agricultural science research articles (Shi, 2010), TED-Talk (Suwanwong, 2015), Medical journals (Panthong, Kunthama, & Poonpon, 2017), native English speaker teachers (Steyn & Jaroongkhongdach, 2016)

29 Studies using non-learner English corpora
Data-driven learning--DDL (Boontam, 2018; Darasawang, 2014; Dokchandra, 2015; Eak-in, 2015; Liangpanit, 2010; Tasanameelarp, 2012; Tangpijaikul, 2014; Yaemtui, 2018) (source:

30 Studies using learner corpora
Lexical analysis (Leelasetakul, 2014) Collocations—knowledge/learning (Detdamrongpreecha, 2014; Khittikote, 2011;Mongkolchai, 2000; Supanfai, 2012; Wangsirisombat, 2011) Grammatical analysis Discourse markers in business English conversation (Nookam, 2010), in chat texts (Thongkampra & Poonpon, 2014) Adverbial connectors in argumentative essays (Patanasorn, 2010; Chanyoo, 2014) Error analysis of Thai students’ laboratory scientific abstract writing (Ua- umakul & Vittayapirak, 2016)

31 Studies using learner corpora
L1 transfer—prepositions used by English major students (Thumawongsa, 2017) Comparative studies Discourse markers in conversation by Thai students and native English speakers (Sitthirak, 2013), in argumentative writing by Thai and Indonesian students (Andayani, 2014) Adverbial connectors in argumentative essays by Thai and American students (Jangarun & Luksaneeyanawin, 2016) Present perfect by Thai and native English speakers (Tumvichit, 2016)

32 A few studies using learner corpora in language testing and assessment
Lexical profile of Thailand university admission tests (Cherngchawano & Jaturapitakkul, 2014; Sujinpram, Senchantichai, & Poonpon, 2014) Lexical analysis and readability of English language tests for Thai university admissions (Limledjalearnvanit, 2014) Grammatical analysis in speaking test responses (Poonpon, 2011) Syntactic complexity in speaking test responses (Poonpon, 2012)

33 An investigation of lexical and grammatical features in graduate students’ written test responses
A case study at khon kaen university

34 Khon Kaen University (KKU), Thailand
22 Faculties and Colleges, more than 2,200 academic staff and 8,500 supporting staff 40,000 students including 10,000 postgraduates and 500 overseas students 340 study programs (101 Undergraduate programs, 138 in Masters Degree programs 77 in Doctoral Degree programs and 24 Graduate Diploma programs) 43 International programs

35 Khon Kaen University Academic English Language Test (KKU AELT)
English proficiency test To measure English proficiency of students who would like to continue their study in graduate programs at Khon Kaen University To screen prospectus graduate students Reading & Writing skills 3 test hours

36 KKU AELT Structure KKU-AELT Reading General reading Academic reading
Writing A essay

37 Writing prompts Do you agree or disagree with the following statement?
“Plastic bags should be banned on campus.” Use specific reasons and examples to support your answer. Do you agree or disagree with the following statement? “University should be open to all ages of learners.” Use specific reasons and examples to support your answer. Question: Agree/Disagree Style Source: KKU-AELT 2018

38 Scoring scale Language use Topic development Paragraph organization
6 scales (Scales 0-5) (adapted from TOEFL iBT scoring rubric for independent writing)

39 Score Bands Reading Band 5 Band 4 Band 3 Band 2 Band 1 Writing Band 5
PhD students Master’s students

40 If you don’t pass the exam…
Not pass AELT PASS English courses Pass

41 2018 AELT test takers 2,979 test takers: 2,001 MA & 978 PhD PASS
Reading Writing MA (Bands 3-5) 786 559 PhD (Bands 4&5) 174 200 Total 960 (32.2%) 795 (26.7%)

42 Research objectives To explore lexical profile in written test responses produced by Thai graduate students To examine grammatical features in written test responses produced by Thai graduate students To compare the lexical profile produced by Thai graduate students with different language proficiency levels To compare the grammatical features produced by Thai graduate students with different language proficiency levels

43 A KKU AELT written test corpus
A collection of 579 written test responses produced by examinees who took the tests during 2018 at Khon Kaen University, Thailand Written test responses were divided into 5 groups representing test responses that received different score bands (bands 1 to 5)

44 Written test responses 2018 (7 tests, N=1,602)
Bands Test responses In this study (n=579) words/essay (x) 5 9 358 4 248 140 298 3 480 210 251 2 372 120 276 1 493 100 165

45 Band 5 Band 1

46 Instruments CLAWS – Part-of-speech tagger of English AntWord Profiler
WordSmith Tools AntConc

47 Vocabulary profile Bands GSL AWL Off-list Total words 5 2,805 152 268
3,225 4 36,586 1,901 3,202 41,689 3 46,237 2,130 4,269 52,636 2 15,453 626 17,017 33,096 1 7,701 328 8,429 16,458 108,782 5,137 33,185 147,104

48 Vocabulary profile by ability groups

49 Grammatical features Rank Band 5 Band 4 Band 3 Band 2 Band 1 1
N. sing (7.9%) N. sing (8.7%) N. sing (8.9%) N. sing (11.1%) N. Sing (11.6%) 2 N. plural (3.3%) N. plural (3.9%) N. plural (3.6%) Gen. prep. (3.4%) Gen. adj. (3.7%) 3 Art. (3.1%) Gen prep. (3.1%) Gen Adj. (3.3%) Gen. adj. (3.3%) Gen. prep. (3.3%) 4 Gen Adj. (3.1%) Gen Adj. (3.0%) Gen prep. (2.9%) Lexical verb, base (2.7%) Lexical verb, base (3%) 5 Gen prep. (2.8%) Art. (2.8%) Art. (2.9%) To-infinitive (2.3%) To-infinitive (2.2%) 6 To-infinitive (2.6%) To-infinitive (2.7%) Art. (2.2%) Co. conj. (1.7%) 7 Verb, inf. (2.1%) Verb, inf. (2.0%) Lexical verb, base (1.9%) N. plural Art. (1.7%) 8 Gen adv. (1.8%) Aux, modal (1.6%) Verb, inf. (1.9%) N. Plural (1.5%) 9 Lexical verb, base (1.6%) Verb, inf. (1.4%) is (1.3%) 10 Co. conj. (1.5%) Co. conj. (1.6%) Aux, modal (1.5%) Is (1.1%) Grammatical features

50 Summary of findings Test takers with high proficiency produced more language than those at lower proficiency levels. At low proficiency levels, less production of the written language may result in the test takers limited knowledge and less opportunity to use the language. Academic words were used more by the test takers with high proficiency level than lower proficiency level.

51 Summary of findings ‘Adverbs’ were used more by the Band 5 test takers, reflecting the range of vocabulary and ability to describe or modify actions and feelings. Vocabulary and grammar used by the learners at levels 3 and 4 are quite similar in terms of types and number of occurrences. Thus, differentiating these two groups of learners cannot be done without considering other writing scoring criteria (i.e., topic development and organization).

52 How do these findings inform KKU-AELT developers?
In some way, the findings address the validation of our writing scoring criteria in terms of language use. More fine-grained scoring description can be developed to help raters differentiate the test takers’ proficiency levels more easily.

53 Challenges of learner corpora application in testing and assessment context
Data collection perspective Time-consuming transcribing process Accuracy of transcripts Analysis perspective Dealing with language errors

54 Learner Corpora and Language Testing and Assessment: Application and Challenges
THANK YOU.


Download ppt "Kornwipa Poonpon Khon Kaen University, Thailand"

Similar presentations


Ads by Google