Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learner corpus research - hands on Tom Cobb Didactique des langues / éducation Université du Québec à Montréal Saturday, October 31 8:15am - 10:15am lextutor.ca/cv/slrf_09/corpus.ppt.

Similar presentations


Presentation on theme: "Learner corpus research - hands on Tom Cobb Didactique des langues / éducation Université du Québec à Montréal Saturday, October 31 8:15am - 10:15am lextutor.ca/cv/slrf_09/corpus.ppt."— Presentation transcript:

1 Learner corpus research - hands on Tom Cobb Didactique des langues / éducation Université du Québec à Montréal Saturday, October 31 8:15am - 10:15am lextutor.ca/cv/slrf_09/corpus.ppt

2 2 Dr. Cobb will provide a "crash course" in carrying out research using learner corpora and small teacher or researcher built corpora generally. He will lead a walk-through of a study he has conducted using corpus data and address the work that had to be done and issues to be resolved at each stage of the study, offering a behind-the-scenes look at how corpus research is carried out. In addition he will display some new and accessible online tools for corpus work, hoping to encourage instructors or researchers from other areas to get some hands-on experience in the learner corpus paradigm.

3 3 Dr. Cobb will provide a [1] "crash course" in carrying out [1a] research using learner corpora and [1b] small teacher or researcher built corpora generally. He will lead a [2] walk- through of a study he has conducted using corpus data and [2a] address the work that had to be done and [2b] issues to be resolved at each stage of the study, offering a behind-the- scenes look at how corpus research is carried out. In addition he will display some [3] new and accessible online tools for corpus work, hoping to [4] encourage instructors or researchers from other areas to get some hands-on experience in the learner corpus paradigm.

4 4 LEARNER CORPUS crash course  research using learner corpora  or other small corpora walk-through of a study  address the work that had to be done  issues to be resolved at each stage display online tools for corpus work encourage hands-on experience + a bit of context

5 5 At 10.15 you will know… What a corpus is Why corpus research is important What it has contributed to applied linguistics The uses it can have for researchers … for instructors How to build a corpus Choice points in building a corpus … interpreting a instructors Some tools of corpus analysis How to do a learner corpus study Results from some published studies The future of learner corpus studies

6 6 Corpora – what are they?

7 7 What is a corpus? A large collection of language in use, but  Not only large  Not necessarily so large Assembled systematically, according to explicit criteria  of representativeness How large?  Depends on the goal

8 8 Goals and sizes Linguistics goal - to represent entire language 100 million wds still under-represents common collocations Pedagogical goal – S`s meet common words, structures 1-million-words gives 10 hits for frequent words Applied linguistics goal – trace an acquisition feature 1-200,000 words is common

9 9 Sub-Goals and sizes Pedagogical goal – S`s meet common grammar and vocab  Grammar – 1 million is adequate –All structures get many hits  Lexis Basic vocab –1 million gives 10 hits @ 2k level Main collocations – 1 million gives the main ones Torrential rain? “Raining cats and dogs”? – 1 billion gives 5 hits Identify specialist lexis – 200,000 may be enough

10 10

11 11 A growth industry Brown 1970………………..1,000,000 wds http://icame.uib.no/brown/bcm.html BNC 1994.……………… 100,000,000 wds www.natcorp.ox.ac.uk Cambridge Int’l 2002....1,000,000,000 wds www.cambridge.org./elt/corpus/international_corpus.htm Plus ANC, Bank of English, Cancode …

12 12 Design / composition e.g., Brown (1970s) Page from Lextutor

13 13 What does a corpus represent? A language as a whole BNC Or a part Cancode oral, MICASE academic Or of an individual Jack London’s collected works Or a group of individuals –Class of ESL learners

14 14 How do we read a corpus? Cannot read it naturally –Defeats the goal Needs the help of a search technology  concordance  index  frequency list  many others

15 15 Concordancers http://www.lextutor.ca/concordancers/concord_e.html

16 16 Lists http://www.lextutor.ca/freq/compleat_lister/

17 17 Indexes http://www.lextutor.ca/concordancers/text_concord/

18 18 Corpora – why do we need them?

19 19 Why do we need corpora? A. Corpus work is sexy B. We have computers – let’s use them C. Linguistic intuitions are unreliable

20 20 Linguistic intuitions are notoriously unreliable Demo 1: Do you think however is more common in spoken or in written language?  By how much? (3 to 1… etc)

21 21 http://www.lextutor.ca/range/range_corpus/

22 22 Demo 2: What are the main senses of back and which is most common? By what factor? http://www.lextutor.ca/concordance rs/concord_e.html http://www.lextutor.ca/concordance rs/concord_e.html

23 23

24 24

25 25 Demo 3: Can you rank order these roughly by frequency band? 0 - 2k 3k - 5k 6k - 10k 11k-15k http://www.lextutor.ca/freq/train/

26 26 Try one? http://www.lextutor.ca/freq/train/http://www.lextutor.ca/freq/train/

27 27 But not always Demo 4: Which do you think is more common, man and woman, or woman and man? Factor of 10:1, 5:1, 2:1? Go Live http://www.lextutor.ca/concordancers/concord_e.html

28 28 Many linguistic intuitions are unreliable Implicit patterns are extremely slow to extract from input N. Ellis, J. Hulstijn … because of the severe limitations on what we can see and remember … unaided

29 29 Scientific instrumentation - a brief history

30 30 Not only linguistic intuitions are problematic For every appearance, many possible explanations Stand outside on a starry evening, what does it look like?

31 31 The role of the computer in modern science is well known. In disciplines like physics and biology, the computer's ability to store and process inhumanly large amounts of information has disclosed patterns and regularities in nature beyond the limits of normal human experience. Similarly in language study, computer analysis of large texts reveals facts about language that are not limited to what people can experience, remember, or intuit. In the natural sciences, however, the computer merely continues the extension of the human sensorium that began 200 years ago with the telescope and microscope. But language study did not have its telescope or microscope. The computer is its first analytical tool, making feasible for the first time a truly empirical science of language. –Cobb 1999

32 32 Before the computer, linguists could only study small samples of language at a time because of their limitations of their powers of observation and their memories. Even scholars who relentlessly collected instances of usage all their lives only had a few examples of any particular pattern, and there was no way of telling what they had missed.  Sinclair, 2003, p. ix

33 33 Dr Johnson A Dictionary of the English Language  Longman 1755 Based on quotations from literature copied onto many slips of paper But using literature has some problems - Old and recent lit conflated - Is literature truly representative of life’s typical situations? - Is its lexis «un peu recherché»? Early corpora

34 34 120 years later - James Murray, OED 1879 – REAL LANGUAGE examples sent in by post - Oxford City Post Office sets up a special sub-branch for OED

35 35 Most sciences - supplemented by technologies from 15 th century BIOLOGY..……….microscope ASTRONOMY..…..telescope NAVIGATION.……astrolabe etc Language study – late 20 th century – ….machine readable corpora

36 36 Thus the “corpus revolution” Dictionaries Grammars Courses Studies

37 37 Of particular note… LGSWE

38 38 Corpus – successes

39 39 Fabled Core of English is close to disclosure Main lexis + coverage  2000 wd families = 80%, Carrol et al 76 Main collocations in BNC-speech  84 HF collocations belong in 1k list, Shin & Nation 2007 Main phrasal verbs –  25 Ph vbs = 1/3 of all ph vbs in BNC, Gardner & Davies, 2007 Main morphologies  Bauer & Nation, 1993 Main stress patterns (Murphy & Kandil)  Cf. All this coming together at the same time as the human genome, also a corpus project

40 40 Ancient prescriptivism is close to defeated in language pedagogy Except one debate remains  Corpus-based v. corpus-informed approaches Corpus based  If it`s in the corpus times X, it`s OK  X to be defined Corpus informed  Corpus information is one source of information

41 41 Numerous errors are now corrected (in principle) Definitions no longer harder than the defined word Simple present no longer automatically the first verb tense taught Written language no longer the model for spoken language Status of multi-word units reinstated Grammar no longer taught …  via unknown lexis  as unconnected to lexis

42 42 Task Grammar as connected to lexis? Let’s see what this could mean  + practice “reading concordances” Get out “borders on” (From SInclair http://www.twc.it/)http://www.twc.it/  What is the pattern?  What does it mean?  Can we call this ``word grammar`` ?

43 43 < Back to full output User extract 041. cember, Karimov became is more than just a way of life – it BORDERS on a religion. But there is of the laws of the sea s 042. n a religion. But there is of the laws of the sea sometimes BORDERS on arrogance. Not only should the international coll 043. ot only should the international collaboration is great and BORDERS on cartel like behaviour. who say using the extremis 044. on cartel like behaviour. who say using the extremist label BORDERS on demagoguery and will only serve Yugoslavia. What 045. ery and will only serve Yugoslavia. What is occurring there BORDERS on genocide. No country or society Careless but losi 046. o country or society Careless but losing two in the one day BORDERS on incompetence. Now Charlie Turkey, the only NATO c 047. competence. Now Charlie Turkey, the only NATO country which BORDERS on Iraq, is playing a key role in Her mastery of the 048. aq, is playing a key role in Her mastery of the short story BORDERS on perfection. kate saunders country’s stagnant grow 049. fection. kate saunders country’s stagnant growth, which now BORDERS on recession. Here again, the challenge looms ugly w 050. ession. Here again, the challenge looms ugly when recession BORDERS on slump. Everybody is on edge, The author, a lifelo 051. incredible. In the case_0 of maxim ‘The collector’s passion BORDERS on the chaos of memories.’ before staged protests at 052. he paranoid and, although and an easy going demeanour which BORDERS on the charismatic, it’s hardly popular music. In so 053. ian province of Kosovo, a professional solicitousness which BORDERS on the dangerous edge of savings accounts versus sha 054. e Soviet Central Asian clash. He said: ‘The hostility there BORDERS on the dangerous.’ Black players and – and to perfor 055. pathological. The sky, a then Claire makes a statement that BORDERS on the downright cocky. When I ask The linear intens 056. the chaos of memories.’ before staged protests at these two BORDERS on the east and west of their speaking to troops in 057. e obsessive. But there is the Sierra Madre” as he dubs them BORDERS on the eccentric. Mountain lions courses and opportu 058. ccentric. Mountain lions courses and opportunities, that it BORDERS on the embarrassing. This the straight, but his winn 059. on the obsessive. He portrays has a streak of bravery which BORDERS on the foolish. She has delicate to buy. A family wi 060. sensational because the amount of work he is required to do BORDERS on the incredible. In the case_0 of maxim ‘The colle 061. rs on the dangerous edge of savings accounts versus shares, BORDERS on the irresponsible. an independent Bosnia in its p 062. the contrary, his private His love for all things maritime BORDERS on the obsessional. He is truly Not surprisingly, th 063. ally acceptable, four even_0 harbour a passion for DIY that BORDERS on the obsessive. But there is the Sierra Madre” as 064. on slump. Everybody is on edge, The author, a lifelong fan, BORDERS on the obsessive. He portrays has a streak of braver 065. right cocky. When I ask The linear intensity of their songs BORDERS on the paranoid and, although and an easy going deme 066. on the surreal. Wander into the The atmosphere of paranoia BORDERS on the pathological. The sky, a then Claire makes a 067. the embarrassing. This the straight, but his winning effort BORDERS on the sensational because the amount of work he is 068. surreal. He had his own most dangerous regions on Earth. It BORDERS on the Serbian province of Kosovo, a professional so 069. lish. She has delicate to buy. A family with three children BORDERS on the socially acceptable, four even_0 harbour a pa 070. east and west of their speaking to troops in Xinjian which BORDERS on the Soviet Central Asian clash. He said: ‘The hos 071. gerous.’ Black players and – and to performing them sort of BORDERS on the surreal. He had his own most dangerous region 072. e obsessional. He is truly Not surprisingly, the atmosphere BORDERS on the surreal. Wander into the The atmosphere of pa 073. arismatic, it’s hardly popular music. In some cases_1, this BORDERS on wholesale plagiarism. That’s * __________________ 074. on the irresponsible. an independent Bosnia in its pre war BORDERS. On the contrary, his private His love for all thing 075. ________________________ and on mutual respect for existing BORDERS” on December, Karimov became is more than just a way

44 44 Corpus – failures

45 45 And yet… “The corpus-driven revolution in applied linguistics continues apace, and along with it the paradox that as corpora change the face of applied linguistics (most dictionaries, grammars, and course books now claim to be corpus based) it is largely without the participation of practitioners. Only a few teachers or researchers have ever built a corpus or delved through concordance lines.” - Cobb 2008, review of CBLS

46 46 Stalled enterprise ( -McCarthy, 2008) Teachers and researchers need to become producers, not just consumers, of corpus research Why? To evaluate “corpus based” claims Often vocab but not grammar is CB, etc What kind of corpus? To effectively lobby to get their CB needs met e.g. Gram+lex of specific domains To develop their own CB materials Who still uses a course book? To build their own corpora for action research projects

47 47 Stumbling blocks Some intimidation remains attached to corpus work It is not universally appreciated in SLA - Widdowson Computer stuff looks daunting - Seems more linguistics than applied POLICY OF THIS WORKSHOP: There are some fairly clear reasons to do this and simple ways to get started

48 48 … The classic corpora are not easy-access - Despite long lists on the Web - Even McCarthy’s Cancode is 100% unavailable to researchers - Ref Tribble review of O’keefe et al - Especially in languages other than English - Lextutor users’ requests for German => Solutions <= [1] Band together (CECL) - [2] Make your own =>

49 49 DIY corpus – why?

50 50 German http://www.lextutor.ca/concordancers/braun_info.html http://www.lextutor.ca/concordancers/braun_info.html

51 51 Why bother – Google is a corpus Ref – Robb

52 52

53 53 Classic case, breadth v. depth Web-as-corpus gives massive volume Even smallish DIY corpus gives Better quality search Families, starts with, ends with Easier access to detail & context Better exposure to pattern + you can make your own, target your own needs Material for learners Material from learners v. corpus

54 54 DIY corpus – how?

55 55 Build your own - HOW Many texts on the Web  E.g., http://www.lextutor.ca/bookbox/ http://www.lextutor.ca/bookbox/  Question of selection replaces quesiotn of access Must be or become text files  (whatever.txt) «dot txt  Whether you want a one-big-file corpus  Or several-small-files corpus

56 56 Only plain.TXT files make corpora One

57 57 One big file: a) Insert One

58 58 One big file: b) Upload http://www.lextutor.ca/tools/corpus_builder2/ http://www.lextutor.ca/tools/corpus_builder2/ One

59 59 DIY corpus for learning materials

60 60 Using CB tools to select / develop learning materials? Using news texts? Check first against CB frequency lists Pre-teaching vocab? Find the CB keywords Writing tests? Check it contains gram+lex the S’s have actually seen Teaching a speaking course? Check models are speech not writing

61 61 Build corpus as learning materials For some purpose Must make some sampling sense EG one London – all London All course materials Corpus of graded readers

62 62 Learning materials – multi-file corpus http://www.lextutor.ca/callwild

63 63 Learning materials – one-file corpus http://conc.lextutor.ca/list_learn/eng/ http://conc.lextutor.ca/list_learn/eng/

64 64 Learning materials – one-file corpus http://www.lextutor.ca/corpus_grammar/ http://www.lextutor.ca/corpus_grammar/

65 65 DIY for research purposes

66 66 1. Written production

67 67 Learner text more and more available - Collect & investigate because it is there? Some typical purposes - determine needs - check progress - Cf. active vs. passive ability - explore for experimental hypothesis Constraints Choose topic carefully Does topic suggest just one verb tense? Cf capital punishment vs. my holiday Very different language demands

68 68 Models of LCs Learners vs. NSs Ls vs. Ls – Snapshot or Longitudinal (same Ls at diff times) Or diff Ls at diff stages in learning ≅ longitudinal (Cross-sectional) OR Belz (04, citing Cobb 03) 4 LC variables should be controlled: 1. type of learner (e.g., FL vs. SL), 2. stage of learner 3. text type/purpose/register/conditions, 4. and the availability of a similar corpus of native speaker data

69 69 NS data must be comparable Best example is UCLE’s Locness Louvain Corpus of Native Speaker Essays 149,574 words of argumentative essays written by American university students 18,826 words of literary-mixed essays written by American university students 59,568 words of argumentative and literary essays written by British university students 60,209 words of British A-level argumentative essays.

70 70 Issues in LC SMALL ISSUES – Tag or not? Spell check or not, or at what point? One file or many? BIG ISSUE - Granger 2004, p. 124 What kind of data is a LC? “LC typically fall into the category of natural or open-ended data” while “SLA researchers tend to prefer [1] introspective or [2] experimental/elicited data…” V BIG ISSUE - Is this paradigm an instance of Bley-Vroman’s (1983) “comparative fallacy”?

71 71 Once made, flat or tagged? Pro’s of flat corpus  If for learning materials, = what learners face THEY must make sense of data Tagged does it for them  Easier to make, you can have more  Search inputs require some work, Trial +error Pro’s of tagged corpus  Precise comparisons are possible  Especially for N-N compounds and errors  But learner data poses special problems  Tags are needed for error analysis VP + ADV + D OBJ, etc  Yet learner data confuses taggers

72 72 Error tagger (UCL Err Extractor – Granger 02 ) specific-purpose, known-target tagging - Unlikely to confuse tagger, but a ton of work

73 73 Here’s a set of studies I’m working on LC study typically begins with a practical problem Theoretical conundrums? not so much E.g., this problem: Montreal learners Eight years ESL At 18 many switch to English-language system With insufficient vocabulary for advanced study in English Fully competent only at 1k

74 74 Biq question Input: What lexis are these kids getting in school? RQ Do their NNS teachers have enough vocab themselves to get kids over the 1k-hump?

75 75 Procedure Run Vocab size test on Ts Nations’s new 14k – lextutor.ca/tests/lextutor.ca/tests/ Get small exploration corpus of their production “How could the TESL program be improved?” Argumentative + opinion Get similar sized NS corpus LOCNESS, A-Levels, UK “An invention that has changed how we live” Compare for structure and lexis Quantity (frequency) and quality Focus on lexis 2k+

76 76

77 77 Prelude Look at TESLProg.txt in your handout as demo mini-corpus Writing task was this 5-minute in-class writing exercise  Peter Elbow, keep writing idea Discursive topic  How could UQAM new TESL program be improved? Homework:  - identify your main point  - focus + elaborate for Web publication Each paper gets three rounds of feedback

78 78 Computers have become a huge part of our lives in both the areas of work and education. But are they such a good thing? When calculators came along a drop in ability of students for mental arithmetic was obvious and now they are used for the simplest calculations. The computer could do the same thing. Computers encourage laziness in the general public, why work out something yourself when the computer can do it for you. This is very time saving and efficient but it is causing people to forget basic ideas. For instance, spelling is no longer as important as it was you can simply use a "spellcheck" to correct your English, which is absurd. For the youth of today computers offer links around the world and millions of facts and figures. This could be argued to be educational. However, this is killing the imagination of children and they spend hours sat at a keyboard tapping away in the doom and gloom of the house. They should be out enjoying themselves and gaining experiences for themselves instead of reading about them on a flat screen. It is said that you can meet people through computers and have `relationships'. I find this preposterous and people are losing the ability to communicate and form relationships. Computers can offer escape from the hum-drum routine of daily life by means of games but they are mind-numbing and un-inventive. There is however a more dangerous threat from computers, it is that they can do the work for man. This could lead to high unemployment. Those people who work with computers for long periods of time every day face problems. The repetion of tapping keys all day and staring at the screen can be harmful and not only that it is highly boring to do the same thing over and over again. Computers may be the future but what part will man have in this future. There will be no need for people to go to school as they could be taught at home, people would hardly ever talk and the only career available would be for computer programmers. I agree that computers are helpful but people should not live through their computers and be so reliant on them. They should read books and live more in order to regain their lost imagination and sense of adventure. Also, in schools I feel that work should be done mainly by hand and calculators and computers should only be used minimally in mathematics in order to stop the production of computer addicts and again have normal people. Computers have become a huge part of our lives in both the areas of work and education. But are they such a good thing? When calculators came along a drop in ability of students for mental arithmetic was obvious and now they are used for the simplest calculations. The computer could do the same thing. Computers encourage laziness in the general public, why work out something yourself when the computer can do it for you. This is very time saving and efficient but it is causing people to forget basic ideas. For instance, spelling is no longer as important as it was you can simply use a "spellcheck" to correct your English, which is absurd. For the youth of today computers offer links around the world and millions of facts and figures. This could be argued to be educational. However, this is killing the imagination of children and they spend hours sat at a keyboard tapping away in the doom and gloom of the house. They should be out enjoying themselves and gaining experiences for themselves instead of reading about them on a flat screen. It is said that you can meet people through computers and have `relationships'. I find this preposterous and people are losing the ability to communicate and form relationships. Comparison text from Locness (ex 1)

79 79 Computers have become a huge part of our lives in both the areas of work and education. But are they such a good thing? When calculators came along a drop in ability of students for mental arithmetic was obvious and now they are used for the simplest calculations. The computer could do the same thing. Computers encourage laziness in the general public, why work out something yourself when the computer can do it for you. This is very time saving and efficient but it is causing people to forget basic ideas. For instance, spelling is no longer as important as it was you can simply use a "spellcheck" to correct your English, which is absurd. For the youth of today computers offer links around the world and millions of facts and figures. This could be argued to be educational. However, this is killing the imagination of children and they spend hours sat at a keyboard tapping away in the doom and gloom of the house. They should be out enjoying themselves and gaining experiences for themselves instead of reading about them on a flat screen. It is said that you can meet people through computers and have `relationships'. I find this preposterous and people are losing the ability to communicate and form relationships. Computers can offer escape from the hum-drum routine of daily life by means of games but they are mind-numbing and un-inventive. There is however a more dangerous threat from computers, it is that they can do the work for man. This could lead to high unemployment. Those people who work with computers for long periods of time every day face problems. The repetion of tapping keys all day and staring at the screen can be harmful and not only that it is highly boring to do the same thing over and over again. Computers may be the future but what part will man have in this future. There will be no need for people to go to school as they could be taught at home, people would hardly ever talk and the only career available would be for computer programmers. I agree that computers are helpful but people should not live through their computers and be so reliant on them. They should read books and live more in order to regain their lost imagination and sense of adventure. Also, in schools I feel that work should be done mainly by hand and calculators and computers should only be used minimally in mathematics in order to stop the production of computer addicts and again have normal people. Computers may be the future but what part will man have in this future. There will be no need for people to go to school as they could be taught at home, people would hardly ever talk and the only career available would be for computer programmers. I agree that computers are helpful but people should not live through their computers and be so reliant on them. They should read books and live more in order to regain their lost imagination and sense of adventure. Also, in schools I feel that work should be done mainly by hand and calculators and computers should only be used minimally in mathematics in order to stop the production of computer addicts and again have normal people. Comparison corpus from Locness (2) More lexis? Less? A little? A lot? http://www.lextutor.ca/vp/bnc/ Computers may be the future but what part will man have in this future. There will be no need for people to go to school as they could be taught at home, people would hardly ever talk and the only career available would be for computer programmers. I agree that computers are helpful but people should not live through their computers and be so reliant on them. They should read books and live more in order to regain their lost imagination and sense of adventure. Also, in schools I feel that work should be done mainly by hand and calculators and computers should only be used minimally in mathematics in order to stop the production of computer addicts and again have normal people. http://www.lextutor.ca/vp/bnc/ Computers may be the future but what part will man have in this future. There will be no need for people to go to school as they could be taught at home, people would hardly ever talk and the only career available would be for computer programmers. I agree that computers are helpful but people should not live through their computers and be so reliant on them. They should read books and live more in order to regain their lost imagination and sense of adventure. Also, in schools I feel that work should be done mainly by hand and calculators and computers should only be used minimally in mathematics in order to stop the production of computer addicts and again have normal people.

80 80 Which analysis software?

81 81 Basic structure snapshot (Qc corpus) http://www.lextutor.ca/concordancers/text_concord

82 82 http://www.lextutor.ca/concordancers/text_concord

83 83

84 84 http://www.lextutor.ca/tuples/eng/

85 85 Lexis comparison

86 86 Lexis comparison NNS corpus (Quebec TESL trainees) 155 post-1k word families/3356 tokens NS corpus (UK A-Levels essay) 269 post-1k word families/3630 tokens But that’s not all Split up corpus Look at individuals http://www.lextutor.ca/vp/bnc/

87 87

88 88 Almost all post-2ks are used by one writer only

89 89 Conclusion Interesting peripheral differences for another study Syntax correct but unelaborated Phrases heavy on the short end, light on the long end Low proportion of noun-noun Vocab - Heavy reliance on 1k vocab Low Post-1k Items used by one person Yet good recognition scores at 3k+ levels  Known words are not getting used   Unlikely to get used in classroom

90 90 2. Oral production corpus

91 91 Let’s learn more about the previous study: Follow trainees into their classrooms Does the predicted pattern occur? If new words appear, are they recycled? * See Horst’s Teacher Talk Corpus study in a forthcoming RIFL (2011) (Note: Different subjects – here we are establishing tools & method)

92 92 Looks like rich lexical input… 18 hrs of NS-T classroom talk

93 93 Summary Post-1k words (learning zone)  1570 families  900 appear in one class-hour only  Inc 300 one TIME only «Recyclage» is not happening  Now add this to the NNS data  Few post-1k used in own writing  The problem starts to make sense

94 94 Or, Alert’s 108,000 wds, no past tense! Went, saw http://www.lextutor.ca/concordancers/concord_e.html

95 95 3. Goal clarification clarification

96 96 Let’s work through a published study Ovtcharov & Cobb 2006Ovtcharov & Cobb 2006 (en français) Situation: Ottawa Civil service promotions depend on success in L2 oral interview Pass/fail evaluated globally (=impressionistically) “A well developed vocabulary” is one of the stated criteria But what is it? The usual soft focus

97 97 Needed for the study 1. Corpus of transcribed oral interviews Both passes, fails, & borderlines 24 of each, 25-35 minutes 100s of hours work 2. French version of Vocabprofile Lemmatized large-corpus based, k-leveled frequency lists? Miraculously appear in c. 2001 See Cobb & Horst, 2004Cobb & Horst, 2004 3. Usable NS reference corpus Provided by Beeching, 2001 French oral interviews in USA

98 98 Identifiable difference at 2k Strong difference at 3k+MHL (off-list) Result

99 99 (Assuming replication) One less failure-to-communicate in the vastness of high-stakes language instruction The instructional design process has a place to begin Significance

100 100 Corpus research is a fairly simple, bean-counting type of research That can solve complex problems in language learning & teaching, both Practical What do these people need to learn? Can examiners’ impressions be operationalized? Theoretical E.g., Piecing together the portrait of advanced interlanguage (Cobb 2003) So…

101 101 Course tie-up

102 102 At 10.15 you now know… What a corpus is Why it is important What insights it has yielded in applied linguistics The uses it can have for researchers … for instructors How to build a corpus Choice points in building a corpus Some tools of corpus analysis How to do a learner corpus study The results of some published learner corpus studies The future of learner corpus studies

103 103 The Future

104 104 Corpus research carries on shining the light into dark corners - 2007-2009 work from Dee Gardner, Stuart Webb Some increase in corpus awareness - Teacher training programs - MA methods courses Collaboration reduces labour - CECL, the Locness reference corpus - Promise of automatic corpus comparisons at Calper Gold Dev. world can play as tools go online Where do we go from here?

105 105 If we have time… The final challenge to the utility of frequency lists As already seen We are closing in on the Core of English This includes a smaller than expected group of true homonyms No corpus tool-kit so far deals with these systematically E.g. a Vocabprofile analysis does not distinguish bank and bank

106 106 Go live http://www.lextutor.ca/concordancers/text_concord

107 cobb.tom@uqam.ca cobb.tom@uqam.ca www.lextutor.cawww.lextutor.ca This PPT at http://www.lextutor.ca/cv/slrf_09/corpus.ppt References list at http://www.lextutor.ca/cv/slrf_09/handout.doc http://www.lextutor.ca/cv/slrf_09/handout.doc


Download ppt "Learner corpus research - hands on Tom Cobb Didactique des langues / éducation Université du Québec à Montréal Saturday, October 31 8:15am - 10:15am lextutor.ca/cv/slrf_09/corpus.ppt."

Similar presentations


Ads by Google