Teachers’ Top 10 Uses For a Language Corpus Saturday, May 18, 9-10h * PLUS Breakout Session at 10h15 * Sunshine State TESOL 2013 «Expanding Traditions:

1 Teachers’ Top 10 Uses For a Language Corpus Saturday, May 18, 9-10h * PLUS Breakout Session at 10h15 * Sunshine State TESOL 2013 «Expanding Traditions: Merging Methodology & Technology» ORLANDO, FLORIDA 1 Tom Cobb Didactique des langues Université du Québec à Montréal FIND THIS PPT AT WWW.LEXTUTOR.CA/CV/SS-TESOL.PPT

2 Who? Tom Cobb teaches Applied Linguistics at a French university in Montreal. His main interest is adapting the computer tools of linguists to the needs of language teachers through his website Tom worked abroad for many years (Saudi Arabia, Oman, Hong Kong) before returning to North America, and continues to work as a consultant in both developed and developing counries (Japan, Niger, Benin, Barbados). His research writings are available at 2

3 SS-TESOL - Blurb A “language corpus” is a sampled collection of written or spoken texts large enough to represent part of a language (medical, economic) or even a language as a whole. The applied linguistics literature is full of references to research involving corpora, and ESL teacher-training courses exhort new teachers to get familiar with corpora and use them for various purposes in their teaching. But - when teachers get into the classroom, do they follow this advice? And if so what do they use a corpus for? The Lextutor website ( offers teachers access to several corpora and checks how they use them. User data shows that > 1,000 (mainly teachers) per day consult a corpus on Lextutor. This data along with email queries and conference presentation makes it clear what teachers are using corpora for, and has made it possible to evolve the tools in line with teachers’ needs and goals. My talk will outline the main 10 reasons teachers consult a 3

8 What? What is a corpus? Why do we need corpora? What difference do they make? What is ``the corpus revolution``? Or, ``Is there a corpus revolution?`` >>>> A brief primer on CORPORA before we get to teachers’ uses 8

9 9 Corpora – what are they?

10 Historically… 10

11 11 Dr Johnson A Dictionary of the English Language  Longman 1755 Based on quotations from literature copied onto many slips of paper But using literature has some problems Early corpora

12 12 120 years later - James Murray, OED 1879 – REAL LANGUAGE examples sent in by post - Oxford City Post Office sets up a special sub-branch for OED

13 1960s - Enter The Computer 13

14 14

15 15 What is a corpus? NOT just «a lot of text»! A large collection of language in use, but  …Assembled systematically, according to explicit criteria  of representativeness How large?  Depends on the goal

16 16 Goals and sizes Linguistics goal - to represent entire language 100 million wds still under-represents common collocations Pedagogical goal – S`s meet common words, structures 1-million-words gives 10 hits for frequent words Applied linguistics goal – trace an acquisition feature 100,000 word Learner Corpora are common

17 17 Drilling down into… Pedagogical goal – S`s meet common grammar and vocab  Grammar – 1 million is adequate –All structures get many hits  Lexis Basic vocab –1 million gives 10 hits @ 2k level Main collocations – 1 million gives the main ones Torrential rain? “Raining cats and dogs”? – 1 billion gives 5 hits Identify specialist lexis – 200,000 may be enough

18 18 A growth industry Brown 1970………………..1,000,000 wds BNC 1994.……………… 100,000,000 wds COCA (BYU) 2013.……. 450,000,000 wds  Contemporary corpus U.S. English 1990-2012 Cambridge Int’l 2002....1,000,000,000 wds

19 19 Design / composition e.g., Brown (1970s) Page from Lextutor

20 20 What does a corpus represent? A language as a whole BNC Or a part Cancode oral, COCA, MICASE academic Or of an individual Jack London’s collected works Or a group of individuals –Class of ESL learners

21 21 How do we read a corpus? Cannot read it naturally –Defeats the goal Needs the help of a search technology  concordance  index  frequency list  many others

22 22 Concordancers

23 23 Corpora – why do we need them?

24 24 Why do we need corpora? A. Corpus work is sexy B. We have computers – let’s use them C. Linguistic intuitions are unreliable

25 25 Linguistic intuitions are notoriously unreliable Demo 1: Do you think however is more common in spoken or in written language?  By how much? (3 to 1… etc)

26 26

27 27 Demo 2: What are the main senses of back and which is most common? By what factor? rs/concord_e.html rs/concord_e.html

28 28

29 29

30 30 Demo 3: Can you rank order these roughly by frequency band? 0 - 2k 3k - 5k 6k - 10k 11k-15k

31 31 Try one?

32 32 Many linguistic intuitions are unreliable Implicit patterns are extremely slow to extract from input N. Ellis, J. Hulstijn … because of the severe limitations on what we can see and remember … unaided

33 And if pattern perception is slow and unreliable for Native Speakers … How much slower for LEARNERS ?! 33

34 34 Not only linguistic intuitions are problematic For every appearance, many possible explanations Stand outside on a starry evening, what does it look like?

35 35 The role of the computer in modern science is well known. In disciplines like physics and biology, the computer's ability to store and process inhumanly large amounts of information has disclosed patterns and regularities in nature beyond the limits of normal human experience. Similarly in language study, computer analysis of large texts reveals facts about language that are not limited to what people can experience, remember, or intuit. In the natural sciences, however, the computer merely continues the extension of the human sensorium that began 200 years ago with the telescope and microscope. But language study did not have its telescope or microscope. The computer is its first analytical tool, making feasible for the first time a truly empirical science of language. –Cobb 1999

36 36 Before the computer, linguists could only study small samples of language at a time because of their limitations of their powers of observation and their memories. Even scholars who relentlessly collected instances of usage all their lives only had a few examples of any particular pattern, and there was no way of telling what they had missed.  Sinclair, 2003, p. ix

37 37 Most sciences - supplemented by technologies from the 15 th century BIOLOGY..……….microscope ASTRONOMY..…..telescope NAVIGATION.……astrolabe etc Language study – late 20 th century – ….machine readable corpora

38 38 Corpus Findings – Very Good News for ESL

39 39 Fabled Core of English is close to disclosure through 35 yrs of corpus work Main lexis + coverage  2000 wd families = 80%, Carrol et al 76 Main collocations in BNC-speech  84 HF collocations belong in 1k list, Shin & Nation 2007 Main phrasal verbs –  25 Ph vbs = 1/3 of all ph vbs in BNC, Gardner & Davies, 2007 Main morphologies  Bauer & Nation, 1993 Main stress patterns (Murphy & Kandil)  Cf. All this coming together at the same time as the human genome, also a corpus project

40 40 Numerous errors are now corrected (in principle) Definitions no longer harder than the defined word Simple present no longer automatically the first verb tense taught Written language no longer the model for spoken language Status of multi-word units is reinstated Grammar no longer taught …  via unknown lexis  as unconnected to lexis

41 41 Thus the “corpus revolution” Dictionaries Grammars Courses Studies

42 42 This is all great, but… What do teachers do with corpora? <<< Back to 10 main uses of Lextutor corpora with ESL learners

44 1. The obvious use – source of examples for the teacher Teacher finds examples to show students – Words – Structures – Discourse features Find sentences for test questions – within a rough-tuned level – within a domain ««-- MEANS WE CAN GO LIVE EASILY FROM THIS PLACE 44

45 Display words, collocations, structures in classroom 45

46 46

47 47

48 48

49 49

50 Conclusion: most of “What it means to know a word” can be shown in a million-word corpus 50 Nation’s 18 kinds of word knowledge

51 Uses 2-9 are concordancing in a task context – Where teachers set up concordances for learners to use independently – because they achieve some goal by doing so Payoff for looking through multi-examples These were independent uses of concordances – Later incorporated in dedicated interfaces 51

52 EXAMPLE: A student writer wants to describe a teacher as ``one of the best teacher…`` 52 2. Corpus as a writing resource

53 53 A writing resource click-linked to learner`s text

54 3. Data-Driven Error Analysis 54

55 … integrated as writing error feedback 55

56 56

57 4-5 : Corpus as a reading resource Expand the text Via concordancer hooked up to learner’s text – With potential payoff in strategy development

58 4. Give lexical info while reading Or, develop lexical strategies while reading Or, eta-lexical competence… etc 58

59 4a. Encourage use of context before dictionary 59

60 5. Show if word is worth stopping for 60

61 6. Word-focus activities… Auto-generate rich semantic cuing 61

62 62

63 7. Group made concs for collab-vocab Learners contribute concordance lines Since there are too many words to learn alone… 63

64 7a. Facilitate transfer of word knowledge to novel context 64

65 8. Facilitate quick scope out of a k-level 65

66 8. Facilitate quick-scope of a k-level 66

67 9. Snapshot of a set of learner essays Error patterns? Are recently learned words coming through in production? Are new structures coming through? – Correctly? 67

68 68

69 10. And, under development By popular demand From my best Googe-hitting paper (1997) Scope out a level by contextual inference – Like in L1 but with support 69

70 Any research supporting all of this? 70 COCONCORDANCE AS A READING RESOURCE – Cobb, T. (2009). Internet and literacy in the developing world: Delivering the teacher with the text. In K. Parry (Ed.), Literacy for All in Africa Vol. 2: Reading in Africa: Beyond the School. Kampala: African Book Collective.Internet and literacy in the developing world: Delivering the teacher with the text. CONCORDANCE AS WRITING FEEDBACK – Gaskell, D., & Cobb, T. (2004) Can learners use concordance feedback for writing errors? System, 32(3), 301-319Can learners use concordance feedback for writing errors? LEARNER-BUILT CONCORDANCE FOR VOCAB DEVELOPMENT – Horst, M., Cobb, T., & Nicolae, I. (2005). Expanding Academic Vocabulary with a Collaborative On-line Database. Language Learning & Technology 9(2), 90-110 CONCORDANCE INVESTIGATION OF LEARNER PRODUCTION – Cobb, T. (2003). Analyzing late interlanguage with learner corpora: Quebec replications of three European studies. Can. Modern Language Review 59(3), 393-423. Analyzing late interlanguage with learner corpora: Quebec replications of three European studies CONCORDANCE FOR SCOPING OUT A K-LEVEL – Cobb, T. (1997). Is there any measurable learning from hands-on concordancing? System 25 (3), 301-315.Is there any measurable learning from hands-on concordancing? – Cobb, T. & Horst, M (2011). Does Word Coach coach words? CALICO 28(3), 639-661. MORE AT LEXTUTOR.CA/CV/Does Word Coach coach words?

