Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.

Similar presentations


Presentation on theme: "1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex."— Presentation transcript:

1 1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex

2 2

3 Nov 2009Adam KilgarriffETA-ROC: 3 Overview What is a corpus? History Humanities Linguistics Language teaching Corpora in the classroom Scaring the students Alternative strategies

4 Nov 2009Adam KilgarriffETA-ROC: 4 What is a corpus A collection of texts When used for linguistic/literary research Growth in last two decades Computer power Text available electronically Tools

5 Nov 2009Adam KilgarriffETA-ROC: 5 History Bible studies Literary criticism Shakespeare concordance: 200 years ago Dictionary-making Samuel Johnson (1754)‏ Oxford English Dictionary

6 Nov 2009Adam KilgarriffETA-ROC: 6

7 Nov 2009Adam KilgarriffETA-ROC: 7 History Psychology How do children learn language? Education Teaching to read Thorndike and Lorge, 1940s Word lists for teaching Brown corpus, 1960s, 1m words First modern corpus

8 Nov 2009Adam KilgarriffETA-ROC: 8 History in Linguistics Pre-Chomsky Chomsky (from 1957)‏ Competence and performance Corpora out of fashion More recently Computational linguistics/NLP Often corpus-based Web corpora, Google

9 Nov 2009Adam KilgarriffETA-ROC: 9 In English Language Teaching For vocabulary selection West's General Service List, 1953 Main reference until BNC (1994)‏ To find how the language really is Textbook language often wrong John Sinclair, Birmingham Learner corpora

10 Nov 2009Adam KilgarriffETA-ROC: 10 Direct and indirect Indirect Vocab lists Dictionaries COBUILD (Collins, Birmingham Univ) 1980 Oxford, Longman British National Corpus (1994)‏ 100m words: enormous for its time Now: all leading dictionaries use corpora Textbooks

11 Nov 2009Adam KilgarriffETA-ROC: 11 Corpora in the classroom Direct use Tim Johns data-driven learning Students explore concordances Discover language facts for themselves Real language Test hypotheses If they learn like this, they will remember Since 1994: TALC conferences

12 Nov 2009Adam KilgarriffETA-ROC: 12 “Condensed reading” Best vocabulary learning Extensive reading How to focus? Reinforce vocab items Classroom exercise Cobb (1999)‏ Students need 2500 new words in a year pretend they are corpus lexicographers Each week, work out meaning of 200 new items

13 Nov 2009Adam KilgarriffETA-ROC: 13 Is C-in-the-C successful? After twenty years Minority interest Advanced level (university) only Most teachers haven't heard of it Compare indirect corpus use other parts of linguistics Why?

14 Nov 2009Adam KilgarriffETA-ROC: 14 Do they meet student needs? Dictionary is much easier Concordances slow and arduous distractions, confusions Motivation Not sexy “I want to learn English, not Corpus Linguistics”

15 Nov 2009Adam KilgarriffETA-ROC: scaring the students Concordances are hard to read No context Incomplete sentences Complex structures Difficult vocab Junk (every corpus has it)‏

16 Nov 2009Adam KilgarriffETA-ROC: 16 Reading concordances Best done quickly Read many lines Filter Find patterns a second per line Also for filtering Learners Not possible

17 Nov 2009Adam KilgarriffETA-ROC: 17 New strategies Between corpus and dictionary GDEX Find good example sentences Automatic Collocations Dictionary Motivation

18 Nov 2009Adam KilgarriffETA-ROC: 18 Corpus and dictionary Dictionary High quality but limited in size might not have what you need Corpus vast for when the dictionary does not tell you enough

19 Nov 2009Adam KilgarriffETA-ROC: 19 Corpus and dictionary Corpus unfamiliar difficult Dictionary familiar high quality sometimes even

20 Nov 2009Adam KilgarriffETA-ROC: 20 Corpus and dictionary Corpus unfamiliar difficult Dictionary familiar high quality sometimes even loved

21 Nov 2009Adam KilgarriffETA-ROC: 21 Corpus and dictionary Corpus unfamiliar difficult Dictionary familiar high quality sometimes even loved Disguise corpus as dictionary Word sketch

22 Nov 2009Adam KilgarriffETA-ROC: 22 In dictionaries: Users appreciate examples Paper: space constraints Electronic: no space constraints Give lots of example Constraint Cost of selection, editing GDEX: good example finder

23 Nov 2009Adam KilgarriffETA-ROC: 23 What makes a good example? Readable EFL users Informative Typical, for the collocation Context helps user understand target word/phrase

24 Nov 2009Adam KilgarriffETA-ROC: 24 GDEX Get concordance For each sentence Score it Sort Show best ones

25 Nov 2009Adam KilgarriffETA-ROC: 25 GDEX heuristics Sentence length (10-26 words)‏ Mostly common words: good Rare words: bad Sentences Start with capital, end with one of.!? No [, ],, http, \ Penalise: Other punctuation, numbers More than 2 or 3 capitals Typicality: third collocate is a plus

26 Nov 2009Adam KilgarriffETA-ROC: 26 GDEX: Models for use More examples for dictionaries With manual checking Original project: MEDAL Without Some-some Corpus query tool Sort concordances, best first option in Sketch Engine Automatic collocations dictionary

27 Nov 2009Adam KilgarriffETA-ROC: 27 Motivation What if... Student's favourite topic Not The family But Hip hop; manga; gaming Student owns corpus demo

28 Nov 2009Adam KilgarriffETA-ROC: 28 Summary Long history Word frequency lists Concordances scare students because Too hard to read Corpus valuable where dictionary does not tell you enough Corpus and dictionary Points in between

29 Nov 2009Adam KilgarriffETA-ROC: 29 Sketch Engine English, Chinese, other languages In use at OUP, Macmillan, CUP, Collins Many universities Word sketches Instant web corpora WebBootCaT Free trial Flyer

30 Nov 2009Adam KilgarriffETA-ROC: 30 Thank you


Download ppt "1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex."

Similar presentations


Ads by Google