Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

Similar presentations


Presentation on theme: "The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd"— Presentation transcript:

1 The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd http://www.sketchengine.co.uk

2 The Cambridge Learner Corpus, English Profile, the Sketch Engine, “freely available”, HOO, DANTE and the Kelly Project Adam Kilgarriff Lexical Computing Ltd http://www.sketchengine.co.uk

3 Cambridge Learner Corpus (CLC) Since 1993 – Nearly as old as CECL Leading resource (like ICLE) CUP and Cambridge ESOL – For better dictionaries, ELT courses, tests – Material: all from exams (levels A1-C2) 45m words; 22m error-tagged 200,000 scripts, 138 L1s, 203 nationalities

4 English Profile From 2006 Cambridge Univ, Univ Press, ESOL (+ others) Goal – for each CEFR level, find characteristic lexis and grammar – Main resource: CLC – Talk on Thursday Theodora Alexopolou, Helen Yannakoudakis

5 Flyers

6 Sketch Engine Leading corpus tool Word sketches – One-page summaries of a word’s grammatical and collocational behaviour In use at OUP, CUP, Collins, Macmillan, INL … 42 languages – Over 150 corpora – Since May including CHILDES: demodemo – Since last year including CLC

7 Error-coded corpus Challenge – Intuitive to search for x anywhere only where it is part of an error only where it is part of a correction where x can be a word, phrase, grammar pattern … Requirement for CLC in Sketch Engine

8 Sample text We will only use those informations to take part of our guest survey

9 Error-coded corpora in SkE demo

10 freely available

11 Free (MED online) Sense 1: not costing anything Sense 4: not limited by rules … anyone can get hold of it??

12 freely available Free (MED online) Sense 1: not costing anything Sense 4: not limited by rules … anyone can get hold of it?? Available To download onto your com To use

13 Case studies ICLECLC Money225 EURNo To everyoneYesCambridge author/collab To download?No To useYes

14 Non-geeks Access is important, not download Web is beautiful

15 HOO / HOO+ Helping Our Own HOO: English-NNS NLP researchers – Developer = user: motivation – Shared task/competitive evaluation Organisers define task and prepare ‘gold standard’ Teams participate by running their software over test data Six teams (incl Tübingen), workshop end Sept

16 HOO+ (2012) Probably – English: learner data from CLC – Other languages? – Tasks Essay scoring Determiner, preposition errors ? http://www.clt.mq.edu.au/research/projects/hoo/

17 DANTE Highlights of English lexicography

18 DANTE

19

20

21 http://webdante.com Flyers

22 The KELLY Project EU Lifelong Learning Project Word cards – 9 languages Arabic Chinese English Greek Italian Norwegian Polish Russian Swedish – All 36 pairs – Words the learner should know (at A1 … C2) Partners Stockholm Univ, Gotheburg Univ, Adam Mickiewicz Univ, ILSP Athens, CNR Pisa, Oslo Univ, Leeds Univ, Keewords A/S, Lexical Computing Ltd

23 Interesting question How close to purely corpus-based can a pedagogic list be?

24 Method Take a general corpus Count Review, add, delete using other lists and corpora Translate (72 directed-lg-pairs) Words not in source list which occur in translations: – Review source list http://kelly.sketchengine.co.uk

25 Symmatrical pairs: and Cliques: – For x, y, z, … all pairs are symmetrical – 9-language cliques (English members) hospital library music sun theory

26 Homage


Download ppt "The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd"

Similar presentations


Ads by Google