The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

Slides:



Advertisements
Similar presentations
The English Vocabulary Profile
Advertisements

Terminology-finding in the Sketch Engine Miloš Jakubíček, Adam Kilgarriff, Vojtěch Kovář, Pavel Rychlý, Vit Suchomel Lexical Computing Ltd., Brighton,
A cascade of corpora: The Cambridge Learner Corpus, English Profile, the Sketch Engine, HOO, DANTE and the Kelly Project Adam Kilgarriff Lexical Computing.
Finding multiwords of more than two words Adam Kilgarriff, Pavel Rychly, Vojtech Kovar, Vıt Baisa Lexical Computing Ltd; Masaryk Univ., Cz.
Concordancing at Upper-Intermediate Levels What it is not What you will get from this talk.
Feed Corpus : An Ever Growing Up to Date Corpus Akshay Minocha, Siva Reddy, Adam Kilgarriff Lexical Computing Ltd.
Corpus Processing and NLP
WebBootCaT usage Adam Kilgarriff Lexical Computing Ltd.
1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Linking Dictionary and Corpus Adam Kilgarriff Lexicography MasterClass Ltd Lexical Computing Ltd University of Sussex UK.
1 Corpora for the coming decade Adam Kilgarriff. Dublin June 2009 Kilgarriff: Corpora for the coming decade2 How should they be different?  Bigger 
HOO 2012: A Report on the Preposition and Determiner Error Correction Shared Task Robert Dale, Ilya Anisimoff and George Narroway Centre for Language Technology.
L EARNERS ’ D ICTIONARY Deny A. Kwary
Using Corpora for Teaching Chinese Dr. Adam Kilgarriff Lexical Computing Ltd Leeds University UK.
The Sketch Engine -What is The Sketch Engine? -What is a corpus? -Looking at the BASE and the BAWE corpora. -How can this help.
Making useful wordlists for ELT Topical vocabulary from the WWW Simon Smith & Scott Sommers Ming Chuan University, Taipei Adam Kilgarriff, Lexical Computing.
Talking about your homework News story? –What made you choose…? One of your words? –What made you choose…? (Give your vocabulary books to another student.
1 Corpora for the coming decade Adam Kilgarriff Lexical Computing Ltd.
Page 1 NAACL-HLT BEA Los Angeles, CA Annotating ESL Errors: Challenges and Rewards Alla Rozovskaya and Dan Roth University of Illinois at Urbana-Champaign.
Corpora and Language Teaching
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
What's on the Web? The Web as a Linguistic Corpus Adam Kilgarriff Lexical Computing Ltd University of Leeds.
Online Lexical Tool Theodora Sutanto. Dictionary  Cambridge Dictionaries Online (english learner all levels; Cambridge.
Tools for Historical corpus research, and a corpus of Latin Barbara McGillivray Oxford University Press Adam Kilgarriff Lexical Computing Ltd.
1 Evaluating word sketches Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Tomaž Erjavec 1, Adam Kilgarriff 2, Irena Srdanović Erjavec 3 1 Jožef Stefan Institute, Slovenia 2 Lexical Computing Ltd. and University of Leeds, UK 3.
Using Corpora for Teaching Chinese Dr. Adam Kilgarriff Lexical Computing Ltd Leeds University UK.
Comparable Corpora BootCaT (CCBC) Adam Kilgarriff, Avinesh PVS, Jan Pomikalek Lexical Computing Ltd.
1 Linguistic evidence within and across languages, word frequency lists and language learning Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass.
Using corpora for bespoke language teaching
First International Sketch Grammar Workshop Ljubljana 3-4 February 2010.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
Terminology, translation, and PRESEMT; word frequency lists and KELLY 1 Adam Kilgarriff Lexical Computing Ltd SKEW-2, March 2011Kilgarriff: PRESEMT and.
GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing.
1 Corpora, Dictionaries, and points in between in the age of the web Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of.
Researching language with computers Paul Thompson.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
Why We Need Corpora and the Sketch Engine Adam Kilgarriff Lexical Computing Ltd, UK Universities of Leeds and Sussex.
Corpora by Web Services Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
1 Comparable Corpora Within and Across Languages, Word Frequency Lists and the KELLY Project Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass.
Comparable Corpora BootCaT (CCBC) (or: In Praise of BootCaT) Adam Kilgarriff, Jan Pomikalek, Avinesh PVS Lexical Computing Ltd. Work Supported by EU FP7.
1 Using Corpora in Language Research -also Introduction to the Sketch Engine (WS15) part 1 Adam Kilgarriff Lexical Computing Ltd Universities of Leeds.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Corpora and Concordancers in ESL/EFL Class: Truly Authentic Language for Language Learning. and opening.
1 Evaluating word sketches and corpora Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Corpus Evaluation Adam Kilgarriff Lexical Computing Ltd Corpus evaluationPortsmouth Nov
Using Corpora in Language Research Adam Kilgarriff Lexical Computing Ltd Universities of Leeds January 2013Adam Kilgarriff.
Malta, May 2010Kilgarriff: Corpora by Web Services1 Corpora by Web Services Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities.
The Sketch Engine as Infrastructure for Large Scale Text Collections for Humanities Research Adam Kilgarriff Lexical Computing Ltd. & Univ of Leeds, UK.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois.
Grammar is to Meaning as the Law if to Good Behaviour Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Learners' Dictionaries Oxford1948 Longman1978 Collins COBUILD1987 Macmillan2002 Macmillan2008 (bilingualized) Merriam-Webster2008 Jackson, Howard
GDEX: Automatically finding good dictionary examples in a corpus Auckland 2012Kilgarriff: GDEX1.
Exploring Variation in Lexis and Genre in the Sketch Engine Adam Kilgarriff Lexical Computing Ltd., UK Supported by EU Project PRESEMT.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
Adult education institution ‘DANTE’. About us...  Founded in first acknowledged language school in the city of Rijeka  Long experience and tradition.
GDEX: Automatically finding good dictionary examples in a corpus Kivik 2013Kilgarriff: GDEX1.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
Writing Inspirations, 2017 Aalto University
Making useful wordlists for ELT
Evaluating word sketches and corpora
Writing Inspirations, Spring 2016 Aalto University
Corpora and Concordancers in ESL/EFL Class:
Are you a Cambridge English Teacher? Wednesday 11th April
Tomaž Erjavec1, Adam Kilgarriff2, Irena Srdanović Erjavec3
A Latin corpus for Sketch Engine
Statistical n-gram David ling.
Corpora, Language Technology and Maltese
Presentation transcript:

The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

The Cambridge Learner Corpus, English Profile, the Sketch Engine, “freely available”, HOO, DANTE and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

Cambridge Learner Corpus (CLC) Since 1993 – Nearly as old as CECL Leading resource (like ICLE) CUP and Cambridge ESOL – For better dictionaries, ELT courses, tests – Material: all from exams (levels A1-C2) 45m words; 22m error-tagged 200,000 scripts, 138 L1s, 203 nationalities

English Profile From 2006 Cambridge Univ, Univ Press, ESOL (+ others) Goal – for each CEFR level, find characteristic lexis and grammar – Main resource: CLC – Talk on Thursday Theodora Alexopolou, Helen Yannakoudakis

Flyers

Sketch Engine Leading corpus tool Word sketches – One-page summaries of a word’s grammatical and collocational behaviour In use at OUP, CUP, Collins, Macmillan, INL … 42 languages – Over 150 corpora – Since May including CHILDES: demodemo – Since last year including CLC

Error-coded corpus Challenge – Intuitive to search for x anywhere only where it is part of an error only where it is part of a correction where x can be a word, phrase, grammar pattern … Requirement for CLC in Sketch Engine

Sample text We will only use those informations to take part of our guest survey

Error-coded corpora in SkE demo

freely available

Free (MED online) Sense 1: not costing anything Sense 4: not limited by rules … anyone can get hold of it??

freely available Free (MED online) Sense 1: not costing anything Sense 4: not limited by rules … anyone can get hold of it?? Available To download onto your com To use

Case studies ICLECLC Money225 EURNo To everyoneYesCambridge author/collab To download?No To useYes

Non-geeks Access is important, not download Web is beautiful

HOO / HOO+ Helping Our Own HOO: English-NNS NLP researchers – Developer = user: motivation – Shared task/competitive evaluation Organisers define task and prepare ‘gold standard’ Teams participate by running their software over test data Six teams (incl Tübingen), workshop end Sept

HOO+ (2012) Probably – English: learner data from CLC – Other languages? – Tasks Essay scoring Determiner, preposition errors ?

DANTE Highlights of English lexicography

DANTE

Flyers

The KELLY Project EU Lifelong Learning Project Word cards – 9 languages Arabic Chinese English Greek Italian Norwegian Polish Russian Swedish – All 36 pairs – Words the learner should know (at A1 … C2) Partners Stockholm Univ, Gotheburg Univ, Adam Mickiewicz Univ, ILSP Athens, CNR Pisa, Oslo Univ, Leeds Univ, Keewords A/S, Lexical Computing Ltd

Interesting question How close to purely corpus-based can a pedagogic list be?

Method Take a general corpus Count Review, add, delete using other lists and corpora Translate (72 directed-lg-pairs) Words not in source list which occur in translations: – Review source list

Symmatrical pairs: and Cliques: – For x, y, z, … all pairs are symmetrical – 9-language cliques (English members) hospital library music sun theory

Homage