GDEX: Automatically finding good dictionary examples in a corpus Kivik 2013Kilgarriff: GDEX1.

Slides:



Advertisements
Similar presentations
IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.
Advertisements

Finding multiwords of more than two words Adam Kilgarriff, Pavel Rychly, Vojtech Kovar, Vıt Baisa Lexical Computing Ltd; Masaryk Univ., Cz.
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Linking Dictionary and Corpus Adam Kilgarriff Lexicography MasterClass Ltd Lexical Computing Ltd University of Sussex UK.
1 Corpora for the coming decade Adam Kilgarriff. Dublin June 2009 Kilgarriff: Corpora for the coming decade2 How should they be different?  Bigger 
Learning Objectives Explain similarities and differences among algorithms, programs, and heuristic solutions List the five essential properties of an algorithm.
L EARNERS ’ D ICTIONARY Deny A. Kwary
Augmenting online dictionary entries with corpus data for Search Engine Optimisation Holger Hvelplund, 1 Adam Kilgarriff, 2 Vincent Lannoy, 1 Patrick White.
Corpus Creation for Lexicography Adam Kilgarriff, Michael Rundell Lexicography MasterClass, UK Elaine Ui Dhonnchadha ITE (Linguistics Institute of Ireland)
Using Corpora for Teaching Chinese Dr. Adam Kilgarriff Lexical Computing Ltd Leeds University UK.
The Sketch Engine -What is The Sketch Engine? -What is a corpus? -Looking at the BASE and the BAWE corpora. -How can this help.
Labels: automation Adam Kilgarriff. Kivik 2013Kilgarriff / Labels: automation2 Which words are:  Most distinctive of business English? Keywords, already.
Making useful wordlists for ELT Topical vocabulary from the WWW Simon Smith & Scott Sommers Ming Chuan University, Taipei Adam Kilgarriff, Lexical Computing.
Constructing and Evaluating Web Corpora: ukWaC Adriano Ferraresi University of Bologna Aston University Postgraduate Conference.
Today Listening test Corpus linguistics talk, Part 3 News task NEOs Life on Mars.
Talking about your homework News story? –What made you choose…? One of your words? –What made you choose…? (Give your vocabulary books to another student.
1 Corpora for the coming decade Adam Kilgarriff Lexical Computing Ltd.
Today Writing: using the comma –Writing task Corpus linguistics talk, Part 2 Re-organize groups –Group news discussion.
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
1. Learning Outcomes At the end of this lecture, you should be able to: –Define the term “Usability Engineering” –Describe the various steps involved.
Simple Maths for Keywords Adam Kilgarriff Lexical Computing Ltd.
Labels: automation Adam Kilgarriff. Auckland 2012Kilgarriff / Labels: automation2 Which words are:  Most distinctive of business English?  Most often.
1 Evaluating word sketches Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Using Corpora for Teaching Chinese Dr. Adam Kilgarriff Lexical Computing Ltd Leeds University UK.
Online Corpora in L2 Writing Class Zawan Al Bulushi Indiana University Bloomington November 15,
First International Sketch Grammar Workshop Ljubljana 3-4 February 2010.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing.
1 Corpora, Dictionaries, and points in between in the age of the web Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of.
Searching on MEC and MEDO Making the most of MEC’s search facilities.
Works Cited Page. Overview: Your Works Cited page is where you will list all the articles/books/websites/etc you will use in your paper. If you decide.
Administrative Software Chapter 7 Teaching and Learning with Technology.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Using the Sketch Engine for second language learning: an experiment Simon Smith & Alice Chen |
Why We Need Corpora and the Sketch Engine Adam Kilgarriff Lexical Computing Ltd, UK Universities of Leeds and Sussex.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Corpora by Web Services Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
CHAPTER 10 – VOCABULARY: STUDENTS IN CHARGE Presenter: 1.
How to use Microsoft Word. Where can I find Microsoft Word? How to select, copy and paste information Go to the document from which you wish to copy the.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Paul Mundy Readability. Counts  3800 words  113 paragraphs  150 sentences Averages  2 sentences/paragraph  24 words/sentence  5.4.
GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage.
TALC Applying some Developments in Corpus Building Technology to Language Teaching and Learning TALC 2006 Paris.
Corpora and Concordancers in ESL/EFL Class: Truly Authentic Language for Language Learning. and opening.
1 Evaluating word sketches and corpora Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Corpus Evaluation Adam Kilgarriff Lexical Computing Ltd Corpus evaluationPortsmouth Nov
Malta, May 2010Kilgarriff: Corpora by Web Services1 Corpora by Web Services Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities.
Copyright © 2010 – MICS 2010, Curt Hill Instructor Tools: Test Data Generation Curt Hill Valley City State University.
Tool Kit. Receiving an When you receive an , it will appear on the white box, which is the conversation list. To do this you will have.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
L ITERATURE REVIEW RESEARCH METHOD FOR ACADEMIC PROJECT I.
Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based.
Applying some Developments in Corpus Building Technology to Language Teaching and Learning TALC 2006 Paris.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
GDEX: Automatically finding good dictionary examples in a corpus Auckland 2012Kilgarriff: GDEX1.
Exploring Variation in Lexis and Genre in the Sketch Engine Adam Kilgarriff Lexical Computing Ltd., UK Supported by EU Project PRESEMT.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
How to complete and submit a Final Report through Mobility Tool+ Technical guidelines Authentication, Completion and Submission 1 Antonia Gogaki IT Officer.
Making trouble-free corpus tasks in 10 minutes Jennie Wright.
THE PROCESS OF WORDS BEING ENTERED IN A DICTIONARY WORD FORMATION IN ENGLISH Magdalena Soklevska April, 2016.
GDEX: Automatically finding good dictionary examples in a corpus.
Differentiating Instruction Using Nettrekker
Evaluating word sketches and corpora
Corpora and Concordancers in ESL/EFL Class:
COMP444 Human Computer Interaction Usability Engineering
Corpora, Language Technology and Maltese
7th Grade Computers.
Presentation transcript:

GDEX: Automatically finding good dictionary examples in a corpus Kivik 2013Kilgarriff: GDEX1

Kivik 2013Kilgarriff: GDEX2 Users appreciate examples  Paper: space constraints  Electronic: no space constraints Give lots of examples Constraint: Cost of selection, editing

Kivik 2013Kilgarriff: GDEX3 Project  Macmillan English dictionary  Already had 1000 collocation boxes  Average 8 per box  New electronic version All 8000 collocations need examples  Authentic; from corpus

Kivik 2013Kilgarriff: GDEX4 Old method  Lexicographer Gets concordance for collocation Reads through until they find a good example Cut, paste, edit

Kivik 2013Kilgarriff: GDEX5 New method  Lexicographer Gets sorted concordance  20 best examples in spreadsheet Less reading through Tick the first good one, edit

Kivik 2013Kilgarriff: GDEX6 What makes a good example?  Readable EFL users  Informative Typical, for the collocation Gives context which helps user understand the target word/phrase

Kivik 2013Kilgarriff: GDEX7 Readability  70 years research  Not just (or mainly) EFL Educational theory  Teaching children to read Instruction manuals  Early work: US military Publishing  People like newspapers and magazines that they find easy to read

Kivik 2013Kilgarriff: GDEX8 Readability tests  Fleish-Kincaid Reading Ease test 1948 Ave sentence length, ave word length In some word processing software  Many similar measures  Recent work training data for different reading levels Language modelling Tailored readability according to domain, L1  Target levels US grades Now, increasingly: Common European Framwork

Kivik 2013Kilgarriff: GDEX9 GDEX  Get concordance for collocation  For each sentence Score it Sort Show best ones to lexicographer

Kivik 2013Kilgarriff: GDEX10 GDEX heuristics  Sentence length (10-26 words) ‏  Mostly common words is good  Rare words are bad  Sentences Start with capital, end with one of.!?  No [, ],, http, \  Not much other punctuation, numbers  Not too many capitals  Typicality: third collocate is a plus

Kivik 2013Kilgarriff: GDEX11 Weighting  For each sentence Score on each heuristic Weight scores Add together weighted score  How to set weights? Two students:  Manually judged 1000 “ good examples ”  Weights set so system makes same choices as students

Kivik 2013Kilgarriff: GDEX12 Was it successful?  Did it save lexicographer time? Definitely (says project manager) ‏  Rough guess Average number of corpus lines to read until you find a good one:  Unsorted: 20  Sorted: 5

Kivik 2013Kilgarriff: GDEX13 Corpus choice Started with BNC but  Too old  Not enough examples If no good examples in corpus, GDEX can ’ t help Changed to UKWaC  20 times bigger; from web; contemporary  Better  Most web junk filtered out  Usually a good example in top twenty

Kivik 2013Kilgarriff: GDEX14 GDEX and TALC  TALC Teaching and Language Corpora  Goal: bring corpora into lg teaching  Usual problem Concordances are tough for learners to read  Way forward GDEX examples Half way between dictionary and corpus

Kivik 2013Kilgarriff: GDEX15 GDEX: Models for use  More examples for dictionaries Speed up, as with MED or Fully automatic “ more examples ”  Corpus query tool Option in the Sketch Engine  Only show concordances with high scores  Automatic collocations dictionary

Recent developments  Configurable GDEX For other languages Interface to help set up  Commonest string Between ‘bare collocate’ and example Kivik 2013Kilgarriff: GDEX16

Kivik 2013Kilgarriff: GDEX17