Presentation is loading. Please wait.

Presentation is loading. Please wait.

GDEX: Automatically finding good dictionary examples in a corpus Kivik 2013Kilgarriff: GDEX1.

Similar presentations


Presentation on theme: "GDEX: Automatically finding good dictionary examples in a corpus Kivik 2013Kilgarriff: GDEX1."— Presentation transcript:

1 GDEX: Automatically finding good dictionary examples in a corpus Kivik 2013Kilgarriff: GDEX1

2 Kivik 2013Kilgarriff: GDEX2 Users appreciate examples  Paper: space constraints  Electronic: no space constraints Give lots of examples Constraint: Cost of selection, editing

3 Kivik 2013Kilgarriff: GDEX3 Project  Macmillan English dictionary  Already had 1000 collocation boxes  Average 8 per box  New electronic version All 8000 collocations need examples  Authentic; from corpus

4 Kivik 2013Kilgarriff: GDEX4 Old method  Lexicographer Gets concordance for collocation Reads through until they find a good example Cut, paste, edit

5 Kivik 2013Kilgarriff: GDEX5 New method  Lexicographer Gets sorted concordance  20 best examples in spreadsheet Less reading through Tick the first good one, edit

6 Kivik 2013Kilgarriff: GDEX6 What makes a good example?  Readable EFL users  Informative Typical, for the collocation Gives context which helps user understand the target word/phrase

7 Kivik 2013Kilgarriff: GDEX7 Readability  70 years research  Not just (or mainly) EFL Educational theory  Teaching children to read Instruction manuals  Early work: US military Publishing  People like newspapers and magazines that they find easy to read

8 Kivik 2013Kilgarriff: GDEX8 Readability tests  Fleish-Kincaid Reading Ease test 1948 Ave sentence length, ave word length In some word processing software  Many similar measures  Recent work training data for different reading levels Language modelling Tailored readability according to domain, L1  Target levels US grades Now, increasingly: Common European Framwork

9 Kivik 2013Kilgarriff: GDEX9 GDEX  Get concordance for collocation  For each sentence Score it Sort Show best ones to lexicographer

10 Kivik 2013Kilgarriff: GDEX10 GDEX heuristics  Sentence length (10-26 words) ‏  Mostly common words is good  Rare words are bad  Sentences Start with capital, end with one of.!?  No [, ],, http, \  Not much other punctuation, numbers  Not too many capitals  Typicality: third collocate is a plus

11 Kivik 2013Kilgarriff: GDEX11 Weighting  For each sentence Score on each heuristic Weight scores Add together weighted score  How to set weights? Two students:  Manually judged 1000 “ good examples ”  Weights set so system makes same choices as students

12 Kivik 2013Kilgarriff: GDEX12 Was it successful?  Did it save lexicographer time? Definitely (says project manager) ‏  Rough guess Average number of corpus lines to read until you find a good one:  Unsorted: 20  Sorted: 5

13 Kivik 2013Kilgarriff: GDEX13 Corpus choice Started with BNC but  Too old  Not enough examples If no good examples in corpus, GDEX can ’ t help Changed to UKWaC  20 times bigger; from web; contemporary  Better  Most web junk filtered out  Usually a good example in top twenty

14 Kivik 2013Kilgarriff: GDEX14 GDEX and TALC  TALC Teaching and Language Corpora  Goal: bring corpora into lg teaching  Usual problem Concordances are tough for learners to read  Way forward GDEX examples Half way between dictionary and corpus

15 Kivik 2013Kilgarriff: GDEX15 GDEX: Models for use  More examples for dictionaries Speed up, as with MED or Fully automatic “ more examples ”  Corpus query tool Option in the Sketch Engine  Only show concordances with high scores  Automatic collocations dictionary http://forbetterenglish.com

16 Recent developments  Configurable GDEX For other languages Interface to help set up  Commonest string Between ‘bare collocate’ and example Kivik 2013Kilgarriff: GDEX16

17 Kivik 2013Kilgarriff: GDEX17


Download ppt "GDEX: Automatically finding good dictionary examples in a corpus Kivik 2013Kilgarriff: GDEX1."

Similar presentations


Ads by Google