Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University www.robwaring.org/presentations/

Similar presentations


Presentation on theme: "The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University www.robwaring.org/presentations/"— Presentation transcript:

1 The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University www.robwaring.org/presentations/

2 Overview Purpose - What kind of list? List structure Selection factors Definitions or translations? Mechanics Validating

3

4

5 Android too

6 Black – in level Red out of list Red underline – out of level Green – ignored words Black – in level Red out of list Red underline – out of level Green – ignored words

7 What kind of list - Purpose? To give to students to learn from (paper or digital) To analyze texts against e.g. a graded reader To cover the majority of words in a given field (e.g. top 1000 business words) Master list to source sublists from? Multiple level lists, or one list? For a single class – or general (e.g. all natives, all intermediates) Spoken, written, mixed? For a specific audience? – TOEIC, business, academic – A certain age – A certain level (intermediates)

8 What kind of list - Starting point? Use existing wordlists – GSL, Nation’s BNC lists, NGSL, NAWL... Use existing corpus (e.g. BNC, COCA) and dig out what you want Create your own corpus (business, TOEIC, nursing) Does it suit your purpose? Will BNC give you an academic list? Is it structured the way you want? Headwords only? Lemmas? Mixture?

9 BNC raw (by type) BNC Nation Family list

10 List structure I - List with Levels How many levels? Why? What are the breaks between levels? Will learners get from one level to another with ease? Will the breaks be even (say 560 words each) or vary? Level by frequency? utility? range? intuition? Learnability?

11 Selection Criteria I Representativeness: The list should adequately represent the wide range of uses of language Frequency and range: A word should occur frequently across a wide range of texts. Word families: Sensible set of criteria regarding what forms and uses are counted as being members of the same family Utility: how useful will the words be to the target learners Idioms and set expressions: Some items larger than a word behave like high frequency words

12 Selection Criteria II Learnability: how easy to learn? Related words may be easier Regularity: regular forms are easier than irregular forms, but some derivatives operate differently within a family. Excuse inexcusable Coverage: (it is not efficient to be able to express the same idea in different ways. It is more efficient to learn a word that covers a quite different idea) Stylistic level and emotional words: West saw second language learners as initially needing neutral vocabulary Intuition: how well does it match the teacher’s sense of what to include

13 Which of these would you put in your list? out of per cent such as of course for example in front of all right as soon as in general in addition to next to on top of instead of in charge of just about provided that as good as with a view to in between by and large at random per se old fashioned grown up matter of fact sq m fait accompli straight forward habeas corpus self-same haute cuisine a good deal laissez faire persona non grata

14 How frequently do lexical phrases occur (BNC)? Raw RankWordPer million words 177out of490 222per cent382 272such as321 285of course309 378for example238 1538in front of65 1725all right58 2159as soon as47 2491in general41 2970in addition to34 3307next to30 3755on top of26 4378instead of21 5409in charge of17 5987just about15 7396provided that11 7885as good as10 9125with a view to8 Raw RankWordPer million words 11459in between6 13507by and large5 14369at random4 16684per se4 19505old fashioned3 22060grown up2 28441matter of fact2 43572sq m1 48241fait accompli1 51717straight forward1 58511habeas corpus1 74321self-same0 76170haute cuisine0 82928a good deal0 83882laissez faire0 89371persona non grata0

15 Selection criteria – a new headword or in the family? Only mega-headwords (that cover all meaning senses) Inflections only? - Plurals, verb forms, -er –est adjectives. Keep them all together? If not where do low frequency derivatives go? – USE uses using used user users useful useless usefulness usefully usable misused misuse misusing misuses misuser misusers uselessness uselessly unused usability reuse reuses reused reusing unusable Derivatives in the family or as a new headword? – interest, interesting, interested, disinterested, interestingly Polygraphs with different meaning senses – book, bank, bat, bill Nuances – a brain, to brain someone Phrasal verbs – bring down, bring back, bring up, bring over Compound words – handbag, policeman, airflow, birdwatching Multi-word units? – traffic light, lunch box, all right, by and large

16 Selection – where to put derivatives? Level 1: A different form is a different word. Capitalization is ignored. Level 2: Regularly inflected words are part of the same family. The inflectional categories are - plural; third person singular present tense; past tense; past participle; -ing; comparative; superlative; possessive. Level 3: -able, -er, -ish, -less, -ly, -ness, -th, -y, non-, un-, all with restricted uses. Level 4: -al, -ation, -ess, -ful, -ism, -ist, -ity, -ize, -ment, -ous, in-, all with restricted uses. Level 5: -age (leakage), -al (arrival), -ally (idiotically), -an (American), -ance (clearance), -ant (consultant), -ary (revolutionary), -atory (confirmatory), -dom (kingdom; officialdom), -eer (black marketeer), -en (wooden), -en (widen), -ence (emergence), -ent (absorbent), -ery (bakery; trickery), -ese (Japanese; officialese), -esque (picturesque), -ette (usherette; roomette), -hood (childhood), -i (Israeli), -ian (phonetician; Johnsonian), -ite (Paisleyite; also chemical meaning), -let (coverlet), -ling (duckling), -ly (leisurely), -most (topmost), -ory (contradictory), -ship (studentship), -ward (homeward), -ways (crossways), -wise (endwise; discussion-wise), ante- (anteroom), anti- (anti-inflation), arch- (archbishop), bi- (biplane), circum- (circumnavigate), counter- (counter-attack), en- (encage; enslave), ex- (ex- president), fore- (forename), hyper- (hyperactive), inter- (inter- African, interweave), mid- (mid-week), mis- (misfit), neo- (neo-colonialism), post- (post-date), pro- (pro-British), semi- (semi-automatic), sub- (subclassify; subterranean), un- (untie; unburden). Level 6: -able, -ee, -ic, -ify, -ion, -ist, -ition, -ive, -th, -y, pre-, re-. Level 7: Classical roots and affixes.

17 Selection Criteria - How will you deal with … I Proper nouns: SONY, Dave, Jackson, Thomson, Paris, London Proper nouns that are words - Bell, Sue, Jack, Nation, Mark Numbers: 1, one, thirty, twenty-seven, thousand, billion Acronyms – NATO, DNA, UN, NSA, DARPA, Dialectal differences (e.g. US vs UK spelling) Multi-word units – post office, train station, city hall, Closed lexical sets such as days of the week, months etc. Typos – mispelings, heros, amatur, arguement, bellweather Incomplete words – travelin’, roarin’, ‘cept Slang forms – gonna, wanna, nuffink, wassup

18 Selection Criteria - How will you deal with … II Offensive words – pooh, shit, crap, bugger, bastard, fart, Culturally loaded words – temple vs. church, hijab, sporran Non-pc words – stewardess, waitress, negro, retarded, stupid NCLB words - beer, alcohol, drugs, tobacco, smoking, Archaic words – thou, thee, thine, groovy, gay, Prototypical sets – words often taught in sets – foods - pizza, apple, cake, bread, salt, tomato, zucchini, eggplant, capsicum – drinks – coffee, tea, juice, water, cola, mojito, screwdriver, bloody Mary – buildings – office, station, hotel, city hall, auditorium, ice rink – shops – supermarket, mall, barber, stationer, grocer – colors – red, blue, green, yellow, pink, violet, scarlet, puce

19 Definitions - What aspects of word knowledge to include? Definition POS – how detailed do you want to be? Translations – how will you deal with translators who disagree? Example sentence – authentic, contrived? Usage notes – which ones? Synonyms Anyonyms Distractors? (for online test auto-create software)

20 Definitions - style What style? e.g. Apple synonym fruit short definitionhard red or green fruit long definition the fleshy usually rounded red, yellow or green edible fruit of a usually cultivated tree (genus Malus) of the rose family Use of a defining vocabulary list? Which one? Which words?

21 Mechanics Word? Excel? Specialized database software such as Access or Filemaker? Versions. Is it important to know which version of your wordlist was given to which users? Do you have the time and patience? SERIOUSLY. Do you have the time and patience?

22 Validating your wordlist How will you evaluate the list’s integrity? How will you check if you missed words? How will you check mis-levelled words? How will you check consistency of definitions, examples, translations?

23 And soooooo much more! Questions? If you want help - waring.rob@gmail.com


Download ppt "The Ins and Outs of making a Wordlist Rob Waring Notre Dame Seishin University www.robwaring.org/presentations/"

Similar presentations


Ads by Google