Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Charles Browne Professor of Applied Linguistics

Similar presentations


Presentation on theme: "Dr. Charles Browne Professor of Applied Linguistics"— Presentation transcript:

1 The New General Service List: Celebrating 60 years of Vocabulary Learning
Dr. Charles Browne Professor of Applied Linguistics Meiji Gakuin University, Tokyo

2 A few current Corpus Projects…
Business English Word List for NHK TV Show in Japan EnglishCentral (a HUGE video corpus of authentic English) New General Service List (CEC) New Academic Word List (CEC) TOEIC Vocabulary Study List (using past tests materials)

3 A few of my many online vocabulary learning projects…

4 EFL Vocabulary Learning in Japan…
Frequency 600,000 exasperate digress chaos permission and of the abstain emigrate torment The Negative Effect of “Test English” PROBLEM: Students NEED to learn the first 5000 words of English to use English in the real word… But entrance exams and high school textbooks force students to memorize hundreds of low-frequency words… RESULT? High school students can’t deal with real world English because they don’t know hundreds of the most important high frequency words… 84,168 42,024 25,537 23,371 14,641 5,000 chaos permission and of the 英語には焼く35万単語あるといわれている。前述の通り、5000語で十分。 大学受験は一つの目標に過ぎない。最終目標ではない。これからもっと長い人生が続く。 4,441 ace 2,566 bid HFW 2,289 sum 3 2 1

5 When reading or listening to a text, students will of course will not know many words…
What percentage of words do you think must be known for them to be able to read easily? 50% ? 75% ? 85% ? 95% ?

6 75% Coverage 1000 high frequency words
[ 19 missing words ] …another possible problem with _____ _____ is how to _____ learner _____ although research suggests that _____ are a very _____ way to learn new words (Leitner, 1972, Mondria, 1994, Nation, 1990, 2001), students may lose interest if _____ are the _____ _____ of doing _____ _____. There is a _____ _____ in the _____ classroom of using games with a _____ purpose to increase and _____ learner _____ (Ersoz , 2000, Uberman 1988, Wright, Betteridge & Buckby, 1984), as well as lower the learner _____ _____ (Asher, 1965, 1977, Dulay, Krashen & Burt, 1982)

7 85% Coverage 2000 high frequency words
[ 13 missing words ] …another possible problem with _____ _____ is how to _____ learner _____ although research suggests that _____ are a very efficient way to learn new words (Leitner, 1972, Mondria, 1994, Nation, 1990, 2001), students may lose interest if _____ are the _____ method of doing _____ _____. There is a rich tradition in the _____ classroom of using games with a communicative purpose to increase and maintain learner _____ (Ersoz , 2000, Uberman 1988, Wright, Betteridge & Buckby, 1984), as well as lower the learner _____ _____ (Asher, 1965, 1977, Dulay, Krashen & Burt, 1982)

8 95% Coverage 5000 high frequency words
[ 4 missing words ] …another possible problem with vocabulary _____ is how to sustain learner motivation although research suggests that _____ are a very efficient way to learn new words (Leitner, 1972, Mondria, 1994, Nation, 1990, 2001), students may lose interest if _____ are the sole method of doing vocabulary review. There is a rich tradition in the _____ classroom of using games with a communicative purpose to increase and maintain learner motivation (Ersoz , 2000, Uberman 1988, Wright, Betteridge & Buckby, 1984), as well as lower the learner affective filter (Asher, 1965, 1977, Dulay, Krashen & Burt, 1982)

9 Vocabulary Thresholds:
Below 80%, reading comprehension is almost impossible (Hu & Nation, 2001) 95% coverage is the point at which learners can read without the help of dictionaries (Laufer, 1989)

10 Goals of the NGSL Project…
to update and greatly expand the size of the corpus used (273 million words) compared to the limited corpus behind the original GSL (about 2.5 million words), with the hope of increasing the generalizability and validity of the list to create a NGSL of the most important high-frequency words useful for second language learners of English which gives the highest possible coverage of English texts with the fewest words possible. to make a NGSL that is based on a clearer definition of what constitutes a word to be a starting point for discussion among interested scholars and teachers around the world, with the goal of updating and revising the list based on this input (in much the same way that West did with the original Interim version of the GSL)

11 Original GSL in a nutshell…
West’s 1953 GSL was actually a more fully developed version of Faucett’s 1936 “Interim Report on Vocabulary Selection” (sponsored by the Carnegie Corporation) Contributors included many famous linguists such as Thorndike, Horn, Maki, Palmer and West Based on a 2.5 million word hand collected corpus (later increased to 5 million words) Combined objective (frequency) and subjective (teacher intuition) criteria Approximately 2200 words giving about 80% coverage in general texts No systematic attempt to define what a word was: “no attempt has been made to be rigidly consistent in the method used for displaying the words: each word has been treated as a separate problem, and the sole aim has been clearness” (West, 1953, page viii)

12 General Service Lists GSL (West, 1953) http://jbauman. com/aboutgsl

13 Academic Word List AWL (Coxhead 2000) http://www. victoria. ac

14 Getting AWL/GSL lists w/definitions & sound files…
I made a few GSL/AWL apps and have made all the context available for free to teachers and researchers. Please contact me if you need any of the following for the GSL or AWL: Word lists Parts of speech Definitions in easy English Definitions in Japanese Sound files for pronunciation of words

15 Original GSL created in 1930s… 2
Original GSL created in 1930s… 2.5m corpus may have had too many agriculture and religion texts? AGRICULTURE god Lord plow devil mill mercy NOT AS IN USE? spade bless telegraph cultivator fellowship chimney preach coal SEA TRAVEL sacred cottage sailor worship gaiety oar holy shilling vessel pray headdress merchant heaven saucer grace woolen RELIGION pupil amongst kingdom church 

16 Starting Point for NGSL…
Starting Point for NGSL…. Access to Cambridge’s more modern 2 BILLION word corpus CEC corpora used for preliminary analysis of NGSL Corpus Tokens Newspaper 748,391,436 Academic 260,904,352 Learner 38,219,480 Fiction 37,792,168 Journals 37,478,577 Magazines 37,329,846 Non-Fiction 35,443,408 Radio 28,882,717 Spoken 27,934,806 Documents 19,017,236 TV 11,515,296 Total 1,282,909,322

17 Problems… Newspaper subsection was too large and dominated the frequencies Newspaper subsection in CEC had too much of a bias towards financial terms Academic subcorpus of CEC not really related to needs of General English for 2nd language learners

18 Corpus Development & WYPIIWYGO….

19 Balancing the NGSL Corpus…
CEC corpora included in final analysis for NGSL Corpus Tokens Learner 38,219,480 Fiction 37,792,168 Journals 37,478,577 Magazines 37,329,846 Non-Fiction 35,443,408 Radio 28,882,717 Spoken 27,934,806 Documents 19,017,236 TV 11,515,296 Total 273,613,534* *273 million word subsection used is 100x larger than original GSL corpus…

20 Next steps… Removed proper nouns
Removed numbers, days of the week, months of the year, etc. Used statistical procedures to combine the frequencies from the various sub-corpora while adjusting for differences in their relative sizes Had meetings with Paul Nation to review list in relation to other frequency list and add/delete words deemed appropriate

21 Input from Paul Nation – Thanks!

22 Comparing the GSL and NGSL: Apples and Oranges?
Word Families or Lemmas?

23 Comparing the GSL and NGSL:
“To be or not to be, that is the question.” 10 Tokens to, to, be, be, or, not, that, is, the, question 8 Types to, be, or, not, that, is, the, question 7 Lemmas to, be, or, not, that, the, question

24 Comparing the GSL and NGSL:
“To be or not to be, that is the question.” Rank Word Tokens Coverage 1 be 3 30% 2 to 2 20% 3 not 1 10% 3 or 1 10% 3 question 1 10% 3 that 1 10% 3 the 1 10%

25 Comparing the GSL and NGSL:
The assumption in Word Families is that if the headword is known, so are all derived forms… ACCEPT ACCEPTABILITY ACCEPTABLE UNACCEPTABLE ACCEPTANCE ACCEPTED ACCEPTING ACCEPTS

26 Comparing the GSL and NGSL:
But are they?

27 Comparing the GSL and NGSL:
THE WORD FAMILY APPROACH (Bauer and Nation, 1993) Level 1 A different form is a different word. Capitalization is ignored. Level 2 Regularly inflected words are part of the same family. Level 3 (10 affixes) -able, -er, -ish, -less, -ly, -ness, -th, -y, non-, un-, all with restricted uses Level 4 (10 affixes) -al, -ation, -ess, -ful, -ism, -ist, -ity, -ize, -ment, in-, all with restricted uses.

28 Comparing the GSL and NGSL:
Level 5 (48 affixes) -age (leakage), -al (arrival), -an (American), -ance (clearance), -ant (consultant), -ary (revolutionary), -atory (confirmatory), -dom (kingdom: officialdom), -eer (black marketeer), -en (wooden), -en (widen), -ence (emergence, -ent(absorbent), -ery (bakery: trickery), -­ese (Japanese; officialese), -esque (picturesque, -ette (usherette; roomette), -hood (childhood), -i (Israeli), -ian (phonetician; Johnsonian), -ite (Paisleyite; also chemical meaning), -let (coverlet), -ling (ducking), -ly (leisurely), -most (topmost), -ory (contradictory), -ship (studentship), -ward (homeward), -ways (crossways), -wise (endwise; discussion-wise), anti- (anti-inflation), ante- (anteroom), arch- (archbishop), bi- (biplane), circum- (circumnavigate), counter- (counter-attack), en- (encage; enslave), ex- (ex-president), fore- (forename), hyper- (hyperactive), inter- (interweave), mid- (mid-week), mis- (misfit), neo- (neo-colonialism), post- (post-date), pro- (pro-British), semi- (semi-automatic), sub- (subclassify; subterranean).

29 Comparing the GSL and NGSL:
Level 6 (10 affixes) -able, -ee, -ic, -ify, -ion, -ist, -ition, -ive, -th, -y Level 7 Classical roots

30 Comparing the GSL and NGSL:
However, the GSL is not consistent in defining what to count as a word. “no attempt has been made to be rigidly consistent in the method used for displaying the words: each word has been treated as a separate problem, and the sole aim has been clearness” (West, 1953, page viii) To get some consistency, Bauman and Culligan (1995) grouped the original GSL headwords using Level 4 affixes. Then they ranked the words according to frequencies from the Brown Corpus. Subsequently, Nation released a word list with the program Range that grouped words up to Level 6 affixes, and also included numbers, days of the week, months of the year, and metric units of measurement.

31 Comparing the GSL and NGSL:
NGSL: A Modified Lexeme Approach All inflected forms for all parts of speech plus the plural of the gerund Includes both British & American spellings Examples accept: accepts, accepted, accepting, acceptings acceptable: acceptables paint: paints, painted, painting, paintings

32 Comparing the GSL and NGSL: Apples and Oranges no longer…
Leve;; 6 on range…. When both lists are lemmatized, the NGSL provides far more coverage with far fewer words, one of the chief goals of this project…

33 A Dedicated Website… www.newgeneralservicelist.org

34 List downloadable in many forms www.newgeneralservicelist.org
Headword list…

35 List downloadable in many forms www.newgeneralservicelist.org
Lemmatized list…

36 List downloadable in many forms www.newgeneralservicelist.org
List with definitions in easy English…

37 List downloadable in many forms www.newgeneralservicelist.org
List with raw data… (coming soon!) SFI = Standard Frequency Index is from Carroll’s 1971 “Statistical Analysis of the Corpus” SFI raw = Standard Frequency Index for the raw data SFI adj = Adjusted for dispersion (range) U = adjusted frequency to parts per million D = index of dispersion

38 Now available on free Quizlet Program… www.quizlet.com

39 Now available on free Quizlet Program… www.quizlet.com

40 Quizlet both intuitive and fun… www.quizlet.com

41 Quizlet both intuitive and fun… www.quizlet.com

42 Quizlet both intuitive and fun… www.quizlet.com

43 Quizlet both intuitive and fun… www.quizlet.com

44 Quizlet both intuitive and fun… www.quizlet.com

45 Quizlet both intuitive and fun… www.quizlet.com

46 Quizlet both intuitive and fun… www.quizlet.com

47 Quizlet both intuitive and fun… www.quizlet.com

48 Quizlet both intuitive and fun… www.quizlet.com

49 Quizlet both intuitive and fun… www.quizlet.com

50 Quizlet both intuitive and fun… www.quizlet.com

51 Soon to be available on WordEngine… www.wordengine.com

52 New Cambridge Text Series Using NGSL (both in text and online)
Screen Shot at PM

53 Links to NGSL Resources…

54 Free Graded Text Editor & Analysis Tool www.er-central.com/ogte/

55 Free Graded Text Editor & Analysis Tool www.er-central.com/ogte/

56 Free Text Helper Tool identifies/gets meanings/gives learning tools for words out of your level…

57 Text Helper in Action…

58 Text Helper in Action…

59 Text Helper in Action…

60 Text Helper in Action…

61 The New General Service List: Celebrating 60 years of Vocabulary Learning
much more to come… Thank you! Dr. Charles Browne Professor of Applied Linguistics Meiji Gakuin University, Tokyo


Download ppt "Dr. Charles Browne Professor of Applied Linguistics"

Similar presentations


Ads by Google