Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Charles Browne Professor of Applied Linguistics Meiji Gakuin University, Tokyo

Similar presentations


Presentation on theme: "Dr. Charles Browne Professor of Applied Linguistics Meiji Gakuin University, Tokyo"— Presentation transcript:

1 Dr. Charles Browne Professor of Applied Linguistics Meiji Gakuin University, Tokyo browne@ltr.meijigakuin.ac.jp

2 A few current Corpus Projects… 1. Business English Word List for NHK TV Show in Japan 2. EnglishCentral (a HUGE video corpus of authentic English) 3. New General Service List (CEC) 4. New Academic Word List (CEC) 5. TOEIC Vocabulary Study List (using past tests materials)

3 A few of my many online vocabulary learning projects…

4 4 1 2 3 Frequency 600,000 5,000 EFL Vocabulary Learning in Japan… chaos permission and of the exasperate digress chaos permission and of the abstain emigrate torment The Negative Effect of Test English PROBLEM: Students NEED to learn the first 5000 words of English to use English in the real word… But entrance exams and high school textbooks force students to memorize hundreds of low-frequency words… RESULT? High school students cant deal with real world English because they dont know hundreds of the most important high frequency words… sum bid ace HFW 2,289 2,566 4,441 14,641 23,371 25,537 42,024 84,168

5 When reading or listening to a text, students will of course will not know many words… What percentage of words do you think must be known for them to be able to read easily? 50% ? 75% ? 85% ? 95% ?

6 75% Coverage 1000 high frequency words …another possible problem with _____ _____ is how to _____ learner _____ although research suggests that _____ are a very _____ way to learn new words (Leitner, 1972, Mondria, 1994, Nation, 1990, 2001), students may lose interest if _____ are the _____ _____ of doing _____ _____. There is a _____ _____ in the _____ classroom of using games with a _____ purpose to increase and _____ learner _____ (Ersoz, 2000, Uberman 1988, Wright, Betteridge & Buckby, 1984), as well as lower the learner _____ _____ (Asher, 1965, 1977, Dulay, Krashen & Burt, 1982) [ 19 missing words ]

7 85% Coverage 2000 high frequency words …another possible problem with _____ _____ is how to _____ learner _____ although research suggests that _____ are a very efficient way to learn new words (Leitner, 1972, Mondria, 1994, Nation, 1990, 2001), students may lose interest if _____ are the _____ method of doing _____ _____. There is a rich tradition in the _____ classroom of using games with a communicative purpose to increase and maintain learner _____ (Ersoz, 2000, Uberman 1988, Wright, Betteridge & Buckby, 1984), as well as lower the learner _____ _____ (Asher, 1965, 1977, Dulay, Krashen & Burt, 1982) [ 13 missing words ]

8 95% Coverage 5000 high frequency words …another possible problem with vocabulary _____ is how to sustain learner motivation although research suggests that _____ are a very efficient way to learn new words (Leitner, 1972, Mondria, 1994, Nation, 1990, 2001), students may lose interest if _____ are the sole method of doing vocabulary review. There is a rich tradition in the _____ classroom of using games with a communicative purpose to increase and maintain learner motivation (Ersoz, 2000, Uberman 1988, Wright, Betteridge & Buckby, 1984), as well as lower the learner affective filter (Asher, 1965, 1977, Dulay, Krashen & Burt, 1982) [ 4 missing words ]

9 Vocabulary Thresholds: Below 80%, reading comprehension is almost impossible (Hu & Nation, 2001) 95% coverage is the point at which learners can read without the help of dictionaries (Laufer, 1989)

10 Goals of the NGSL Project… 1. to update and greatly expand the size of the corpus used (273 million words) compared to the limited corpus behind the original GSL (about 2.5 million words), with the hope of increasing the generalizability and validity of the list 2. to create a NGSL of the most important high-frequency words useful for second language learners of English which gives the highest possible coverage of English texts with the fewest words possible. 3. to make a NGSL that is based on a clearer definition of what constitutes a word 4. to be a starting point for discussion among interested scholars and teachers around the world, with the goal of updating and revising the list based on this input (in much the same way that West did with the original Interim version of the GSL)

11 Original GSL in a nutshell… Wests 1953 GSL was actually a more fully developed version of Faucetts 1936 Interim Report on Vocabulary Selection (sponsored by the Carnegie Corporation) Contributors included many famous linguists such as Thorndike, Horn, Maki, Palmer and West Based on a 2.5 million word hand collected corpus (later increased to 5 million words) Combined objective (frequency) and subjective (teacher intuition) criteria Approximately 2200 words giving about 80% coverage in general texts No systematic attempt to define what a word was: no attempt has been made to be rigidly consistent in the method used for displaying the words: each word has been treated as a separate problem, and the sole aim has been clearness (West, 1953, page viii)

12 General Service Lists GSL (West, 1953) http://jbauman.com/aboutgsl.html#1953

13 Academic Word List AWL (Coxhead 2000) http://www.victoria.ac.nz/lals/resources/academicwordlist/

14 I made a few GSL/AWL apps and have made all the context available for free to teachers and researchers. Please contact me if you need any of the following for the GSL or AWL: -Word lists -Parts of speech -Definitions in easy English -Definitions in Japanese -Sound files for pronunciation of words browne@ltr.meijigakuin.ac.jp Getting AWL/GSL lists w/definitions & sound files…

15 Original GSL created in 1930s… 2.5m corpus may have had too many agriculture and religion texts? AGRICULTURE plow mill spade cultivator SEA TRAVEL sailor oar vessel merchant RELIGION kingdom god devil mercy bless fellowship preach sacred worship holy pray heaven grace pupil church Lord NOT AS IN USE? telegraph chimney coal cottage gaiety shilling headdress saucer woolen amongst

16 Starting Point for NGSL…. Access to Cambridges more modern 2 BILLION word corpus CEC corpora used for preliminary analysis of NGSL CorpusTokens Newspaper 748,391,436 Academic 260,904,352 Learner 38,219,480 Fiction 37,792,168 Journals 37,478,577 Magazines 37,329,846 Non-Fiction 35,443,408 Radio 28,882,717 Spoken 27,934,806 Documents 19,017,236 TV 11,515,296 Total 1,282,909,322

17 Problems… Newspaper subsection was too large and dominated the frequencies Newspaper subsection in CEC had too much of a bias towards financial terms Academic subcorpus of CEC not really related to needs of General English for 2 nd language learners

18 Corpus Development & WYPIIWYGO….

19 Balancing the NGSL Corpus… CEC corpora included in final analysis for NGSL CorpusTokens Learner 38,219,480 Fiction 37,792,168 Journals 37,478,577 Magazines 37,329,846 Non-Fiction 35,443,408 Radio 28,882,717 Spoken 27,934,806 Documents 19,017,236 TV 11,515,296 Total 273,613,534* *273 million word subsection used is 100x larger than original GSL corpus…

20 Next steps… Removed proper nouns Removed numbers, days of the week, months of the year, etc. Used statistical procedures to combine the frequencies from the various sub- corpora while adjusting for differences in their relative sizes Had meetings with Paul Nation to review list in relation to other frequency list and add/delete words deemed appropriate

21 Input from Paul Nation – Thanks!

22 Comparing the GSL and NGSL: Apples and Oranges?

23 Comparing the GSL and NGSL: 10 Tokens to, to, be, be, or, not, that, is, the, question 8 Types to, be, or, not, that, is, the, question 7 Lemmas to, be, or, not, that, the, question To be or not to be, that is the question.

24 RankWordTokensCoverage 1be330% 2to220% 3not110% 3or110% 3question110% 3that110% 3the110% Comparing the GSL and NGSL:

25 The assumption in Word Families is that if the headword is known, so are all derived forms… ACCEPT ACCEPTABILITY ACCEPTABLE UNACCEPTABLE ACCEPTANCE ACCEPTED ACCEPTING ACCEPTS Comparing the GSL and NGSL:

26 But are they? Comparing the GSL and NGSL:

27 THE WORD FAMILY APPROACH (Bauer and Nation, 1993) Level 1 A different form is a different word. Capitalization is ignored. Level 2 Regularly inflected words are part of the same family. Level 3 (10 affixes) -able, -er, -ish, -less, -ly, -ness, -th, -y, non-, un-, all with restricted uses Level 4 (10 affixes) -al, -ation, -ess, -ful, -ism, -ist, -ity, -ize, -ment, in-, all with restricted uses. Comparing the GSL and NGSL:

28 Level 5 (48 affixes) -age (leakage), -al (arrival), -an (American), -ance (clearance), -ant (consultant), -ary (revolutionary), -atory (confirmatory), -dom (kingdom: officialdom), -eer (black marketeer), -en (wooden), -en (widen), -ence (emergence, -ent(absorbent), -ery (bakery: trickery), -­ese (Japanese; officialese), -esque (picturesque, -ette (usherette; roomette), -hood (childhood), -i (Israeli), -ian (phonetician; Johnsonian), -ite (Paisleyite; also chemical meaning), -let (coverlet), -ling (ducking), -ly (leisurely), -most (topmost), -ory (contradictory), -ship (studentship), -ward (homeward), - ways (crossways), -wise (endwise; discussion-wise), anti- (anti-inflation), ante- (anteroom), arch- (archbishop), bi- (biplane), circum- (circumnavigate), counter- (counter-attack), en- (encage; enslave), ex- (ex- president), fore- (forename), hyper- (hyperactive), inter- (interweave), mid- (mid-week), mis- (misfit), neo- (neo-colonialism), post- (post-date), pro- (pro-British), semi- (semi-automatic), sub- (subclassify; subterranean). Comparing the GSL and NGSL:

29 Level 6 (10 affixes) -able, -ee, -ic, -ify, -ion, -ist, -ition, -ive, -th, -y Level 7 Classical roots Comparing the GSL and NGSL:

30 However, the GSL is not consistent in defining what to count as a word. no attempt has been made to be rigidly consistent in the method used for displaying the words: each word has been treated as a separate problem, and the sole aim has been clearness (West, 1953, page viii) To get some consistency, Bauman and Culligan (1995) grouped the original GSL headwords using Level 4 affixes. Then they ranked the words according to frequencies from the Brown Corpus. Subsequently, Nation released a word list with the program Range that grouped words up to Level 6 affixes, and also included numbers, days of the week, months of the year, and metric units of measurement. Comparing the GSL and NGSL:

31 All inflected forms for all parts of speech plus the plural of the gerund Includes both British & American spellings Examples –accept: accepts, accepted, accepting, acceptings –acceptable: acceptables –paint: paints, painted, painting, paintings NGSL: A Modified Lexeme Approach Comparing the GSL and NGSL:

32 Comparing the GSL and NGSL: Apples and Oranges no longer… When both lists are lemmatized, the NGSL provides far more coverage with far fewer words, one of the chief goals of this project…

33 A Dedicated Website… www.newgeneralservicelist.org

34 List downloadable in many forms www.newgeneralservicelist.org Headword list…

35 List downloadable in many forms www.newgeneralservicelist.org Lemmatized list…

36 List downloadable in many forms www.newgeneralservicelist.org List with definitions in easy English…

37 List downloadable in many forms www.newgeneralservicelist.org List with raw data… (coming soon!)

38 Now available on free Quizlet Program… www.quizlet.com

39

40 Quizlet both intuitive and fun… www.quizlet.com

41

42

43

44

45

46

47

48

49

50

51 Soon to be available on WordEngine… www.wordengine.com

52 New Cambridge Text Series Using NGSL (both in text and online) Screen Shot 2013-10-09 at 3.34.00 PM

53 Links to NGSL Resources…

54 Free Graded Text Editor & Analysis Tool www.er-central.com/ogte/

55

56 Free Text Helper Tool identifies/gets meanings/gives learning tools for words out of your level…

57 Text Helper in Action…

58

59

60

61 Dr. Charles Browne Professor of Applied Linguistics Meiji Gakuin University, Tokyo browne@ltr.meijigakuin.ac.jp much more to come… Thank you!


Download ppt "Dr. Charles Browne Professor of Applied Linguistics Meiji Gakuin University, Tokyo"

Similar presentations


Ads by Google