Presentation is loading. Please wait.

Presentation is loading. Please wait.

Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan.

Similar presentations


Presentation on theme: "Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan."— Presentation transcript:

1 Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan

2 I. Presentation A. Corpus linguistics and corpus-related resources B. Online resources for corpus linguistics 1. Types of resources 2. Examples of resources C. Using corpus-related resources for language teaching

3 II. Application A. Assigned tasks B. Free exploration

4 Presentation Definitions Corpus (Latin for “ body ” ) A text or collection of texts Now generally used to refer to machine- readable texts

5 Corpus linguistics the use of the empirical data from a corpus to study language usage and to find patterns of language usage by analyzing actual language use

6 Requirements A corpus Can be a single text or a large collection of texts Larger corpora provide more reliable results, if the purpose is making generalizations about language use

7 Balanced corpora A variety of genres, including academic writing, newspapers, fiction, and spoken language

8 Specialized corpora Examples Academic writing Texts by learners of English, sometimes with a specific native language Teachers can develop their own corpora Newspaper articles Learners ’ texts

9 Corpus analysis tool(s) Types Tools with specific corpora Tools that can be used with any text or collection of texts General Word, Excel, etc. Specialized Count words Find example of specific words or parts of speech Analyze word frequencies Evaluate readability

10 Online Corpora Free to all users Available for a fee or for purchase Available only to restricted users In this presentation, we will only introduce resources that are free.

11 Using Corpus Linguistics for Language Teaching Technology has become widespread and accessible Larger, more powerful computers that can analyze large amounts of data quickly are available Many corpus-related resources have become available Language teachers and learners can use corpora

12 Corpus-related Internet resources 1. General resources on corpus linguistics 2. Vocabulary frequency lists and frequency level checkers 3. Online corpora, concordancers and other text-analysis software 4. E-texts 5. Information about using corpus linguistics for language teaching

13 Resources for Corpus Linguistics http://www.cis.doshisha.ac.jp/kkitao/libra ry/resource/corpus/corpus.htm

14 1. General resources on corpus linguistics Web sites that help orient users to corpora and to what is available online for teachers to use in the classroom or in preparing material

15 The Compleat Lexical Tutor http://www.lextutor.ca/ Resources for data-driven learning, including concordancers for various corpora and in which one can enter texts Tutorials, resources of teachers, resources for research

16 Bookmarks for Corpus Linguists http://devoted.to/corpora/ extensive annotated list of links related to corpus linguistics, including software tools frequency lists papers and articles English and non-English corpora

17 2. Vocabulary frequency lists, frequency level checkers, and n-gram extractors Frequency lists Words used most frequently in English and thus words that are most useful for students to know Often divided into sublists

18 Specialized word lists Academic Word List http://www.nottingham.ac.uk/~alzsh3/acvocab/inde x.htm http://www.nottingham.ac.uk/~alzsh3/acvocab/inde x.htm List includes 570 headwords with their word families Site includes an explanation of the word lists, the words in each sublist, suggestions for using the list, and a gapmaker that can be used to produce gap- filling exercises

19 5000 Vocabulary List for Visiting Scholars in the USA http://www.paulnoll.com/Books/5000- Words/index.html http://www.paulnoll.com/Books/5000- Words/index.html This is a list of the 5000 Words determined by the Chinese Academy of Sciences for scholars that need to go abroad for research or advanced studies in the USA. They are listed in alphabetical order and have sample sentences and examples. There is an additional three thousand words.

20 Frequency-level checkers Produces a list of words at each level of difficulty Helps a teacher understand how difficult the vocabulary in the reading passage is and which words students at different levels of proficiency might need to learn N-gram finders Finds groups of n-words

21 JACET 8000 Word List http://www01.tcp-ip.or.jp/~shin/j8web/j8web.cgi On this web page, you can enter a text and get a list of the words that appear in the text at each of the eight levels of the JACET list. You also get statistics about what percentage of the words (both types and tokens) occur at each of the eight levels.

22 N-gram finders Online text analysis tool http://www.online-utility.org/text/analyzer.jsp Finds most frequent groups of 2 and 3 words, plus produces a list of all the words, their occurances, and their percentage

23 Advanced Search – Explore N-grams from the BNC http://pie.usna.edu/explore.html Produces lists of n-grams, based on the number of words and occurances you specify N-gram phrase extractor http://www.er.uqam.ca/nobel/r21270/cgi- bin/tuples/u_extract.html http://www.er.uqam.ca/nobel/r21270/cgi- bin/tuples/u_extract.html Produces KWIC list of n-grams

24 3. Online corpora, concordancers, and other text-analysis software Concordancers A type of software for searching corpora Produces a list of key words in context (KWIC), that is, search terms with the words that come before and after them. May be able to search for parts of speech, e.g., take, followed by a preposition May be able to search for two words that are not next to each other

25 Corpora (or parts of corpora) may have spoken language, written language, American English, British English, academic English, and so on. Specialized corpora include: parallel corpora, which have same texts in different languages (to compare same passages in different languages) learner corpora, which have students ’ writing/ speaking (to help identify learners ’ problems or to study characteristics of their writing)

26 Examples of concordancers Turbo Lingo http://www.staff.amu.edu.pl/~sipkadan/lin go.htm http://www.staff.amu.edu.pl/~sipkadan/lin go.htm Can enter a text or URL and get a list of KWIC, average sentence length, word frequency list, and other analyses

27 VIEW (Variation in English Words and Phrases) http://view.byu.edu/ Concordancing tool for the British National Corpus, the Corpus of Contemporary American English, and a Time magazine corpus, plus non-English corpora

28 A powerful concordancing tool Has a useful tutorial Click on what you want to do to see samples of searches For example, if you want to learn to use wildcards, click on that word, and you will see several examples. You choose the type of search you want to do, and the search is automatically filled in. You can revise it based on what you want to do.

29 Types of searches Search by exact word, exact phrase, wildcard, or part of speech For example, mysterious Use ? or * as a wildcard For example, * point * Search for an exact word plus a part of speech For example, white [n*]

30 Compare usage of semantically related words {sheer/total} [n*] Search for surrounding words Nouns that follow the verb “ wrap ” Limit the search to one register Adjectives in tabloid newspapers

31 Compare usage between registers, e.g., news and speaking we [verb] that: ACAD vs SPOKEN Find words with similar, more general, or more specific meanings Similar words to “ small ” More general than “ shriek ” More specific than “ woman ”

32 BNCweb To log in, go to: http://bncweb.lancs.ac.uk/bncwebSignup/ For information, go to: http://bncweb.info

33 On BNCweb, you can do simple searches, you can restrict your search to written or spoken texts or based on the type of text. Form your own subcorpora.

34 Make frequency lists based on criteria you specify For example, make a frequency list of all adverbs that end in –ly in spoken texts. Look at your query history and save queries to use again.

35 See your results in a sentence view or a KWIC view. Get a list of collocates, with statistics about their frequency. Get information about what type of texts the search term was found in.

36 Online concordancer http://www.lextutor.ca/concordancers/con cord_e.html http://www.lextutor.ca/concordancers/con cord_e.html Can search a variety of corpora, including the Brown Corpus, the British National Corpus (written and spoken), a learner corpus, etc. Produces a KWIC list for a given word and a list of collocates and their frequency

37 WebCorp http://www.webcorp.org.uk/ Uses the Internet as a corpus and produces KWIC as well as providing other information

38 Comparing two texts Text Lex Compare http://www.lextutor.ca/text_lex_compare/ Allows users to enter two texts and get lists of: Unique words to first text Shared words in two texts Unique words in second text Useful to help teacher find new words in new text

39 Specialized corpora (a few examples) Spoken English Corpus swb (American English telephone conversations) http://www.ldc.upenn.edu/cgi- bin/lol/swb/speechcorpus?&corpus=swb http://www.ldc.upenn.edu/cgi- bin/lol/swb/speechcorpus?&corpus=swb Technical English e-Xplore Technical English https://learn.sz.htwk-leipzig.de/wc/main.php

40 Parallel corpora CRATER Multilingual Aligned Annotated Corpus http://www.comp.lancs.ac.uk/linguistics/crater/c orpus.html http://www.comp.lancs.ac.uk/linguistics/crater/c orpus.html Academic English Michigan Corpus of American Spoken English http://quod.lib.umich.edu/m/micase/ Some large corpora also have sub-corpora of academic English

41 Online software to assess readability Tests of document readability and suggestions how to improve readability http://www.online- utility.org/english/readability_test_and_im prove.jsp http://www.online- utility.org/english/readability_test_and_im prove.jsp Can calculate texts of any length (some online text analysis programs have limits)

42 Can enter the text directly or enter a URL e.g., http://www.cis.doshisha.ac.jp/kkitao/Japan/shim oda/s1.htm http://www.cis.doshisha.ac.jp/kkitao/Japan/shim oda/s1.htm Provides statistics: Number of characters Number of words Number of sentences Number of syllables/word Number of words/sentence

43 Calculates readability indexes, including Gunning Fog Index Coleman-Liau Index Flesch Kinkaid Grade Level Flesch Reading Ease Lists sentences that might be rewritten to improve readability.

44 4. E-texts In some cases, teachers or students may want to develop their own corpora. There are large numbers of e-text available. Project Gutenberg http://www.gutenberg.org/wiki/Main_Page Large collection of downloadable fiction and non- fiction

45 Internet Public Library: Online Texts http://www.ipl.org/div/subject/browse/hum60.60.00/ A large number of online texts on a wide variety of subjects Drew ’ s Script-o-Rama http://www.script-o-rama.com/oldindex.shtml A website with a large number of scripts of movies and TV programs American Rhetoric Online Speech Bank http://www.americanrhetoric.com/speechbank.htm A website with a large collection of speeches

46 5. Information about using corpus linguistics for language teaching Corpus-related websites specifically for language teachers Learner corpora and SLA Research http://leo.meikai.ac.jp/%7Etono/ Links to learner corpora made up of language produced by speakers of various languages, links to useful tools, a bibliobraphy, and so on

47 Corpus linguistics: What it is and how it can be applied to teaching http://iteslj.org/Articles/Krieger-Corpus.html An article about corpus linguistics and how it can be used in the language classroom

48 Classroom Application Two types of uses of corpus-related resources “ Low contact ” uses – teacher uses resources to help in teaching, e.g., to find the difficult words in a reading passage; students do not actually see the corpus “ High contact ” uses – students use the corpora themselves to learn about language, e.g., to find out which adjectives collocate with “ rain ”

49 “ Data-driven learning ” is a high contact use of corpus-related resources. Using corpora to deduce rules of grammar or usage, e.g., to determine if a word ’ s connotation is positive or negative Advantages of data-driven learning Focus on authentic language Encouragement of students to deduce Real, exploratory activities rather than drills A learner-centered activity

50 Web sites with suggestions for data- driven learning activities How to use concordances in teaching English: Some suggestions http://www.nsknet.or.jp/%7Epeterr- s/concordancing/usingconcs.html http://www.nsknet.or.jp/%7Epeterr- s/concordancing/usingconcs.html

51 Data-Driven Learning (DDL): the idea http://www.ecml.at/projects/voll/rationale_and _help/booklets/resources/menu_booklet_ddl.ht m http://www.ecml.at/projects/voll/rationale_and _help/booklets/resources/menu_booklet_ddl.ht m An explanation of DDL, with examples

52 Activities Use a corpus to check grammar http://www.lextutor.ca/grammar_tester/ Use the concordancer in the bottom frame to check the grammar of the sample sentences in the top half

53 Use a concordancer to make a gap-filler or a quiz http://www.lextutor.ca/multi_conc/ http://www.nottingham.ac.uk/~alzsh3/acv ocab/awlgapmaker.htm http://www.nottingham.ac.uk/~alzsh3/acv ocab/awlgapmaker.htm

54 Find examples of a word and group them according to meaning Examples (http://www.lextutor.ca/concordancers/con cord_e.html)http://www.lextutor.ca/concordancers/con cord_e.html party run

55 Use the results of a KWIC search to determine how synonyms are used differently Examples http://www.lextutor.ca/concordancers/con cord_e.html http://www.lextutor.ca/concordancers/con cord_e.html travel, journey, trip, voyage, tour confident, fearless, pushy, upbeat, self-reliant

56 Use the academic word list web page and enter a text and make a gap-filling activity http://www.nottingham.ac.uk/~alzsh3/acv ocab/awlgapmaker.htm http://www.nottingham.ac.uk/~alzsh3/acv ocab/awlgapmaker.htm

57 Resources for Corpus Linguistics http://www.cis.doshisha.ac.jp/kkita o/library/resource/corpus/corpus. htm Thank you


Download ppt "Resources for Using Corpus Linguistics in ELT Kenji Kitao Doshisha University Kyoto, Japan S. Kathleen Kitao Doshisha Women ’ s College Kyoto, Japan."

Similar presentations


Ads by Google