Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building and analysing your own corpus 1. Building a corpus.

Similar presentations


Presentation on theme: "Building and analysing your own corpus 1. Building a corpus."— Presentation transcript:

1 Building and analysing your own corpus 1. Building a corpus.

2 Why bother with corpora? “Language users cannot accurately report language usage, even their own” (Sinclair, 1987) “Using a language is a skill that most people are not conscious of; they cannot examine it in detail, but simply use it to communicate” (Sinclair 1995) “There are many facts about language that cannot be discovered by just thinking about it, or even reading and listening very intently” (Sinclair, 1995) As language teachers and professionals, we often have strong intuitions about language use… Corpus- based research, however, shows us that our intuitions are often completely wrong. (Biber 2005)

3 There are many free online corpora like COCA or COHA, but you could also build your own corpus.

4 1. Building a corpus. You can collect data from a variety of sources, but the most important thing to remember is that you need to save it in plain text (.txt) format. It also needs to be fairly big to make the corpus analysis worthwhile (I would recommend at least 100,000 tokens).

5 White house briefings Transcripts of the press conferences http://www.whitehouse.gov/briefing- room/press-briefings http://www.whitehouse.gov/briefing- room/press-briefings

6 The Brown family Part of the Brown family of corpora (which includes Brown, Frown, LOB, FLOB and BE06) http://www.Helsinki.fi/varieng/CoRD/corpora /index.html http://www.Helsinki.fi/varieng/CoRD/corpora /index.html

7 International Corpus of English ICE Twenty four research teams preparing electronic corpora of their own national or regional variety of English for comparative purposes (e.g Indian English/ Australian English/South African English) http://ice-corpora.net/ice/index.htm

8 Corpora galore Learner corpora Courtroom discourse Academic English Specialised small corpora: RIP Sex education

9 UK parliamentary discourse http://www.publications.parliament.uk/pa/cm /cmhansrd.htm http://www.publications.parliament.uk/pa/cm /cmhansrd.htm Select committees: http://www.parliament.uk/business/committe es/committees-a-z/commons-select/ http://www.parliament.uk/business/committe es/committees-a-z/commons-select/

10 Where you collect your data from will depend on the type of corpus that you need to create, but the principles remain the same.

11 Downloading a newspaper 1. Go to a database like Lexis Nexis/ Westlaw 2. Check which newspapers are available (be careful sometimes they lie - Westlaw claims to have Corriere but actually just has the articles that have been translated into English)

12 Downloading a paper 3. Choose the newspaper that you want 4. Use the name of the newspaper as the search term

13 Downloading a paper 5. If there is an option to remove duplicates - select it 6. Choose one day only for the date range

14 Downloading a paper 7. Download that day's articles and save as txt. To download the articles, click on the save icon on the right of the screen which will open another window. Make sure that you download all the articles in text format.

15 Downloading a paper 8. If there are more than 500 articles for one day then you will have to download them as 1-500 and the 501-1000 (or whatever the maximum is). Click on the link to open and save your new file 9. Remember to save the file with a sensible name that includes the paper and date e.g. GUA20121015 (for the Guardian from 15 Oct 2012)

16 c. Building a corpus of fiction language from Project Gutenburg Project Gutenberg contains about 30,000 books which are no longer bound by copyright restrictions. This could be very useful if you wanted to look at different time periods, or different genres e.g. children’s writing. Go to http://www.gutenberg.org

17 Building a corpus of fiction language from Project Gutenburg Think of a book you would like to download, for instance The Princess and the Goblin. Type the book that you want into the search box on the left. Scroll down the page to select a text only format and click to open. The text file will open within your browser.

18 Building a corpus of fiction language from Project Gutenburg Copy and paste into either Wordpad (look under ‘programs’ then ‘accessories’) or a Word document. Remember to save as.txt There is a large introductory section at the beginning of the file which could skew your results. In order to tell AntConc to ignore this you will have to enclose it in angle brackets

19 Building a corpus of fiction language from Project Gutenburg Save your document as text with a sensible name eg ‘PrincessGoblin’ and make sure it is saved somewhere that you can find it easily

20 Class task Academic English: research papers introduction. Go to the CL 2015 abstract book Copy the introduction paragraph and save in text format You need to decide how to label them Collect as many as you can

21 Introduction to research What phraseologies can you discover from a corpus of the indtroduction to research papers? You can use this to help you write your own abstract for your project.


Download ppt "Building and analysing your own corpus 1. Building a corpus."

Similar presentations


Ads by Google