Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Corpus Resources in English Language Teaching Hilary Nesi Coventry University UK.

Similar presentations


Presentation on theme: "Using Corpus Resources in English Language Teaching Hilary Nesi Coventry University UK."— Presentation transcript:

1 Using Corpus Resources in English Language Teaching Hilary Nesi Coventry University UK

2 First, a bit of history behind the “corpus revolution”……..

3 Two simultaneous linguistic traditions Chomsky’s Syntactic Structures 1957 Firth’s A Synopsis of Linguistic Theory 1957

4 The American Linguistic Tradition A branch of cognitive psychology – a sort of biology Data can be derived intuitively and presented in isolated sentences The theory does not have social relevance The goal is to develop a system of abstract principles to account for everything that humans know about language ‘in advance of experience’

5 The British Linguistic Tradition A branch of social science Data must be studied in the context of whole texts - attested, authentic instances of use The theory helps solve practical problems of language and society The goal is to study meaning. Language is social interaction, and transmits culture.

6 Competence and Performance ‘Competence’ = knowledge of the language system ‘Performance’ = language that is actually produced

7 Performance contains lots of human errors For example, lecturers say things like: we'll local government will have had a dialogue with their communities and produced a s-,er agenda twent-, a Local Agenda twenty-one nineteen-ninety-six (BASE corpus sslct012)

8 And writers make mistakes with spelling etc… On the other hand, diphtheria toxin, which inhibits protein synthesis in eukaryotic ribosomes, has no affect on the protein synthesis (BAWE corpus 0006c)

9 So is our own competence a better guide to the language system? We can invent perfectly formed sentences, even though we all make slips, false starts, spelling mistakes etc. when we use the language…..

10 However.. There are lots of aspects of language use we can’t invent because we are not consciously aware of them…….

11 Fluent language users are usually quite good at identifying: whether a sentence is correctly structured whether a word is frequent or rare, formal or informal the collocations and connotations of words

12 But they are not very good at identifying: the relative frequency of different grammatical structures small differences in meaning between different structures whether a word is likely to occur in some sorts of contexts but not others

13 For example, Which is used more frequently, the simple or the progressive form of the present tense?

14 What is the difference in meaning between ‘take a job’ and ‘take on a job’?

15 Why does it sound odd to say: ‘I am deeply happy’?

16 Most people can’t answer these sort of questions intuitively That means that most teachers would not be able to provide this sort of information “off the cuff” It also means that the answers are not in reference books compiled by experts who rely on competence rather than performance

17 The old way of compiling dictionaries and grammar books….. Compilers relied on: their own intuition examples of unusual usage that they had noticed (usually while reading)

18 The new way of compiling dictionaries and grammar books….. Compilers look at: Thousands of ordinary examples of language use – i.e. they examine performance, rather than relying on competence

19 This is the “corpus revolution” The thousands of ordinary examples of language use can be found in a corpus Nowadays most new dictionaries, pedagogical grammars, descriptive grammars and textbooks are corpus- based

20

21 These books are based on corpora such as: The British National Corpus (100 million words) The Bank of English – 524 million words Started in 1991, but new data is acquired continuously from websites, newspapers etc.

22 A corpus is…. “a collection of pieces of language text in electronic form, selected according to external criteria to represent, as far as possible, a language or language variety as a source of data for linguistic research.” (Sinclair 2005) http://ahds.ac.uk/linguistic-corpora/

23 “text in electronic form” so that large amounts of text can be consulted very quickly, using concordancing software – it used to take years to count the number of times a word occurred in a book – now it only takes a second to find it in a thousand books.

24 Corpus analysis can reveal: the relative frequency of different grammatical structures – e.g. it goes is very common, it had been going is very rare

25 Corpus analysis can reveal: small differences in meaning between different structures - e.g. take a job and take on a job

26 take a job 1.to pay off, she cannot now take a job paying less than pounds 12,000 a 2.river. He is now leaving to take a job in Brussels as a European 3.a kitchen assistant before taking a job as a pizza delivery driver 18 4.x years. Three years ago I took a part-time job and have received my tax 5. eir boy to be a lawyer. He took a job with the Ministry of the Interior but 6. e neuroses.' At 16, Moore took a summer job working on the chassis line 7. moving to New York, she took a modelling job and, while doing an ad for 8. block any move for him to take another job in football." Little would see a

27 take on a job 1.Whitbread is strong. Why take on the job of scrapping excess 2.ays be people unwilling to take on the stressful job-loads most 3.A group of students could take on the job of compiling the electoral 4. teaching qualification to take on a demanding job from which you 5. oes not improve when he takes on the job of defending Boston's 6. pounds 200,000. Now he takes on an unpaid job for an organisat 7. He's fat, he's 53 and he's taking on a stress-loaded job. He may be 8.ated plants, while women took on the job of grain preparation.‘

28 What’s the difference? Take a job occurs in contexts which state what sort of job it is, it means to take employment. Take on a job means to assume responsibility for a task, paid or unpaid.

29 Corpus analysis can reveal: whether a word is likely to occur in some sorts of contexts but not others - i.e. what is wrong with ‘I am deeply happy’?

30 ‘I am deeply happy today’ i am deeply honored today Hume was deeply worried about the view we are so deeply indebted to we have always been deeply grateful for the that can be deeply offensive to people to express very deeply held feelings of vulnerability what accounts for stability in deeply divided societies? horrifying and reprehensible, but also deeply puzzling

31 ‘just like that’ “We’ll connect you to a network just like that” (advertisement for an internet service provider)

32 How to consult one of the large corpora for yourself… The British National Corpus Searchable at www.natcorp.ox.ac.uk and http://corpus.byu.edu/bncwww.natcorp.ox.ac.uk http://corpus.byu.edu/bnc The Bank of English Sample available at www.collins.co.uk/Corpus/CorpusSearch.aspx www.collins.co.uk/Corpus/CorpusSearch.aspx The Corpus of Contemporary American English (COCA) - 400+ million words. Searchable at http://www.americancorpus.orghttp://www.americancorpus.org

33 The BNC interface

34

35 Part of the COCA interface

36 You can also consult other online concordancers, e.g. http://springerexemplar.com/http://springerexemplar.com/

37 But… is it worth your while to consult a corpus? Advantages: A corpus can provide language information that is not available in reference books A corpus can provide lots of authentic examples of the way words and phrases are used in context – you can use these as a basis for teaching materials

38 Disadavantages Concordance lines can be difficult to interpret Because they are examples of performance – real rather than idealised language use - they may contain errors A corpus may not represent the kind of language your students need to produce……

39 A corpus is ‘selected …. to represent ….. a language or language variety’ BUT “no corpus, no matter how large, how carefully designed, can have exactly the same characteristics as the language itself” Any corpus will under-represent: some types of language user some types of text

40 “small” corpora Created to represent types of language that are inadequately represented in the very large corpora used for dictionaries and reference grammars

41 For example British Academic Written English (BAWE) – (6,506,995 words). http://www.coventry.ac.uk/bawehttp://www.coventry.ac.uk/bawe British Academic Spoken English (BASE) – (1,644,942 words) http://www.coventry.ac.uk/basehttp://www.coventry.ac.uk/base The Michigan Corpus of Academic Spoken English (MICASE) – (1,848,364 words) http://www.hti.umich.edu/m/micase/ http://www.hti.umich.edu/m/micase/ The Michigan Corpus of Upper-level Student Papers (MICUSP) - (roughly 2.6 million words) http://micusp.elicorpora.info/ http://micusp.elicorpora.info/

42

43

44 A query tool for BASE and BAWE

45

46 Corpus design What kind of (small) corpus would you like to create? What would it represent? How would you go about collecting representative data?

47 Some uses of corpora Word counts and word lists The production of dictionaries and grammars The study of idiom and collocation Diachronic studies The study and comparison of language varieties The study of language acquisition The production of learning materials etc….


Download ppt "Using Corpus Resources in English Language Teaching Hilary Nesi Coventry University UK."

Similar presentations


Ads by Google