Download presentation
Presentation is loading. Please wait.
1
Intro to corpus linguistics
Corpora and language variation John Corbett and Wendy Anderson
2
Today’s session Variation within a standard language (register, genre & style) Studying language variety (the SCOTS corpus) Main sources: Wendy Anderson and John Corbett (2017) Exploring English with Online Corpora 2nd edn. London: Palgrave Douglas Biber and Susan Conrad Register, Genre and Style (2009) Cambridge: CUP Tony McEnery, Richard Xiao and Yukio Tono Corpus-Based Language Studies (2006) London: Routledge
3
Standard and non-standard varieties
Most large-scale corpora are based on ‘standard’ English and intended for use in compiling dictionaries, grammars, etc. More recently, corpora of different ‘non-standard’ varieties (SCOTS, local Irish English, learner corpora and ELF corpora) have been developed – usually smaller and often incorporating spoken data. This session looks at ways of using both corpora of standard varieties and non-standard varieties to explore language variation.
4
Variation in standard English
Variation within standard English is often described in terms of Register (variation according to situation) Genre (variation according to communicative purpose) Style (systematic aesthetic preferences)
5
Comparing register, genre & style
Defining characteristic Register Genre Style Textual focus Sample of text excerpts Whole texts Linguistic features Any lexicogrammatical feature Specialized expressions, rhetorical organization Distribution of linguistic features Frequent and pervasive in texts from the variety Seldom occurring; specific to certain points in the text Interpretation Features serve important communicative functions Features serve to organise the genre according to conventions Features are not functional but preferred for aesthetic reasons
6
Register variation in COCA
Occurrences of the ‘get passive’ in different registers
7
Geographical variation: SCOTS corpus
AHRC funded Data gathered opportunistically Freely available online & downloadable 4 million words (c. 1, speech/ written); all post-1945 Searchable via word/phrase search, concordancing, Google maps, ‘collocate clouds’
8
SCOTS corpus Range of written genres includes
Formal (Scottish Parliament, academic articles) Informal (personal letters, diaries) Literary (in English and Scots)
9
SCOTS corpus Range of spoken genres includes
Formal (lectures, talks, interviews) Informal (conversation, spontaneous caregiver-child interaction) Examples across English and Scots continuum
10
Beginning to look at SCOTS…
Review the literature for suggested features Check the SCOTS corpus for instances Compare across corpora Lexis (dialectal vocabulary) Grammar Noun morphology Syntax: Determiner + Noun
11
Aitken’s Scots-English continuum
Scots Common Core English 1 2 3 4 5 bairn mair before more child lass stane stone girl kirk hame name home church chaft dee see die jaw gowpen heid tie head double handful ken hoose tide house know bide loose(n) young louse (n) remain kenspeckle louse (adj) winter loose (adj) conspicuous low yaize (v) of use (v) flame cowp yis (n) is use (n) capsize shauchle auld some old shuffle whae’s aucht that? truith why truth whose is that? pit the haims on barra he barrow do in tummle the wulkies they turn somersaults no (adv) * not (adv) -na (adv) † -n’t (adv) * (Most of the inflectional system, word order,grammar) † (Pronunciation system and rules of realisation)
12
Exploring lexis F643: //Aye, mind they// took it up tae Aberdeen and we gave her ten pound tae buy flowers for the bairn; a wreath and that. And the lassie came back and thanked us hersel, M608: Aye. F643: later on about that. F643: sh- we used tae take the kids tae her and then I came through here and cleaned aw this place, so I widnae bring the kids, ye see. So, I cleaned aw this place. //Until the kids.// F643: //She was feedin the baby in bed and she// must’ve slept on it, ye see. M608: mm F643: So, and eh, she had three other lovely children.
14
Grammar – noun morphology
F1091: //Now,// //you goin to tell me what is this.// M1092: //[child noises]// //Nose.// F1091: //What’s that? Aye.// And fitt’s this? It’s yer e-? //Come here.// M1092: //Eeks.// Eeks. F1091: It’s yer, is it yer een? M1092: My eeks. Eeks, eeks. F1091: It’s nae, it’s yer een.
15
Grammar – numeral + noun
M642: He says, ‘Right, I’ll need twa hundred pound for it.’ …………………………………………………………. M642: //n- naw! Efter aboot// twa year he says, ‘I’m fed up o youse comin up here every week.’ ………………………………………………………….. M642: Now, I actually built twa hoose. BUT: M642: But see they prefabs, John, over there? //Ye put they// M608: //Aye.// M642: prefabs on a flat roof in a run o three inches. M608: uh-huh M642: And I pit, I says, ‘Right, I’ll put twa layers o felt on it.’
16
Grammar – weak & strong verbs
F961: Er, a whole block o dem, erm, Ian [CENSORED: surname], //you might mind, he// F960: //Yes, I kennt him.// F961: built dem aa fae de Sandsoond estate. //And dan he// F960: //Oh.// F961: sellt dem aa, he did up some o dem, and sellt dem, and he sellt aff idder eens for idder fock ta do up. Scots has regular -(i)t inflexion for simple past: Scots English kennt knew sellt sold
17
Grammar – reduced verb paradigms
F1114: What’s this? What’s this Mum? F1113: Sorry? F1114: What’s this? F1113: It’s Play-Doh went hard. It’s went all hard. Got to put it back into its tubby. Or it’ll go hard. The 3-part irregular verb go/went/has gone is reduced to go/went/has went, by analogy with 2-part regular verbs like walk/walked/has walked.
18
Discourse markers ‘See’ as a topic-marker; ‘aye’ as a response token: M642: //See your// hoose, John. We’ll go on tae that. See your hoose? //Your hoose was eh// M608: //Aye.// F643: The forester’s house.
19
Take-home messages Corpora have re-activated interest in register variation within standard languages, since we can now do statistically robust analyses of the lexicogrammar of situational varieties. Variationist corpora (which include standard and non-standard varieties) act as a resource for ongoing description of ‘geographical and social varieties’ and their continuing evolution with respect to written standard languages and world Englishes. Untagged variationist corpora like SCOTS can be searched for illustrative uses; however, much more systematic data collection, tagging and searching needs to be done to come up with a clearer picture of language variety within a geographical space like Scotland, which itself is full of internal variation.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.