Presentation on theme: "Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014."— Presentation transcript:
Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014
In this lecture... Stylistics and style Combining stylistics + corpus linguistics Examples of studies combining corpus linguistics and stylistics – Analysis of genres – Analysis of the works by particular authors – Analysis of individual texts – Analysis of variation inside texts Corpus Tools – WMatrix
Stylistics Stylistics is the study of literature using methods, theories and concepts from linguistics (Leech and Short 2007: 1) it is "[...] the study of the relationship between linguistic form and literary function [...]” (Leech and Short 2007: 3).
Linguistic style ‘Style is a way in which language is used’ (Leech and Short 2007: 31) ‘[S]tyle consists in choices made from the repertoire of the language.’ (Leech and Short 2007: 31)
Linguistic style ‘Stylistic choice is limited to those aspects of linguistic choice which concern alternative ways of rendering the same subject matter’ (Leech and Short 2007: 31) e.g. horse vs. steed but not horse vs. dog
Linguistic style Style and genre, e.g. science fiction, romance novels, etc. Style and author Style and text Style and parts of texts (e.g. the narration or speech of different characters)
Ways of analysing style Analyst’s intuitions ‘Manual’ comparative analysis
Ways of analysing style Style and comparison ‘Even if style is defined as that variety of language which correlates with context, the recognition and analysis of styles are squarely based on comparison. The essence of variation, and thus of style, is difference, and differences cannot be analysed and described without comparison.’ (Enkvist 1973: 21)
Ways of analysing style Comparative analysis – manually – OK for shorter texts/extract Comparative analysis – using computers: – Corpus linguistic methods/tools – Especially useful for longer texts – prose fiction
Combining corpus linguistics and stylistics The ‘corpus turn’ (Leech and Short 2007:284). On-going trend in stylistics to use methods and tools from corpus-linguistics for the analysis of literary and other texts. Usually referred to as corpus stylistics Other terms: digital stylistics (Louw 2008) electronic text analysis (Adolphs 2006)
Examples of studies Combining corpus linguistics and stylistics – Analysis of genres – Analysis of the works by particular authors – Analysis of individual texts – Analysis of variation inside texts
Genre style Biber (1988) – multivariate statistical techniques – factor analysis – many different variables – variables = linguistic features (e.g. passive constructions) e.g. narrative versus non-narrative texts – important variables = past tense verbs, 3 rd person pronouns, perfect aspect, present participle clauses – High scores = narrative – Low scores = non-narrative
A range not a dichotomy narrative / non-narrative the top text-types the bottom text types there exists a whole range of text-types in the middle – it’s not just a two-way distinction Note also –spoken and written genres are mixed together along the dimension
Genre style – direct speech Corpus-based study of speech, writing and thought presentation (Semino and Short 2004)
Genre style – direct speech Corpus of 260,000 (approx) words of (late) 20th century written British English 120 text samples 2,000 (approx) words each, amounting to a total of 258,348 words. It is divided into three sections:
Genre style – direct speech Corpus divided into three sections: – prose fiction (87,709 words), – newspaper news reports (83,603 words), and – biography and autobiography (87,036 words) Each genre section further divided into a ‘serious’ and a ‘popular’ sub-sections.
Genre style – direct speech Corpus tagged – manually The theme park’s manager, Mike Slattery said: ‘By closing Crinkley Bottom, the council has shot Morecambe in the foot. And I’m out of a job.’
Genre style – direct speech Section of the corpusNumber of instances of DS Whole corpus2,974 Fiction1,569 Press770 (Auto)biography635 Fiction sub-sectionNumber of instances of DS Serious629 Popular940
Authorial style Studies attempting to ‘fingerprint’ authors: i.e. to identify linguistic items that distinguish the works by one author from those of others. Burrows (1987): study of Jane Austen’s novels focusing on closed-class words, such as the, and, of, a and to. Burrows found that these words can distinguish the works of different authors, different novels, and even the words spoken by different characters.
Authorial style Hoover (2002) studied a series of corpora containing chunks from novels by different authors. For example, he looked at a corpus containing the first 30,000 words of 29 novels by 17 different authors. The distribution of the 300 most frequent words in the corpus as a whole correctly clusters 15 out of 17 novels.
Authorial style An analysis of the most frequent word sequences (n- grams) can also be useful, e.g. – of the – in the – to the – it was – he was – and the
Authorial style Mahlberg (2007, 2009, 2012) Corpus stylistics and Dickens’s fiction Also shows that analysis of frequent word sequences (clusters) can be useful. Clusters containing body parts – “his hands in his pockets” – “his head on one side” – “his hands upon his”
Text style Stubbs’s (2005) study of Joseph Conrad’s Heart of Darkness, first published in 1899. Marlow, the protagonist and first-person narrator, tells of how he was contracted to travel up a river in the Belgian Congo, in order to find an ivory trader called Kurtz, who was the subject of stories of madness and suspect practices. However, Kurtz dies while travelling back down the river.
Main themes – ‘hypocrisy of the colonizers’ – ‘unreliability of progress and civilization’ – ‘breakdowns in communication’ – Light vs. dark – Restraint vs. frenzy – Appearance vs. reality – Marlow’s ‘unreliable and distorted knowledge (Stubbs 2005: 8-9) Text style
Used WordSmith Tools (Scott 2007) Compared one novel with a corpus of fictional texts of around 700,000 words Overused words in novel include: seemed, mystery, darkness, absurd, horror, terror, desolation Several words concern uncertainty, perception and knowledge. Coincide with some of the novel’s themes
Text style Stubbs shows how the application of corpus methods can provide: – further justification for well-established interpretations, – new insights into the language and meaning potential of the text.
Text style: variation inside texts Culpeper (2002) used WordSmith Tools to do a key- word analysis of the speech of the main characters in Romeo and Juliet A file with the words spoken by each character was compared to a ‘reference corpus’ containing the words of all the other characters. Findings are relevant to an understanding of how the characters are linguistically constructed (characterisation).
Text style: variation inside texts Juliet’s key-words (raw frequencies in brackets): If (31), Or (25), Sweet (16), Be (59), News (9), My (92), Night (27), I (138), Would (20), Yet (18), Thou (71), Words (5), Name (11), Nurse (20), Tybalt’s (6), Send (7), Husband (7), That (82), Swear (5)
Text style: variation inside texts Key-words such as if, or, would, yet can be related to Juliet’s tendency to express uncertainty and anxiety throughout the play: ‘I fear it is: and yet, methinks, it should not, For he hath still been tried a holy man’ (IV.iii.) [Context: Wondering whether the Friar has supplied sleeping potion or poison]
Corpus tools Corpus tools make comparison relatively easy WordSmith Tools (Scott 2007) WMatrix (Rayson 2009) AntConc (Anthony 2011) MLCT (Piao)
Summary Style is the way in which language is used. The notion of ‘style’ is fundamentally based on comparison Corpus linguistic methods are relevant to the analysis of style in fiction/literature. They have been applied to the analysis of genres, authors and texts. Manual analysis and interpretation of the output from corpus tools is needed.
Summary [...] ‘corpus stylistics’ is not purely a quantitative study of literature. Rather, it is still a qualitative stylistic approach to the study of the language of literature, combined with or supported by corpus-based quantitative methods and technology. (Ho 2011:10)
References Culpeper, J. (2009) “Keyness: words, parts-of-speech and semantic categories in the character-talk of Shakespeare’s Romeo and Juliet” International Journal of Corpus Linguistics, 14(1): 29-59. Ho, Y. (2011) Corpus Stylistics in Principles and Practice: A Stylistic Exploration of John Fowles’ The Magus. London: Continuum Leech, G. (2008) Language in Literature: style and foregrounding Harlow, UK: Pearson Louw, B. (2008) "Consolidating Empirical method in data-assisted stylistics: Towards a corpus-attested glossary of literary terms" in Zyngier, S., Bortlussi, M., Chesnokova, A. and Auracher, J. Directions in Empirical Literary Studies, pp. 243-264. Amsterdam: Benjamins. Mahlberg M. (2007) “Clusters, Key Clusters and local textual functions in Dickens” Corpora 2(1): 1-31 Mahlberg, M. (2009) “Corpus Stylistics and the Pickwickian watering-pot”, in Contemporary Corpus Linguistics Baker, P. (ed.) Contemporary Corpus Linguistics, pp47-63. London: Continuum. Mahlberg, M. (2012) Corpus Stylistics and Dickens’s Fiction. London: Routledge McIntyre, D. (2010) “Dialogue and Characterization in Quentin Tarantino’s Reservoir Dogs: A Corpus Stylistic Analysis”, in McIntyre, M. and Busse, B. (eds.) Language and Style pp 162-182. Basingstoke: Palgrave. McIntyre, D. and Walker, B. (2010) 'How can corpora be used to explore the language of poetry and drama?' in McCarthy, M. and O’Keefe, A. (eds) The Routledge Handbook of Corpus Linguistics. London: Routledge Widdowson, H. G. (2008) “The Novel Features of Text. Corpus Analysis and Stylistics” in Gerbig, A. and Mason, O. (eds.)Language, People, Numbers: Corpus Stylistics and Society, pp. 293-304. Amsterdam: Rodopi.
Web-based corpus tool Developed by Paul Rayson at Lancaster University Automated grammatical and semantic analysis of texts/corpora A web-based front end for CLAWS and USAS
WMatrix Using a web interface: Texts are uploaded onto the Wmatrix server (at Lancaster) The upload procedure automatically adds (i) Grammatical or Part of Speech (POS) tags; (ii) Semantic tags
WMatrix CLAWS grammatical (POS) tagger. CLAWS = Constituent Likelihood Automatic Word- tagging System USAS semantic tagger USAS = UCREL Semantic Analysis System (UCREL = University Centre for Corpus Research on Language)
WMatrix USAS Assigns tags to each word using a hierarchical framework of categorization Based originally on McArthur’s (1981) Longman Lexicon of Contemporary English
The 21 Top Level Semantic Categories of the USAS Tag-set A GENERAL & ABSTRACT TERMS B THE BODY & THE INDIVIDUAL C ARTS & CRAFTS E EMOTION F FOOD & FARMING G GOVERNMENT & PUBLIC DOMAIN H ARCHITECTURE, HOUSING & THE HOME I MONEY & COMMERCE (IN INDUSTRY) K ENTERTAINMENT L LIFE & LIVING THINGS M MOVEMENT, LOCATION, TRAVEL, TRANSPORT N NUMBERS & MEASUREMENT O SUBSTANCES, MATERIALS, OBJECTS, EQUIPMENT P EDUCATION Q LANGUAGE & COMMUNICATION S SOCIAL ACTIONS, STATES & PROCESSES T TIME W WORLD & ENVIRONMENT X PSYCHOLOGICAL ACTIONS, STATES & PROCESSES Y SCIENCE & TECHNOLOGY Z NAMES & GRAMMAR
WMatrix G - Government and the public domain G1.1 G1.2 Government, politics and elections Crime, law and order War, defence and the army: weapons Government, etc. Politics G1 G2 G3
WMatrix Allows analysis of texts at : – the word level – the grammatical level (POS) – and the semantic level
WMatrix Allows text comparison at: – the word level – the grammatical level (POS) – and the semantic level