Presentation on theme: "Diachronic study and language change Corpus Linguistics Richard Xiao"— Presentation transcript:
Diachronic study and language change Corpus Linguistics Richard Xiao
Aims of this session Lecture – Corpora vs. diachronic study – The state of the art of corpus-based diachronic studies – Case study: recent change in English grammar Lab session – Using the Time corpus to explore full and bare infinitives in American English between the 1920s and the 2000s
Corpora vs. diachronic study The nature of diachronic study determines its reliance on empirical historical data Diachronic study is perhaps one of the few areas which can only be investigated using corpus data (cf. Bauer 2002: 109) – The intuitions of modern speakers have little to offer regarding the language used hundreds or even tens of years ago
Helsinki corpus and related books Three books based on the Helsinki corpus on the project English in transition: Change through variation – Early English in the Computer Age: Exploration through the Helsinki Corpus (Rissanen, Kytö and Palander-Collin 1993) – English in Transition: Corpus-based Studies in Linguistic Variation and Genre Styles (Rissanen, Kytö and Heikkonen 1997) – Grammaticalization at Work: Studies of Long-term Developments in English (Rissanen, Kytö and Heikkonen 1997)
Recent grammatical changes Work undertaken by teams led Geoff Leech (Lancaster) and Christian Mair (Freiburg) on the basis of the corpora of the Brown family (LOB vs. FLOB, and Brown vs. Frown) – Change in Contemporary English: A Grammatical Study (Leech, Hundt and Mair 2009) – Recent grammatical change in English: data, description, theory (Leech 2004) – Current changes in English syntax (Leech and Mair 2006) – Recent grammatical change in written English (Leech and Smith 2006) – Grammatical change in 20th century English (Mair 2006)
Historical pragmatics Arnovick (2000) examines the speech event of parting, focusing on the development of Goodbye, which was originally an explicit blessing God be with you – The end of the 17 th century and the beginning of the 18 th century marked a crucial period during which the blessing declined and the closing form Goodbye increased in frequency Jacobsson (2002) studies Thank you and Thanks in Early Modern English – They were probably the same in the Early Modern period as they are today as gratitude expressions, but they had not developed the discourse-marking features (e.g. as a closing sequence of conversation) of todays British English; nor is it possible to see the complex patterns of thanking in different turn-positions Biber (2004) explores, on the basis of the ARCHER corpus, the patterns of historical change in the preferred devices used to mark stance across the past three centuries
Recent change in English grammar Case study based on Leech (2004) – Recent grammatical change in English: data, description, theory, in K. Aijmer and B. Altenberg (eds) Advances in Corpus Linguistics. Amsterdam: Rodopi What are the major trends in grammatical change over the three intervening decades between ?
Data collection and tagging Spoke: 80,000 words from a comparable and balanced range of spoken genres
Modal auxiliaries LL score greater than 3.84 for p<0.05
A generation gap? BrE is following rather "reluctantly" in the wake of AmE?
Encroachment hypothesis The apparent decline in canonical modal usage (e.g. will, would, shall, should, may, might, ought, need) is due to the rise, in recent centuries, of the so-called semi- modals, such as be going to and have to, which are presumed to be still increasingly used – Are semi-modals gradually encroaching the territory of canonical modals?
Encroachment hypothesis No strong connection between the patterns shown by the modals and the semi-modals Semi-modals are much less frequent (in written English) than the modals, but changes in frequency show a mixed picture – Some of them seem to have increased their usage massively in the period (e.g. need to), but others have declined (e.g. be to) – Unexpectedly, however, the overall frequency of semi-modals is found to be greater in the BrE than in the AmE corpora in both periods
Frequencies of some semi-modals
Semi-modals in spoken BrE Trends in spoken English are similar to those in written English, but somewhat more exaggerated The general increase of semi-modals is even greater in spoken than in written English (+32.3% vs. 10% for BrE / 18.6% for AmE) – But only two of them have increased significantly
Modal auxiliaries: a summary In general terms, a clear decline of frequency in the use of canonical modal auxiliaries between 1961 and 1991 During this period, individual modals have been declining at different rates, but there is a tendency for very common modals to hold their own (e.g. will, can), and for infrequent modals (e.g. shall, ought to, need) to decline sharply – Some middle-ranking modals (e.g. may and must) have also declined sharply Alongside the decline of modals, there is no clear overall picture regarding semi-modals: although in general, semi- modal usage is increasing, some semi-modals are declining, and semi-modals as a whole are much less frequent than true modals
A bigger question… Do the decline in canonical modals (especially formal modals) and the general increase in semi-modals suggest that English is becoming more colloquial over the three intervening decades between 1961 and 1991?
Changes indicative of colloquialization Frequency per M words
Colloquialization hypothesis A decline in canonical modals, especially formal usages like shall, ought to and need An increasing frequency of phenomena associated with spoken language (e.g. progressive, contractions, zero-relative clauses) A decreasing frequency of phenomena associated with the written language (e.g. passive, pied-piping relative clauses) A tendency for the written British English gradually to acquire norms and characteristics associated with the spoken conversational English over the three decades in Leech, G. (2012) How grammar has been changing in recent English: Using comparable corpora to track linguistic change. 2012, Vol. 4 Issue (4): 13-20
Practical Using the Time corpus to explore full and bare infinitives in American English between the 1920s and the 2000s –
The Time Corpus The Time corpus (Davies 2007) – 100+ million words – span ( ) – Wide range of topics (news, sports, business, culture, health, entertainment, etc) – internal consistency – chronological gap
HELP V [help].[vv*] [v*i] Tip: select "Chart
HELP + PRON + V [help].[vv*] [p*] [v*i]
HELP + NOUN + V [help].[vv*] [n*] [v*i]
HELP + WORD + NOUN + V [help].[vv*] * [n*] [v*i] (* - any word, e.g. Det)
HELP + bare infinitives Combined frequency per million words (1960s s: stead rise)
HELP + to V [help].[vv*] to [v*i]
HELP + PRON + to V [help].[vv*] [p*] to [v*i]
HELP + NOUN + to V [help].[vv*] [n*] to [v*i]
HELP + WORD + NOUN + to V [help].[vv*] * [n*] to [v*i]
HELP + full infinitives Combined frequency per million words (1960s-1990s: a decline)