Diachronic study and language change Corpus Linguistics Richard Xiao

Slides:



Advertisements
Similar presentations
Writing Fluency Action Plan Independent Project Betty Jackson, NBCT Kennedy High School.
Advertisements

Corpora in grammatical studies
Diachronic study and language change Corpus Linguistics Richard Xiao
Corpora in language variation studies
Using an enhanced MDA model in study of World Englishes
Variation and regularities in translation: insights from multiple translation corpora Sara Castagnoli (University of Bologna at Forlì – University of Pisa)
I Need Out Because He Wants In the House: The Subject Pronoun in need and want Phrasal Constructions 1 Gregory Paules & Dr. Erica J. Benson English Department,
Uses of a Corpus “[E]xplore actual patterns of language use”
Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
Introduction: A discourse perspective on grammar
What is VOICE? VOICE, the Vienna-Oxford International Corpus of English, is a structured collection of language data, the first computer-readable corpus.
Using an Enhanced MDA Model in study of World Englishes Richard Xiao
A Corpus-based Study of Discourse Features in Learners ’ Writing Development Yu-Hua Chen Lancaster University, UK.
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
The Subjunctive in Spoken British English ICAME, Lancaster, 28 th May Jo Close & Bas Aarts, UCL
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
Corpus 06 Discourse Characteristics. Reasons why discourse studies are not corpus-based: 1. Many discourse features cannot be identified automatically.
Stylistics and stylometry
The origins of language curriculum development
Input-Output Relations in Syntactic Development Reflected in Large Corpora Anat Ninio The Hebrew University, Jerusalem The 2009 Biennial Meeting of SRCD,
Grammatical change in present-day English: convergence and divergence in speech and writing Christian Mair, Freiburg.
1/23 LELA Lecture 2 Corpus-based research in Linguistics See esp. Meyer pp
Young Children Learn a Native English Anat Ninio The Hebrew University, Jerusalem 2010 Conference of Human Development, Fordham University, New York Background:
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English.
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Corpus Linguistics Case study 2 Grammatical studies based on morphemes or words. G Kennedy (1998) An introduction to corpus linguistics, London: Longman,
Chapter 3: An Introduction to Corpus Linguistics Compiled by: Sajjad Ghadamyari Farhad Ghiasvand Presentation Date: Dec. 8, Monday.
English Corpora and Language Learning Tamás Váradi
Memory Strategy – Using Mental Images
CORPUS LINGUISTICS: AN INTRODUCTION Susi Yuliawati, M.Hum. Universitas Padjadjaran
A Contrastive Study of English Native Speaker's and Chinese Learner's Use of Existential Construction Tian Ma.
English Language AS and A2. Which English to study?
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Prof. Karīna Aijmere ( Karin Aijmer ) Gēteborgas Universitāte, Zviedrija „Valodas apguvēju korpuss – tā veidošana un izmantošana valodu apguvē, mācību.
Data collection and experimentation. Why should we talk about data collection? It is a central part of most, if not all, aspects of current speech technology.
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
Representatıvness, balance and samplıng ın a corpus Lınguistıcs.
Researching language with computers Paul Thompson.
Historical linguistics Historical linguistics (also called diachronic linguistics) is the study of language change. Diachronic: The study of linguistic.
The Great Vowel Shift Continued The reasons behind this shift are something of a mystery, and linguists have been unable to account for why it took place.
Recent change in spoken English: the perfect construction Jill Bowie Survey of English Usage, UCL 27 October 2010
UCREL: from LOB to REVERE Paul Rayson. November 1999CSEG awayday Paul Rayson2 A brief history of UCREL In ten minutes, I will present a brief history.
Review - There are four common patterns of essay organization in English you should have learned: - There are four common patterns of essay organization.
Split infinitive You need to explain your viewpoint briefly (unsplit infinitive) You need to briefly explain your viewpoint (split infinitive) Because.
How Can Corpora Help Me To Be Successful in CO150?
EDTECH Module 7 Technology Survey by J.D. Winterhalter.
New Englishes. Global English  ‘[…] the English language ceased to be the sole possession of the English some time ago’ (Rushdie, 1991)  Loss of ownership.
Corpus search What are the most common words in English
Overview of Corpus Linguistics
Language Society and Culture. Social Dialects  Varieties of language used by groups defined according to :  - Class  - Education  - Occupation  -
Passive Generalizations Li, Charles N. & Thompson, Sandra A. (1981). Mandarin Chinese - A Functional Reference Grammar. Los Angeles: University of California.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
LANGUAGE, DIALECT, AND VARIETIES
LECTURE 3 1 APPROACHES TO THE STUDY OF LANGUAGE IN SOCIETY.
1 Grade 3-8 English Language Arts Results Student Growth Tracked Over Time: 2006 – 2009 Grade-by-grade testing began in The tests and data.
Variations in grammar.  In chapter 6 we look at variation in English and examine the function of variation and its characteristics in relation to Standard.
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
LING 306.  Last week – informalisation  Ongoing change in spoken English  This week – written English  Is written English being “informalised” as.
CORPUS LINGUISTICS 1) A revision of corpus linguistics 2) Language corpora in the ESL/EFL classroom.
Applied Linguistics Applied Linguistics means
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
Corpus Linguistics Anca Dinu February, 2017.
Syntax 1 Introduction.
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Corpus-Based ELT CEL Symposium Creating Learning Designers
Introduction to Linguistics
The Nature of Learner Language
Presentation transcript:

Diachronic study and language change Corpus Linguistics Richard Xiao

Aims of this session Lecture – Corpora vs. diachronic study – The state of the art of corpus-based diachronic studies – Case study: recent change in English grammar Lab session – Using the Time corpus to explore full and bare infinitives in American English between the 1920s and the 2000s

Corpora vs. diachronic study The nature of diachronic study determines its reliance on empirical historical data Diachronic study is perhaps one of the few areas which can only be investigated using corpus data (cf. Bauer 2002: 109) – The intuitions of modern speakers have little to offer regarding the language used hundreds or even tens of years ago

Helsinki corpus and related books Three books based on the Helsinki corpus on the project “English in transition: Change through variation” – Early English in the Computer Age: Exploration through the Helsinki Corpus (Rissanen, Kytö and Palander-Collin 1993) – English in Transition: Corpus-based Studies in Linguistic Variation and Genre Styles (Rissanen, Kytö and Heikkonen 1997) – Grammaticalization at Work: Studies of Long-term Developments in English (Rissanen, Kytö and Heikkonen 1997)

Recent grammatical changes Work undertaken by teams led Geoff Leech (Lancaster) and Christian Mair (Freiburg) on the basis of the corpora of the Brown family (LOB vs. FLOB, and Brown vs. Frown) – Change in Contemporary English: A Grammatical Study (CUP, in press) – Recent grammatical change in English: data, description, theory (Leech 2004) – Current changes in English syntax (Leech and Mair 2006) – Recent grammatical change in written English (Leech and Smith 2006) – Grammatical change in 20th century English (Mair 2006)

Historical pragmatics Arnovick (2000) examines the speech event of parting, focusing on the development of Goodbye, which was originally an explicit blessing God be with you – The end of the 17 th century and the beginning of the 18 th century marked a crucial period during which the blessing declined and the closing form Goodbye increased in frequency Jacobsson (2002) studies Thank you and Thanks in Early Modern English – They were probably the same in the Early Modern period as they are today as gratitude expressions, but they ‘had not developed the discourse-marking features of today’s British English; nor is it possible to see the complex patterns of thanking in different turn-positions Biber (2004) explores, on the basis of the ARCHER corpus, the patterns of historical change in the preferred devices used to mark stance across the past three centuries

Recent change in English grammar Case study based on Leech (2004) – “Recent grammatical change in English: data, description, theory”, in K. Aijmer and B. Altenberg (eds) Advances in Corpus Linguistics. Amsterdam: Rodopi What are the major trends in grammatical change over the three intervening decades between ?

Data collection and tagging

Modal auxiliaries LL score greater than 3.84 for p<0.05

A generation gap?

“Encroachment” hypothesis The apparent decline in canonical modal usage is due to the rise, in recent centuries, of the so-called semi-modals, such as be going to and have to, which are presumed to be still increasingly used – Are semi-modals gradually encroaching the territory of canonical modals?

“Encroachment” hypothesis No strong connection between the patterns shown by the modals and the semi-modals Semi-modals are very much less frequent (in written English) than the modals, but changes in frequency show a mixed picture – Some of them seem to have increased their usage massively in the period , but others have declined – Unexpectedly, however, the overall frequency of semi-modals is found to be greater in the BrE than in the AmE corpora in both periods

Frequencies of some semi-modals

Semi-modals in spoken BrE Trends in spoken English are similar to those in written English, but somewhat more exaggerated The general increase of semi-modals is even greater in spoken than in written English (+32.3%) – But only two of them have increased significantly

Modal auxiliaries: a summary In general terms, a clear decline of frequency in the use of canonical modal auxiliaries between 1961 and 1991 During this period, individual modals have been declining at different rates, but there is a tendency for very common modals to hold their own (e.g. will, can), and for infrequent modals (e.g. shall, ought to, need) to decline sharply – Some middle-ranking modals (e.g. may and must) have also declined sharply Alongside the decline of modals, there is no clear overall picture regarding semi-modals: although in general, semi- modal usage is increasing, some semi-modals are declining, and semi-modals as a whole are much less frequent than ‘true’ modals

A bigger question… Do the decline in canonical modals (especially formal modals) and the general increase in semi-modals suggest that English is becoming more colloquial over the three intervening decades between 1961 and 1991?

Changes indicative of colloquialization Frequency per M words

Colloquialization hypothesis – A decline in canonical modals, especially formal usages like shall, ought to and need – An increasing frequency of phenomena associated with spoken language – A decreasing frequency of phenomena associated with the written language A tendency for the written British English gradually to acquire norms and characteristics associated with the spoken conversational English over the three decades in

Practical Using the Time corpus to explore full and bare infinitives in American English between the 1920s and the 2000s – Install and login via VPN to access the Internet

The Time Corpus The Time corpus (Davies 2007) – 100+ million words – span ( ) – Wide range of topics (news, sports, business, culture, health, entertainment, etc) – internal consistency – chronological gap

HELP V [help].[vv*] [v*i]

HELP + PRON + V [help].[vv*] [p*] [v*i]

HELP + NOUN + V [help].[vv*] [n*] [v*i]

HELP + WORD + NOUN + V [help].[vv*] * [n*] [v*i]

HELP + bare infinitives combined frequency per million words

HELP + to V [help].[vv*] to [v*i]

HELP + PRON + to V [help].[vv*] [p*] to [v*i]

HELP + NOUN + to V [help].[vv*] [n*] to [v*i]

HELP + WORD + NOUN + to V [help].[vv*] * [n*] to [v*i]

HELP + full infinitives combined frequency per million words

Full vs. bare infinitives in Time corpus