Presentation is loading. Please wait.

Presentation is loading. Please wait.

CORPUS LINGUISTICS: AN INTRODUCTION Susi Yuliawati, M.Hum. Universitas Padjadjaran

Similar presentations


Presentation on theme: "CORPUS LINGUISTICS: AN INTRODUCTION Susi Yuliawati, M.Hum. Universitas Padjadjaran"— Presentation transcript:

1 CORPUS LINGUISTICS: AN INTRODUCTION Susi Yuliawati, M.Hum. Universitas Padjadjaran susi.y@unpad.ac.id

2 PRESENTATION TOPICS 1.Corpus Linguistics: Past and Present 2.What is Corpus? 3.The Role of Computers in Corpus Linguistics

3 Corpus Linguistics: Past McEnery (2006): Corpus linguistics dates back to the pre-Chomskyan period when it was used by field linguists such as Boas (1940) and linguists of structural tradition, such as Sapir, Newman, Bloomfield, Pike. The linguists at that time would have used shoeboxes filled with paper slips rather than computers as a means of data storage.

4 Corpus Linguistics: Past The ‘corpora’ might have been simple collections of written or transcribed texts and thus not representative. The methodology was essentially ‘corpus based’ in the sense that it was empirical and based on observed data.

5 Corpus Linguistics: Present With development in technology, especially the development of ever more powerful computers offering ever increasing processing power and massive storage at relatively low cost, the exploitation of massive corpora became feasible. However, corpus linguistics is not a mindless process of automatic language description. Linguists use corpora to answer questions and solve problems.

6 Corpus Linguistics: Present Key problems and challenges in corpus linguistics: 1.How can we best exploit the opportunities which arise from having texts stored in machine- retrievable form? 2.What linguistic theories will best help structure corpus-based research? 3.What linguistic phenomena should we look for? 4.What applications can make use of the insights and improved description of languages which come out of this research?

7 What is Corpus? A collection of pieces of language that are selected and ordered according to explicit criteria in order to be used as a sample of the language (Sinclair, 1996). ‘Linguistic criteria’ dependent upon the intended use for the corpus are used to select and put together the texts ‘in a principle way’ (Johansson, 1998). A corpus is a collection of text based on a set of design criteria, one of which is that the corpus aims to be representative (Cheng, 2012).

8 What is Corpus? There is an increasing consensus that a corpus is a collection of (1) machine-readable (2) authentic texts (including transcripts of spoken data) which is (3) sampled to be (4) representative of a particular language or language variety (McEnery, 2006).

9 Types of Corpora Nesselhauf (2011): General/reference corpora vs (specialized corpora), e.g. BNC & Bank of English. Historical corpora (vs corpora of present-day language use), e.g. Helsinski corpus, ARCHER. Regional corpora (vs corpora containing more than one variety), e.g. WCNZE (Wellington Corpus of Written New Zealand English). Learner corpora (vs native speaker corpora), e.g. ICLE (International Corpus of Learner English) Multilingual corpora (vs one language corpora). Spoken corpora (vs written corpora vs mixed corpora), e.g. LLC (London Lund Corpus of Spoken English).

10 Available Corpora

11 The Role of Computers in Corpus Linguistics Computerized corpora can be processed and manipulated rapidly at minimal cost. Computers can process machine-readable data accurately and consistently. Computer can avoid human bias in an analysis, thus making the result more reliable. Machine-readability allows further automatic processing to be performed on the corpus so that corpus texts can be enriched with various metadata and linguistic analyses.

12 The Role of Computers in Corpus Linguistics Corpus linguistics is now inextricably linked to the computer, which has introduced incredible speed, total accountability, accurate replicability, statistical reliability and the ability to handle huge amounts of data.

13 COHA ( Word Comparison)

14 COLLINS CORPUS

15 COCA (Keywords in Contexts)

16 Indonesian Corpus

17 Specialised Corpus (Sundanese Corpus)

18 Corpus Software & Statistical Information


Download ppt "CORPUS LINGUISTICS: AN INTRODUCTION Susi Yuliawati, M.Hum. Universitas Padjadjaran"

Similar presentations


Ads by Google