CORPUS LINGUISTICS: AN INTRODUCTION Susi Yuliawati, M.Hum. Universitas Padjadjaran

Slides:



Advertisements
Similar presentations
Building up Corpus of Technical Vocabulary – Strategies and Feasibility Presenters: Dr. Aparna Palle, Preetha Anthony GNITS, HYDERABAD.
Advertisements

Metadata in Carrot II Current metadata –TF.IDF for both documents and collections –Full-text index –Metadata are transferred between different nodes Potential.
Uses of a Corpus “[E]xplore actual patterns of language use”
Interlanguage phonology: Phonological description of what constitute ‘foreign accents’ have been developed. Studies about the reception of such accents.
Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
© Cambridge International Examinations 2013 Component/Paper 1.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
Constructing and Evaluating Web Corpora: ukWaC Adriano Ferraresi University of Bologna Aston University Postgraduate Conference.
Corpus Linguistics. What is corpus linguistics? Method / Theory in Linguistics Analysis of collections of texts (corpora) Verifying/ Strengthening or.
SOCI 380 INSTRUCTIONS RE. RESEARCH PAPER DUE DATE: The research paper is due on the last day of class You are required to write and submit a detailed research.
Data-Driven South Asian Language Learning SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University
LELA English Corpus Linguistics
CMP3265 – Professional Issues and Research Methods Research Proposals: n Aims and objectives, Method, Evaluation n Yesterday – aims and objectives: clear,
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Research methods in corpus linguistics Xiaofei Lu.
Chapter 3: An Introduction to Corpus Linguistics Compiled by: Sajjad Ghadamyari Farhad Ghiasvand Presentation Date: Dec. 8, Monday.
Allyn & Bacon 2003 Social Work Research Methods: Qualitative and Quantitative Approaches Topic 2: The Basics of Social Work Research Learn.
Deny A. Kwary Internal Structures of Dictionary Entries.
English Corpora and Language Learning Tamás Váradi
Memory Strategy – Using Mental Images
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Corpus linguistics for translators Amanda Saksida University of Nova Gorica.
U SING C ORPUS - BASED R ESEARCH FOR L ANGUAGE T EACHING AND L EARNING ENGLISH 510 Hee Sung (Grace) Jun & Kimberly LeVelle.
Claudia Marzi Institute for Computational Linguistics, “Antonio Zampolli” – Italian National Research Council University of Pavia – Dept. of Theoretical.
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
Units 1 & 2.
Reflections on Using Corpora Data in EFL Teaching CHEN BO Chongqing Jiaotong University 2006.
Researching language with computers Paul Thompson.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES introduction (02) Bambang Kaswanti Purwo
Corpus-assisted discourse analysis
Chapter 10 Language and Computer English Linguistics: An Introduction.
Introduction to Research Deny A. Kwary Airlangga University
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Creating Authentic EFL Materials Using English Corpora: Some Benefits of Corpus for the Layman Tyler Barrett Kure City ALT
Cognitive Level of Analysis
CT 854: Assessment and Evaluation in Science & Mathematics
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
Teacher Training Programme for the Ministry of Higher and Secondary Special Education of the Republic of Uzbekistan.
How Can Corpora Help Me To Be Successful in CO150?
Corpus Linguistics in Research Doctorate in Education University of Warwick 6th November 2008.
RESEARCH DESIGN & CORPUS COMPILATION. Corpus design is intrinsic and a fundamental part of the analysis. It is guided by the RQ and affects the results.
Chapter One What is language? What is it we know about language?
PSYA4 Research Methods Qualitative Data.
Enda F. Scott 2001 Good morning An introduction to modern dictionary making.
Numeracy unit standards update. Background Government strategy to improve literacy and numeracy levels of all New Zealanders Adult Literacy Strategy (TEC)
Practicing Problems and Thinking About Linguistics Billy Clark, Middlesex University, UK Linguistics Olympiad Summer Course Corpus Christi.
Colorado State University
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 1 Research: An Overview.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
Jette hannibal Internal assessment Experimental research.
INTRODUCTION TO APPLIED LINGUISTICS
CORPUS LINGUISTICS 1) A revision of corpus linguistics 2) Language corpora in the ESL/EFL classroom.
Critical &Scientific Debate Soran University Faculty of Science / Chemistry Dept. Talib M. Sharif Omer Asst. Lecturer April 7,
Issue Evaluation Exercise.. The Process of Issue Evaluation (1) This demands the development of the range of geographical skills, knowledge and understanding.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
Corpus Linguistics Anca Dinu February, 2017.
Introduction to Corpus Linguistics
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Computational and Statistical Methods for Corpus Analysis: Overview
Corpus Linguistics I ENG 617
Corpus-Based ELT CEL Symposium Creating Learning Designers
Corpus Linguistics I ENG 617
(word formation: follow up)
McEnery, T. , Xiao, R. and Y. Tono Corpus-based language studies
Applied Linguistics Chapter Four: Corpus Linguistics
Course Description In this course:
Presentation transcript:

CORPUS LINGUISTICS: AN INTRODUCTION Susi Yuliawati, M.Hum. Universitas Padjadjaran

PRESENTATION TOPICS 1.Corpus Linguistics: Past and Present 2.What is Corpus? 3.The Role of Computers in Corpus Linguistics

Corpus Linguistics: Past McEnery (2006): Corpus linguistics dates back to the pre-Chomskyan period when it was used by field linguists such as Boas (1940) and linguists of structural tradition, such as Sapir, Newman, Bloomfield, Pike. The linguists at that time would have used shoeboxes filled with paper slips rather than computers as a means of data storage.

Corpus Linguistics: Past The ‘corpora’ might have been simple collections of written or transcribed texts and thus not representative. The methodology was essentially ‘corpus based’ in the sense that it was empirical and based on observed data.

Corpus Linguistics: Present With development in technology, especially the development of ever more powerful computers offering ever increasing processing power and massive storage at relatively low cost, the exploitation of massive corpora became feasible. However, corpus linguistics is not a mindless process of automatic language description. Linguists use corpora to answer questions and solve problems.

Corpus Linguistics: Present Key problems and challenges in corpus linguistics: 1.How can we best exploit the opportunities which arise from having texts stored in machine- retrievable form? 2.What linguistic theories will best help structure corpus-based research? 3.What linguistic phenomena should we look for? 4.What applications can make use of the insights and improved description of languages which come out of this research?

What is Corpus? A collection of pieces of language that are selected and ordered according to explicit criteria in order to be used as a sample of the language (Sinclair, 1996). ‘Linguistic criteria’ dependent upon the intended use for the corpus are used to select and put together the texts ‘in a principle way’ (Johansson, 1998). A corpus is a collection of text based on a set of design criteria, one of which is that the corpus aims to be representative (Cheng, 2012).

What is Corpus? There is an increasing consensus that a corpus is a collection of (1) machine-readable (2) authentic texts (including transcripts of spoken data) which is (3) sampled to be (4) representative of a particular language or language variety (McEnery, 2006).

Types of Corpora Nesselhauf (2011): General/reference corpora vs (specialized corpora), e.g. BNC & Bank of English. Historical corpora (vs corpora of present-day language use), e.g. Helsinski corpus, ARCHER. Regional corpora (vs corpora containing more than one variety), e.g. WCNZE (Wellington Corpus of Written New Zealand English). Learner corpora (vs native speaker corpora), e.g. ICLE (International Corpus of Learner English) Multilingual corpora (vs one language corpora). Spoken corpora (vs written corpora vs mixed corpora), e.g. LLC (London Lund Corpus of Spoken English).

Available Corpora

The Role of Computers in Corpus Linguistics Computerized corpora can be processed and manipulated rapidly at minimal cost. Computers can process machine-readable data accurately and consistently. Computer can avoid human bias in an analysis, thus making the result more reliable. Machine-readability allows further automatic processing to be performed on the corpus so that corpus texts can be enriched with various metadata and linguistic analyses.

The Role of Computers in Corpus Linguistics Corpus linguistics is now inextricably linked to the computer, which has introduced incredible speed, total accountability, accurate replicability, statistical reliability and the ability to handle huge amounts of data.

COHA ( Word Comparison)

COLLINS CORPUS

COCA (Keywords in Contexts)

Indonesian Corpus

Specialised Corpus (Sundanese Corpus)

Corpus Software & Statistical Information