Corpus Linguistics Lexicography. Questions for lexicography in corpus linguistics How common are different words? How common are the different senese.

Slides:



Advertisements
Similar presentations
An investigation into Corpus-based learning about language inin the primary-school: CLLIP Corpus evidence of the features of childrens literature.
Advertisements

ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES language teaching (1) Bambang Kaswanti Purwo
Uses of a Corpus “[E]xplore actual patterns of language use”
Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
Introduction: A discourse perspective on grammar
Outline What is a collocation?
Introduction to phrases & clauses
English Lexicography.
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
Talking about your homework News story? –What made you choose…? One of your words? –What made you choose…? (Give your vocabulary books to another student.
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
Corpus 06 Discourse Characteristics. Reasons why discourse studies are not corpus-based: 1. Many discourse features cannot be identified automatically.
The origins of language curriculum development
Corpus 3 Corpus-based Description. Aspects of corpus-based studies lexis, morphology, syntax and discourse. fig. 3.1 A classification of corpus-based.
1/23 LELA Lecture 2 Corpus-based research in Linguistics See esp. Meyer pp
Corpus 05 Grammar. Unlike lexicography, grammar does not have a long tradition of empirical study. Prescriptive vs descriptive: traditionally, grammatical.
Corpora and Language Teaching
Chapter 18: Words as They Appear in Malaysian Secondary School English Language Textbooks: Some Implications for Pedagogy Jayakaran Mukundan Presented.
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Corpus Linguistics Case study 2 Grammatical studies based on morphemes or words. G Kennedy (1998) An introduction to corpus linguistics, London: Longman,
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation Chapter 2.
The DVC project: Disambiguation of Verbs by Collocation ____ an introduction to the linguistic theory of norms and exploitations Patrick Hanks Research.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
Representatıvness, balance and samplıng ın a corpus Lınguistıcs.
Finding the draft curriculum edu.au/Home.
Teaching Vocabulary Chapter 14
Corpus-assisted discourse analysis
Corpus linguistics and language teaching The next nexus? Doug Biber Northern Arizona University.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES exploring frequencies in texts Bambang Kaswanti Purwo
Why We Need Corpora and the Sketch Engine Adam Kilgarriff Lexical Computing Ltd, UK Universities of Leeds and Sussex.
Averil Coxhead Hüsem Korkmaz MA TEFL. was developed from a corpus of 5 million words with the needs of ESL/EFL learners in mind, contains the most widely.
Academic Vocabulary and Grammar Academic Word Lists.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
Corpora and Concordancers in ESL/EFL Class: Truly Authentic Language for Language Learning. and opening.
Centre for Lexicography, Aarhus School of Business, Aarhus University, Denmark Centlex (Centre for Lexicography) TOWARDS A BETTER PERSPECTIVE IN THE SELECTION.
Please solve the problem using a model (e.g. – picture, diagram, equation/expression, etc.).
Corpus approaches to discourse
Communicative and Academic English for the EFL Professional.
The International Consortium. The International HapMap Project.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
Corpus search What are the most common words in English
Overview of Corpus Linguistics
Levels of Linguistic Analysis
1 Ch 1. VOCABULARY SIZE, TEXT COVERAGE & WORD LISTS Nation& Waring.
Learners' Dictionaries Oxford1948 Longman1978 Collins COBUILD1987 Macmillan2002 Macmillan2008 (bilingualized) Merriam-Webster2008 Jackson, Howard
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
Making trouble-free corpus tasks in 10 minutes Jennie Wright.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
FREQUENCY DISTRIBUTION
E303 Part II The Context of Language Research
Introduction to Corpus Linguistics
Searching corpora.
Exploring the BNC Corpus
Introduction to Corpus Linguistics: Exploring Collocation
Introduction to Corpus Linguistics: Applications Lexicography
Corpus Linguistics I ENG 617
A CORPUS-BASED STUDY OF COLLOCATIONS OF HIGH-FREQUENCY VERB —— MAKE
Corpus Linguistics I ENG 617
Levels of Linguistic Analysis
Introduction to Text Analysis
Applied Linguistics Chapter Four: Corpus Linguistics
The Nature of learner language
Using Dictionaries in Translation (223 TRAJ)
Using Dictionaries in Translation (223 TRAJ)
Presentation transcript:

Corpus Linguistics Lexicography

Questions for lexicography in corpus linguistics How common are different words? How common are the different senese for a given words? Do words have systematic associations with other words? Do words have systematic associations with particular registers or dialects?

6 major research questions in lexicography 1.What are the meanings associated with a particular word? 2. What is the frequency of a word relative to other related words? 3. What non-linguistic association patterns does a particular word have (e,g, to registers, historical periods, or dialects)? 4. What words commonly co-occur with a particular word, and what is the distribution of these “collocational” sequences across registers? 5. How are the senese and uses of a word distributed? 6. How are seemingly synonymous words used and distributed in different ways?

Meaning of Words KWIC can usually reveal the different meanings of words. CL: p. 27 Figs 2.1 and 2.2 Applications: meanings of a particular word in a textbook, learner groups, register

Frequency of Words Listing can serve the purpose. Listing methods: various forms of a word, comparison basis, multiple grammatical functions of a word Forms of a word: lemma is more useful than the raw form of a word in a list of all words. (Figs. 2.3 & 2.4, CL:28, 29)

Frequency of Words Word frequency: percentage or on the basis of one million, to get information about the commonness of a word, help solve the problems caused by a small corpus. A tagged corpus shows the distribution of grammatical forms of a word. (Table Fig. 2.5 CF: 32) Applications: distribution of a word in a curriculum, in a test, in a book, in a register

Distribution across registers. Overall characterizations of a word can be misleading, because words are often used in a very different ways in different registers. Table 2.1 shows the frequency by register for DEAL as a noun (Table 2.1 CL: 32) Table 2.2 shows frequency of DEAL as a noun and verb in two registers (Table 2.2 CL: 34) Raw count: the actual number of occurrences of the word

Distribution of senses across registers One way to begin investigating the senses of words is to look at their collocates. Identifying the most common collocates of a word provides an efficient ad effective means to begin analyzing senses. Table 2.3 Common collocates of DEAL as a noun (CL: 37)

Distribution of senses across registers The meaning referring to an amount is the most common use The sense of amount is the most common meaning in both academic prose and fiction, and other uses are relatively common in fiction. Table 2.4 Common dictionary definitions of DEAL as a noun.

Distribution of senses across registers Findings: 1. DEAL in the sense of amount is not covered until very late in some dictionaries. 2. The use of big deal to mean unimportance is not covered by these dictionaries. 3. register differences are disregarded by all these dictionaries.

Synonymous words Table 2.5 Frequency of big, large and great There is great variation between the two registers. big is over ten times more common in fiction than in academic prose. great is over one-and a halftimes more common in fiction.

Synonymous words large is three times more common in academic prose. Big: physical size, most common in both registers Large: quantity or amount in academic prose while physical size in fiction. Great: used as in chunks in academic prose: great deal, great number. In fiction it is used in a much wider range of senses. E.g. great man.

Distribution across registers. Normed count: on a fixed basis In Table 2.2 the total sample has a slight difference between DEAL as a noun and as a verb, but the difference changed a lot when registers is taken into account. Find the possible causes. Applications: distribution of different forms of words in college English curriculum and students writing