Dr. Radhika Mamidi Corpus. What is a Corpus? a corpus (plural corpora) or text corpus is a large and structured set of texts (now usually electronically.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.
Advertisements

Interpreting Concordance Lines Susan Hunston, University of Birmingham John Sinclair, Tuscan Word Centre.
Uses of a Corpus “[E]xplore actual patterns of language use”
Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Introduction to phrases & clauses
Like [regular verb]=enjoy, find pleasant: Steve likes cooking. [plural noun]= things you like: What are your likes and dislikes? [preposition]= similar.
1 Analysing and teaching meaning (3) Analysing and teaching meaning (3) SSIS Lazio - Lesson 3 prof. Hugo Bowles January 2007.
Bilingual Dictionaries
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
Research methods in corpus linguistics Xiaofei Lu.
Corpus Linguistics Case study 2 Grammatical studies based on morphemes or words. G Kennedy (1998) An introduction to corpus linguistics, London: Longman,
Chapter 3: An Introduction to Corpus Linguistics Compiled by: Sajjad Ghadamyari Farhad Ghiasvand Presentation Date: Dec. 8, Monday.
In The Name of Allah بسم اللّه الرّحمن الرّحیم. Special English For Computer Science Students By: Sayed Mohammad Mehdi Feiz.
Memory Strategy – Using Mental Images
KINDS OF TRANSLATION Literal Versus Idiomatic Form-based kinds of translation: meaning-based Form based is to follow the form of the source language and.
Paradigm based Morphological Analyzers Dr. Radhika Mamidi.
Online Corpora in L2 Writing Class Zawan Al Bulushi Indiana University Bloomington November 15,
BTANT 129 w5 Introduction to corpus linguistics. BTANT 129 w5 Corpus The old school concept – A collection of texts especially if complete and self-contained:
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
How conversation works Conversational English Compiled by Victor Nickolz Grand Lyceum 2004 For classes 7-11.
Researching language with computers Paul Thompson.
CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.
Why We Need Corpora and the Sketch Engine Adam Kilgarriff Lexical Computing Ltd, UK Universities of Leeds and Sussex.
Learning Vocabulary QUESTIONS:  How do you learn vocabulary?  Do you often forget vocabulary? Why?  What do you need to know to really know a word?
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Grammar for Graduate Students Lecture 5 Gerunds & Infinitives.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
How Can Corpora Help Me To Be Successful in CO150?
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
1 And yeah, it was really good! Positive stance in native and learner speech Sylive De Cock Centre for English Corpus Linguistics Université catholique.
Unit 8 LANGUAGE FOCUS. Content  Word study  Word used in Computing and Telephoning  Grammar  Pronoun  Indirect speech with conditional sentences.
Communicative and Academic English for the EFL Professional.
Corpus search What are the most common words in English
D.L.P. – Week Four GRADE EIGHT. Day One – Skills Correction of a sentence fragment A fragment occurs because a sentence is missing a vital part, a subject.
Text segmentation Amany AlKhayat. Before any real processing is done, text needs to be segmented at least into linguistic units such as words, punctuation,
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
What’s in a Wordle? Vocabulary Learning Made Fun Tilly Harrison University of Warwick.
SENTENCE STRUCTURE HOW TO FIND THE PARTS OF A SENTENCE.
Using Tag Questions Using Tag Questions. She's very beautiful, isn't she?
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
CORPUS LINGUISTICS 1) A revision of corpus linguistics 2) Language corpora in the ESL/EFL classroom.
PRONOUNS. Pronouns A pronoun is a word used in place of one or more nouns or pronouns. Example: Ask Dan if Dan has done Dan’s homework. Ask Dan if he.
Making trouble-free corpus tasks in 10 minutes Jennie Wright.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
Corpora: a key part of a materials writer’s toolkit
Writing Inspirations, 2017 Aalto University
SENTENCE COMPLETION AND ERROR IDENTIFICATION
Approaches to Machine Translation
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Computational and Statistical Methods for Corpus Analysis: Overview
Corpus Linguistics I ENG 617
عمادة التعلم الإلكتروني والتعليم عن بعد
Introduction to Corpus Linguistics: Exploring Collocation
Writing Inspirations, Spring 2016 Aalto University
Corpus-Based ELT CEL Symposium Creating Learning Designers
(word formation: follow up)
Approaches to Machine Translation
Computational Linguistics: New Vistas
Parts of Speech II.
Artificial Intelligence 2004 Speech & Natural Language Processing
Using Dictionaries in Translation (223 TRAJ)
Presentation transcript:

Dr. Radhika Mamidi Corpus

What is a Corpus? a corpus (plural corpora) or text corpus is a large and structured set of texts (now usually electronically stored and processed). a corpus (plural corpora) or text corpus is a large and structured set of texts (now usually electronically stored and processed). used for statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules. used for statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules.

Dr. Radhika Mamidi Corpus A corpus is: A corpus is: Spoken (transcribed) or writtenSpoken (transcribed) or written In any languageIn any language Usually naturally-occurringUsually naturally-occurring Stored electronicallyStored electronically Searched using dedicated softwareSearched using dedicated software Using the techniques of frequency, phraseology and collocations, the data is processed. Using the techniques of frequency, phraseology and collocations, the data is processed.

Dr. Radhika Mamidi Some notable text corpora English language: Bank of English Bank of English Bank of English Bank of English British National Corpus [BNC] British National Corpus [BNC] British National Corpus British National Corpus Brown Corpus Brown Corpus Brown Corpus Brown Corpus Lancaster Oslo Bergen [LOB] Lancaster Oslo Bergen [LOB] International Corpus of English International Corpus of English International Corpus of English International Corpus of English Oxford English Corpus Oxford English Corpus Oxford English Corpus Oxford English Corpus Scottish Corpus of Texts & Speech Scottish Corpus of Texts & Speech Scottish Corpus of Texts & Speech Scottish Corpus of Texts & Speech

Types of Corpora A corpus may contain texts in a single language monolingual corpus) or text data in multiple languages (multilingual corpus). A corpus may contain texts in a single language monolingual corpus) or text data in multiple languages (multilingual corpus). Specialized Corpus Specialized Corpus General Corpus General Corpus Learner’s Corpus Learner’s Corpus Pedagogic Corpus Pedagogic Corpus Diachronic Corpus Diachronic Corpus Monitor Corpus Monitor Corpus Comparable Corpus Comparable Corpus Parallel Corpus Parallel Corpus Monolingual Bilingual

Concordance lines Concordance lines show every instance of the word you have asked for (or a sample of these), with a few words before and after; Concordance lines show every instance of the word you have asked for (or a sample of these), with a few words before and after; They can be sorted to put together similar co-texts; They can be sorted to put together similar co-texts; They encourage observation of recurring patterns (‘samenesses’). They encourage observation of recurring patterns (‘samenesses’)

Dr. Radhika Mamidi Example: Cup of tea (1) Example: Cup of tea (1) and we'll discuss it over a cup of tea." He handed back the ID and and we'll discuss it over a cup of tea." He handed back the ID and in and ask me to make her a cup of tea. When I refuse she'll say in and ask me to make her a cup of tea. When I refuse she'll say she wouldn't even accept a cup of tea because she didn't have the she wouldn't even accept a cup of tea because she didn't have the where to begin. I offered him a cup of tea and he blurted out: `I will where to begin. I offered him a cup of tea and he blurted out: `I will trolley approaches and a cup of tea is set down on her locker. T trolley approaches and a cup of tea is set down on her locker. T play quietly, while I have a cup of tea, I'll cook you some chips f play quietly, while I have a cup of tea, I'll cook you some chips f on the sofa enjoying a nice cup of tea. Since I've done all this on on the sofa enjoying a nice cup of tea. Since I've done all this on much. I will have a nice cup of tea with him before the game and much. I will have a nice cup of tea with him before the game and into my uniform, have a quick cup of tea, and then get breakfast read into my uniform, have a quick cup of tea, and then get breakfast read for a quick, or not so quick, cup of tea to return the compliment in for a quick, or not so quick, cup of tea to return the compliment in a chat about old times over a cup of tea and a biscuit." Eurosta a chat about old times over a cup of tea and a biscuit." Eurosta She sat down and picked up the cup of tea I'd poured for her. She dran She sat down and picked up the cup of tea I'd poured for her. She dran I sure as hell remembered the cup of tea because I mean because it a I sure as hell remembered the cup of tea because I mean because it a

Dr. Radhika Mamidi Example: Cup of tea (2) Example: Cup of tea (2) hours on motorways is not my cup of tea, but I do like visiting new p hours on motorways is not my cup of tea, but I do like visiting new p her. `No, really, she's not my cup of tea. But the powerful deputy edi her. `No, really, she's not my cup of tea. But the powerful deputy edi lecturers were more my cup of tea than homicidally tanked-up l lecturers were more my cup of tea than homicidally tanked-up l of Ruby -- she's not everyone's cup of tea. By the way, I understand yo of Ruby -- she's not everyone's cup of tea. By the way, I understand yo marketing may not be everyone's cup of tea. There's an old advertising two This is much more Linda's cup of tea: a three-bedroom, brand-new two This is much more Linda's cup of tea: a three-bedroom, brand-new play. This won't be everybody's cup of tea; but you'd be hard pushed to play. This won't be everybody's cup of tea; but you'd be hard pushed to which are not everybody's cup of tea. And the annual management c which are not everybody's cup of tea. And the annual management c catching. But if Leo isn't your cup of tea, you might like AMERICAN BE catching. But if Leo isn't your cup of tea, you might like AMERICAN BE have been here. It's more your cup of tea, as it were, with its High C have been here. It's more your cup of tea, as it were, with its High C was saying. `Not quite your cup of tea, isn't that what you say?" S was saying. `Not quite your cup of tea, isn't that what you say?" S

Dr. Radhika Mamidi Uses of Concordance lines Concordance lines make recurrences of pattern apparent. Concordance lines make recurrences of pattern apparent. They encourage us to see that: They encourage us to see that: Pattern and meaning are associatedPattern and meaning are associated Many words and phrases occur in a restricted set of contextsMany words and phrases occur in a restricted set of contexts They encourage us to make unexpected connections between items. They encourage us to make unexpected connections between items.

Dr. Radhika Mamidi Use of Corpus: Language teaching Language teaching: The most frequent words with the most frequent senses are taught, using the concordance lines the grammar patterns are studied, the difference in easily confused pairs is noted. Language teaching: The most frequent words with the most frequent senses are taught, using the concordance lines the grammar patterns are studied, the difference in easily confused pairs is noted. Eg: interested and interesting Eg: interested and interesting interested is used in the phrase ‘interested in’ and the pattern ‘someone is interested in something’ is more frequent. interested is used in the phrase ‘interested in’ and the pattern ‘someone is interested in something’ is more frequent. interesting is nearly always used before a noun and the pattern ‘an interesting thing’ is more frequent. interesting is nearly always used before a noun and the pattern ‘an interesting thing’ is more frequent.

Dr. Radhika Mamidi Use of Corpus: Dictionary making New words/phrases/collocations, new meanings to old words and real examples are added using corpus. New words/phrases/collocations, new meanings to old words and real examples are added using corpus. Frequency plays an important in making the entries for each headword. Frequency plays an important in making the entries for each headword. Longman Dictionary of Contemporary English 1 st and 2 nd editions were written without using corpus and the 3 rd edition, 1995, was written using a corpus. Longman Dictionary of Contemporary English 1 st and 2 nd editions were written without using corpus and the 3 rd edition, 1995, was written using a corpus. You will find example sentences from corpus in this edition apart from more number of senses. You will find example sentences from corpus in this edition apart from more number of senses.Example: New words – internet, New words – internet, New meanings – file, folder, save New meanings – file, folder, save New shades of meaning – ‘I know’ New shades of meaning – ‘I know’

Dr. Radhika Mamidi Use of Corpus: Translation Multilingual corpora that have been specially formatted for side-by-side comparison are called aligned parallel corpora. Multilingual corpora that have been specially formatted for side-by-side comparison are called aligned parallel corpora. Aligned parallel corpora is useful for translators to study the SL and TL equivalents. Aligned parallel corpora is useful for translators to study the SL and TL equivalents.

Dr. Radhika Mamidi Use of Corpus: Natural Language Processing In order to make the corpora more useful for doing linguistic research, they are often subjected to a process known as annotation. In order to make the corpora more useful for doing linguistic research, they are often subjected to a process known as annotation.annotation An example of annotating a corpus is part-of-speech tagging, or POS-tagging, in which information about each word's part of speech (verb, noun, adjective, etc.) is added to the corpus in the form of tags. An example of annotating a corpus is part-of-speech tagging, or POS-tagging, in which information about each word's part of speech (verb, noun, adjective, etc.) is added to the corpus in the form of tags.part-of-speech taggingpart-of-speech tagging Other types of annotation– syntactic, semantic and discourse. Other types of annotation– syntactic, semantic and discourse. To build NLP tools like POS taggers, syntactic parsers or semantic analyzers, such annotated corpus is used. To build NLP tools like POS taggers, syntactic parsers or semantic analyzers, such annotated corpus is used.