Introduction to Corpus Linguistics

Slides:



Advertisements
Similar presentations
Conducting the Community Analysis. What is a Community Analysis?  Includes market research and broader analysis of community assets and challenges 
Advertisements

Introduction to Computational Linguistics
Academic style.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
HL2 MARKETING THEORY: QUANTITATIVE MARKET RESEARCH IB BUSINESS & MANAGEMENT A COURSE COMPANION.
QUANTITATIVE METHODS I203 Social and Organizational Issues of Information.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Behavioral Research Chapter 6-Observing Behavior.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
Slide 1-1 Copyright © 2004 Pearson Education, Inc. Stats Starts Here Statistics gets a bad rap, and Statistics courses are not necessarily chosen as fun.
Introduction to Scientific Research. Science Vs. Belief Belief is knowing something without needing evidence. Eg. The Jewish, Islamic and Christian belief.
Enda F. Scott 2001 Good morning An introduction to modern dictionary making.
Lexicography Lexicon has two different meanings:
LING 200 Introduction to Linguistics Prof. Sharon Hargus Winter 2009 Jan. 5, 2009.
QUANTITATIVE METHODS I203 Social and Organizational Issues of Information For Fun and Profit.
Getting Started: Research and Literature Reviews An Introduction.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
INTRODUCTION TO APPLIED LINGUISTICS
Research Principles in VET Formulating Research Problems and Research Questions.
Using language corpora in developing Arabic lessons & syllabuses
How to teach writing Why teach writing?
Unit 5: Plagiarism, Cheating and Academic Integrity
E303 Part II The Context of Language Research
Vocabulary Module 2 Activity 5.
How to Research Lynn W Zimmerman, PhD.
Statistical NLP: Lecture 7
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Experimental Psychology
What the problem looks like:
Vocabulary acquisition in language classrooms
Reading and Frequency Lists
Introduction to Statistics
Corpus Linguistics I ENG 617
عمادة التعلم الإلكتروني والتعليم عن بعد
Topics in Linguistics ENG 331
Corpus Linguistics I ENG 617
Corpus Linguistics I ENG 617
Writing for Academic Journals
Corpus Linguistics I ENG 617
Evaluating Sources.
Corpora and Concordancers in ESL/EFL Class:
How do we know things? The Scientific Method
Come in and get your notebooks out. We have notes today!
Topics in Linguistics ENG 331
Corpus Linguistics I ENG 617
Academic Communication Lesson 2
ELT General Supervision
Basic Research Terms Research—the process of finding information relevant to a particular topic Source—any medium that provides information relevant to.
Topics in Linguistics ENG 331
How to Read Research Papers?
Final Tips.
Research Design Shamindra Nath Sanyal 12/4/2018 SNS.
Introduction to Corpus Linguistics ENG 331
Final PROJECT in translation (1)
Topics in Linguistics ENG 331
The Scientific Method.
Introduction. Conducting statistical investigations to develop learner statistical thinking.
Chapter 1: Introduction to Research on Physical Activity
Psych 231: Research Methods in Psychology
Introduction: Statistics meets corpus linguistics
INTRODUCTION TO COMPILERS (Pavan)
Using GOLD to Tracking L2 Development
Applied Linguistics Chapter Four: Corpus Linguistics
Competence and performance
Psych 231: Research Methods in Psychology
Presenting the Significance
RESEARCH ON THE GENDER DIFFERENCES IN THE FIELD OF SOCIOLINGUISTICS
The Lexical Approach By: Yajaira Carrillo and Lorena Chirinos.
Presentation transcript:

Introduction to Corpus Linguistics Rania Al-Sabbagh Department of English Faculty of Al-Alsun (Languages) rsabbagh@alsun.asu.edu.eg

Have You Ever Wondered … how many irregular verbs are in English? what the most frequently used words on Egyptian social media are? which is more common in Arabic: noun-based or verb-based sentences? What is the most appropriate way to answer these questions: the qualitative or the quantitative way? Well, what is the difference between the two ways? Week 1

Qualitative vs. Quantitative Research 1 The qualitative way has some drawbacks: It will only consider those cases known by the expert who does not necessarily know everything. We will always be confined to the expert’s opinion and run the risk of becoming prescriptive. Qualitative Quantitative A few examples As many as possible Experts’ opinions Statistical data analysis Week 1

Qualitative vs. Quantitative Research 2 The qualitative way, however, is not confined to anyone’s knowledge; instead we rely on real-world examples collected from multiple language users. Typically, this leads to new discoveries about language. The quantitative way is, therefore, descriptive, rather than prescriptive. However, they do have their own drawbacks as well. Typically, researchers wonder which examples they should compile and how many examples should be collected as well. Week 1

Prescriptive vs. Descriptive Research Prescriptive approaches are typically pedagogical: they try to teach people how to use language or what is right and what is wrong about language usage. Descriptive approaches, however, represent language as used without telling which usage is more standard or better. Week 1

Where does Corpus Linguistics Fit? Corpus Linguistics (CL) is defined as the study of language as expressed in samples of real-world text. As per the definition, CL is a quantitative, descriptive field. The word corpus in CL stands for the data or the collection of texts that we compile to answer our research questions. Week 1

Quiz True or False? Qualitative approaches are typically descriptive. Descriptive approaches rely on experts’ knowledge. Quantitative approaches use statistical data analysis. The experts’ opinion is crucial to quantitative research. Descriptive approaches try to teach people how to use language. Corpus linguistics is a quantitative, descriptive field of study. A corpus is a group of real-world texts like novels and social media posts. Quantitative approaches can be biased to the expert’s personal preferences. Prescriptive approaches describe how language is actually used in real world. Week 1

Corpus Linguistics and Modern Technology The idea of doing quantitative and descriptive language analysis is not new. Furthermore, CL can be done manually and it has nothing to do with computers. However, modern computer technology has helped CL a lot because: Computers make the analysis faster, more accurate, and consistent. Computers can store and analyze massive amounts of data – the era of BIG DATA. Computers are portable and cloud services have made CL analysis accessible any where. Week 1

Corpus Linguistics and Other Disciplines: Translation For translation, we can use corpus linguistics to know: The most frequent translation of given words. The different possible translations of a given word in different contexts. The most frequent translation of a given syntactic structure. Week 1

Corpus Linguistics and Other Disciplines: Stylistics In stylistics, corpus linguistics can be used to identify an author’s recurrent themes, words, phrases, and sentences. This is usually helpful in copyrights dispute as in The Da Vinci Code case. Week 1

Corpus Linguistics and Other Disciplines: Sociolinguistics Corpus linguistics can also be used for sociolinguistic studies. For example, we can use it to know: Who swears more frequently: men or women Which topics women or men frequently discuss on social media vs. face-to-face communication. Week 1

Corpus Linguistics and Other Disciplines: Lexicography Corpus Linguistics is crucial to lexicography, the industry of making dictionaries. All the examples and collocations included in a dictionary are derived from corpora – the plural of corpus. The lexicographer knows that a word is archaic because it is no longer used in modern corpora. Week 1

Corpus Linguistics and Other Disciplines: Lang. Learning Last but not least, corpus linguistics can be widely used to enhance language teaching and learning. For example, we can know the most frequent mistakes of your students to address them in your teaching. We can also make books of the most frequent words in a particular language so learners can focus on these words. One example is here. We can learn the contexts in which a particular word can be used. Week 1

Corpus Linguistics Cant’ … Corpus Linguistics can’t answer questions like: Why do Egyptians mispronounce /θ/ for /s/ as in thanks? What social factors that derive women to use more politeness markers? What will be the most frequent word on Egyptian social media next week? Week 1

Quiz True or False? Computers facilitate CL analysis. In our case, big data refers to massive amounts of texts. CL does not answer inferential questions; questions of why. CL cannot answer questions about the future of a given language. Quantitative, descriptive language analysis can be done manually. We can describe CL as an applied field with real-world applications. Week 1

Corpus Linguistics and Linguistic Theories Corpus Linguistics studies are roughly classified into: Corpus-based studies: with these studies the point of departure is usually a linguistic theory or a claim made in the literature that we would like to test against new data – that is, a new corpus. For example, Lakoff (1979) claimed that women are more likely to use fillers than men. What would I do if I want to test this claim in 2017? Corpus-driven studies: with these studies the point of departure is usually a question that we are seeking an answer for. They studies are typically experimental or empirical in nature. Week 1