Introduction: Statistics meets corpus linguistics

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

REVIEW OF BASICS PART II Probability Distributions Confidence Intervals Statistical Significance.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
DATA ANALYSIS I MKT525. Plan of analysis What decision must be made? What are research objectives? What do you have to know to reach those objectives?
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
The Basics of Regression continued
1 SOC 3811 Basic Social Statistics. 2 Announcements  Assignment 2 Revisions (interpretation of measures of central tendency and dispersion) — due next.
Click on image for full.pdf article Links in article to access datasets.
Lect 10b1 Histogram – (Frequency distribution) Used for continuous measures Statistical Analysis of Data ______________ statistics – summarize data.
Today Concepts underlying inferential statistics
Chapter 12 Inferring from the Data. Inferring from Data Estimation and Significance testing.
Statistical Inference Dr. Mona Hassan Ahmed Prof. of Biostatistics HIPH, Alexandria University.
Chapter Ten Introduction to Hypothesis Testing. Copyright © Houghton Mifflin Company. All rights reserved.Chapter New Statistical Notation The.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Testing Hypotheses I Lesson 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics n Inferential Statistics.
CHAPTER 4 Research in Psychology: Methods & Design
Statistical Analysis A Quick Overview. The Scientific Method Establishing a hypothesis (idea) Collecting evidence (often in the form of numerical data)
Quantitative Research Design and Statistical Analysis.
Statistical Analysis Mean, Standard deviation, Standard deviation of the sample means, t-test.
Data Analysis (continued). Analyzing the Results of Research Investigations Two basic ways of describing the results Two basic ways of describing the.
Statistics in Biology. Histogram Shows continuous data – Data within a particular range.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
Inferential Statistics Body of statistical computations relevant to making inferences from findings based on sample observations to some larger population.
Determination of Sample Size: A Review of Statistical Theory
1 Chapter 8 Introduction to Hypothesis Testing. 2 Name of the game… Hypothesis testing Statistical method that uses sample data to evaluate a hypothesis.
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
Scientific Method Probability and Significance Probability Q: What does ‘probability’ mean? A: The likelihood that something will happen Probability.
The Scientific Method: Terminology Operational definitions are used to clarify precisely what is meant by each variable Participants or subjects are the.
Statistical Analysis Quantitative research is first and foremost a logical rather than a mathematical (i.e., statistical) operation Statistics represent.
Chapter 6: Analyzing and Interpreting Quantitative Data
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Internal assessment, Results, Discussion, and Format By Mr Daniel Hansson.
Course Overview Collecting Data Exploring Data Probability Intro. Inference Comparing Variables Relationships between Variables Means/Variances Proportions.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
DATA ANALYSIS Data analysis helps discover and substantiate patterns and relationships, test our expectations, and draw inferences that make our research.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
Data analysis and basic statistics KSU Fellowship in Clinical Pathology Clinical Biochemistry Unit
15 Inferential Statistics.
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 8 Introducing Inferential Statistics.
Advanced Data Analytics
Lecture 9-I Data Analysis: Bivariate Analysis and Hypothesis Testing
Paired Samples and Blocks
Introduction to Statistics for Engineers
CHAPTER 4 Research in Psychology: Methods & Design
Dr.MUSTAQUE AHMED MBBS,MD(COMMUNITY MEDICINE), FELLOWSHIP IN HIV/AIDS
Inference for Regression
Statistics.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Political Research & Analysis (PO657) Session V- Normal Distribution, Central Limit Theorem & Confidence Intervals.
AP STATISTICS REVIEW INFERENCE
Introduction to Inferential Statistics
Making Data-Based Decisions
Hypothesis tests for the difference between two means: Independent samples Section 11.1.
Statistical Tests P Values.
Discrete Event Simulation - 4
Statistical Inference
Data analysis and basic statistics
Change over time: Working with diachronic data
Paired Samples and Blocks
CHAPTER 12 More About Regression
CHAPTER 12 Inference for Proportions
CHAPTER 12 Inference for Proportions
CHAPTER 10 Comparing Two Populations or Groups
Comparing Two Proportions
Testing Hypotheses I Lesson 9.
Analyzing and Interpreting Quantitative Data
Presentation transcript:

Introduction: Statistics meets corpus linguistics Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

What is statistics? Science, corpus linguistics and statistics Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Think about and discuss What is your personal experience with statistics (if any)? Do you think statistics should be given a more prominent place at schools/universities? Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

What is statistics? Science, corpus linguistics and statistics Statistics is a “science of collecting and interpreting data” (Diggle & Chetwynd 2011: vii). Statistics is a discipline which helps us make sense of quantitative data (Brezina 2017 forth). Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Generalising… 591.45 mean median EXAMPLE 1: Use of adjectives by fiction writers 508, 542, 552, 553, 565, 567, 570, 599, 656, 695, 699 mean 591.45 median Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Finding relationship… EXAMPLE 2: Use of adjectives and verbs by fiction writers 508, 542, 552, 553, 565, 567, 570, 599, 656, 695, 699 2339, 2089, 2056, 2276, 2233, 2241, 1995, 2043, 1976, 2062 Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Building models… Example 3: What’s the area of Great Britain? = 900×520 2 = 234,000 km2 900 km 520 km Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Building models… Example 3: What’s the area of Great Britain? = 900×520 2 = 234,000 km2 900 km Error: 4,152 520 km Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Two things we can do with stats dispersions data sets describe infer collocations frequencies graphs statistical tests 95% confidence intervals p-values null hypotheses Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Basic statistical terminology Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Basic statistical terminology: review assumption effect size rogue value case normal distribution statistical measure confidence interval null-hypothesis statistical test dataset outlier standard deviation dispersion p-value variable distribution robust Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Statistical test Hypothesis (e.g. Men and women use language differently.) Null hypothesis: There is no difference between how men and women use language. Corpus (male) Corpus (female) A scientific hypothesis is the initial building block in the scientific method. Many describe it as an “educated guess,” based on prior knowledge and observation, as to the cause of a particular phenomenon. It is a suggested solution for an unexplained occurrence that does not fit into current accepted scientific theory. A hypothesis is the inkling of an idea that can become a theory, which is the next step in the scientific method. A key function in this step in the scientific method is deriving predictions from the hypotheses about the results of future experiments, then performing those experiments to see whether they support the predictions. 16 14 Is the difference due to chance or is it statistically significant?

Statistical test (cont.) Null hypothesis Statistical test p-value How much evidence do we have in the data to reject the null hypothesis? reject the null hypothesis < 0.05 > 0.05 The probability of seeing values at least as extreme as observed if the null hypothesis were true.

Building of corpora and research design Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Think about and discuss How many texts do we need to collect to create a corpus? What does it mean to say that a corpus is representative? Are large corpora always better than small corpora? Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Corpus as a sample Corpus Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

20B 500M 100M 1M Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Representative? Unbiased? Corpus Corpus Representative? Unbiased? Corpus Corpus Corpus Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Corpus sampling Corpus Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Levels of analysis in corpus linguistics Dimension Key questions Key terms 1) DATA EXPLORATION What are the main tendencies in the data?   Graphs, means, SDs 2) INFERENTIAL STATISTICS: AMOUNT OF EVIDENCE Do we have enough evidence to reject the null hypothesis? Is the effect that we see in the sample due to chance (sampling error) or does it reflect something true about the population? statistically significant p-values confidence intervals 3) EFFECT SIZE How large is the effect in the sample? (standardised measure) effect size e.g. Cohen’s d, r 4) LINGUISTIC INTERPRETATION Is the effect linguistically/socially meaningful? Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Exploring data and data visualisation Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Think about and discuss Why is looking critically at data before analysis important? What types of errors can we encounter in a dataset? What types of graphs do you know? Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Exploring data and data visualisation Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Exploring data and data visualisation Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Things to remember Corpus linguistics is a scientific method. Successful application of statistical techniques in corpus linguistics depends on the use of a well-constructed unbiased corpus. Statistics uses mathematical expressions to help us make sense of quantitative data. Effective visualization summarizes patterns in data without hiding important features. Although most visible, p-values form only a (small) part of statistics. ‘Statistical significance’, ‘practical importance’ and ‘linguistic meaningfulness’ are three separate dimensions which shouldn’t be confused. Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.