«A chi-square test showed that...» – or did it really? Bård Uri Jensen

Slides:



Advertisements
Similar presentations
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 6 Hypothesis Tests with Means.
Advertisements

Chapter 17/18 Hypothesis Testing
AP Statistics Section 14.2 B
Statistics for Linguistics Students Michaelmas 2004 Week 5 Bettina Braun
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
S519: Evaluation of Information Systems
 Once you know the correlation coefficient for your sample, you might want to determine whether this correlation occurred by chance.  Or does the relationship.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
STATISTICS. DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS.
The Chi-square Statistic. Goodness of fit 0 This test is used to decide whether there is any difference between the observed (experimental) value and.
AM Recitation 2/10/11.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Academic Viva POWER and ERROR T R Wilson. Impact Factor Measure reflecting the average number of citations to recent articles published in that journal.
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
Breaking Statistical Rules: How bad is it really? Presented by Sio F. Kong Joint work with: Janet Locke, Samson Amede Advisor: Dr. C. K. Chauhan.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Lunch & Learn Statistics By Jay. Goals Introduce / reinforce statistical thinking Understand statistical models Appreciate model assumptions Perform simple.
Inferential Statistics Body of statistical computations relevant to making inferences from findings based on sample observations to some larger population.
1.State your research hypothesis in the form of a relation between two variables. 2. Find a statistic to summarize your sample data and convert the above.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 6 Hypothesis Tests with Means.
Inferential Statistics Significance Testing Chapter 4.
STEP BY STEP Critical Value Approach to Hypothesis Testing 1- State H o and H 1 2- Choose level of significance, α Choose the sample size, n 3- Determine.
© Copyright McGraw-Hill 2004
_ z = X -  XX - Wow! We can use the z-distribution to test a hypothesis.
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
Ch. 26 Tests of significance Example: –Goal: Decide if a die is fair. –Procedure: Roll a die 100 times and count the number of dots. We observe 368 total.
STATISTICS. DESCRIPTIVE STATISTICS Quick Re-Cap From Last Year What do they tell us? What are the ways you can describe your data? What are the ways you.
Major Steps. 1.State the hypotheses.  Be sure to state both the null hypothesis and the alternative hypothesis, and identify which is the claim. H0H0.
Chapter 13 Understanding research results: statistical inference.
S519: Evaluation of Information Systems Social Statistics Inferential Statistics Chapter 15: Chi-square.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
HYPOTHESES TESTING. Concept of Hypotheses A hypotheses is a proposition which the researcher wants to verify. It may be mentioned that while a hypotheses.
Cross Tabulation with Chi Square
Statistical Analysis: Chi Square
Two Categorical Variables: The Chi-Square Test
Inferential Statistics
Section 9.5 Day 3.
Part Four ANALYSIS AND PRESENTATION OF DATA
Qualitative vs. Quantitative
Unit 3 Hypothesis.
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
Hypothesis testing Chapter S12 Learning Objectives
HYPOTHESIS TESTING Asst Prof Dr. Ahmed Sameer Alnuaimi.
This Week Review of estimation and hypothesis testing
© LOUIS COHEN, LAWRENCE MANION AND KEITH MORRISON
The non-parametric tests
Quantitative Methods PSY302 Quiz Chapter 9 Statistical Significance
Inferential Statistics
When Data Comes in Pairs
Chapter 9 Hypothesis Testing.
Hypothesis Testing and Comparing Two Proportions
Chapter 4 One-Group t-Test for the Mean
Quantitative Methods in HPELS HPELS 6210
Hypothesis Testing.
P-VALUE.
PSYB07 Review Questions: Set 4
Hypothesis Tests for a Standard Deviation
Doing t-tests by hand.
Power and Sample Size I HAVE THE POWER!!! Boulder 2006 Benjamin Neale.
How do you know if the variation in data is the result of random chance or environmental factors? O is the observed value E is the expected value.
Inferential statistics Study a sample Conclude about the population Two processes: Estimation (Point or Interval) Hypothesis testing.
Graphs and Chi Square.
Section 11.1: Significance Tests: Basics
Analyzing and Interpreting Quantitative Data
InferentIal StatIstIcs
Rest of lecture 4 (Chapter 5: pg ) Statistical Inferences
Presentation transcript:

«A chi-square test showed that...» – or did it really? Bård Uri Jensen

- or did it really? Allowing [statistical software] to do our thinking is a sure recipe for disaster. (Good & Hardin, 2012, p. xi)

- or did it really? «Simple» statistical tests chi-square ( X 2 ) test t-test

- or did it really? Statistical hypothesis testing 1.Formulate a hypothesis  E.g. In Norwegian L2, Vietnamese have more TENSE errors than Somali. 2.Formulate a null-hypothesis  Vietnamese and Somalis have the same rate of TENSE errors. 3.«Disprove» the null-hypothesis = demonstrate its unlikelihood  E.g. less than 5% chance for the null-hypothesis to be true  = «Significance» We choose α according to what we consider an acceptable risk of false conclusions  Often 5% in linguistic research

- or did it really? Conditions of use Independent observations  chi-square test  t-test Parametric assumptions  t-test The dangers of repeated testing  any test

- or did it really? A simple example from ornithology

- or did it really? A simple example from ornithology

- or did it really? A simple example from ornithology

- or did it really? A simple example from ornithology

- or did it really? A simple example from corpus linguistics

- or did it really? A simple example from corpus linguistics The observations should be independent. An important condition of use for  chi-squared test  t-test  The observations should be of different individuals. «Chi-square is a much-abused test in second language research studies, and often one of its assumptions (that of independence of data) is violated as a matter of course.» Larson-Hall (2010, p.206)

- or did it really? Example 1: Chi-squared test, non-independent observations Blom & Paradis 2013  Journal of Speech, Language, and Hearing Research  On past tense production in L2 children with language impairment 48 children with English as L2 Overregularization of past tense  Hypothesis: Less common in verb stems ending in /d/ or /t/ X 2 (1) = 3.45, p (one-sided) = Problem: n = , N = 48 Observations are not independent, so the result is invalid. overregularizationzero marking d# or t#1669 others4298

- or did it really? Example 1: Chi-squared test, non-independent observations Solution A:  Pick just one observation from each author/speaker “To exclude the author as one more relevant factor, the database was cleaned so that there is only one example for each verb from any single author.” Sokolova 2012, p. 94

- or did it really? Example 1: Chi-squared test, non-independent observations Solution A:  Pick just one observation from each author/speaker  Sokolova 2012 Solution B:  Calculate average values for each informant  Use the average values as independent observations  Test significance with an appropriate test, e.g. t-test or U-test  Gujord 2013 Both these solutions might require a larger corpus! «Solution» C:  Alter the research question  Danckaert 2011

- or did it really? Example 1: Chi-squared test, non-independent observations Solution B:

- or did it really? Example 2: T-test, non-independent observations Klavan 2012  PhD thesis from Tartu University  Investigation of adposition ‘peal’ and adessive case 450 observations of each, from 2 corpora t = 8.02, p < Conclusion: adessive phrases are longer than ‘peal’-phrases Problem: Observations are not independent. The conclusion is invalid.

- or did it really?

Example 3: T-test, non-normal populations Hunter (2011, s. 48)  PhD thesis from Birmingham University  On grammaticality judgements by L2 students Conclusion: the accuracy (max. = 1) for the teacher group (M =.98, SD =.14) was significantly higher than the student group (M =.64, SD =.49), t(1) = 4.9, p <.001. Problem:  Mean = 0.98, Maximum value = 1  Standard deviation= 0.14 The distribution cannot possibly be normal. The result is invalid.

- or did it really?

Example 4 Repeated testing Leedham 2011  PhD thesis, The Open University  Features in the writing of Chinese students in UK universities Conclusion: There are differences in frequencies of certain phrases between 3 rd year students and younger students Problem: Repeated testing without adjusting the probability values Some of the results are not valid.

- or did it really? CV

- or did it really? Moral There are no simple tests. 1.You should understand the conditions of the test. 2.You should take the conditions into account. 3.You should document properly  how you perform the test,  what numbers you put into it,  how the conditions are met. «A chi-square test showed that the difference is significant.»

- or did it really? Is it really that important? «[C]ompared to other social sciences (e.g., psychology, communication, sociology, anthropology, …) or branches of linguistics (e.g., psycholinguistics, phonetics, sociolinguistics…), most of corpus linguistics has paradoxically only begun to develop this methodological awareness.» Gries (forthcoming, p.1)

- or did it really? Is it really that important? «It has become increasingly apparent over a period of several years that psychologists, taken in the aggregate, employ the chi-square test incorrectly.» Lewis and Burke (1949)

- or did it really? Whose responsibility is it?

- or did it really? «Corpus linguistics needs to ‘catch up’ [...]» Gries (forthcoming, p.1)

References ( Boneau, A. C. (1960). The effects of violations of assumptions underlying the t test. Psychological Bulletin, 57(1), Good, P.I. & Hardin, J.W. (2012). Common errors in statistics (and how to avoid them). Hoboken: John Wiley. Gries, S (forthcoming). Quantitative designs and statistical techniques. Larson-Hall, J. (2010). A Guide to Doing Statistics in Second Language Research Using SPSS. New York: Routledge. Lewis, D., & Burke, C. J. (1949). The use and misuse of the chi-square test. Psychological Bulletin, 46(6), Blom & Paradis (2013). Past Tense Production by English Second Language Learners With and Without Language Impairment. In Journal of Speech, Language, and Hearing Research. 56, Danckaert, L. (2011). On the left periphery of Latin embedded clauses. Ph.D. thesis. University of Gent. Gujord, A.H. (2013). Grammatical encoding of past time in L2 Norwegian : The roles of L1 influence and verb semantics. Ph.D. thesis. University of Bergen. Hunter, J.D. (2011). A multi-method investigation of the effectiveness and utility of delayed corrective feedback in second-language oral production. Ph.D. thesis. University of Birmingham. Klavan, j. (2012). Evidence in linguistics : corpus-linguistic and experimental methods for studying grammatical synonymy. Ph.D. thesis. University of Tartu. Leedham, M. (2011). A corpus-driven study of features of Chinese students’ undergraduate writing in UK universities. Ph.D. thesis. The Open University. Sokolova, S. (2012). Asymmetries in Linguistic Construal : Russian Prefixes and the Locative Alternation. Ph.D. thesis. University of Tromsø. - or did it really?