Brief historical interlude

Slides:



Advertisements
Similar presentations
Introductory Mathematics & Statistics for Business
Advertisements

Overview of Lecture Parametric vs Non-Parametric Statistical Tests.
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Chapter 7 Sampling and Sampling Distributions
Proving a Premise – Chi Square Inferential statistics involve randomly drawing samples from populations, and making inferences about the total population.
Chapter 13: Chi-Square Test
Contingency Tables Prepared by Yu-Fen Li.
Chi-Square and Analysis of Variance (ANOVA)
Chapter 4 Inference About Process Quality
Comparing Two Population Parameters
Lesson 14 - R Chapter 14 Review:
STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Putting Statistics to Work
Inferential Statistics
Ch 14 實習(2).
Testing Hypotheses About Proportions
Multiple Regression and Model Building
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 14 From Randomness to Probability.
Chapter 13 Comparing Two Populations: Independent Samples.
Chapter 26 Comparing Counts
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
CHI-SQUARE(X2) DISTRIBUTION
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
2013/12/10.  The Kendall’s tau correlation is another non- parametric correlation coefficient  Let x 1, …, x n be a sample for random variable x and.
SPSS Session 5: Association between Nominal Variables Using Chi-Square Statistic.
Chi Square Example A researcher wants to determine if there is a relationship between gender and the type of training received. The gender question is.
Bivariate Analyses.
Basic Statistics The Chi Square Test of Independence.
Hypothesis Testing IV Chi Square.
Analysis of frequency counts with Chi square
Making Inferences for Associations Between Categorical Variables: Chi Square Chapter 12 Reading Assignment pp ; 485.
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Hypothesis Testing:.
1 GE5 Lecture 6 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson.
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 10.7.
Statistical Significance R.Raveendran. Heart rate (bpm) Mean ± SEM n In men ± In women ± The difference between means.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 13: Nominal Variables: The Chi-Square and Binomial Distributions.
Statistical Analysis Mean, Standard deviation, Standard deviation of the sample means, t-test.
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Chi-Square X 2. Parking lot exercise Graph the distribution of car values for each parking lot Fill in the frequency and percentage tables.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Chi square analysis Just when you thought statistics was over!!
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Chapter Eight: Using Statistics to Answer Questions.
Leftover Slides from Week Five. Steps in Hypothesis Testing Specify the research hypothesis and corresponding null hypothesis Compute the value of a test.
Chi-Square X 2. Review: the “null” hypothesis Inferential statistics are used to test hypotheses Whenever we use inferential statistics the “null hypothesis”
The Analysis of Variance ANOVA
Chi-Square Test (χ 2 ) χ – greek symbol “chi”. Chi-Square Test (χ 2 ) When is the Chi-Square Test used? The chi-square test is used to determine whether.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
The Chi Square Equation Statistics in Biology. Background The chi square (χ 2 ) test is a statistical test to compare observed results with theoretical.
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
Statistics for Psychology CHAPTER SIXTH EDITION Statistics for Psychology, Sixth Edition Arthur Aron | Elliot J. Coups | Elaine N. Aron Copyright © 2013.
Basic Statistics The Chi Square Test of Independence.
Statistical Analysis: Chi Square
9.3 Hypothesis Tests for Population Proportions
INF397C Introduction to Research in Information Studies Spring, Day 12
i) Two way ANOVA without replication
Hypothesis Testing Review
Qualitative data – tests of association
Hypothesis Testing Using the Chi Square (χ2) Distribution
Chapter 12: Inference about a Population Lecture 6b
The Chi-Square Distribution and Test for Independence
Chi Square (2) Dr. Richard Jackson
Lesson 11 - R Chapter 11 Review:
Hypothesis Tests for a Standard Deviation
Chapter Nine: Using Statistics to Answer Questions
Presentation transcript:

Brief historical interlude Karl Pearson (b. London1857; d. London 1936) Considered the founder of mathematical statistics Developed the chi-square test (published July 1900) Coined the term “standard deviation” Developed the product-moment correlation Now you know who to blame

Chi-square test of significance Χ2 tests how cases are distributed across a variable One variable—how its distribution compares to a second, given distribution. Two variables—tests whether the two variables are related (statistically) to each other Popular for crosstabs H0: every i.v. category has the same distribution across the d.v. as the total—i.e., the i.v. doesn’t matter (the two variables are unrelated).

Chi-square (cont.) The idea is that we calculate a statistic (i.e., we use a particular formula to calculate a number), where we know that the statistic has a certain distribution (in this case what’s called a chi-square distribution) that will occur simply by chance variation (in a sample). I.e., this statistic has a known (not normal) sampling distribution. Because we know the distribution of this statistic, we can tell whether the result we get (the number that we calculate) is larger than one would expect by chance. If it is, we conclude (in the case of two variables) that they are in fact related.

Chi-square (cont.) Here’s the formula: As you can see, once we figure out what frequencies we’re talking about, it’s not a very complicated formula. Nevertheless, SPSS will do the calculations for us. (SPSS is really good at multiplying, dividing, stuff like that.)

Chi-square (cont.) Here’s the distribution: Pearson’s great insight was to figure out that if there is in fact no relationship between two variables, if you draw repeated samples and calculate the formula on the last slide, you’ll get this kind of distribution simply because of chance variation. What we do in practice is to draw one sample and make the calculation. If the number we get is large enough, it tells us that we almost certainly didn’t get this number by chance. I.e., there really is a relation-ship between these variables in the population.

Chi-square (cont.) The distribution is what you would expect by chance. It’s very likely you would get a small value and very unlikely that you would get larger and larger values. (The x-axis isn’t labeled, but distribution starts at zero.) If you look back at the formula, you can see that the larger the difference between what you expected (given no relationship) and what you observed, the larger will be the chi-square number. If it’s large enough, it tells you that you almost certainly didn’t get this number by chance. I.e., there really is a relationship between these variables in the population.

Example: Does an observed frequency distribution match the population distribution? A study of grand juries in one county compared the demographic characteristics of jurors with the general population, to see if the jury panels were representative. The investigators wanted to know whether the jurors were selected at random from the population of this county. (This is an example of comparing one distribution with a second distribution. In this case, the second distribution is that of the population of the county on some characteristic.) The observed data are given on the next page.

County-wide population Example continued…. Observed data Age County-wide population Number of jurors 21 to 40 42% 5 (7.6%) 41 to 50 23% 9 (13.6%) 51-60 16% 19 (28.8%) 61 and over 19% 33 (50.0%) Total 100% 66 (100.0%)

Example continued…. Expected data Age Observed Expected 21 to 40 5 0.42*66=27.7 41 to 50 9 0.23*66=15.2 51-60 19 0.16*66=10.6 61 and over 33 0.19*66=12.5 Total 66

Example continued…. As noted, the test statistic is: Given our data,

The chi-square table To use the table, we need what is called the degrees of freedom. In this case, the degree of freedom is the number of categories minus 1, or 4-1=3. Just like the t-table, we look up 3 on the table, and then look for the test statistic and report the bounds.

Example concluded The df are (4-1)=3, and the p-value is therefore roughly 0. (Read values across the top as the area to the right of the critical value.) So, with simple random sample, we conclude that it is almost impossible for a jury to differ this much from the county age distribution. The inference is that grand juries are not selected at random.

Now with a picture… Chi-sq critical values (area to the right of crit val.)

Testing independence with chi-square In a certain town, there are about 1 million voters. An SRS of 10,000 was chosen to study the relationship between gender and participation. Are gender and voting independent? We can answer this with a chi-square test. First, we need the expected values for each cell. Men Women Total Voted 2792 3591 6383 Didn’t vote 1486 2131 3617 4278 5722 10000

Expected values The expected value for each cell is simply: For men who voted, this is: Thus, we have the following:

Observed and expected values Difference Men Women Vote 2792 3591 2730.6 3652.4 61.4 -61.4 Didn’t vote 1486 2131 1547.4 2069.6 We calculate the test statistic in the same way as before:

The test statistic The degrees of freedom are:

The p-value The p-value is therefore around 1%. Based on the p-value, we reject the null hypothesis and conclude that voting and gender are not independent. Or, in other words, men and women (in the population) don’t vote the same way.

Caveats, complications Caveat: low frequencies can befuddle chi-square. When the expected frequency in a cell is below about 10, the values aren’t quite what they should be. One “solution” is to recode categories so the expected frequencies aren’t so small. There are some other “corrections” that one can make (but we won’t go into).

Testing caveat #1 There is nothing special about 5% or 1%. If our significance level is 5%, what is the difference between a p-value of 4.9% and a p-value of 5.1%? One is statistically significant, and one is not. But does that make sense? One solution (not often used): report the p-value, not just the conclusion.

Testing caveat #2 Data snooping What does a significance level of 5% mean? There is a 5% chance of rejecting the null hypothesis when it is actually true. If our significance level is 5%, how many results would be “statistically significant” just by chance if we ran 100 tests? We would expect 5 to be “statistically significant,” and 1 to be “highly significant.”

Testing caveat #2 continued…. So what can we do? 1. One can state how many tests were run before statistically significant results turned up. 2. If possible, one can test one’s conclusions on an independent set of data. 3. Again, there are some statistical procedures that can help—basically playing off the idea that (at the 5% level) one should get 5% of the results significant merely by chance.

Testing caveat #3 Was the result important? What is the magnitude of the difference? In the example above, 65.3% of men voted, 62.8% of women voted). Chi-square doesn’t tell you that. Even the level of significance doesn’t tell you that. (Yes, it’s related, but it depends on the number of cases, so you can’t simply say that a difference significant at, say, the 1% level is really big.) This leads directly to the next topic: measures of assoc.

Testing caveat #3 conclusion The moral of the story is: A statistically significant difference may not be important. And… An important difference may not be statistically significant.