2 of 27 Outline Review of Important Definitions and Concepts Chi-Square Tests Goodness of Fit Test of Independence Sample Problems
3 of 27 What are Non-parametric Statistics? Methods of analyzing data that examine the relative position or rank of the data rather than the actual values. Non-parametric statistics do not: assume that the data come from a normal distribution. create any parameter estimates (e.g., means; standard deviations) to assess whether one set of numbers is statistically different from another set of numbers.
4 of 27 Chi-Square Goodness of Fit This test is used to evaluate questions concerning the probabilities associated with each value of a variable by comparing an observed frequency distribution to an expected frequency distribution. It’s most often used when a person has the observed frequencies for several mutually- exclusive categories and wants to decide if they have occurred equally frequently.
5 of 27 1. Random and independent sampling. 2. Sample size must be sufficiently large 3. Values of the variable are mutually exclusive and exhaustive. Every subject must fall in only one category. Note: If these values are not met, the critical values in the chi-square table are not necessarily correct. Chi-Square Goodness of Fit: Test Assumptions
6 of 27 Chi-Square Goodness of Fit: Computing by hand Note: df = (k - 1)
8 of 27 Chi-Square Test of Independence This test is used to examine whether two or more variables are related based on information about probabilities. It assesses whether observed frequencies of events differ from those that would be expected by chance. One common use is to determine whether there is an association between two independent variables.
9 of 27 The Chi-Square Test of Independence : Computing by hand Note: df = (k - 1)(q - 1)
10 of 27 Critical Values for the Chi-Square Test
11 of 27 Chi-Square Example Using SPSS A researcher is interested in whether cats can be trained to line dance. He recruits 200 cats and then tries to train them to line dance by giving them either food or affection as a reward for “dance-like” behavior. At the end of the week he counts how many of the cats could line dance and how many cannot. We have two categorical variables: Training (food vs. affection) and Dance (each cat learned to dance or it did not). Open cat.sav
17 of 27 Cramer’s V =.36, p<.01..36 out of 1 = a medium association between type of training and whether the cats dance. Can be viewed like a correlation coefficient. The significance level indicates it is highly unlikely the observed pattern of data is due to chance. How big is the effect?: Cramer’s V
18 of 27 How big is the effect?: Odds Ratio 1. Calculate odds that a cat danced given they had food as a reward. Odds (dancing after food) = number that had food and danced = 28/10 = 2.8 number that had food but didn’t dance 2. Calculate odds a cat danced given they had affection as a reward. Odds (dancing after affection) = number that had affection and danced = 48/114 number that had affection but didn’t dance = 0.421 3.Calculate odds ratio. Odds (dancing after food) = 2.8 / 0.421 = 6.65 Odds (dancing after affection) There was a significant association between the type of training and whether or not cats would dance, 2 (1)= 25.36, p<.001. This seems to represent the fact that, based on the odds ratio, the odds of cats dancing were 6.65 times higher if they were trained with food than if trained with affection.
21 of 27 This process tells the computer that it should weight each category combination by the number in the column labeled Frequency. So, for example, the computer “pretends” there are 28 rows of data that have the category combination 0,0, representing cats trained with food and that danced).
23 of 27 Chi-Square Example: Computing expected frequencies for hand-computation A researcher wants to know if there is a significant difference in the frequencies with which males come from small, medium, or large cities as contrasted with females. The two variables we are considering here are hometown size (small, medium, or large) and sex (male or female). Another way of putting our research question is: Is gender independent of size of hometown? The data for 30 females and 6 males is in the following table.
24 of 27 The formula for chi-square is: Where: O is the observed frequency, and E is the expected frequency. The degrees of freedom for the 2-D chi-square statistic is: df = (Columns - 1) x (Rows - 1)
25 of 27 In our example: Column Totals are 14 (small), 15 (medium), and 7 (large). Row Totals are 30 (female) and 6 (male). Grand total is 36. Computing Expected Frequencies Expected Frequency for each Cell: The cell’s Column Total X the cell’s Row Total / Grand Total
26 of 27 The expected frequency: 1. Small female cell:14 X 30 / 36 = 11.667 2. Medium female cell:15 X 30 / 36 = 12.500 3. Large female cell: 7 X 30 / 36 = 5.833 4. Small male cell:14 X 6 / 36 = 2.333 5. Medium male cell:15 X 6 / 36 = 2.500 6. Large male cell: 7 X 6 / 36 = 1.167 Computing Expected Frequencies