Presentation on theme: "Chi-Squares II (other categorical measures of association)"— Presentation transcript:
1Chi-Squares II (other categorical measures of association) Lecture 5Chi-Squares II (other categorical measures of association)
2Measures of Association for Categorical Data One problem with the hypothesis testing framework that we’ll discuss later is the fact that any observed difference has the potential to be statistically significant, provided the sample size is large enough.Hence, the results of a hypothesis test could be depicted as a test to determine whether your sample size is large enough to detect the true difference between two populations.A more informative way of describing observed differences relies on effect size indices (statistics that attempt to depict differences in a metric that provides substantive meaning to the observed difference).In the context of chi-square tests, the appropriate effect size indices are measures of association (statistics that depict the magnitude of the relationship between the two variables in the table).
3Measures of Association for Categorical Data For example, consider the two tables below. Both have comparable chi-square and p-values, but most people would say that the one on the left shows evidence of a stronger relationship than the one on the right (particularly given the expected values shown in parentheses). Measures of association for these tables illustrate this difference.c2 = 4.94, p = c2 = 5.14, p = .0230(24.5)19300(281)262
4Measures of Association for Categorical Data There are four relevant measures of association for nominal (categorical) data. The first three are interpreted similarly, and the last has a different interpretation.Contingency coefficientPhi CoefficientCramer’s phi coefficientOdds ratioRisk ratioThe contingency coefficient cannot have a maximum value of 1, so its interpretation is somewhat difficult.
5Measures of Association for Categorical Data The phi coefficient and Cramer’s phi coefficient have a range from 0 to 1 with 0 indicating no association and 1 indicating a perfect relationship between the two variables in the contingency table.As a rule, values less than .2 indicate a negligible relationship, values from .2 up to .5 indicate an important relationship, and values from .5 up to 1 indicate a very strong relationship.The phi coefficient only applies to 2 x 2 tables, and Cramers phi (aka Cramer’s V) applies to any two-way table. As you can see, the equations for the three indices are similar.
6Measures of Association for Categorical Data Going back to our original example, let’s apply what we now know…300(281)262c2 = 4.94, p = .03
7Measures of Association for Categorical Data Going back to our original example, let’s apply what we now know…30(24.5)19c2 = 5.14, p = .02
8Measures of Association for Categorical Data The introduction of these correlation-based (we’ll learn more about correlations in Chapter 9) statistics introduces a new way of thinking about the null hypothesis of the Pearson chi-square test of association.Recall that our null hypothesis is that there is no relationship between the two variables depicted by the table and that we represent this symbolically asHo: ρij = ρ i+1,j, for all i .That is, the proportion of observations in one row equal the proportion of observations in another row for each column of the table.
9Measures of Association for Categorical Data Recall that this is a test of association and that Cramer’s V is a measure of association. Also, note that the null hypothesis implies that there is no relationship between the two variables in the table (i.e., that the proportion of observations in an individual cell is dictated by the marginal frequencies for the two variables).Hence, we can restate the null hypothesis for the Pearson chi-square test of association asHo: Cramer = 0That is, there is no relationship between the two variables in the table, which is equivalent to saying that the proportion of observations in one row equal the proportion of observations in another row for each column of the table.
10Measures of Association for Categorical Data The odds ratio (OR) is a little more difficult to understand, but it also has a straightforward interpretation. Note that the odds of an event is represented as a fraction 2/1, sometimes represented 2:1 or 2 to 1.The odds represents the likelihood of one event instead of the converse of that event.For example, you could describe the odds of a 1, 2, 3, or 4 on the roll of a six-sided die rather than something other than a 1 through 4 (i.e., a 5 or a 6) as 2/1 or simply 2 (4 to 2, simplified to 2 to 1).This means that an outcome of 1 through 4 is twice as likely as an outcome of 5 or 6. Hence, the odds of a 5 or 6 rather than its converse is 2/4 or 1/2 or simply 0.5a 5 or 6 is half as likely as a 1 through 4.Note that in this example, the ratio for the odds is a simplification of a ratio of probabilities. The probability of a 1 through 4 is (4/6) and the probability of a 5 or 6 is (2/6). So, the odds of 1 through 4 is
11Measures of Association for Categorical Data An odds ratio, on the other hand, is a ratio of odds compared between two groups.Let’s compare the odds of a fair die resulting in a 1 through 4 outcome to the odds of a “loaded” die resulting in a 1 through 4 outcome as the ratio of the odds for each event.For the fair die, the odds would be 2/1 or 2 (as stated on the previous slide).For the loaded die, the odds might be 4/1—loading the die has doubled the chances of seeing a 1 through 4.Hence, the odds ratio between an outcome of 1 through 4 for a fair versus a loaded die would be (2/1)/(4/1)=2/4, which equals That means that the odds of a fair die showing a 1 through 4 is only 50% as large as the odds of a loaded die showing a 1 through 4.
12Measures of Association for Categorical Data Alternatively, you can turn around this odds ratio by inverting the original odds ratio. Hence, 1 / .50 = 2, which is the odds ratio between an outcome of 1-4 for a loaded die versus a fair die. We can confirm this by constructing the odds ratio from the actual odds of each event:(4/1)/(2/1) =4/2= 2. Hence, the odds of a 1-4 on an loaded die is 2 times larger than the odds of a 1-4 on a fair die.Example in out text…
13Measures of Association for Categorical Data Table 6.4 The effect of aspirin on the incidence of heart attacksOdds of heart attack given that participants did not take aspirin:OddNoAspirin=189/10845=0.0174Odds of heart attack given that participants did take aspirin:OddAspirin=104/10933=0.0095OR= OddNoAspirin/OddAspirin=0.0174/0.0095=1.83Thus, the odds of having a heart attack given you didn’t take aspirin are 1.83 times greater than the odds of having a heart attack with aspirin.
14Measures of Association for Categorical Data An alternative calculation is simply dividing the cross products. Again we want to divide the odds of the treatment group (experimental group) by the odds of the no-treatment group (control group)Example in out text:Table 6.4 The effect of aspirin on the incidence of heart attacksAD/BC or BC/AD will yield different OR’s and different interpretationsOdds of heart attack given that participants did take aspirin:OddNoAspirin=189(10933)=Odds of heart attack given that participants did not take aspirin:OddAspirin=104(10845)=OR= OddNoAspirin/OddAspirin= / =1.83
15Measures of Association for Categorical Data Another commonly seen measure of association is relative risk (RR).The relative risk is a measure of the relative size of the probabilities of two events: p1 / p2. We know that the probability of a 1 through 4 on a fair die is 4/6 (or 2/3 = .67). From the odds ratio, for the loaded die, we can see that the probability of a 1 through 4 is 4/5 (p/1-p = 4, so p = .80). Hence, the relative risk of a 1 through 4 on a fair versus a loaded die is 2/3 / 4/5 or .83. That is, the likelihood of a 1 through 4 on a fair die is 83% of the likelihood of a 1 through 4 on a loaded die. This is different than the odds ratio for these events which equals .50.
16Measures of Association for Categorical Data Back to the example on the previous slide:Risk of heart attack given that participants did not take aspirin:RiskNoAspirin=189/11034=0.0171Odds of heart attack given that participants did take aspirin:RiskAspirin=104/11037=0.0094Risk Difference = =.0077RR= RiskNoAspirin/RiskAspirin=0.0171/0.0094=1.819Therefore, the risk of having a heart attack given you did not take aspirin is 1.82 times as likely than if you had taken aspirinNote: The odds ratio is only relevant for 2 x 2 tables
17Measures of Association for Categorical Data Some quick notes on risk and oddsRisk is intuitive but limitedIt is future oriented and inapplicable in retrospective studiesOdds is less intuitiveBut it is applicable in retrospective and prospective studiesCan make odds more intuitive with some simple transformations
18Measures of Association for Categorical Data ExampleThe odds of having a heart attack given you took aspirin are .54 times the odds of having a heart attack given you were in the placebo groupThe probability of having a heart attack given you were in the aspirin group is OR/(1+OR) = .54/1.54=.35The probability of having a heart attack given you were in the placebo group is 1.83/2.83 = .65= 1
19Measures of Association for Categorical Data A quick reminder…All of the tests that we present in this course will place certain requirement, expectations, or assumptions on the data in order for the test interpretation to be valid. For the chi-square test, the assumptions are:Independence: We assume that observations are independent of one another. That is, the value of any one observation does not depend on or is not influenced by the value of other observations in the dataset. Don’t confuse this with the test of independence, which focuses on independence between variables (not observations).One way to ensure independence among observations is to verify that the categories constitute mutually exclusive codes (an individual cannot be a member of multiple categories).Another way to ensure independence among observations is to use simple random sampling from the population. A third way to ensure independence is to evaluate your research design to determine whether there are opportunities for participants to interact or to for group clusters.
20Measures of Association for Categorical Data Normality: Recall that the chi-square distribution can be formed by summing squared observations from a standard normal curve (z-scores from a normal distribution). This suggests that the chi-square distribution relies on a normality assumption in some way.Look at the tables below. If you fix the margins as indicated, there are several configurations of allocating individuals to cells that allows you to maintain these marginal frequencies.515104651510456151047156310438156721035715103751510348157621039157110321015786451510631547106271548310618154921069154101
21Measures of Association for Categorical Data In fact, the distribution of possible values for any single cell in the table is normally distributed, given that the sample size is large enough and the probability of an observation falling in that cell is not extreme.Also, recall that the expected cell frequencies for the chi-square test are defined as Np (total sample size times the probability of being in that cell). Hence, the requirement of normality can be satisfied if the expected cell frequencies are of sufficient size. A rule of thumb is that all of the expected cell frequencies should be 5 or greater.
22Measures of Association for Categorical Data Sensitivity is the probability that an outcome occurs given a positive result on some (predictive) measure for that outcomeSpecificity is the opposite; the probability of not having some outcome or meeting criteria for some outcome given you screened negatively on some predictive measure.
23Measures of Association for Categorical Data This data is similar to mammography data predicting the presence and absence of breast cancer (not real data)We need to consider the conditional and marginal distributions to get at the answer of sensitivity and specificity
24Measures of Association for Categorical Data sensitivity = 8/10 = .80 (the probability of screening positive in the diagnostic cancer population)specificity = 880/990 = .89 (the probability of screening negative in the non-diagnostic cancer population)
25Measures of Association for Categorical Data Going one step further…What if we wanted to use all of this information to answer the question, “What is the probability of having cancer, given you screened positive for cancer?”Guesses?We can answer this with Baye’s theorem
26Measures of Association for Categorical Data P(C) = (8 + 2)/1000 = .01; P(C’) = .99 P(+)=(8+110)/1000=.12; sensitivity=.8 orThoughts? Is this what you expected?
27Measures of Association for Categorical Data When this requirement is not met, you can use exact statistic to perform the hypothesis test. The exact statistic is based on the empirical probability of observing a certain configuration of cell frequencies with fixed marginal frequencies. On the last page, several such configurations were shown. To perform an exact test, you would rank order the tables based on the value of one of the cells, determine the probability of observing a value in that cell equal to or less than the observed value, and declare that probability as the p-value for your hypothesis test. For this class, you don’t need to know how to do an exact test, but you do need to know that it is an alternative when expected cell frequencies are small.Inclusion of Nonoccurrences: Another requirement of the chi-square test is that all cases in the data set be included in the contingency table. That is, the coding system must be exhaustive—it must represent all elements of the sample.
28Measures of Association for Categorical Data A slightly different index, a measure of agreement rather than association, is coefficient kappa (κ—aka Cohen’s kappa). This index is referred to as a measure of agreement rather than a measure of association because it goes beyond merely indicating whether there is a relationship between two variables—kappa actually indicates the degree to which the categorizations of the two variables are identical.Coefficient kappa is commonly used to depict the level of agreement between two raters.
29Measures of Association for Categorical Data For example, the frequency table below provides an overly optimistic measure of association of the level of agreement between two raters. Cramer’s V for this table equals 0.54 indicating fairly strong association. However, they only agree in 12 out of 36 cases.rater1 rater2Frequency‚Percent ‚Row Pct ‚Col Pct ‚ ‚ ‚ ‚ Total0 ‚ ‚ ‚ ‚‚ ‚ ‚ ‚‚ ‚ ‚ ‚‚ ‚ ‚ ‚1 ‚ ‚ ‚ ‚‚ ‚ ‚ ‚‚ ‚ ‚ ‚‚ ‚ ‚ ‚2 ‚ ‚ ‚ ‚Total
30Measures of Association for Categorical Data One measure of agreement in the table below would be to sum the relative cell frequencies in cases where the two raters agree (e.g., 0,0; 1,1; and 2,2). In table on the previous page, the percentage of agreement between the raters would be 33% (12/36)--not that great.rater1 rater2Frequency‚Percent ‚Row Pct ‚Col Pct ‚ ‚ ‚ ‚ Total0 ‚ ‚ ‚ ‚‚ ‚ ‚ ‚‚ ‚ ‚ ‚‚ ‚ ‚ ‚1 ‚ ‚ ‚ ‚‚ ‚ ‚ ‚‚ ‚ ‚ ‚‚ ‚ ‚ ‚2 ‚ ‚ ‚ ‚Total
31Measures of Association for Categorical Data However, such an index is misleading because it ignores the fact that raters may agree by chance. Cohen’s kappa corrects for this problem by depicting the proportion of agreement attained beyond that attainable by chance. As shown below, kappa gives us the proportion of agreement that was attained once the proportion attainable by chance is removed from the actual proportion of agreement.Attainable by chanceAttainable beyond chance0.001.00Actualk
32Measures of Association for Categorical Data The computation formula for kappa demonstrates this relationship. In this formula, D indicates the diagonal elements of the frequency table (the cells in which the raters agree).The first element of the numerator indicates the sum of the number of observed agreements. The second element of the numerator indicates that you subtract the sum of the expected agreements from this (where the expected value is defined as the product of the marginal frequencies and N as was the case for the chi-square test). Hence, the numerator gives you the number of observations in agreement beyond those expected by chance.The denominator takes the total number of observations and subtracts the number of expected agreements giving us the number of observations beyond those expected to agree given the marginal frequencies. Hence, the numerator divided by the denominator (kappa) gives us the proportion of observations in agreement beyond those expected by chance.
33Measures of Association for Categorical Data Recall that in the table below, Cramer’s V equals 0.54 and the observed level of agreement equals 0.33 (12 out of 36 cases). Cohen’s kappa for this table equals 0.00 indicating that the observed level of agreement (0.33) is no better than that expected by chance alone.rater1 rater2Frequency‚Percent ‚Row Pct ‚Col Pct ‚ ‚ ‚ ‚ Total0 ‚ ‚ ‚ ‚‚ ‚ ‚ ‚‚ ‚ ‚ ‚‚ ‚ ‚ ‚1 ‚ ‚ ‚ ‚‚ ‚ ‚ ‚‚ ‚ ‚ ‚‚ ‚ ‚ ‚2 ‚ ‚ ‚ ‚Total
34Measures of Association for Categorical Data 6.29 Dabbs and Morris (1990) examined archival data from military records to study the relationship between high testosterone levels and antisocial behavior in males. Of 4016 men in the Normal Testosterone group, 10.0% had a record of adult delinquency. Of 446 men in the High Testosterone group, 22.6% had a record of adult delinquency. Is this relationship significant?6.30 What’s the odds ratio? How would you interpret it?
35Measures of Association for Categorical Data According to the description, the data for this study look like:The critical value for this study is χ2(1)=3.84 at 0.05 level.
36Measures of Association for Categorical Data According to the description, the data for this study look like:The odds of adult delinquency for high testosterone group isODDhigh=101/345=0.2928The odds of adult delinquency for normal testosterone group isODDnormal=402/3614=0.1112And the odds ratio OR=.2928/.1112=2.63The odds of engaging in behaviors of adult delinquency are 2.63 times higher if you are a member of the high testosterone group.