Contingency Tables Prepared by Yu-Fen Li.

Slides:



Advertisements
Similar presentations
Introductory Mathematics & Statistics for Business
Advertisements

STATISTICS HYPOTHESES TEST (II) One-sample tests on the mean and variance Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National.
BUS 220: ELEMENTARY STATISTICS
1 2 Test for Independence 2 Test for Independence.
Overview of Lecture Parametric vs Non-Parametric Statistical Tests.
Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
C82MST Statistical Methods 2 - Lecture 2 1 Overview of Lecture Variability and Averages The Normal Distribution Comparing Population Variances Experimental.
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Chapter 7 Sampling and Sampling Distributions
Elementary Statistics
Tests of Significance and Measures of Association
Chapter 13: Chi-Square Test
Biostatistics Unit 10 Categorical Data Analysis 1.
Contingency Table Analysis Mary Whiteside, Ph.D..
Chapter 16 Goodness-of-Fit Tests and Contingency Tables
Chi-Square and Analysis of Variance (ANOVA)
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Statistical Significance for 2 x 2 Tables Chapter 13.
Hypothesis Tests: Two Independent Samples
Chapter 4 Inference About Process Quality
Module 16: One-sample t-tests and Confidence Intervals
Module 17: Two-Sample t-tests, with equal variances for the two populations This module describes one of the most utilized statistical tests, the.
Please enter data on page 477 in your calculator.
Statistical Inferences Based on Two Samples
© The McGraw-Hill Companies, Inc., Chapter 10 Testing the Difference between Means and Variances.
© The McGraw-Hill Companies, Inc., Chapter 12 Chi-Square.
Chapter Thirteen The One-Way Analysis of Variance.
Chapter 18: The Chi-Square Statistic
Chapter 8 Estimation Understandable Statistics Ninth Edition
1 Chapter 20: Statistical Tests for Ordinal Data.
Testing Hypotheses About Proportions
Simple Linear Regression Analysis
Correlation and Linear Regression
Multiple Regression and Model Building
January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.
Chapter 26 Comparing Counts
Inference about the Difference Between the
Hypothesis Testing IV Chi Square.
Chapter 13: The Chi-Square Test
Experimental Evaluation
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
Analysis of Categorical Data
CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Chapter 20 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 These tests can be used when all of the data from a study has been measured on.
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
Chapter-8 Chi-square test. Ⅰ The mathematical properties of chi-square distribution  Types of chi-square tests  Chi-square test  Chi-square distribution.
FPP 28 Chi-square test. More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies.
Analysis of Qualitative Data Dr Azmi Mohd Tamil Dept of Community Health Universiti Kebangsaan Malaysia FK6163.
Copyright © 2010 Pearson Education, Inc. Slide
Chapter 11: Chi-Square  Chi-Square as a Statistical Test  Statistical Independence  Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
More Contingency Tables & Paired Categorical Data Lecture 8.
Chi Square Tests PhD Özgür Tosun. IMPORTANCE OF EVIDENCE BASED MEDICINE.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
Comparing Counts Chi Square Tests Independence.
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
Lecture8 Test forcomparison of proportion
Chapter 11 Chi-Square Tests.
The binomial applied: absolute and relative risks, chi-square
Association between two categorical variables
Hypothesis Testing Review
Hypothesis testing. Chi-square test
Chapter 10 Analyzing the Association Between Categorical Variables
Contingency Tables: Independence and Homogeneity
Chapter 11 Chi-Square Tests.
UNIT V CHISQUARE DISTRIBUTION
S.M.JOSHI COLLEGE, HADAPSAR
Chapter 11 Chi-Square Tests.
Presentation transcript:

Contingency Tables Prepared by Yu-Fen Li

contingency table When working with nominal data that have been grouped into categories, we often arrange the counts in a tabular format known as a contingency table (or cross-tabulation) A r × c table (r rows and c columns) In the simplest case, two dichotomous random variables are involved; the rows of the table represent the outcomes of one variable, and the columns represent the outcomes of the other.

Example To examine the effectiveness of bicycle safety helmets, we wish to know whether these is an association between the incidence of head injury and the use of helmets among individuals who have been involved in accidents.

What are the hypotheses? H0: The proportion of persons suffering head injuries among the population of individuals wearing safety helmets at the accident is equal to the proportion of persons sustaining head injuries among those not wearing helmets versus HA: The proportions of persons suffering head injuries are not identical in the two populations

The Chi-Square Test The first step in carrying out the test is to calculate the expected count for each cell of the contingency table, given that H0 is true The chi-square test compares the observed frequencies in each category of the contingency table (represented by O) with the expected frequencies in each category of the contingency table (represented by E) given the null hypothesis is true.

The Chi-Square Test It is used to determine whether the deviations between the observed and the expected counts, O−E, are too large to be attributed to chance where rc is the number of cells in the table. To ensure that the sample size is large enough to make this approximation valid, no cell in the table should have an expected count less than 1, and no more than 20% of the cells should have an expected count less than 5.

How to compute the expected values? Observed (O) Expected (E)

Chi-square distributions A chi-square random variable cannot be negative; it assumes values from zero to infinity and is skewed to the right. Chi-square distributions with 2, 4, and 10 degrees of freedom

Yates correction We are using discrete observations to estimate χ2, a continuous distribution. The approximation is quite good when the degrees of freedom are big. We can apply a continuity correction (Yates correction) for a 2 × 2 table as

TS formula for a 2 × 2 table For a 2 × 2 table in the general format shown below

Another TS formula for a 2 × 2 table the test statistic (TS) χ2 without continuity correction can be express as the test statistic (TS) χ2 with continuity correction can also be express as

Example For the bicycle example we talked about earlier, if we apply the Yates correction, we would get the p-value is smaller than 0.001

Fisher’s exact test When the sample size is small, one can use Fisher’s exact test to obtain the exact probability of the observed frequencies in the contingency table, given that there is no association between the rows and columns and that the marginal totals are fixed. The details of this test is not presented here because the computations involved can be arduous.

McNemar’s Test We cannot use the regular chi-square test for the matched data, as the previous chi-square test disregard the paired nature of the data. We must take the pairing into account in our analysis. Consider a 2 × 2 table of observed cell counts about exposure status for a sample of n matched case-control pairs as follows

McNemar’s Test If the data of interest in the contingency table are paired rather than independent, we use McNemar’s test to evaluate hypotheses about the data To conduct McNemar’s test for matched pairs, we calculate the test statistic where b and c are the number of discordant pairs

Independent vs dependent data Independent Data Dependent Data

Example: matched pairs Consider the following data taken from a study investigating acute myocardial infarction (MI) among Navajos in the US. In this study, 144 MI cases were age- and gender-matched with 144 individuals free of heart disease Independent Data Dependent Data

Example: matched pairs The test statistic is with 0.001 < p < 0.01. Since p is less than α = 0.05, we reject the null hypothesis. For the given population of Navajos, we conclude that if there is a difference between individuals who experience infarction and those who do not, victims of acute MI are more likely to suffer from diabetes than the individuals free from heart disease who have been matched on age and gender

Strength of the association The chi-square test allows us to determine whether an association exists between two independent nominal variables McNemar’s test does the same thing for paired dichotomous variables However, neither test provides us a measure of the strength of the association

The Odds Ratio If an event occurs with probability p, the odds in favor of the event are p/(1−p) to 1. We can express an estimator of the OR as

The Confidence Interval (CI) for Odds Ratio The cross-product ratio is simply a point estimate of the strength of association between two dichotomous variables. To gauge the uncertainty in this estimate, we must calculate a confidence interval (CI) as well; the width of the interval reflects the amount of variability in the estimate of OR

The Confidence Interval (CI) for Odds Ratio When computing a CI for the OR, we must make the assumption of normality. However, the probability distribution of the OR is skewed to the right, and the relative odds can be any positive value between 0 and infinity. In contrast, the probability distribution of the natural logarithm of the OR, i.e. ln(OR), is more symmetric and approximately normal. Therefore, when calculating a CI for the OR, we typically work in the log scale.

The Confidence Interval (CI) for Odds Ratio Besides, to ensure that the sample size is large enough, the expected value of each cell in the contingency table should be at least 5. a 95% CI for the natural logarithm of the OR is where a 95% CI for the OR itself is

Bicycle Example we reject the null hypothesis and conclude that wearing a safety helmet at the accident is protective to the head injury

The Odds Ratio and 95% CI for matched pairs An OR can be calculated to estimate the strength of association between two paired dichotomous variables a 95% CI for the OR itself is

MI and DM example

Berkson’s Fallacy Berkson’s Fallacy is a common type of bias in case-control studies in particular hospital-based and practice-based studies. It occurs due to differential admission rates between cases and controls. This leads to positive (and spurious) associations between exposure and the case control status with the lowest admission rate.

Example : Berkson’s Fallacy Hospitalized patients + nonhospitalized subjects Hospitalized patients individuals who have a disease of the circulatory system are more likely to suffer from respiratory illness than individuals who do not there is no association between the two diseases

What happened? Why do the conclusions drawn from these two samples differ so drastically? To answer this question, we must consider the rates of hospitalization that occur within each of the four disease subgroups:

What happened? individuals with both circulatory and respiratory disease are more likely to be hospitalized than individuals in any of the three other subjects with circulatory disease are more likely to be hospitalized than those with respiratory illness. Therefore, the conclusions will be biased if we only sample patients who are hospitalized

What’s the lesson? We observe an association that does not actually exist. This kind of spurious relationship among variables – which is evident only because of the way in which the sample was chosen – is known as Berkson’s fallacy the sample must be representative