Biostatistics course Part 12 Association between two categorical variables Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division.

Slides:



Advertisements
Similar presentations
Biostatistics course Part 13 Effect measures in 2 x 2 tables Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences.
Advertisements

Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and.
Biostatistics course Part 6 Normal distribution Dr. en C. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and.
Biostatistics course Part 2 Types of studies in epidemiology Dr. en C. Nicolas Padilla Raygoza Departrment of Nursing and Obstetrics Division of Health.
Biostatistics course Part 11 Comparison of two proportions Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences.
Biostatistics course Part 14 Analysis of binary paired data
CHI-SQUARE(X2) DISTRIBUTION
Research planning Dr. Nicolas Padilla Raygoza Department of Nursing and Obstetrics MCM María de Lourdes García Campos Department of Clinical Nursing Division.
SPSS Session 5: Association between Nominal Variables Using Chi-Square Statistic.
Hypothesis Testing IV Chi Square.
Biostatistics course Part 4 Probability Dr. C. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engioneering.
Chapter 13: The Chi-Square Test
Analysis of frequency counts with Chi square
CJ 526 Statistical Analysis in Criminal Justice
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
Chapter 10 Analyzing the Association Between Categorical Variables
Categorical Data Prof. Andy Field.
University of Guanajuato Campus Celaya Salvatierra Division of Health Sciences and Engineering Department of Nursing and Obstetrics Dr. Nicolas Padilla.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
Biostatistics course Part 16 Lineal regression Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering.
Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering.
CJ 526 Statistical Analysis in Criminal Justice
Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering.
For testing significance of patterns in qualitative data Test statistic is based on counts that represent the number of items that fall in each category.
Chapter 11 Chi-Square Procedures 11.3 Chi-Square Test for Independence; Homogeneity of Proportions.
Course on Biostatistics Part 1 What is statistics? Dr. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
Biostatistics course Part 5 Binomial distribution
Chi-Square X 2. Parking lot exercise Graph the distribution of car values for each parking lot Fill in the frequency and percentage tables.
Biostatistics course Part 3 Data, summary and presentation Dr. en C. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences.
© 2014 by Pearson Higher Education, Inc Upper Saddle River, New Jersey All Rights Reserved HLTH 300 Biostatistics for Public Health Practice, Raul.
Analysis of Qualitative Data Dr Azmi Mohd Tamil Dept of Community Health Universiti Kebangsaan Malaysia FK6163.
Nonparametric Tests: Chi Square   Lesson 16. Parametric vs. Nonparametric Tests n Parametric hypothesis test about population parameter (  or  2.
+ Chi Square Test Homogeneity or Independence( Association)
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
HYPOTHESIS TESTING BETWEEN TWO OR MORE CATEGORICAL VARIABLES The Chi-Square Distribution and Test for Independence.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics: A First Course Fifth Edition.
Copyright © 2010 Pearson Education, Inc. Slide
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Chapter 11: Chi-Square  Chi-Square as a Statistical Test  Statistical Independence  Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Non-parametric Tests e.g., Chi-Square. When to use various statistics n Parametric n Interval or ratio data n Name parametric tests we covered Tuesday.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Biostatistics course Part 7 Introduction to inferential statistics Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics, Division Health.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Biostatistic course Part 10 Inferences from a proportion Dr. Sc. Nicolas Padilla Raygoza Department dof Nursing and Obstetrics Division Health Sciences.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
 Test for Qualitative variables Chi Square Test Dr. Asif Rehman.
POLS 7000X STATISTICS IN POLITICAL SCIENCE CLASS 9 BROOKLYN COLLEGE-CUNY SHANG E. HA Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
I. ANOVA revisited & reviewed
CHI-SQUARE(X2) DISTRIBUTION
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter 9: Non-parametric Tests
Chapter 11 Chi-Square Tests.
Association between two categorical variables
Categorical Data Aims Loglinear models Categorical data
Biostatistics course Part 2 Types of studies in epidemiology
Elementary Statistics
The Chi-Square Distribution and Test for Independence
Chapter 10 Analyzing the Association Between Categorical Variables
Chapter 11 Chi-Square Tests.
Chapter 11 Chi-Square Tests.
Presentation transcript:

Biostatistics course Part 12 Association between two categorical variables Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering Campus Celaya-Salvatierra University of Guanajuato, Mexico

Biosketch Medical Doctor by University Autonomous of Guadalajara. Pediatrician by the Mexican Council of Certification on Pediatrics. Postgraduate Diploma on Epidemiology, London School of Hygiene and Tropical Medicine, University of London. Master Sciences with aim in Epidemiology, Atlantic International University. Doctorate Sciences with aim in Epidemiology, Atlantic International University. Associated Professor B, Department of Nursing and Obstetrics, Division of Health Sciences, University of Guanajuato Campus Celaya Salvatierra.

Competencies The reader will analyze the relationship between two categorical variables with two or more categories. He (she) will apply the Chi-squared test. He (she) will know the Chi-squared test for trends and when apply it.

Introduction In part three, we learned how to tabulate a frequency distribution for a categorical variable. This tab shows how individuals are distributed in each category of a variable. For example, in a rural community in Celaya, a randomized sample of 200 people were asked about their level of socioeconomic status.

Introduction The table shows the distribution of individuals in each category Socioeconomic Index Level (SEIL). SEILn% Low5025 Regular11055 High4020 Total200100

Introduction When we examine the relationship between two categorical variables, tabulated one against other. This is a two way table or cross-tabulation. SEILSouthCenterNorth Low33710 Regular98120 High2830 Total449660

Interpretation of a two ways table There is an association between two categorical variables, if the distribution of a variable varies according to the value of the other. The question we are interested in is: Is the level of SEIL varies by place of residence? To answer this question we need to assess a cross-tabulation

Interpretation of a two ways table To compare the distributions in the table, we need to consider the percentages. To answer the question of interest, what should we consider the percentages of column or row? SEILSouth % Center n % North n % Low Regular High Total Place of residence

Expected frequencies If the null hypothesis is true, there is no association between SEIL and area of residence, the percentages for each level of SEIL in each area, should be the same as the column of percentages in the total column.

Example of expected frequencies The percentage of people in low SEIL in the total sample is 50 (25%). If the null hypothesis is true, we should expect that 25% of people in the place of residence, Center, with low SEIL, are: 25% of 96 = 24

SEILSouth n % Center n % North n % Total n % Low Regular High Total Place of residence Interpretation of a two ways table

Example of expected frequencies If there are no differences in the distribution of SEIL by places of residence, we should expect that the percentage of people with low SEIL is the same in each place of residence. Note that the expected frequencies do not have to be integers. Using the totals of columns and rows, we can calculate the expected number in each cell

Chi-squared test Expected frequencies are those that we should expect if the null hypothesis were true. To test the null hypothesis, we must compare the expected frequencies with observed frequencies, using the following formula. (O – E) 2 X 2 =Σ E

Chi-squared test From the formula we can see that: If there is a significant difference between the observed and expected values, X 2 will be great If there is a small difference between the observed and expected values, X 2 will be small. If X 2 is large, suggesting that data do not support the null hypothesis because the observed values are not what we expect under the null hypothesis. If X 2 is small, the data suggests that support from the null hypothesis that the observed values are similar to those expected under the null hypothesis.

Chi-squared test SEILSouth O E Center O E North O E Total n Low Regular High Total Place of residence

Chi-squared test SEILPlace of residence ObservedExpectedO - E(O-E) 2 (O-E) 2 /E LowSouth LowCenter LowNorth RegularSouth RegularCenter RegularNorth HighSouth HighCenter HighNorth Total138.1

Chi-squared test in 2 x 2 tables When both variables are binary, the cross- tabulation table becomes a 2 x 2. The X2 test was applied in the same way as for a larger table.

Example There was a study of the bacteriological efficacy of clarithromycin vs penicillin, in acute pharyngotonsillitis in children by Streptococcus Beta Haemolytic Group A. The results are shown below DrugCureNot cureTotal Clarithromycin Penicillin Total

Example To use Chi-squared test, we should point the null hypothesis; in this case, it should be: There are not differences between bacteriological efficacy between the two treatments, against Streptococcus Beta Hemolytic Group A. To test the null hypothesis, first we should calculate the expected numbers in each cell from the table. DrugCure O E Not cure O E Total Clarithromycin Penicillin Total

Example DrugEffect ObservedExpected O - E(O-E) 2 (O-E) 2 /E ClarithromycinCure ClarithromycinNot cure PenicillinCure PenicillinNot cure Total3.47

A quickly formulae for 2 x 2 tables X 2 can be calculate using the observed frequencies in a table and marginal totals. If we labeled the cells and marginal totals as follow: ExposureResult Yes Result No Total Yesaba + b Nocdc + d Totala + cb + dN X 2 =(ad – bc) 2 x N /(a+b) (c+d) (a+c) (b+d)

Trend test in 2 x c tables We had use Chi-squared test to evaluate if two categorical variables are associated between them in the population. When one variable is binary and another is ordered categorical (ordinal), we can be interested in to comprobe if their association follow a trend.

Trend test in 2 x c tables Low O E Regular O E High O E Total Hypertension Without hypertension Total HypertensionSEILObservedExpectedO - E(O-E) 2 (O-E) 2 /E YesLow YesRegular YesHigh NoLow NoRegular NoHigh Total27.2 SEIL

Trend test in 2 x c tables To calculate this test, assign a numerical score to each socioeconomic group. LowRegularHighTotal Hypertension Without hypertension Total SEIL

Chi-squared test trends We conducted a chi-square test for trend, when we assess whether a binary variable, varies linearly through the levels of another variable, to assess whether there is a dose-response effect. The null hypothesis for this test is that the mean scores in the two groups (the binary variable) are the same. Thus, the Chi square test becomes a test comparing two means by this is with only one degree of freedom.

Chi-squared test for trends _ _ (X (Yes) – X (No) ) 2 X 2 = = S 2 (1/n1 + 1/n2) _ X (Yes) = mean of score from hypertension group _ X (No) = mean of score from non-hypertension group n1 total of people in hypertension group n2 total of people in non-hypertension group s= standard deviation for overall scores from both groups

Validity of Chi-squared tests Chi square tests that we reviewed are based on the assumption that the test statistic follows approximately the distribution of X 2. This is reasonable for large samples but for the small one should use the following guidelines: For 2 x 2 tables If the total sample size is> 40, then X 2 can be used. If n is between 20 and 40, and the smallest expected value is 5, X 2 can be used. Otherwise, use the exact value of Fisher. 2 x c tables The X 2 test is valid if not more than 20% of expected values is less than 5 and none is less than 1.

Bibliografy 1.- Last JM. A dictionary of epidemiology. New York, 4ª ed. Oxford University Press, 2001: Kirkwood BR. Essentials of medical ststistics. Oxford, Blackwell Science, 1988: Altman DG. Practical statistics for medical research. Boca Ratón, Chapman & Hall/ CRC; 1991: 1-9.