Presentation on theme: "Biostatistics course Part 13 Effect measures in 2 x 2 tables Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences."— Presentation transcript:
Biostatistics course Part 13 Effect measures in 2 x 2 tables Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering University of Guanajuato Campus Celaya-Salvatierra
Biosketch Medical Doctor by University Autonomous of Guadalajara. Pediatrician by the Mexican Council of Certification on Pediatrics. Postgraduate Diploma on Epidemiology, London School of Hygiene and Tropical Medicine, University of London. Master Sciences with aim in Epidemiology, Atlantic International University. Doctorate Sciences with aim in Epidemiology, Atlantic International University. Associated Professor B, Department of Nursing and Obstetrics, Division of Health Sciences and Engineering, University of Guanajuato, Campus Celaya Salvatierra, Mexico. firstname.lastname@example.org
Competencies The reader will obtain Risk Ratio or Odds Ratio from a 2 x 2 table. He (she) will calculate 95% confidence interval from RR or OR. He (she) will identify potential confounders and/or interactions. He (she) will apply Mantel Haenzsel test for RR, OR and Chi-squared.
Introduction In part 12 of the course, we tested the association between two categorical variables. Now, we review the methods used to measure the association. We will work with binary variables, so we will use 2 x 2 tables.
Example A nurse in a poor area of Mexico, was informed that many area children attending the nursery were sick of respiratory infections. She designed a cohort study to investigate the problem. During the following years 1000 children were followed. The main research question was: Attending nursery is associated with respiratory infection?
Example Respiratory infection Total Attending nursery Yes n % No n % Yes37 33.972 66.1109 No43 4.8848 95.2891 Total80 8920 921000
Risk Ratio (RR) In health research, the term "risk" is used instead of proportion. For example: The risk of infection among children attending day care was 33.9%. Thus, the risk ratio is the ratio of two proportions. The risk of respiratory infection for those attending the nursery 37 / (37 + 72) = 37/109 = 0.339 The risk of respiratory infection in children not attending day care is: 43 / (43 + 848) = 43/891 = 0.048. The risk ratio (RR) is the ratio of these two risks. Risk ratio = 0.339 / 0.048 = 7.06
Risk Ratio (RR) In general, the risk ratio can be obtained with the following formula, where a, b, c and d are the frequencies in the 2 x 2 table. Outcome Total ExposureYesNo Yesaba + b Nocdc + d Totala + c b + dN Risk Ratio = (a /a+b) / (c/c + d)
Odds Ratio (OR) The Odds Ratio (OR) is the ratio of the chance (probability) of the results between those exposed and the chance of the outcome among non-exposed. The chance of infection among attendees of the nursery is: 37 / 72 = 0,514 The chance of infection among children not attending day care is: 43 / 848 = 0,051 The Odds Ratio of these two probabilities: OR = 0,514 / 0,051 = 10.08 In general, the Odds Ratio was found with the following formula: OR = ad / bc = (a / c) / (b / d)
Confidence intervals In the analysis of data from children attending day care or not, we have the option to use RR or OR, to measure the effect of attendance at the nursery. Each value is an estimate only, so these values should be reported with confidence intervals. An approximate confidence interval at 95% for the RR is found using the following formula: Minimum value: RR / EF Maximum value: RR x EF EF = exp(1.96(1/a) – (1/a+b) + (1/c) –(1/c+d))
Confidence intervals CI for the data of children who attend day care or not, is: EF = exp (1.96 1 / 37 - 1 / 109 + 1 / 43 - 1/891 = 1.48 RR = 7.06 Minimum 7.06/1.48 = 4.77 Maximum value 7.06 x 1.48 = 10.45 95% CI = 4.77 to 10.45
Confidence intervals An approximate confidence interval at 95% for the OR is found using the following formula: Minimum value: OR / EF Maximum value: OR x EF EF = exp(1.96(1/a) + (1/b) + (1/c) + (1/d))
Confidence intervals CI for the data of children who attend day care or not, is: EF = exp (1.96 1 / 37 + 1 / 72 + 1 / 43 +1 / 848 = 1.65 OR = 10.08 Minimum value 10.08/1.65 = 6.11 Maximum value 10.08 x 1.65 = 16.63 95% CI = 6.11 to 16.63
Which measure is best? Risk Ratios are calculated for cross-sectional and cohort studies. The formula for the 95% confidence interval for RR requires larger sample sizes than for OR. OR are calculated for case-control and cross- sectional studies. In case-control studies is not possible to calculate risks, and therefore can not calculate RR. There is an advantage in using OR. It is a consistent measure of effect, unlike RR.
Example (Cont…) Mexican children showed a strong association between exposure (attending nursery) and outcome (respiratory infection). However such an association may be confounded by other factor(s). For example, although children who attend day care, seem to have a 7 times higher risk of respiratory infection, the cause of the infection can also be something that is associated with children who go to daycare. In other words, to attend the nursery may be a marker of exposure that causes a respiratory infection. If this is true, we can say that the association between respiratory infections and assistance to the nursery, are confused.
How identify a potential confounder? To evaluate a potential confounder, we should consider three aspects: The exposure The outcome The confounder
Example The nurse is interested in the association between day care attendance and presence of respiratory infection, but is aware that children might be exposed to other factors that cause respiratory infection. For example, overcrowding at home is a risk factor for respiratory infection. It is therefore a potential confounder of the association between attendance at day care and respiratory infections.
Confounders For a variable has been a potential confounding, it should meet three conditions: Must be: an independent risk factor for the outcome of interest should be associated with the exposure of interest not be in the cause pathway between exposure and outcome.
Confounders How do we check these conditions in the study of Mexican children? Condition 1 of confusion: Risk factor for the outcome of interest Is there an association between overcrowding and respiratory infection? Overcrowding in home RI Yes RI No Risk of RI Yes545554/109 =0.5 No2187021/891= 0.02 RR = 25 95%CI = 15.72 a 39.75 X 2 = 311.67 P<<0.05
Confounders How do we check these conditions in the study of Mexican children? Condition 2 of confusion: Association with exposure Is there an association between overcrowding and assistance to child care? Overcrowding in home Attendance to nursery Yes Attendance to nursery No Yes4366 No35856 X 2 = 170.39 P<<0.05
Confounders How do we check these conditions in the study of Mexican children? Condition 3 of confusion: Is the potential confusion is the causal pathway? In this example, it is unlikely that child care assistance, is caused by overcrowding
Do we have a confounder? In this study, overcrowding has satisfied the three conditions necessary for a confounding variable: It is an independent risk factor for the outcome of interest. Overcrowding is associated with respiratory infection. It is associated with the exposure of interest. Overcrowding is associated with attendance at the nursery. It is not in the causal pathway. Overcrowding is unlikely to be the cause of attendance at nursery.
Stratified tables Now, we know that the data must be additionaly analyzed for to have the effect of overcrowding. To adjust for confounder variable, we stratified the table 2 x 2 of interest. The table without stratify is called raw table. Can be divided into strata defined by the confounder variable. The sample is divided into two groups, each of them the status of overcrowding is the same. The two groups are: Overcrowding and without overcrowding
Stratified tables If we want to find childcare assistance is associated with respiratory infection when comparing children within the same category of overcrowding. The raw table for the relationship between respiratory infections and child care assistance: Respiratory infection Total Attendance to nursery Yes n % No n % Yes37 33.972 66.1109 No43 4.8848 95.2891 Total80 8920 921000
Stratified tables Now, it is show stratified tables by overcrowding and without overcrowding: Respiratory infection Yes Respiratory infection No Total Nursery Yes 611475 Nursery No 52126 Total6635101 Respiratory infection Yes Respiratory infection No Total Nursery Yes 102434 Nursery No 4861865 Total14885899 OvercrowdingWithout overcrowding RR= 4.23 X 2 =32.88 p=0.0000 95%CI 1.91 a 9.37 RR= 63.6 X 2 =178.84 p=0.0000 95%CI 21.01 a 192.56
Stratified tables Do you think that attendance at nursery is a risk factor for respiratory infections among children with overcrowding? Yes, children attending day care are 63 times more at risk of respiratory infection than those who do not attend nursery. The p value indicates a strong association between attendance at daycare and respiratory infection in the group without overcrowding.
Stratified tables Do you think that attendance at nursery is a risk factor for respiratory infection in the group without overcrowding? Yes, children attending day care are more than 3 times more at risk of respiratory infection than those not attending the nursery. The p value indicates a strong association between attendance at daycare and respiratory infection in this group. Within each stratum, the association between attendance at day care and respiratory infections is now independent of overcrowding at home.
Comparison of results How to compare these results with those of the raw table? The raw table shows a strong relationship between attendance at day care and respiratory infection, RR is different in both tables stratified but remains a significant statistical association. RR95%CIX2X2 P-value Raw7.064.77 a 10.45111.88<0.05 Overcrowding4.231.91 a 9.3732.88<0.05 Without overcrowding 63.621.01 a 192.56178.84<0.05
Adjusted Risk Ratios Nurse do not want show data divided into strata, prefer a global estimate of the effect of attended to nursery in respiratory tract infection adjusted by overcrowding. This can be done by calculate RR using a Mantel Haenzsel method. First, look 2 x s table in each strata. ExposureDisease Yes Diasease No Total Yes aeae bebe No cece dede Total nene
Risk Ratios from Mantel Haenzsel Adjusted RR (summarized), can be obtained with: Ʃ a (c+d)/n RR Mantel Haenzsel = --------------- Ʃ c (a+b)/n This give us a average of RR initially estimate into each table ; more important each table with more sample size.
Adjusted Risk Ratio We calculate overcrowding adjusted RR with Mantel Haenzsel formula: Respiratory infection Yes Respiratory infection No Total Nursery Yes 611475 Nursery No 52126 Total6635101 Respiratory infection Yes Respiratory infection No Total Nursery Yes 102434 Nursery No 4861865 Total14885899 OvercrowdingNon-overcrowding 61 (5 + 21)/ 101 + 10 (4 + 861)/899 15.70 + 9.62 25.32 ------------------------------------------------ = ----------------- = ----------- = 6.56 5 (61 + 14)/101 + 4 (10 + 24)/899 3.71 + 0.15 3.86
Adjusted Odds Ratio Adjusted OR is calculate in similar form that adjusted RR. Ʃ ad/n RM Mantel Haenzel = ----------- Ʃ bc/n ExposureDisease Yes Diasease No Total Yesaeae bebe Nocece dede Totalnene
Adjusted Odds Ratio In a cross-sectional study, on the use of quinfamide after a amoebic dysentery, it was reported how many are carriers of Entamoeba histolytic. Non-carrierCarrier Total Quinfamide10054154 Non quinfamide 157287 Total115126241
Adjusted Odds Ratio We calculate adjusted OR by residence area, with the Mantel Haenzsel formula: Non-carrierCarrier Total Quinfamide Yes 353974 Quinfamide No 1051 61 Total4590135 Non-carrierCarrier Total Quinfamide Yes 651479 Quinfamide No 52126 Total7035105 UrbanRural (35 x 51 /135) + (65 x 21/105) 13.2 + 13 26.2 ---------------------------------------- = ----------------- = ---------- = 7.4 (39 x 10 / 135) + (14 x 5 /105) 2.89 +0.67 3.56
Mantel Haenzsel X 2 The nurse now knows that the association between respiratory infection and attend to nursery still is after adjusted by overcrowding, confounder variable. Now, she want to calculate a Chi squared test to significance of this association, adjusted by confounder. This can be do, calculating X 2 Mantel-Haenzsel test.
Mantel Haenzsel X 2 To calculate adjusted Chi squared test for the confounder, we calculate Mantel Haenzsel Chi squared. Null hypothesis is that there is not association between attend to nursery and respiratory infection. H o : OR = 1. [ Ʃ a e - Ʃ E(a e )] 2 X 2 Mantel Haenzsel = ------------------- Ʃ V(a e )
We should go, step by step, beginning with 2 x 2 of each strata. ExposureDisease Yes Disease NoTotal Yesaeae bebe Nocece dede Totalnene Mantel Haenzsel X 2
Mantel Haenzsel Chi squared test is an average of individuals Chi squared of each table. To calculate Mantel Haenzsel Chi squared test, we need three values of each table: a e number of ill and exposed E(a e ) value expected of a e V(a e ) variance (standard error squared) of a e, where, E(a e ) = total row x total column / grand total = (a e + b e ) x (a e + c e )/n e (a e + b e ) x (c e + d e ) x (a e + c e ) x (b e + d e ) V(a e ) = -------------------------------------------------------- n e ²(n e - 1) Mantel Haenzsel X 2
Overcrowding table a = 61 E(a) = 75 x 66 / 101 = 49.01 V(a) = (75 x 66 x 26 x 35) / (101² x (101 - 1)) = 4.42 Non-overcrowding table a = 10 E(a) = 34 x 14 / 899 = 0.53 V(a)= 34 x 14 x 865 x 885 / (899² x (899 - 1)) = 0.50 To obtain Mantel Haenzsel Chi squared test (adjusted Chi squared by overcrowding), we add these values from the two strata, using the formula: Example [ Ʃ a e - Ʃ E(a e )] 2 X 2 Mantel Haenzsel = ------------------- Ʃ V(a e )
To obtain Mantel Haenzsel Chi squared test (Adjusted Chi squared test by overcrowding), we add these values, using the formula: a E(a) V(a) Overcrowding 61 49.01 4.42 Non-overcrowding 10 0.53 0.50 Total 71 49.54 4.92 X 2 Mantel-Haenzsel = (71 – 49.54)²/4.92= 93.60 Example
Confusion or not confusion How we decide if there is confusion? There are nor statistical tests to demonstrate confusion. We do calculate statistical tests and measure the effect raw and stratified tables. Then, we calculate summarized statistical test and we compare them with the raws, and we conclude if there is confusion or not.
Confusion or not confusion If there is an important difference between raw and adjusted estimates, we say that the association of interest is confounding by another factor. We look the data of children that attend to nursery and respiratory infection. After adjust by overcrowding, RR diminish from 7.06 to 6.56.
Posibles effects from confusion Generally there are more than one confounder. They can have different effects: The association in study, can be or not significative before of adjust for a confounder and not significative after. The association can be significative after adjust for a confounder but with a p-value less significative. Strata can show oposite results and in this case, it is better, show stratified results. This is interaction or effect modified. Confounder can hide an existing relationship.
Bibliografía 1.- Last JM. A dictionary of epidemiology. New York, 4ª ed. Oxford University Press, 2001:173. 2.- Kirkwood BR. Essentials of medical ststistics. Oxford, Blackwell Science, 1988: 1- 4. 3.- Altman DG. Practical statistics for medical research. Boca Ratón, Chapman & Hall/ CRC; 1991: 1-9.