# SADC Course in Statistics Comparing two proportions (Session 14)

## Presentation on theme: "SADC Course in Statistics Comparing two proportions (Session 14)"— Presentation transcript:

SADC Course in Statistics Comparing two proportions (Session 14)

To put your footer here go to View > Header and Footer 2 Learning Objectives By the end of this session, you will be able to explain how two sample proportions can be compared using either –a normal approximation; or –a chi-squared test understand the link between the normal approximation and the chi-square test

To put your footer here go to View > Header and Footer 3 Dealing with categorical data In most of the previous sessions, the focus has been on quantitative measurements. Many data variables collected in practice are however, categorical in nature, especially those emerging from surveys, e.g. –gender of HH head (male/female) –level of education (none, primary, secondary, tertiary) –whether of not HH has access to clean water (yes/no) –failure of a crop (success/failure), etc.

To put your footer here go to View > Header and Footer 4 Some typical questions Are animals vaccinated for a specific disease less likely to fall sick compared to unvaccinated animals? Is there an association between the level of poverty and educational level of the HH head? Does the proportion of children who have had prescribed inoculations differ according to whether or not their HH had access to a health centre within 5 km of their homestead?

To put your footer here go to View > Header and Footer 5 An example comparing proportions In a long-term study on the relationship between smoking and mortality amongst males with cardiovascular problems, such individuals > 60 years were monitored. After 6 years, it was found that 117 out of 1067 non-smokers group had died, while this was 54 out of 356 amongst smokers. Is there evidence of a difference in death rates between smokers and non-smokers?

To put your footer here go to View > Header and Footer 6 Comparing two proportions Let 1 and 2 be the population proportions dying in the smokers and non-smokers groups. The hypotheses to be tested are: H 0 : 1 = 2 versus H 1 : 1 2 Since the sample sizes are large, we assume the normal approximation to the sample proportions p 1 and p 2 (using the Central Limit Theorem), and carry out a test based on the normal distribution.

To put your footer here go to View > Header and Footer 7 Expectation and variance of p 1, p 2 From results of a binomial distribution for the number of deaths (r) in a sample of size n, we have E(r) = n and Var(r) = n(1- ). Hence E(p) = E(r/n) = n/n =, while Var(p) = (1/n 2 )(n(1- ) = (1- )/n where p = observed sample proportion = r/n. This allows the standard error of p 1 -p 2, for two sample proportions from populations with true proportions 1 and 2 to be computed.

To put your footer here go to View > Header and Footer 8 Standard error of p 1 - p 2 The standard error of p 1 -p 2 is given by: Since 1 and 2 are unknown, we can use the estimate: However, under the null hypothesis, an estimate of the common = 1 = 2 can be used, as is done in most software packages.

To put your footer here go to View > Header and Footer 9 Test procedure and results Returning to our example, we can now calculate the z statistic for testing H 0 as: z = p 1 – p 2 /(standard error of p 1 -p 2 ) = p 1 – p 2 / = 0.042/{(0.12*0.88)*[(1/1067)+(1/356)]} = 2.11 This is significant at the 5% level. The exact p-value is 0.035.

To put your footer here go to View > Header and Footer 10 Conclusions There is some evidence (p=0.035) to indicate that mortality rates differ between smokers and non-smokers. The corresponding proportions of deaths are 11% in the non-smoking group and 15% in the smokers group.

To put your footer here go to View > Header and Footer 11 A second example In a study of the effectiveness of using mosquito nets, results from a household survey were used to address the following objective: Is there evidence, amongst children in the sample, of a relationship between the use of a mosquito net and the occurrence of malaria? This is equivalent to the question: Are the proportions of children with malaria different between HHs using mosquito nets and those that dont?

To put your footer here go to View > Header and Footer 12 Survey results Results from the survey gave the following: Of 1039 children using mosquito nets, 649 had malaria Of 6904 children using mosquito nets, 3849 had malaria Can you write out this information in the form of a two-way table, with rows representing whether or not malaria was suffered, and columns representing the use of a net?

To put your footer here go to View > Header and Footer 13 Two-way table – observed values Usually sleep under a mosquito net? Suffered malaria? YesNoTotal Yes 649 62.5% 3849 55.8% 4498 56.6% No 390 37.5% 3055 44.2% 3445 43.4% Total1039 100.0% 6904 100.0% 7943 (100%) Which two proportions (or percentages) are we interested in comparing?

To put your footer here go to View > Header and Footer 14 Null and alternative hypotheses As before, we can compare the two sample proportions. However, often the null and alternative hypotheses are expressed as: H 0 : occurrence of malaria is independent of use of a mosquito net H 1 : malaria and use of net are not independent, i.e. they are associated If H 0 is true, then use of a mosquito net is not associated with the occurrence of malaria. What values would you then expect in each cell of the table?

To put your footer here go to View > Header and Footer 15 Expected values in the first row: Expected value in cell 1 = (4498 / 7943)*1039 = (4498*1039) / 7943 = 588.4 Expected value in cell 2 = (4498 / 7943)*6904 = (4498*6904) / 7943 = 3909.6 Can you calculate expected values in the next row? Check that your 2 numbers add to 3445. Computation of expected values

To put your footer here go to View > Header and Footer 16 Usually sleep under a mosquito net? Suffered malaria? YesNoTotal Yes588.43909.64498 No450.62994.43445 Total103969047943 Note: Table of expected values

To put your footer here go to View > Header and Footer 17 The chi-square test statistic Here we test the null hypothesis using a chi-square test. The first step is to compute the chi-square ( 2 ) test statistic. The formula is: Comparing this value with values of the 2 distribution with 1 d.f., shows the result is significant at the 1% level. We conclude there is strong evidence to reject the null hypothesis.

To put your footer here go to View > Header and Footer 18 What would have happened if we had done a z-test to compare the two proportions of children with malaria who use, and do not use a mosquito net? The result would be an z-statistic = 4.07 This again leads to a highly significant p-value of 0.000. Note that the square of z above is 16.565. This is identical to the chi-square statistic. This is expected since theoretically, it is known that z 2 = 2 with 1 d.f. So the two tests are equivalent! Comparison with z-test

To put your footer here go to View > Header and Footer 19 We havent yet dealt with how best to present results of a chi-square test, and further interpretation of results of this last example. We also have not discussed assumptions underlying the chi-square test and actions to take if assumptions fail. These issues will be dealt with in the next two sessions. Some final remarks

To put your footer here go to View > Header and Footer 20 Some practical work follows…