# Comparing Two Proportions

## Presentation on theme: "Comparing Two Proportions"— Presentation transcript:

Comparing Two Proportions
SECTION 13.2 Comparing Two Proportions

In this scenario, we desire to compare two populations or the responses to two treatments based on two independent samples. We compare the populations by doing inference about the difference p1 - p2 The statistic that estimates this difference is the difference between the two sample proportions,

The sampling distribution of
The variance of the difference is the sum of the variances of and , which is Note that the variances add. The standard deviations do not. When the samples are large, the distribution is approximately normal. The mean of this distribution is p1-p2

Assumptions Data are from two independent SRSs from the populations
The populations are at least ten times as large as the samples A. For a significance test: Where is the combined sample proportion. B. For a confidence interval:

Confidence Intervals for p1 - p2
Draw an SRS of size n1 from a population having proportion p1 of successes and draw an independent SRS of size n2 from another population having proportion p2 of successes. When n1 and n2 are large, an approximate level C confidence interval for p1 – p2 is ( ) ± z*SE In this formula the standard error SE of is And z* is the upper (1 – C)/2 standard normal critical value. Follow the same assumptions as for single proportion confidence intervals.

Our z test statistic Significance Tests for p1 – p2
Where is the combined sample proportion.

The Steps for a Two Proportion z-test
State the hypothesis and name test Ho: p1 = p2 Ha: p1 ‹, ›, or ≠ p2 State and verify your assumptions Calculate the P value and other important values Done in calculator or… Using the formulas and tables State Conclusions (Both statistically and contextually) - The smaller the p-value, the greater the evidence is to reject Ho

CALCULATOR FUNCTIONS You may be able to find these on your own by now, but just in case, you will be looking for: 6: 2-PropZTest B: 2-PropZInt Note: x is your number of successes while n is your total trials

+ 4 Confidence Interval for 2 Proportions
Just like before, this helps us overcome the lack of Normality when the sample sizes are too small for the large-sample procedures. These methods cannot save us from the fact that small samples produce wide confidence intervals. The plus four interval may be conservative for very small samples and population p’s close to 0 or 1. It is generally much more accurate than the large-sample interval when the samples are small or the population p is close to 0 or 1. Add 4 imaginary observations, one success and one failure in each of the two samples. Use the large-sample procedures with the new sample sizes and counts of successes. Use this when the sample size is at least 5 in each group, with any counts of successes and failures.

Example of Two-Proportion Confidence Interval
A surprising number of young adults (ages 19-25) still live at home with their parents. A random sample by the National Institutes of Health included 2253 men and 2629 women in this age group. The survey found that 986 of the men and 923 of the women lived at home. Is this good evidence that different proportions of young men and young women live at home? How large is the difference between the proportions of young men and young women who live at home?

Step 1—Parameters Population 1—young men Population 2—young women
p1 = proportion of young men who live at home p2 = proportion of young women who live at home We will construct a 95% confidence interval for the difference between men and women, p1- p2

Step 2—Conditions SRSs—The data were obtained from a random sample, so we should be safe generalizing to the respective populations of interest. Normality—To check that the large-sample confidence interval is safe, look at counts of successes and failures (show calculations) for both samples. All of these are much larger than 5, so the large-sample method will be accurate. Independence—The sample survey in this example selected a single random sample of young adults, not two separate random samples of men and women. We divide the one sample by gender. The two-sample z procedures for comparing proportions are valid in such situations. This is an important fact about these methods.

Step 3—Calculations Here are the needed calculations: z*=1.96
So, our interval is (0.059 , 0.114) Calculator: ( , )

Step 4—Interpretation We are 95% confident that the percent of young men living at home is between 5.9 and 11.4 percentage points higher than the percent of young women who live at home. This is definitely good evidence that a different proportion of young men and young women live at home. We have this level of confidence, because if we repeated our procedures over and over with new samples, 95% of our intervals would capture the true difference.

Testing a Claim Considering the previous example, someone makes the claim that young men are more likely to live at home. Does our data support this claim? Ho: p1 = p2 Ha: p1 › p2 We need to check the Normal assumption again using the combined sample proportion.

Calculations P-value=

Interpretation Based on our extremely low P-value, we would reject the null hypothesis. Essentially, a difference in proportions this high would rarely every occur by chance if there is truly no difference between the proportion of young men and women that live at home. We are comfortable agreeing with the claim that more young men live at home.