# Estimating a Population Proportion

## Presentation on theme: "Estimating a Population Proportion"— Presentation transcript:

Estimating a Population Proportion
SECTION 10.3 Estimating a Population Proportion

NOW WHAT In this section we are interested in the unknown proportion, p of a population as opposed to the unknown mean of a population. Keep in mind, p will have an approximately normal distribution, so it is BACK TO THE WORLD OF z.

Standard Error We don’t really know p for our standard deviation. So, when we create confidence intervals, will be a decent estimate of p. The standard deviation of is Standard error of is

Conditions for Inference about a Population Proportion
Random—Data are an SRS from the population or are results from a randomized experiment Normality—The sample size is large enough to assume the sampling distribution of is approximately Normal. Remember, there is no population distribution for p. For a confidence interval, check: 3. Independence—Either the sample is collected with replacement or the population is at least ten times as large as the sample so that we can use our formula for standard deviation NOTE: we don’t use p here because we don’t know p

CAUTION Be sure to check that the conditions for constructing a confidence interval for the population proportion are satisfied before you perform any calculations.

Confidence Interval These procedures are very similar to what we did in Section However, now we are working with proportions instead of means. We will interpret our results in a very similar fashion.

INFERENCE TOOLBOX (p 631) DO YOU REMEMBER WHAT THE STEPS ARE??? Steps for constructing a CONFIDENCE INTERVAL: 1—PARAMETER—Identify the population of interest and the parameter you want to draw a conclusion about. 2—CONDITIONS—Choose the appropriate inference procedure. VERIFY conditions (Random, Normal, Independent) before using it. 3—CALCULATIONS—If the conditions are met, carry out the inference procedure. 4—INTERPRETATION—Interpret your results in the context of the problem. CONCLUSION, CONNECTION, CONTEXT(meaning that our conclusion about the parameter connects to our work in part 3 and includes appropriate context)

Example: Will smoking shorten your life?
Do smokers realize that smoking is bad for their health? Have most smokers tried to quit? The Harris Poll addressed smoking in a sample survey conducted by telephone in January Because Harris called residential telephone numbers at random, the sample (ignoring practical problems) was an SRS of smokers living in the United States in households with telephone service. That sample size was n= Here are two findings from this sample survey: “Do you believe that smoking will probably shorten your life, or not?848 of 1010 said “yes” “Have you ever tried to give up smoking?”707 of 1010 said “yes” Construct a 95% confidence interval for the proportion of all American smokers who think that smoking will probably shorten their lives.

Example: Will smoking shorten your life? Cont.
--We want to estimate p=the actual proportion of all American smokers who think smoking will shorten their lives using the sample proportion --we will check to see if we can create a one-proportion confidence interval We are told to treat the sample as an SRS Since are both at least 10, we are safe using the Normal approximation. There are at least 10,100 American smokers so random sampling ensures independent responses.

Example: Will smoking shorten your life? Cont.
--About 84% of the smokers in the sample thought that smoking will probably shorten their lives. To extend this result to the population, report the confidence interval:

Example: Will smoking shorten your life? Cont.
--We are 95% confident that between 81.7% and 86.2% of all smokers believe that smoking will probably shorten their lives because the methods we used will yield an interval such that 95% of all such intervals will capture the true proportion.

Example Notes Harris poll suggests avoiding “margin of error” statements because the general public does not understand the meaning. Margin of error includes the range of variation due to the play of chance in choosing a random sample. Margin of error DOES NOT INCLUDE variation due to refusual to be interviewed (non-response), question wording, question order, interviewer bias, weighting by demographic control data and screening, etc.

Choosing a Sample Size To determine the appropriate sample size
for a desired margin of error, use the following formula: Round UP to the nearest whole number p* is a guessed value for the sample proportion based on either a pilot study or previous experience. Using a p* of 0.5 will yield the most conservative (largest) estimate of the necessary sample size. If we have something else available to it, we use it because sampling costs money.

CALCULATOR FUNCTIONS You may be able to find this on your own by now, but just in case, you will be looking for: 1-PropZInt Note: x is your number of successes while n is your total trials

+4 Confidence Interval for Proportions
THE FOLLOWING INFORMATION IS NOT IN YOUR BOOK (at least not in detail) BUT WILL BE ADDRESSED ON YOUR QUIZ AND/OR YOUR TEST

+4 Confidence Interval for Proportions
The confidence interval we have used so far for the population proportion p is easy to calculate and easy to understand because it rests directly on the approximately Normal distribution of Unfortunately, this interval is often quite inaccurate unless the sample is very large. The actual confidence level is actually LESS than the confidence level you asked for in choosing the critical value z*. THAT IS BAD! And, accuracy does not consistently get better as the sample size n increases. Fortunately, there is a simple modification that is almost magically effective in improving the accuracy of the confidence interval. We call it the “plus four” method, because all you need to do is add four imaginary observations (2 success, 2 failures).

+4 Confidence Interval for Proportions
The plus four estimate of p is The formula for the confidence interval is exactly as before, with the new sample size and count of successes. With the calculator, just enter the new plus four sample size and count of successes into the usual large-sample procedure. USE THIS interval when the confidence level is at least 90% and the sample size n is at least 10.

+4 Confidence Interval EXAMPLE
Some shrubs have the useful ability to resprout from their roots after their tops are destroyed. Fire is a particular threat to shrubs in dry climates, as it can injure the roots as well as destroy the aboveground material. One study of resprouting took place in a dry area of Mexico. The investigators clipped the tops of samples of several species of shrubs. In some cases, they also applied a propane torch to the stumps to simulate a fire. Of 12 specimens of the shrub Krameria cytisoides, 5 resprouted after fire. Give a 95% confidence interval for the proportion of all shrubs of this species that will resprout after fire.

+4 Confidence Interval EXAMPLE cont.
This sample is not large enough to allow use of the traditional large-sample confidence interval. The plus-four confidence interval will be quite accurate even for this small sample. The usual sample proportion is Note that we don’t have enough successes or failures to use the large-sample confidence interval The plus four sample proportion is NOTE: The plus four estimate always moves away from 0 or 1 and toward The result is not very different from unless is very near 0 or 1. The plus four adjustment is immediately attractive if, for example, 12 of 12 sample shrubs of a species resprout. We don’t really think that p = 1. The plus four estimate of 14/16=0.875 seems more plausible. More importantly, the consistent moving toward 0.5 can have a major effect on the coverage probability of the confidence interval, which is often much closer to the desired 0.95 for the plus four interval.

+4 Confidence Interval EXAMPLE cont.
--here is the plus four interval --We are 95% confident that between 19% and 68% of this species will resprout after being burned because we used a method that yields intervals such that 95% of all intervals will capture the true proportion for this population of plants. This interval is so wide because the sample is small.

+4 Confidence Interval for Proportions
The numerical difference between the large-sample and plus four intervals is often small. Remember that the confidence level is the probability that the interval will catch the true population proportion in very many uses. Small differences every time add up to accurate confidence levels from plus four versus inaccurate levels from the large-sample interval. HOW MUCH MORE ACCURATE? Computer studies have run numbers to determine needed sample sizes to ensure that the confidence level was accurate. It was found that for a 95% confidence interval to cover the true parameter at least 94% of the time, if p=0.1 the sample size needs to be 646 for the large-sample interval but only 11 for the plus four interval. The consensus of computational and theoretical studies is that plus four is much better than the large-sample interval for many combinations of n and p.