Presentation on theme: "6.1 Inference for a Single Proportion Statistical confidence Confidence intervals How confidence intervals behave."— Presentation transcript:
6.1 Inference for a Single Proportion Statistical confidence Confidence intervals How confidence intervals behave
2 Sampling Distribution of a Sample Proportion As n increases, the sampling distribution becomes approximately Normal. Sampling Distribution of a Sample Proportion
After we have selected a sample, we know the responses of the individuals in the sample. However, the reason for taking the sample is to infer from that data some conclusion about the wider population represented by the sample. 3 Statistical Inference Statistical inference provides methods for drawing conclusions about a population from sample data. Population Sample Collect data from a representative sample... Make an inference about the population.
Methods for drawing conclusions about a population from sample data are called statistical inference So we’ll use data to make these inferences; i.e., draw conclusions about populations from data in our samples or from our experiments We'll consider two types of inference: Confidence interval estimation Tests of significance In both of these cases, we'll consider our data as either being a random sample from a population or as data from a randomized experiment Start with estimation… there are two situations we'll consider estimating the mean of a population of measurements estimating the proportion p of Ss in a population of Ss and Fs
In either case, we'll construct a confidence interval of the form estimate +/- M.O.E., where M.O.E. = margin of error of the estimator. The MOE gives information on how good the estimate is through the variation in the estimator (its standard error) and through the level of confidence in the confidence interval (through a tabulated value). The standard error of an estimator is its estimated standard deviation (treating the estimator as a statistic with a sampling distribution…) Best estimator of is and we will learn that is approximately Best estimator of p is phat and we’ve learned that phat is approx.. We’ll start here…
In case of inference, we’ll try to make sure that n is a fairly large sample… this will assure normality of the sampling distribution of p-hat The mean and standard deviation of p-hat will be given by these formulas: We did a simulation using Table B and can use our results to show the formulas make sense…
I’ve modified Example 6.4 on page 320: Assume p = 0.60; i.e., that 60% of the population are “Success”. We will simulate drawing a random sample of size 20 from the population We can imitate the population by Table B, with each entry standing for a person. Six of the 10 digits (say 0 to 5) stand for people who are “Success”. The remaining four digits, 6 to 9, stand for “Failure”. Because all digits in a random number table are equally likely, this assignment produces a population proportion of “Success” equal to p = 0.60. We then imitate an SRS of 20 students from the population by taking 20 consecutive digits from Table B. The statistic is the proportion of 0s to 5s in the sample of size n = 20. Here are the first 100 entries in Table B, with digits 0 to 5 highlighted – What are the first 5 p-hats?? Continue with JMP…
These samples show the sampling variability of p-hat: because the samples are random, we don’t expect to get the same proportion of S’s in each sample of n=20… but notice that the variability in the p-hats can be characterized as normal… I used the “Random -> Binomial Formula in JMP & divided by 20.
9 Sampling Distribution of a Sample Proportion As n increases, the sampling distribution becomes approximately Normal. Sampling Distribution of a Sample Proportion
10 Large-Sample Confidence Interval for a Proportion To construct a confidence interval for an unknown population proportion p we’ll use our best estimator p-hat and construct the CI as estimate +/- M.O.E. … here the MOE is (value from Table) * (SE of estimator)
11 How do we find the critical value for our confidence interval? If the Normal condition is met, we can use a Normal curve. To find a level C confidence interval, we need to catch the central area C under the standard Normal curve. For example, to find a 95% confidence interval, we use a critical value of 2 based on the 68-95-99.7 rule. Using a standard Normal table or a calculator, we can get a more accurate critical value. Note, the critical value z* is actually 1.96 for a 95% confidence level. Large-Sample Confidence Interval for a Proportion
12 Once we find the critical value z*, our confidence interval for the population proportion p is: Choose an SRS of size n from a large population that contains an unknown proportion p of successes. An approximate level C confidence interval for p is: where z* is the critical value for the standard Normal density curve with area C between –z* and z*. Use this interval only when the numbers of successes and failures in the sample are both at least 15. Choose an SRS of size n from a large population that contains an unknown proportion p of successes. An approximate level C confidence interval for p is: where z* is the critical value for the standard Normal density curve with area C between –z* and z*. Use this interval only when the numbers of successes and failures in the sample are both at least 15. One-Sample z Interval for a Population Proportion Large-Sample Confidence Interval for a Proportion
13 Large-Sample Confidence Interval for a Proportion What does the CI for p actually mean? Here’s a picture of (Figure 6.7 on page 327) 25 confidence intervals computed from 25 samples of the same size- note that they vary quite a bit, but only 1 out of the 25 actually misses the mean=p : approximately 95% of the confidence intervals computed this way should capture p inside…
14 Example It is claimed that 50% of the beads in a container are red. A random sample of 251 beads is selected, of which 107 are red. Calculate and interpret a 90% confidence interval for the proportion of red beads in the container. Use your interval to comment on the claim that ½ the beads in the container are red. z0.030.040.05 –1.70.04180.04090.0401 –1.60.05160.05050.0495 –1.50.06300.06180.0606 For a 90% confidence level, z* = 1.645 This is an SRS and there are 107 successes and 144 failures. Both are greater than 15. Sample proportion = 107/251 = 0.426 We are 90% confident that the interval from 0.375 to 0.477 captures the actual proportion of red beads in the container. Since this interval gives a range of plausible values for p and since 0.5 is not contained in the interval, we have reason to doubt the claim.
Confidence intervals contain the population proportion p in C% of samples, in the long run. Different areas under the curve give different confidence levels C. Example: For an 80% confidence level C, 80% of the normal curve’s area is contained in the interval. C z*z*−z*−z* Varying confidence levels Practical use of z: z* z* is related to the chosen confidence level C. C is the area under the standard normal curve between −z* and z*. The confidence interval is thus:
How do we find specific z* values? We can use a table of z (Table A) or t values (Table D). In Table D, for a particular confidence level, C, the appropriate z* value is just above it. We can use software. In JMP: Create a new column, Edit Formula, and choose Normal Quantile( p ) under Probability where p = (1-C)/2 is the area to the left of z* Since we want the middle C probability, the probability we require is (1 - C)/2 Example: A 98% confidence level, Normal Quantile (.01) = −2.326349 (= neg. z*) Example: For a 98% confidence level, z*=2.326
Link between confidence level and margin of error The confidence level C determines the value of z* (in table A or D). The margin of error m also depends on z*. C z*z*−z* m Higher confidence C implies a larger margin of error m (thus less precision in our estimates). A lower confidence level C produces a smaller margin of error m (thus better precision in our estimates).
The margin of error is smaller when z* (and thus the confidence level C) gets smaller p(1-p) is smaller n is larger – this is the usual way to decrease MOE – increase the sample size! Properties of Confidence Intervals User chooses the confidence level, C, and hence z* Margin of error follows from this choice as (z*)(SE of estimator) We want A high level of confidence A small margin of error
Interpretation of Confidence Intervals Conditions under which an inference method is valid are never fully met in practice. Exploratory data analysis and judgment should be used when deciding whether or not to use a statistical procedure. Any individual confidence interval either will or will not contain the true population proportion, p. It is wrong to say that the probability is 95% that the true proportion falls in the confidence interval. The correct interpretation of a 95% confidence interval is that we are 95% confident that the true proportion falls within the interval. The confidence interval was calculated by a method that gives correct results in ~95% of all possible samples. (See slide #13 above!) In other words, if many such confidence intervals were constructed, ~95% of these intervals would contain the true proportion. HW: Read Introduction to Chapter 6 and Section 6.1 - 6.1.6; do # 6.3, 6.5-6.9 Previous HW: Read section 5.5; omit section 5.6 Do Exercises #5.85, 5.87- 5.90, 5.93-5.95, 5.99, 5.100, 5.102, 5.144