Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sampling Distribution of a Sample Proportion Lecture 26 Sections 8.1 – 8.2 Wed, Mar 8, 2006.

Similar presentations


Presentation on theme: "Sampling Distribution of a Sample Proportion Lecture 26 Sections 8.1 – 8.2 Wed, Mar 8, 2006."— Presentation transcript:

1 Sampling Distribution of a Sample Proportion Lecture 26 Sections 8.1 – 8.2 Wed, Mar 8, 2006

2 Preview of the Central Limit Theorem We looked at the distribution of the sum of 1, 2, and 3 uniform random variables U(0, 1). We looked at the distribution of the sum of 1, 2, and 3 uniform random variables U(0, 1). We saw that the shapes of their distributions was moving towards the shape of the normal distribution. We saw that the shapes of their distributions was moving towards the shape of the normal distribution. If we replace “sum” with “average,” we will obtain the same phenomenon, but on the scale from 0 to 1 each time. If we replace “sum” with “average,” we will obtain the same phenomenon, but on the scale from 0 to 1 each time.

3 Preview of the Central Limit Theorem 01 1 2

4 01 1 2

5 01 1 2

6 Some observations: Some observations: Each distribution is centered at the same place, ½. Each distribution is centered at the same place, ½. The distributions are being “drawn in” towards the center. The distributions are being “drawn in” towards the center. That means that their standard deviation is decreasing. That means that their standard deviation is decreasing. Can we quantify this? Can we quantify this?

7 Preview of the Central Limit Theorem 01 1 2  = ½  2 = 1/12

8 Preview of the Central Limit Theorem 01 1 2  = ½  2 = 1/24

9 Preview of the Central Limit Theorem 01 1 2  = ½  2 = 1/36

10 Preview of the Central Limit Theorem This tells us that a mean based on three observations is much more likely to be close to the population mean than is a mean based on only one or two observations. This tells us that a mean based on three observations is much more likely to be close to the population mean than is a mean based on only one or two observations.

11 Parameters and Statistics THE PURPOSE OF A STATISTIC IS TO ESTIMATE A POPULATION PARAMETER. THE PURPOSE OF A STATISTIC IS TO ESTIMATE A POPULATION PARAMETER. A sample mean is used to estimate the population mean. A sample mean is used to estimate the population mean. A sample proportion is used to estimate the population proportion. A sample proportion is used to estimate the population proportion. Sample statistics, by their very nature, are variable. Sample statistics, by their very nature, are variable. Population parameters are fixed. Population parameters are fixed.

12 Some Questions We hope that the sample proportion is close to the population proportion. We hope that the sample proportion is close to the population proportion. How close can we expect it to be? How close can we expect it to be? Would it be worth it to collect a larger sample? Would it be worth it to collect a larger sample? If the sample were larger, would we expect the sample proportion to be closer to the population proportion? If the sample were larger, would we expect the sample proportion to be closer to the population proportion? How much closer? How much closer?

13 The Sampling Distribution of a Statistic Sampling Distribution of a Statistic – The distribution of values of the statistic over all possible samples of size n from that population. Sampling Distribution of a Statistic – The distribution of values of the statistic over all possible samples of size n from that population.

14 The Sample Proportion Let p be the population proportion. Let p be the population proportion. Then p is a fixed value (for a given population). Then p is a fixed value (for a given population). Let p ^ (“p-hat”) be the sample proportion. Let p ^ (“p-hat”) be the sample proportion. Then p ^ is a random variable; it takes on a new value every time a sample is collected. Then p ^ is a random variable; it takes on a new value every time a sample is collected. The sampling distribution of p ^ is the probability distribution of all the possible values of p ^. The sampling distribution of p ^ is the probability distribution of all the possible values of p ^.

15 Example Suppose that this class is 3/4 freshmen. Suppose that this class is 3/4 freshmen. Suppose that we take a sample of 2 students, selected with replacement. Suppose that we take a sample of 2 students, selected with replacement. Find the sampling distribution of p ^. Find the sampling distribution of p ^.

16 Example F N F N F N 3/4 1/4 3/4 1/4 3/4 1/4 P(FF) = 9/16 P(FN) = 3/16 P(NF) = 3/16 P(NN) = 1/16

17 Example Let X be the number of freshmen in the sample. Let X be the number of freshmen in the sample. The probability distribution of X is The probability distribution of X is x P(x)P(x)P(x)P(x) 01/16 16/16 29/16

18 Example Let p ^ be the proportion of freshmen in the sample. (p ^ = X/n.) Let p ^ be the proportion of freshmen in the sample. (p ^ = X/n.) The sampling distribution of p ^ is The sampling distribution of p ^ is x P(p ^ = x) 01/16 1/26/16 19/16

19 Samples of Size n = 3 If we sample 3 people (with replacement) from a population that is 3/4 freshmen, then the proportion of freshmen in the sample has the following distribution. If we sample 3 people (with replacement) from a population that is 3/4 freshmen, then the proportion of freshmen in the sample has the following distribution. x P(p ^ = x) 0 1/64 =.02 1/3 9/64 =.14 2/3 27/64 =.42 1

20 Samples of Size n = 4 If we sample 4 people (with replacement) from a population that is 3/4 freshmen, then the proportion of freshmen in the sample has the following distribution. If we sample 4 people (with replacement) from a population that is 3/4 freshmen, then the proportion of freshmen in the sample has the following distribution. x P(p ^ = x) 0 1/256 =.004 1/4 12/256 =.05 2/4 54/256 =.21 3/4 108/256 =.42 1 81/256 =.32

21 The Parameters of the Sampling Distributions When n = 1, the sampling distribution is When n = 1, the sampling distribution is The mean and standard deviation are The mean and standard deviation are  = 3/4 = 0.75  = 3/4 = 0.75  2 = 3/16 = 0.1875  2 = 3/16 = 0.1875 p^p^p^p^ P(p^)P(p^)P(p^)P(p^) 01/4 13/4

22 The Parameters of the Sampling Distributions When n = 2, the sampling distribution is When n = 2, the sampling distribution is The mean and standard deviation are The mean and standard deviation are  = 3/4 = 0.75  = 3/4 = 0.75  2 = 3/32 = 0.09375  2 = 3/32 = 0.09375 p^p^p^p^ P(p^)P(p^)P(p^)P(p^) 01/16 1/26/16 19/16

23 The Parameters of the Sampling Distributions When n = 3, the sampling distribution is When n = 3, the sampling distribution is The mean and standard deviation are The mean and standard deviation are  = 3/4 = 0.75  = 3/4 = 0.75  2 = 3/48 = 0.0625  2 = 3/48 = 0.0625 p^p^p^p^ P(p^)P(p^)P(p^)P(p^) 0 1/64 =.02 1/3 9/64 =.14 2/3 27/64 =.42 1

24 The Parameters of the Sampling Distributions When n = 4, the sampling distribution is When n = 4, the sampling distribution is The mean and standard deviation are The mean and standard deviation are  = 3/4 = 0.75  = 3/4 = 0.75  2 = 3/64 = 0.046875  2 = 3/64 = 0.046875 p^p^p^p^ P(p^)P(p^)P(p^)P(p^) 0 1/256 =.004 1/4 12/256 =.05 2/4 54/256 =.21 3/4 108/256 =.42 1 81/256 =.32

25 Sampling Distributions Run the program Run the program Central Limit Theorem for Proportions.exeCentral Limit Theorem for Proportions.exe. Central Limit Theorem for Proportions.exe Use n = 30 and p = 0.75; generate 100 samples. Use n = 30 and p = 0.75; generate 100 samples.

26 100 Samples of Size n = 30  = 0.75  = 0.079

27 Observations and Conclusions Observation #1: The values of p ^ are clustered around p. Observation #1: The values of p ^ are clustered around p. Conclusion #1: p ^ is probably close to p. Conclusion #1: p ^ is probably close to p.

28 Larger Sample Size Now we will select 100 samples of size 120 instead of size 30. Now we will select 100 samples of size 120 instead of size 30. Run the program Run the program Central Limit Theorem for Proportions.exeCentral Limit Theorem for Proportions.exe. Central Limit Theorem for Proportions.exe Pay attention to the spread (standard deviation) of the distribution. Pay attention to the spread (standard deviation) of the distribution.

29 100 Samples of Size n = 120  = 0.75  = 0.0395

30 Observations and Conclusions Observation #2: As the sample size increases, the clustering is tighter. Observation #2: As the sample size increases, the clustering is tighter. Conclusion #2A: Larger samples give more reliable estimates. Conclusion #2A: Larger samples give more reliable estimates. Conclusion #2B: For sample sizes that are large enough, we can make very good estimates of the value of p. Conclusion #2B: For sample sizes that are large enough, we can make very good estimates of the value of p.

31 Larger Sample Size Now we will select 10000 samples of size 120 instead of only 100 samples. Now we will select 10000 samples of size 120 instead of only 100 samples. Run the program Run the program Central Limit Theorem for Proportions.exeCentral Limit Theorem for Proportions.exe. Central Limit Theorem for Proportions.exe Pay attention to the shape of the distribution. Pay attention to the shape of the distribution.

32 10,000 Samples of Size n = 120  = 0.75  = 0.0395

33 10,000 Samples of Size n = 126

34 More Observations and Conclusions Observation #3: The distribution of p ^ appears to be approximately normal. Observation #3: The distribution of p ^ appears to be approximately normal.

35 One More Conclusion Conclusion #3: We can use the normal distribution to calculate just how close to p we can expect p ^ to be. Conclusion #3: We can use the normal distribution to calculate just how close to p we can expect p ^ to be. However, we must know the values of  and  for the distribution of p ^. However, we must know the values of  and  for the distribution of p ^. That is, we have to quantify the sampling distribution of p ^. That is, we have to quantify the sampling distribution of p ^.

36 The Sampling Distribution of p ^ It turns out that the sampling distribution of p ^ is approximately normal with the following parameters. It turns out that the sampling distribution of p ^ is approximately normal with the following parameters. This is the Central Limit Theorem for Proportions, summarized on page 519. This is the Central Limit Theorem for Proportions, summarized on page 519.

37 The approximation to the normal distribution is excellent if The approximation to the normal distribution is excellent if The Sampling Distribution of p ^

38 Why Surveys Work Suppose 51% of the population plan to vote for candidate X, i.e., p = 0.51. Suppose 51% of the population plan to vote for candidate X, i.e., p = 0.51. What is the probability that an exit survey of 1000 people would show candidate X with less than 45% support, i.e., p ^ <.45? What is the probability that an exit survey of 1000 people would show candidate X with less than 45% support, i.e., p ^ <.45?

39 Why Surveys Work First, describe the sampling distribution of p ^ if the sample size is n = 1000 and p = 0.51. First, describe the sampling distribution of p ^ if the sample size is n = 1000 and p = 0.51. Check: np = 510  5 and n(1 – p) = 490  5. Check: np = 510  5 and n(1 – p) = 490  5. p ^ is approximately normal. p ^ is approximately normal.

40 Why Surveys Work The z-score of 0.45 is z = (0.45 – 0.51)/.01581 = -3.795. The z-score of 0.45 is z = (0.45 – 0.51)/.01581 = -3.795. P(p ^ < 0.45) = P(Z < -3.795) P(p ^ < 0.45) = P(Z < -3.795) = 0.00007385 (not likely!) Or use normalcdf(-E99, 0.45, 0.51, 0.01581). Or use normalcdf(-E99, 0.45, 0.51, 0.01581).

41 Why Surveys Work Perform the same calculation, but with a smaller sample size, say n = 50. Perform the same calculation, but with a smaller sample size, say n = 50. The probability turns out to be 0.1980, nearly a 20% chance. The probability turns out to be 0.1980, nearly a 20% chance. By symmetry, there is also a 20% chance that the sample proportion is greater than 57%. By symmetry, there is also a 20% chance that the sample proportion is greater than 57%. Thus, there is a 40% chance that the sample proportion is off by at least 6 percentage points. Thus, there is a 40% chance that the sample proportion is off by at least 6 percentage points.


Download ppt "Sampling Distribution of a Sample Proportion Lecture 26 Sections 8.1 – 8.2 Wed, Mar 8, 2006."

Similar presentations


Ads by Google