Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Practice of Statistics Third Edition Chapter 9: Sampling Distributions Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.

Similar presentations


Presentation on theme: "The Practice of Statistics Third Edition Chapter 9: Sampling Distributions Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates."— Presentation transcript:

1 The Practice of Statistics Third Edition Chapter 9: Sampling Distributions Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates

2 9.1 Sampling Distributions We must take care to keep straight whether a number describes a sample or a population

3 Population Sample Mean Proportion  Parameters are fixed, statistics vary  Sampling Variability: the value of a statistic varies in repeated random sampling  Goal is to have statistic tell us something about the parameter of interest

4 Sampling Variability Take a large number of samples from the same population Calculate or for each sample Make a histogram of the values of or Examine the shape, center, and spread What would happen if we took many samples? In practice, it’s too expensive to take many samples from a population like all U.S. residents. But we can imitate by simulation.

5 Customs officials at Guadalajara airport want to be sure that passengers do not bring illegal items into the country. Since they do not have time to check every passenger, they require each person to press a button, which will display either a green light or a red light. If the red light shows us, the passenger will be searched. Custom officials claim that the probability that the light turns green on any press of the button is 0.70. We will simulate drawing SRSs of size 100 from the population of travelers passing through the airport.

6 TABLE B - Random digits Line 101 19223 95034 05756 28713 96409 12531 42544 82853 102 73676 47150 99400 01927 27754 42648 82425 36290 103 45467 71709 77558 00095 32863 29485 82226 90056 104 52711 38889 93074 60227 40011 85848 48767 52573 105 95592 94007 69971 91481 60779 53791 17297 59335 1 st SRS: = _______ 2 nd SRS: = _______ Let digits 0 – 6 represent getting a green light Let digits 7 – 9 represent getting a red light. 0.710.62

7 This histogram shows what would happen with 1000 SRSs (of size 100). It approximates the sampling distribution of Note: This example gave us a population parameter. This is not the usual situation. For the most part, we will be using the sample statistics to estimate the unknown parameters.

8 Sampling distribution: the ideal pattern that would emerge if we looked at all possible samples of the same size (from the same pop) You must understand the difference between a simulation of a sampling distribution and the actual sampling distribution

9 Consider the process of taking an SRS of size 2 from the population of Table B and computing for the sample. We could perform a simulation, but in this case, we can construct the actual sampling distribution.

10 The sampling distribution of for samples of size n = 2

11 Describing Sample Distributions Recall: Shape, Center, and Spread. Survivor, Guatemala, was the most watched TV show in the US in 2005. The true proportion of US adults that watched the show is p = 0.37.

12 The figure below shows the results of drawing 1000 SRSs of size n = 100 from a population with p = 0.37.  Notice that a sample of size 100 gave a quite far from p = 0.37  A sample of size 100 is not a trustworthy estimate for population proportion.  Describe the distribution. Shape, center, spread

13 1000 SRSs of size n = 1000. Describe the new distribution Same distribution with an expanded scale to make the shape clearer. There is less variability with larger sample sizes

14 The appearance of approximate sampling distributions is a consequence of random sampling Haphazard sampling does not give such regular and predictable results. When randomization is used in design for producing data, statistics computed from the data have a definite pattern of behavior over many repetitions, even though the results of a single repetition is uncertain.

15 The Bias of a Statistic How trustworthy is our statistic as an estimate of the parameter? Below are two sampling distributions of for samples of 100 and 1000, drawn to the same scale. Both are unbiased because their means equal the true proportion. There is less variability when n = 1000.

16 Will sometimes fall above or below the true value of the parameter if we take many samples Because the sampling distribution is centered at the true value, however, there is no systemic tendency to overestimate or underestimate the parameter “no favoritism” The sample proportion from a SRS of any size is an unbiased estimator of the parameter p.

17 The spread of the sampling distribution does not depend very much of the size of the population A survey of 1,200 people will have the same variability whether the population being sampled is the city of San Francisco or the entire United States You don’t need larger samples for larger populations

18 Sampling Variability continued…. Imagine sampling harvested corn by thrusting a scoop into a lot of corn kennels. The scoop doesn’t know whether it is surrounded by a bag of corn or by an entire truckload. As long as the corn is well mixed, the variability depends only on the size of the scoop.

19 Bias and Variability We can think of the true value of the parameter as the bull’s-eye on a target and the sample statistic as an arrow fired at the target. Both bias and variability describe what would happen when we take many shots. Bias: aim is off → consistently miss in the same direction Variability: repeated shots are widely scattered

20 Bias and Variability

21 9.2 – Sample Proportions The population needs to be at least 10 times as large as the sample These formulas are derived for you on pg 582. You should take a look at it.

22 Since equals p, it is always an unbiased estimator of p. The standard deviation of gets smaller as the sample size n increases Note: since n appears in the denominator and under a radical, in order to cut the standard deviation in half, you would need to take a sample four times as large. The formula for the standard deviation of doesn’t apply when the sample is a large part of the population.

23 Applying to college. A polling organization asks an SRS of 1500 first-year college students whether they applied for admission to any other college. In fact, 35% of all first-year students applied to colleges besides the one they are attending. What is the probability that the random sample of 1500 students will give a result within 2 percentage points of this true value? What information do you need to have in order to answer this question?

24 Using Normal Approximation for In section 9.1, we found the shape of to be approximately Normal and closer to Normal when n is large.

25 Applying to college revisited. What is the probability that the SRS of 1500 students will give a result within 2 percentage points of the true value? n = 1500, p =.35 The sampling distribution of has mean = 0.35. Check both rules of thumbs to see if we can use Normal approximations.  If rule of thumb 1 is satisfied, find the standard deviation.  If rule of thumb 2 is satisfied, begin your Normal distribution calculations.

26 We want to calculate the probability within 2 percentage points. We see that almost 90% of all samples will give a result within 2 percentage points of the truth about a population.

27 Survey undercoverage. One way of checking the effect of undercoverage, nonresponse, and other sources of error in a sample survey is to compare the sample with known facts about the population. About 11% of American adults are black. The proportion of blacks in an SRS of 1500 adults should therefore be close to 0.11. (It is unlikely to be exactly o.11 because of sampling variability.) If a national sample contains only 9.2% blacks, should we suspect that the sampling procedures is somehow under-representing blacks? Check both rules of thumbs to see if we can estimate the standard deviation and use Normal approximations

28 Checking rules… ROT 1: 12000 ≤ N It is fair to assume that there are more than 12,000 adult Blacks in the population  Find the standard deviation ROT 2: 1500(.11) = 165 ≥ 10 1500(.89) = 1335 ≥ 10  Begin Normal calculations  N(.11,.00808)

29 We want to find Only 1.29% of all samples would have so few blacks. Because it is unlikely that a sample would include so few blacks, we have good reason to suspect that the sampling procedure under-represents blacks

30 Do you jog?. The Gallup Poll once asked a random sample of 1540 adults, “Do you happen to jog?” Suppose that in fact 15% of all adults jog. a) Why can we use the formula for standard deviation in this setting? b) Can we use Normal Approximation? c) Find the mean and standard deviation of the proportion of the samples who jog. (Assume the sample is an SRS) d) Find the probability that between 13% and 17% of the sample jog. e) What sample size would be required to reduce the standard deviation of the sample proportion to one-third the value you found in (c)?

31 n = 1540, p = 0.15 a) ROT 1: 10(1540) = 15,400 It is safe to assume there are more than 15,400 adults in the population so we can use the formula for st. dev. b)ROT 2: 1540(0.15) = 231 1540(0.85) = 1309 Since both are ≥ 10, Normal approximations apply

32 c) Mean = 0.15 St. Dev = d) P(0.13 ≤ ≤ 0.17) = P(-2.20 ≤ z ≤ 2.20) = 0.9861 – 0.0139 = 0.9722 n = 1540, p = 0.15

33 e) Find n such that st. dev. is 1/3 of its value Since n is under a radical, in order to reduce the standard deviation by 1/3, you would need to have the sample size 9 times as large Sample size should be 1540(9) = 13,860. n = 1540, p = 0.15

34 Summary of Sampling Distributions Select a large SRS from a population in which the proportion p are successes. The sampling distribution of the proportion of successes in the sample is approximately Normal. The mean is p and the standard deviation is

35 9.3 Sample Means Sample proportions arise most often when we are interested in categorical variables.  What percent of adults attended church last week?  What proportion of U.S. adults have watched Survivor: Guatemala? Sample means are the most common statistic used for quantitative variables.  Income of households  Blood pressure of a patient In the last section we found that the sampling distribution of is approximately Normal under the right conditions. Which were….? Wouldn’t it be nice if we could say something similar about the sampling distribution of ?

36 Activity 9A Recall: Height of young women varies approximately to N(64.5, 2.5) Simulate the heights of 100 women  Place cursor on top of L 1  Press MATH, choose PRB, choose 6: randNorm(  RandNorm(64.5, 2.5, 100) and press ENTER Plot a histogram  Set WINDOW as follows: X[57,72] and Y[-10, 45] Describe the approximate shape of your histogram

37 Activity 9A Find the mean, median, and standard deviation Compare with the population mean Compare s with Graph a boxplot above the histogram.  Does the boxplot appear symmetric?  How close is the median of the boxplot to the mean of the histogram Repeat process again and record new mean, median and standard deviation. Write your sample means on the post-it and stick it appropriately on the board

38 Activity 9A What is the approximate shape of the distribution of ? Where is the center of the distribution? How does it compare to the mean of the population? Enter all values of from the board into L 2. Turn off plot 1 and define plot 3 as a boxplot from L 2. Compare X to. Use 1-var stats to find the mean and standard deviation of

39 Activity 9A This concept will be further discussed in section 9.3 by the Central Limit Theorem In summary, the distribution is approximately Normal with

40 The Mean and Standard Deviation of The behavior of in repeated samples is much like that of the sample proportion :  is an unbiased estimator of  The values of are less spread out for larger samples  The standard deviation decreases at the rate, so you must take a sample four times as large to cut the st. dev. in half  Use the recipe only when the population is at least 10 times as large as the sample. These facts are true no matter what the population distribution looks like

41 The height of young women varies approximately according to the N(64.5, 2.5) distribution. If we take an SRS of 10 young women, find the sample mean height and the sample standard deviation. The heights of individual women vary widely about the population, but the average height of a sample of 10 women is less variable. In Activity 9A, we plotted the distribution of for samples of size n = 100, so the standard deviation of

42 The shape of the distribution of Depends on the shape of the population distribution If the population distribution is Normal, then so the distribution of the sample mean. Still need to consider the shape of the sampling distribution of if the shape of the population is unknown or known to be non-Normal.  This will be addressed at the end of this section.

43 Recall: Heights of young women follow N(64.5, 2.5) What is the probability that a randomly selected young woman is taller than 66.5 inches?

44 What is the probability that the mean height of an SRS of 10 young women is greater than 66.5 inches?  Standard deviation will now be P( > 2.5) =P(z > 2.53) =1 – P(z < 2.53) =1 – 0.9943 =0.0057 Recall: Heights of young women follow N(64.5, 2.5) It is much less likely for a sample to have a mean greater than 66.5 than it is for an individual.

45

46 The Central Limit Theorem What happens to when the population distribution is not Normal? As the sample size increases, the distribution should get closer and closer to a Normal distribution. This is true no matter what shape the population distribution has, as long as the population has a finite standard deviation.

47 The Central Limit Theorem in Action The distribution of sample means from a strongly non-Normal population becomes more Normal as the sample size increases. The distribution of 1 observation…aka the population The distribution of for 2 observations

48 The distribution of for 10 observations The distribution of for 25 observations

49

50 The Central Limit Theorem Discusses the shape (and only shape) of the sampling distribution of when n is sufficiently large. If n is not large, the shape of the sampling distribution of more closely resembles the shape of the original population. There are 3 situations to consider. 1) The population has a Normal distribution shape of sampling distribution: Normal, regardless of sample size. 2) Any population shape, small n shape of sampling distribution: similar to shape of the parent population 3) Any population shape, large n shape of sampling distribution: close to Normal (CLT)

51 Servicing air conditioners. The time that a technician requires to perform preventive maintenance on an air conditioning unit is governed by the exponential distribution. The mean time is and the standard deviation is hour. You company has a contract to maintain 70 of these units in an apartment building. You must schedule technicians’ time for a visit to this building. Is it safe to budget an average of 1.1 hours for each unit? Or should you budget an average of 1.25 hours? The CLT says that the sample mean time spent working on 70 units has approximately the Normal distribution. n = 70 is large enough. (recall when n = 25 for the same distribution)

52 Servicing air conditioners.  Calculate the mean and standard deviation.  Determine if it is safe to budget 1.1 hours, on average. If you only budget 1.1 hours per unit, there is a 20% chance that the technician will not complete the work in the building within the budgeted time.

53 Determine if it is safe to budget 1.25 hours, on average. Still using the same mean and st. dev.  Mean = 1st.dev = 0.120 If you budget 1.25 hours per unit, there is less than a 2% chance that the technician will not complete the work in the building within the budgeted time. This is a much safer amount of time.

54 Summary of Sampling Distributions The sampling distribution of a sample mean has mean and standard deviation. The distribution is Normal if the population distribution is Normal; it is approximately Normal for large samples in any case.


Download ppt "The Practice of Statistics Third Edition Chapter 9: Sampling Distributions Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates."

Similar presentations


Ads by Google