 # Sampling Distributions and Sample Proportions

## Presentation on theme: "Sampling Distributions and Sample Proportions"— Presentation transcript:

Sampling Distributions and Sample Proportions
Section Sampling Distributions and Sample Proportions

The Big Ideas A statistic is a random variable, the value of which varies from sample to sample. As a random variable, a statistic has a sampling distribution with a mean and a standard deviation (computed from the distribution of the basic random variables that are combined to calculate the statistic). Sampling distributions are the building blocks of statistical inference.

Parameter vs. Statistic
A parameter is a number that describes some characteristic of the population. A statistic is a number that describes some characteristic of a sample. Remember: “P” for Population & Parameter “S” for Sample & Statistic 

How do we get from a sample statistic to an estimate of the population parameter?
We use the sampling distribution: Imagine that instead of taking a single sample like we do in a typical study, we took 3 independent samples of the same population. For each of those 3 sample, we compute the same single statistic (e.g., the mean or proportion). We would get a slightly different value for the same statistic in each of the 3 samples.

What if we could do an infinite number of samples?
We could collect the same statistic from many, many samples and plot them as a histogram, what would this look like? It would be a bell-shaped curve that converges on the true population parameter (mean or proportion), with fewer and fewer samples with means or proportions farther away from the central value.

Sampling Distribution
The distribution of a statistic from an infinite number of samples of the same size as the sample in our study is known as the sampling distribution. Sample 1 Sample 2 Sample 3

Sampling Distribution
Sampling Distribution ~ distribution of values taken by the statistic in ALL possible samples of the same size from the same population A sample distribution is DIFFERENT than the sampling distribution. Always describe the sampling distribution using SOCS  (shape, center, spread, outliers) Parameter ~ describes population Statistic ~ describes sample Unbiased Statistic ~ mean of sampling distribution is EQUAL to the true value of the parameter being estimated

WHAT MAKES A STATISTIC A POOR ESTIMATOR OF A PARAMETER?
HIGH VARIABILITY The small samples lead to a larger spread in the sampling distribution of the statistic giving less certainty about the value of the true parameter. HIGH BIAS Poor sampling methods create unrepresentative samples so that the center of the sampling distribution is not equal to the true value of the parameter. WHY? And what does that look like?

HOW DO WE AVOID HIGH BIAS???
USE APPROPRIATE SAMPLING PROCEDURES THAT WE LEARNED IN PREVIOUS CHAPTERS!!!

HOW DO WE AVOID HIGH VARIABILITY???
First, understand that sampling variability occurs when the value of a statistic varies in repeated random sampling So, to avoid high variability of a statistic, which is described by the spread of its sampling distribution, use larger samples for smaller spread. As long as the population is at least 10 times larger than the sample, the spread of the sampling distribution is approximately the same for any population size. Note: For small populations, it is best to use a census not a sample.

WHY DOES THE POPULATION SIZE NOT REALLY MATTER MUCH???
Even more, why does a sample of size 260 serve a population of 2600 just as well as a population of 26,000? If the population is small, then outliers are going to have a greater impact on the sampling process by creating greater variability in the sampling distribution. The size of the sample is what impacts the sampling variability so a statistic from a sample of 260 Walton students is just as precise as a statistic from a sample of 260 from all East Cobb high school students. Of course, this is assuming one important fact. Which is? THE SAMPLES MUST BE RANDOM!

SECTION 9.2 Sample Proportions
The sample proportion is a statistic  = # of successes / total sample size Sampling distribution of a sampling proportion: choose an SRS of size n from a large population with population proportion p having some characteristic of interest. Let be the proportion of the sample having that characteristic. Then: 1) sampling distribution of is approximately normal and is closer to a normal distribution when the sample size n is large 2) the mean of the sampling distribution is exactly p 3) the standard deviation of the sampling distribution is

Rules of Thumb Use the recipe for standard deviation of only when the population is at least 10 times as large as the sample We will use the normal approximation to the sampling distribution of for values of n and p that satisfy: np≥10 and n(1-p)≥10

Standard Deviation Behavior
What will make the size of the standard deviation, , change? If the sample size goes up the standard deviation goes down. If the sample size goes down, standard deviation goes up. How would we cut the standard deviation in half? Increase the sample size by multiplying by 4.

EXAMPLE (pg. 477 # 9.15) The Gallup Poll once asked a random sample of 1540 adults, “Do you happen to jog?” Suppose that in fact 15% of all adults jog. a) Find the mean and standard deviation of the proportion of the sample who jog. (Assume the sample is an SRS.) μ = p = 0.15 ; σ = b) Explain why you can use the formula for the standard deviation of in this setting. The population (assumed to be US citizens) is certainly more than 10 times larger than the sample. c) Check that you can use normal approximation for the distribution of .

EXAMPLE (pg. 477 # 9.15) (cont’) c) (answer) np = 231, n(1-p) = 1309 ; these are both ≥ 10 d) Find the probability that between 13% and 17% of the sample jog. normalcdf(0.13,0.17,0.15,√(0.15*0.85/1540)) ≈0.9721 e) What sample size would be required to reduce the standard deviation of the sample proportion to one-half the value found in a)? 1540 times 4 = 6160