Suppose we are interested in the digits in people’s phone numbers. There is some population mean (μ) and standard deviation (σ) Now suppose we take a sample.
Published byModified over 4 years ago
Presentation on theme: "Suppose we are interested in the digits in people’s phone numbers. There is some population mean (μ) and standard deviation (σ) Now suppose we take a sample."— Presentation transcript:
Suppose we are interested in the digits in people’s phone numbers. There is some population mean (μ) and standard deviation (σ) Now suppose we take a sample of 4 digits. For that sample, we can find the sample mean ( ) and standard deviation (s). Now, suppose we look lots of samples, all of size 4 (4 digits per sample). In that case, we have a set of sample means that we’re looking at. How these sample means behave is called the sampling distribution of the mean – that’s the probability distribution of sample means
If we took the mean of those sample means, what would you expect it to be? While the sample mean varies from sample to sample (this is called sampling variability), the sampling means target the population mean. In other words, the sample means are a good approximate to the population mean. The mean of all possible sample means would be the population mean. (see table in book, pg 251)
What other statistics (characteristics of a sample) target the population parameters? These are called unbiased estimators Mean Variance Proportions
What statistics do not target the population parameters? These are called biased estimators Median Range Standard Deviation* * For large samples, the bias for standard deviation is pretty small, and so often s is used to approximate sigma anyways.
Thinking back to our experiment of sampling 4 digits from phone numbers, How is the population data distributed? Is each possible value of a sample mean equally likely? (Is a mean of 0 just as likely as a mean of 4?) How spread out are those sample means? If we increased the sample size (maybe from 4 to 8), would you expect the sample means to be more or less spread out?
The Big Point Amazing result: The sampling distribution of the mean is approximately normal, even though the original data had a uniform (not normal) distribution.
The Central Limit Theorem Assuming: 1.Our data (random variable x) has a distribution with mean μ and standard deviation σ. The distribution does not have to be normal 2.Simple random samples, all of the same size n, are selected from the population
The Central Limit Theorem Then: 1.The distribution of the sample means will approach a normal distribution. The distribution will become more normal as sample size increases. 2.The mean of all the sample means is the population mean (μ) 3.The standard deviation of all sample means is
This tells us that the sample means have a normal distribution, and the mean of that distribution is μ, and the standard deviation of that distribution is
Notation The mean of the sample means: The standard deviation of the sample means: Notice this is showing us that the sample means are less spread out then the original data, and that the larger the sample size, the less spread out the sample means will be.
Practical Rules If the original data is not normally distributed, you need a sample size of at least 30 to have the normal distribution be a good approximation to the distribution of the sample means. If the original data is normally distributed, the distribution of the sample means will be normal for any sample size
Using the Central Limit Theorem When you’re working with one value from a normally distributed population, use what we already learned: When you’re working with a sample mean, be sure to use the mean and standard deviation of the sampling distribution.
Example Replacement times for CD players are normally distributed with a mean of 7.1 years and a standard deviation of 1.4 years. What is the probability that a single CD player will need replacing in under 6 years? 10 CD players are chosen at random. What is the probability that the mean replacement time for the 10 players is under 6 years? Before we calculate: Which probability do you expect to be smaller? Why?
Example For a single CD player: P(x < 6) P(z < -0.79) =.2148 So, there is a 21.48% chance that the single CD player will die in under 6 years.
Example For the sample of 10 players: So, there is a 0.66% chance that the mean replacement time for a sample of 10 CD players will be less than 6 years.
What conclusions can we draw? Suppose that the mean of 7.1 and standard deviation of 1.4 were for all brands of CD players. Suppose you bought 10 Cheapo- Brand CD players, and the mean replacement time for the 10 players was under 6 years. What would that suggest?