Presentation on theme: "Dr Richard Bußmann CHAPTER 12 Confidence intervals for means."— Presentation transcript:
Dr Richard Bußmann CHAPTER 12 Confidence intervals for means
SAMPLING DISTRIBUTION FOR THE MEAN
Student’s t -models are unimodal, symmetric, and bell- shaped, just like the Normal model. But t-models (solid curve) with only a few degrees of freedom have a narrower peak than the Normal model (dashed curve) and have much fatter tails. As the degrees of freedom increase, the t-models look more and more like the Normal model. STUDENT’S T
EXAMPLE Data from a survey of 25 randomly selected customers found a mean age of years and the standard deviation was 9.84 years. 1.What is the standard error of the mean? 2.How would the standard error change if the sample size had been 100 instead of 25? (Assume that s = 9.84 years.)
PRACTICAL SAMPLING DISTRIBUTION MODEL FOR MEANS
FINDING T-VALUES The Student’s t -model is different for each value of degrees of freedom. Typically we limit ourselves to the (80%,) 90%, 95%, and 99% confidence levels. What’s the point of only being 50% confident? You need to be fairly certain, you are right. No matter how broad “right” is defined. We can use technology to give critical values for any number of degrees of freedom and for any confidence levels we need. But guess what? You will still have to deal with tables… Find the (right) table in your book!
FINDING T-VALUES FROM T-TABLES A typical t -table is shown here. The table shows the critical values for varying degrees of freedom, df, and for varying confidence intervals. Since the t -models get closer to the normal as df increases, the final row has critical values from the Normal model and is labeled “ ∞ ”.
FINDING T-VALUES FROM T-TABLES For example, suppose we’ve performed a one-sample t - test with 19 df and a critical value of 1.639, and we want the upper tail P-value. From the table, we see that falls between and All we can say is that the P-value lies between P-values of these two critical values, so 0.05 < P < 0.10.
EXAMPLE: CONSTRUCTING A CONFIDENCE INTERVAL
ASSUMPTIONS & CONDITIONS Independence Assumption There is no way to check independence of the data, but we should think about whether the assumption is reasonable. Randomization Condition The data arise from a random sample or suitably randomized experiment. 10% Condition The sample size should be no more than 10% of the population. For means our samples generally are, so this condition will only be a problem if our population is small.
ASSUMPTIONS & CONDITIONS Normal Population Assumption Student’s t-models won’t work for data that are badly skewed. We assume the data comes from a population that follows a Normal model. Data being Normal is idealized, so we have a “nearly normal” condition we can check. Nearly Normal Condition The data come from a distribution that is unimodal and symmetric. This can be checked by making a histogram.
NEARLY NORMAL CONDITION For very small samples (n < 15), the data should follow a Normal model very closely. If there are outliers or strong skewness, t methods shouldn’t be used. For moderate sample sizes (n between 15 and 40), t methods will work well as long as the data are unimodal and reasonably symmetric. For sample sizes larger than 40 or 50, t methods are safe to use unless the data is extremely skewed. If outliers are present, analyses can be performed twice, with the outliers and without. These are guidelines, not rules! There are not magic numbers that make a method work! Sometimes small samples are fine, sometimes even the biggest samples are too small.
NORMAL POPULATION ASSUMPTION In business, the mean is often the value of consequence. Even when we must sample from a very skewed distribution, the Central Limit Theorem tells us that the sampling distribution of our sample mean will be close to Normal. We can use Student’s t methods without much worry as long as the sample size is large enough.
NORMAL POPULATION ASSUMPTION The histogram below displays the monthly compensation of 500 CEO’s. We see an extremely skewed distribution.
NORMAL POPULATION ASSUMPTION Taking many samples of 100 CEO’s, we obtain the nearly Normal plot below for the sample means.
Data from a survey of 25 randomly selected customers found a mean age of years and the standard deviation was 9.84 years. A 95% confidence interval for the mean is (27.78, 25.90). Check conditions for this interval. EXAMPLE: ASSUMPTIONS AND CONDITIONS
Data from a survey of 25 randomly selected customers found a mean age of years and the standard deviation was 9.84 years. A 95% confidence interval for the mean is (27.78, 25.90). Check conditions for this interval. Independence: Data were gathered from a random sample and should be independent. 10% Condition: These customers are fewer than 10% of the customer population. Nearly Normal: The histogram is unimodal and approximately symmetric. EXAMPLE: ASSUMPTIONS AND CONDITIONS
INTERPRETING CONFIDENCE INTERVALS Confidence intervals for means offer new, tempting, wrong interpretations. Here are some ways to keep from going astray: Don’t say, “95% of all the policies sold by this sales rep have profits between $ and $ ” The confidence interval is about the mean, not about the measurements of individual policies. Don’t say, “We are 95% confident that a randomly selected policy will have a net profit between $ and $ ” This false interpretation is also about individual policies rather than about the mean of the policies.
INTERPRETING CONFIDENCE INTERVALS Don’t say, “The mean profit is $ % of the time.” That’s about means, but still wrong. It implies that the true mean varies, when in fact it is the confidence interval that would have been different had we gotten a different sample. Don’t say, “95% of all samples will have mean profits between $ and $ ” That statement suggests that this interval somehow sets a standard for every other interval. In fact, this interval is no more (or less) likely to be correct than any other.
INTERPRETING CONFIDENCE INTERVALS If the confidence interval is for the mean, then do not interpret the results in terms of individuals. Don’t forget that the true mean does not vary, but the confidence interval will vary based on the sample. Don’t suggest that a particular confidence interval somehow sets the standard for every other interval.
DEGREES OF FREEDOM
POTENTIAL PITFALLS First, you must decide when to use Student’s t methods. Don’t confuse proportions and means. Use Normal models with proportions. Use Student’s t methods with means. Be careful of interpretation when confidence intervals overlap. Don’t assume that the means of overlapping confidence intervals are equal.
POTENTIAL PITFALLS Student’s t methods work only when the Normal Population Assumption is true. Beware of multimodality. If you see this, try to separate the data into groups. Beware of skewed data. If it is skewed, try re-expressing the data Investigate outliers. If they are clearly in error, remove them. If they can’t be removed, you might run the analysis with and without the outlier.
POTENTIAL PITFALLS The are other risks when doing inferences about means. Watch out for bias. Measurements can be biased. Make sure data are independent. Consider whether there are likely violations of independence in the data collection methods.