Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inference: Confidence Intervals

Similar presentations


Presentation on theme: "Inference: Confidence Intervals"— Presentation transcript:

1 Inference: Confidence Intervals
© 2009 W.H. Freeman and Company

2 Learning Objectives By the end of this lecture, you should be able to:
Describe why we express our inference as an interval as opposed to a single value. Describe the difference between a confidence interval and a confidence level. Describe what we mean when we say that there is a “tradeoff” between the size of the confidence interval and the chosen confidence level. Describe the formula for calculating a confidence interval. Determine a value for z* from a z* table. Calculate and describe a confidence interval description using the form “I am _____ % certain that the true value falls between ________ and _________.” Describe the most realistic technique that might be used for reducing margin of error. Given a desired margin of error, be able to calculate the size of the sample needed to obtain that margin.

3 Concept Overview Perhaps the two most fundamental types of statistical inference are the calculation of confidence intervals, and significance testing. In this lecture, we will focus on confidence intervals. Confidence Intervals: This is where we estimate the value of some population variable by using data obtained from a sample. Because the process of inference begins with a sample, and because we recognize that all samples vary, we estimate the true (population) value by providing not one single value, but rather, an interval. In addition to providing an interval, we also state how confident we are that our interval does indeed contain the true (population) value. Tests of Significance: Assessing the evidence for a claim. Estimating the possibility that the value we got from our sample was unrealistically high or low due to a “fluky” sample. This will be discussed in an upcoming lecture.

4 Mean and SD of the Population
Important Note: Recall that for our current discussion of inference, this section assumes we know the mean and SD of the population even though in the real world, we typically do not. As has been mentioned, a later discussion in a statistics course covers how to do statistical inference and confidence intervals even when we don’t know the sd of the population. It is not an overly difficult subject, but we will not have the time to cover it in this 10-week course. Also recall that if we have been given a population SD, but need to know the SD of a sample, we must divide the population SD by the squre root of ‘n’.

5 Overview of inference Statistical inference is all about using the information from your sample to draw conclusions about your population. Sample information  Population information E.g. Average undergrad loan amount at DePaul: Given a survey of 200 randomly sampled DePaul students, what can we infer is the average loan amount for all DePaul students? We begin the inference process by generating a confidence interval. Reporting this interval should take a form similar to the following: We will do this by expressing our conclusion in a format similar to: “I am 95% confident that the true average DePaul loan amount lies in the interval between $18,265 and $83,228. Each of the underlined terms has its own particular importance which will be discussed. Let’s start with one of these terms now: The word “true”: Whenever you see the word ‘true’ in a discussion of confidence intervals, it refers to the population value, i.e. the value we are hoping to discover. (Essentially, you can replace the word ‘true’ with the word ‘population’).

6 Overview Example Suppose in your survey of 200 DePaul students, you come up with a mean loan amount of $43,842 . If you had to guess, which of the two options presented here is likely to be the more accurate way of reflecting the true (i.e. population) average? The population mean is $43,842 The population mean is somewhere between $41,000 and $45,000 I would suggest the second one. The reason is that the $43,842 amount we came up with only comes from a single sample of students. Obviously a different example would give a different value. The technique is to take the value from our sample and with that value, calculate a confidence interval. This is the value that we report. The way to report our conclusion then, would be: “Based on our sample, we believe with 95% certainty* that the true (ie. population) value lies somewhere between $41,000 and $45,000. * We’ll discuss the “95% certainty” part shortly…

7 Just HOW certain are we? Stating your Confidence Level
EXAMPLE: We look at a random sample of eggs and come up with a “confidence interval” that says the average egg size ranges between and grams. Confidence Interval = to 66.88 The point is that we are pretty sure that the true population value lies somewhere inside this range. But HOW sure? 99%? 90%? 80%? We need to quantify our degree of certainty that the confidence interval contains the true (population) mean. We quantify our degree of certainty by the ‘Confidence Level’. We FIRST decide on our desired confidence level, and THEN we calculate the confidence interval. We get to choose ANY confidence level we want. The penalty we pay for choosing a higher confidence level, is a wider confidence interval.

8 Confidence Interval v.s. Confidence Level
In order to calculate a confidence interval, we must FIRST decide on our confidence level (C’). The confidence interval is the range of values that (we hope) contains the true value. The size of the confidence interval is determined by the size of the confidence level we choose. If we choose a higher C, we end up with a larger interval and vice-versa. Key Point: The confidence level states how sure we are that the confidence interval we calculated contains the true population value.

9 Choosing a higher confidence level means ending up with a wider confidence interval
Obviously we would prefer to state our conclusions with a higher degree of certainty. Which of the following two levels would you prefer? I am 80% sure that the average height of DePaul women is between 54” and 57” I am 99% sure that the average height of DePaul women is between 45” and 75” However you may have observed that there IS a price to pay. When you choose a higher confidence level, you end up with a wider interval. Not surprisingly, we MUCH prefer narrower confidence intervals over wide intervals! In the example above, the 99% confidence level seems much more desirable, until you recognize the fact that you have a much wider interval. If a confidence interval is too wide, the information may well be useless! Eg: I am 99.99% certain that the true (population) income of DePaul undergraduate ranges from $0 per year to $473,000 per year. So there is a tradeoff between a higher C and a wider interval.

10 Why do we keep choosing 95% as our value for C?
At some point, you may notice that people frequently choose 95% as their confidence level (‘C’). The reason is that most scientific journals have accepted 95% as a somewhat optimal “tradeoff” between confidence level and size of the interval.

11 Downside to Lower C Make sure you are absolutely clear on what ‘C’ represents: It represents the certainty that your interval contains the true population value. Example: What do we mean if we report a 90% confidence level? If C = 90%, then we are saying: “I am 90% sure that the interval contains the true population value.” However, we are ALSO saying that “There is a 10% chance that the interval I’ve provided completely misses the true population value!”

12 Margin of Error A confidence interval is typically expressed as: mean ± m m is called the margin of error Example: 120 ± 6  in this example, m = 6 The confidence interval may also be expressed as: The size of ‘m’ is determined by our desired confidence level. Higher confidence C implies a larger margin of error (m), which, in turn means a wider confidence interval. A larger confidence interval means less precision in our inference conclusion. A lower confidence C results in a lower margin of error, which, in turn means a narrower confidence interval. A narrower confidence interval means more precision in our inference conclusion.

13 Tradeoff between C and m
The calculated margin of error (m) depends directly on the value we choose for our confidence level (C). A higher C means a higher m and vice versa. If you want a higher confidence level (e.g. 99%), then you will have a to accept a wider margin of error. Eg: At 95% you may end up with m = 4.2 If you later decided to increase C to 99% you may end up with m = 6.3 We will learn how to determine m shortly. Similarly, if you are willing to accept a lower confidence level (e.g. 90%), then you will have the benefit of a smaller margin of error: Eg: At C=95% you may end up with m = 3.9 If you later decided to decrease C to 90% you may end up with m = 2.3 Restated: It’s great to have a higher confidence level, but the cost is that we end up with a higher margin of error (i.e. a wider confidence interval).

14 Calculating the margin of error
Recall that the formula for a confidence interval is mean +/- m (margin of error). Here is the formula for calculating the margin of error: Where does this “z*” come from? Answer: The value of z* is dicated by our chosen confidence level. Examples: If our chosen C is 95%, then z* = If our chosen C is 99%, then z* = If our chosen C is 80%, then z* = C z* −z* m m

15 How do we find specific z* values?
Confidence interval = How do we find specific z* values? These can be calculated by using a z-table (Normal table). However, as a shortcut, many tables include a “z* table” as well. These tables provide z* values for a series of “popular” confidence levels. An example of such a z* table is shown here. For a particular confidence level, C, the appropriate z* value is just above it. Example: For a 98% confidence level, z*=2.326 Statistical software, would, of course, also provide this information. Because the most “popular” value for C in scientific literature is 95%, it is almost worth memorizing the corresponding z* which is 1.96.

16 Calculating the z* Let’s review some Normal distribution examples:
The area between z = and z = (just about z=+/-2) contains the middle 95% of values under the density curve. The area between z = -1 and z = +1 contains (roughly) the middle 68% of values under the density curve Pop Quiz: The area between which two z-scores contains the middle 90% of values? Answer: If you want to find the middle 90% of values, you want to find the z-score that has the 5% above it, which is z= and the z-score that has 5% below it which is z= Pop Quiz: The area between which two z-scores contains the middle 99% of values? Answer: If you want to find the middle 99% of values, you want to find the z-score that has the 0.05% above it, which is roughly z= and the z-score that has 0.05% below it which is z=-2.58. Good News: While this is definitely something you should be able to do, I do not plan on asking you to do this calculation on quizzes or exams.

17 Link between confidence level and margin of error
The confidence level C determines the value of z*. Higher C  higher z* Higher z*  higher margin of error The tradeoff: Higher confidence C implies a larger margin of error m (thus less precision in our estimates of the true mean). A lower confidence level C produces a smaller margin of error m (thus better precision in our estimates of the true mean). C z* −z* m m

18 Example Density of bacteria in solution:
Measurement equipment has standard deviation s = 1 * 106 bacteria/ml fluid. We take 3 measurements: 24, 29, and 31 * 106 bacteria/ml fluid Mean: = 28 * 106 bacteria/ml. Find the 96% and 70% CI. Confidence Interval for C=96% z* = 2.054 Interval = 28 ± * (1/√3) = 28 ± 1.19 x bacteria/ml Confidence Interval for C=70% z* = 1.036 Interval = 28 ± * (1/√3) = 28 ± 0.60 x bacteria/ml

19 The key is in the INTERPRETATION
96% confidence interval for the true density, z* = 2.054, and write = 28 ± 2.054(1/√3) = 28 ± 1.19 x bacteria/ml 70% confidence interval for the true density, z* = 1.036, and write = 28 ± 1.036(1/√3) = 28 ± 0.60 x bacteria/ml For the first example, you can say: “I am 96% sure that the true number of bacteria in the population is within 28 ± 1.19 x 106 “ In the second example, you can say: “I am 70% sure that the true number of bacteria in the population is within 28 ± 1.06 x 106 “

20 Example: Attempting to reduce our margin of error
Sometimes you may need a specific (typically some maximum) margin of error . This sometimes comes up in situations such as drug trials, manufacturing specs, etc). Note the formula for m below. The variables used to calculate m are: z*, SD, and n. The population variability (s) is difficult (often impossible) to change. Z* is dicated by the desired confidence level. However, but we can try to adjust our sample size (n). So we rearrange the formula to solve for n as shown here: This way, if possible, we can try to increase our sample size more and more until we reach our desired margin of error. Remember, though, that sample size is not always stretchable at will. There are always costs and constraints associated with large samples, so it is not always possible to increase n.

21 What sample size for a given margin of error?
Density of bacteria in solution: Measurement equipment has standard deviation σ = 1 * 106 bacteria/ml fluid. How many measurements (i.e. what sample size) should you make to obtain a margin of error of at most 0.5 * 106 bacteria/ml with a confidence level of 90%? For a 90% confidence interval, z* = Do examples 6.6 and 6.7 Using only 10 measurements will not be enough to ensure that m is no more than 0.5 * 106. Therefore, we need at least 11 measurements.


Download ppt "Inference: Confidence Intervals"

Similar presentations


Ads by Google