Estimating with Confidence 10-1. Estimating an unknown parameter  Ex: The admissions director at a University proposes using the IQ scores of current.

Slides:



Advertisements
Similar presentations
Estimating a Population Mean
Advertisements

Estimation of Means and Proportions
CHAPTER 14: Confidence Intervals: The Basics
AP Statistics: Section 10.1 A Confidence interval Basics.
Chapter 10: Estimating with Confidence
Sampling Distributions and Sample Proportions
Conditions with σ Unknown Note: the same as what we saw before.
Section 8.2 Estimating a Population Proportion
Chapter 10: Estimating with Confidence
+ DO NOW What conditions do you need to check before constructing a confidence interval for the population proportion? (hint: there are three)
Estimating a Population Mean
Chapter 8: Estimating with Confidence
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.3 Estimating a Population Mean.
ESTIMATING with confidence. Confidence INterval A confidence interval gives an estimated range of values which is likely to include an unknown population.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Lesson Confidence Intervals: The Basics. Knowledge Objectives List the six basic steps in the reasoning of statistical estimation. Distinguish.
AP STATISTICS LESSON 10 – 1 (DAY 2)
Section 8.3 Estimating a Population Mean. Section 8.3 Estimating a Population Mean After this section, you should be able to… CONSTRUCT and INTERPRET.
CHAPTER 18: Inference about a Population Mean
When σ is Unknown The One – Sample Interval For a Population Mean Target Goal: I can construct and interpret a CI for a population mean when σ is unknown.
10.1: Confidence Intervals – The Basics. Introduction Is caffeine dependence real? What proportion of college students engage in binge drinking? How do.
10.1: Confidence Intervals – The Basics. Review Question!!! If the mean and the standard deviation of a continuous random variable that is normally distributed.
Section 8.3 Estimating a Population Mean. Section 8.3 Estimating a Population Mean After this section, you should be able to… CONSTRUCT and INTERPRET.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.3 Estimating a Population Mean.
Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.
Section 10.1 Confidence Intervals
AP Statistics Chapter 10 Notes. Confidence Interval Statistical Inference: Methods for drawing conclusions about a population based on sample data. Statistical.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Confidence Intervals Target Goal: I can use normal calculations to construct confidence intervals. I can interpret a confidence interval in context. 8.1b.
Lesson 10 - R Review of Chapter 10 Confidence Intervals.
Chapter 10: Confidence Intervals
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Estimating with Confidence Section 11.1 Estimating a Population Mean.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.3 Estimating a Population Mean.
10.1 – Estimating with Confidence. Recall: The Law of Large Numbers says the sample mean from a large SRS will be close to the unknown population mean.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.2 Estimating a Population Proportion.
 Here’s the formula for a CI for p: p-hat is our unbiased Estimate of p. Z* is called the critical value. I’ll teach you how to calculate that next. This.
+ Unit 5: Estimating with Confidence Section 8.3 Estimating a Population Mean.
+ Z-Interval for µ So, the formula for a Confidence Interval for a population mean is To be honest, σ is never known. So, this formula isn’t used very.
+ Chapter 8 Estimating with Confidence 8.1Confidence Intervals: The Basics 8.2Estimating a Population Proportion 8.3Estimating a Population Mean.
Copyright © 2010 Pearson Education, Inc. Slide
10.1 Estimating with Confidence Chapter 10 Introduction to Inference.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Estimating with Confidence
Chapter 8: Estimating with Confidence
Inference: Conclusion with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Review of Chapter 10 Confidence Intervals
CHAPTER 14: Confidence Intervals The Basics
Chapter 8: Estimating with Confidence
Estimating with Confidence
Chapter 8: Estimating with Confidence
Estimating with Confidence
Chapter 8: Estimating with Confidence
Click the mouse button or press the Space Bar to display the answers.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
CHAPTER 18: Inference about a Population Mean
8.3 Estimating a Population Mean
CHAPTER 18: Inference about a Population Mean
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Confidence Intervals
2/5/ Estimating a Population Mean.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Presentation transcript:

Estimating with Confidence 10-1

Estimating an unknown parameter  Ex: The admissions director at a University proposes using the IQ scores of current students as a marketing tool. The university provides him with enough $ to administer IQ tests to 50 students. So, he gives the IQ test to an SRS of 50 of the university’s 5000 freshman. The mean IQ score for the sample is = 112. What can the director say about the mean score μ of the population of all 5000 freshman?  Is the mean IQ score μ of all the freshman exactly 112? Probably not.

but…  The law of large numbers tells us that the sample mean from a large SRS will be close to the unknown population mean μ.  Because = 112, we guess that μ is “somewhere around 112”.  How close to 112 is μ likely to be?  To answer this question, we ask another: How would the sample mean vary if we took many samples of 50 freshman from this same population?

Recall…  From last chapter, the means of all samples of size 50 would distribute normally around the true population mean μ with a standard deviation of σ/√50  Remember from our rule that 95% of all samples of size 50 will have a mean that falls within 2 standard deviations of μ.

Suppose we know σ  Suppose we know σ is 15 (this is unrealistic, but just go with it). That means S x = 15/√50 = 2.1  So, in 95% of all samples of size 50, the mean IQ score ( ) will deviate from the true μby 4.2 (up or down…that’s 2 standard deviations above or below).

Here are all our samples…

How confident are we?  Statistical inference uses this fact about what would happen in MANY samples, to express our confidence in being able to capture the true μ in our own ONE sample.  Conclusion: Our sample of 50 freshman gave = 112. The resulting interval is 112 +/- 4.2 (107.8, 116.2).  We say that we are 95% confident that the unknown mean IQ μ for all Big City University freshman is between and

Confidence Interval for Population Mean μ when σ is known  That example was our first scenario for calculating a CI. The calculation depends on 3 important conditions:  1. SRS: the sample comes from a proper sample  2. Normality: The construction of the interval depends on the fact that the sampling distribution of sample means is approximately normal (which it will be, according to the CLT,, as long as our sample sizes are sufficiently large…30 is a usual cutoff)  3. Independence: To keep calculations reasonably accurate when we sample from a finite population, we should sample no more than 10% of the population (our rule of thumb)

Different Confidence Levels and Critical Values  We call our confidence level a C level. While a 95% confidence interval (or confidence level) is most typical, sometimes you are asked for a 99% or 90% interval.  Note that for the 95% CI, we constructed it in the example by taking the Z score, (2 standard deviations) above and below the mean (Z = 1.96 to be precise).  For 90%, a Z score corresponding to our ‘cutoff’ regions is +/  For 99% it’s +/

Formula On the calc: If you have raw data, enter your sample data into L1. Press STAT, choose TESTS, and choose Z: interval. Input method is Data. If you have and σ and select ‘stats’ as your input method and type those in along with n (sample size) and c-level you want (such as.95 or.99). Then choose calculate and hit enter. You get the interval (lower and upper bound) and the sample mean. *This is our confidence interval for the estimate of the unknown μ as and our margin of error is

Margin of Error  There is a tradeoff between margin of error and level of confidence. The margin of error gets smaller as Z* gets smaller (but this also lowers our confidence)  MOE also gets smaller as σ gets smaller (this is hard in reality, but important conceptually). Think of σ and variability as ‘noise’- it’s easier to pin down the true μ when σ is small.  MOE smaller when n gets larger. Because we take the square root of n we must take four times as many observations in order to cut the margin of error in half.

Example and Steps to solving  Suppose the manufacturer of video terminals wants to test screen tension. We know that when the process is operating properly, the σ = 43. Here are the tension readings from an SRS of 20 Screens:

Steps  1. Parameter- identify the population of interest and the parameter you want to draw conclusions about. The population here is “all video terminals”. We want to estimate μ, the mean tension for all these screens.  2. Conditions- choose the appropriate inference procedure. Verify the conditions for using it. Since we know σ, we should use one sample z interval. Now check requirements:  1. SRS (yes)  2. Normality: is the sampling distribution approximately normal? (Yes) The sample size is too small (n = 20) to use the central limit theorem (n>30 is our cutoff) so we look at a boxplot of the sample tension readings (calc). No outliers or strong skewness. The normal probability plot tells us that the sample data is approximately normally distributed. This data gives us no reason to doubt the normality of the population from which they came.  Independence: Since we are sampling without replacement, we must assume that at least 200 video terminals (10)(20) were produced that day.

 Step 3: calculations – if conditions are met, carry out the CI inference procedure for 90% CI.  Enter data in calc, =  (43/√20) =  (43/√20) =  Step 4: Interpretations: So, we are 90% confident the true μ tension lies between (290.5 and 322.1). Always state this part IN CONTEXT! If you wanted to change the confidence level (say to 99%), change your Z* (2.57) and you widen your interval See P. 631 for summary!

Sample size for a desired margin of error  Note- it’s the size of the sample that determines margin of error, the size of the pop does not influence the sample size we need (this is true as long as the population is much larger than the sample)

What if we don’t know σ?  We previously made the unrealistic assumption that we knew the value of σ. In practice, σ is usually unknown so the one sample z interval is rarely used in real life.  So, we use our sample standard deviation S x as an estimate for σ. But we must be punished/penalized for this!  We divide it by n and so our estimated population standard deviation now changes depending on the size of our sample. We call this ‘estimated’ standard deviation the ‘standard error’.  Because of this, we can’t use a normal “Z” distribution for our critical values…instead we use “t”.

Critical T’s  As our N gets bigger, the t distribution gets closer and closer to the normal Z distribution.  The T distribution is based on degrees of freedom which is (n-1) instead of n.  As our sample size gets bigger, n-1 has less impact as compared to n.  Table C gives us critical values for T based on the degrees of freedom (n-1) – so does calc (calc is preferable).

Formula  So the only things that change when we don’t know our population standard deviation is our critical value is now a critical t (we can use the table or the calc…calc recommended) and the standard deviation we are using is:  On calc, same as Z interval, just choose T interval

Paired t procedures  Comparative studies are more convincing than single-sample investigations. (matched pair design).  We use these to compare treatments on 2 different subjects, or before-and-after observations on the same subject.

Important distinction  There are 2 types of studies we learned about earlier: Matched-Pairs design (which includes before-after studies on each individual in our sample, and comparisons between each individual of a pair of similar individuals that we split and assigned to 2 treatments), and comparative studies of 2 INDEPENDENT groups.  When calculating the T-interval on a matched pairs design, we are interested in the DIFFERENCE between the 2 conditions (whether this is a before/after on one individual, or 2 similar individuals being compared). You will always have an equal number in both groups if you are doing matched pairs. For this you define L3 as L1-L2 and do a 1 sample T interval on L3.  In a comparative independent samples design, the 2 samples are INDEPENDENTLY groups (and therefore may even have different numbers in each). They are not matched up in any way- this is what would be a 2- sample T-interval based on L1 and L2.  *For this chapter, 99% of examples will be of the first variety where you take the differences and do a one sample interval on L3. In later chapters we deal with situation 2 more, but it still helps to recognize the difference now.

Example  Caffeine dependence/depression. Population is all people dependent on caffeine. We want to estimate the mean difference diff = placebo - caffeine in depression patients  11 people tested and their scores on a depression test measured (placebo vs. caffeine) (P. 652)  Calc- 2 sample t interval OR, define list 3 as L1 – L2 and do a 1 sample t interval on L3

Depression data SubjectCaffeinePlacebo

Robustness  If outliers are present in the sample data, then the population may not be Normal. The t procedures are NOT robust against outliers because and s are not resistant to outliers.

CI’s for proportions  As always, inference is based on the sampling distribution of a statistic.  Center: the mean is rho. We call the sample proportion (p-hat) is an unbiased estimator of the population proportion p.  Spread: Standard deviation of p hat is √[ρ(1-ρ)/n] provided that the population is at least 10 times as large as the sample.  Shape: If the sample size is large enough that both np and n(1-p) are at least 10, the distribution of p-hat is approximately normal.

 In reality, we don’t know the value of rho (if we did, we wouldn’t need to construct a CI for it!)  So we cannot check whether (n)(rho) and n(1-rho) > 10.  In large samples, p hat will be close to rho so we replace rho by p-hat in determining the values of (n)(rho) and n(1-rho) and so our Standard Error (estimated population proportion standard deviation) is

Remember- P-hat (sample proportion) is the number of successes in your sample divided by total number of individuals in your sample

Calculator- CI for a proportion  Press STAT, choose TESTS and 1-propZint. Enter x (lets say 246), n(lets say 439) and C-level (.95). Calculate.

Choosing a sample size  When planning a study, we may want to actively choose a sample size that will allow us to estimate the parameter within a given margin of error.

P*  When calculating sample size for a specific margin of error, we often don’t know P-hat or Rho (we are running the study in the first place to find this out!)  When you don’t have a ‘best guess estimate’ for your proportion of successes in the population, we make P* =.5(because that’s our most conservative estimate of probability for success: 50/50).  obviously if you are given rho, or you know p hat, use that- it’s our best estimate

Example: P* unknown  A company wants to do a customer service survey where customers rate the service on a scale of 1 – 5 with 4 being satisfied and 5 being very satisfied. The President is interested in the percent of customers who rate them a 4 or a 5. She wants the estimate to be within 3% at a 95% confidence level. It’s too expensive/unreliable to try to question every customer, how many people should they survey?

Example continued  P* =.5 since we don’t know the true population proportion (rho)  1.96 [(√(.5)(.5)/n)] ≤.03 Do some algebra…  [(1.96)(.5)] /.03 ≤ √n  n ≥  So we round up to 1068 participants.  *News reports frequently describe the results of surveys with sample sizes between 1000 and 1500 and a margin of error about 3%.

Summary  See P. 679 for a good summary…