Confidence Intervals Lecture 3. Confidence Intervals for the Population Mean (or percentage) For studies with large samples, “approximately 95% of the.

Slides:



Advertisements
Similar presentations
Lecture 6 Outline – Thur. Jan. 29
Advertisements

© 2011 Pearson Education, Inc
Estimation in Sampling
Statistics for Business and Economics
Statistics and Quantitative Analysis U4320
1 Introduction to Inference Confidence Intervals William P. Wattles, Ph.D. Psychology 302.
Math 161 Spring 2008 What Is a Confidence Interval?
Chapter 8 Estimating Single Population Parameters
Chapter 7 Sampling and Sampling Distributions
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Confidence Interval Estimation Statistics for Managers.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 10 th Edition.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 7-1 Introduction to Statistics: Chapter 8 Estimation.
Chapter 8 Estimation: Single Population
Chapter Topics Confidence Interval Estimation for the Mean (s Known)
Part III: Inference Topic 6 Sampling and Sampling Distributions
BCOR 1020 Business Statistics
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Inferential Statistics
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Business Statistics, A First Course.
Confidence Intervals W&W, Chapter 8. Confidence Intervals Although on average, M (the sample mean) is on target (or unbiased), the specific sample mean.
Standard error of estimate & Confidence interval.
Copyright © 2005 by Evan Schofer
Power and Sample Size IF IF the null hypothesis H 0 : μ = μ 0 is true, then we should expect a random sample mean to lie in its “acceptance region” with.
9. Statistical Inference: Confidence Intervals and T-Tests
Confidence Interval Estimation
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
ESTIMATING with confidence. Confidence INterval A confidence interval gives an estimated range of values which is likely to include an unknown population.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Chapter 11: Estimation Estimation Defined Confidence Levels
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 11 th Edition.
Chapter 8: Confidence Intervals
Albert Morlan Caitrin Carroll Savannah Andrews Richard Saney.
Confidence Interval Estimation
Estimation of Statistical Parameters
Topic 5 Statistical inference: point and interval estimate
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 11 th Edition.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Sampling Distribution ● Tells what values a sample statistic (such as sample proportion) takes and how often it takes those values in repeated sampling.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
When σ is Unknown The One – Sample Interval For a Population Mean Target Goal: I can construct and interpret a CI for a population mean when σ is unknown.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Statistical Sampling & Analysis of Sample Data
Statistical estimation, confidence intervals
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
STA291 Statistical Methods Lecture 18. Last time… Confidence intervals for proportions. Suppose we survey likely voters and ask if they plan to vote for.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 8-1 Confidence Interval Estimation.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Chapter 10: Confidence Intervals
Hypothesis Testing One-sample means and proportions Lecture 4.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 8. Parameter Estimation Using Confidence Intervals.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 7-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Chapter 8, continued.... III. Interpretation of Confidence Intervals Remember, we don’t know the population mean. We take a sample to estimate µ, then.
8.1 Estimating µ with large samples Large sample: n > 30 Error of estimate – the magnitude of the difference between the point estimate and the true parameter.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
10.1 – Estimating with Confidence. Recall: The Law of Large Numbers says the sample mean from a large SRS will be close to the unknown population mean.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Business Statistics: A First Course 5 th Edition.
6-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Dr.Theingi Community Medicine
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Inference: Conclusion with Confidence
Confidence Interval Estimation
Lecture 7 Sampling and Sampling Distributions
What are their purposes? What kinds?
Chapter 8: Confidence Intervals
Presentation transcript:

Confidence Intervals Lecture 3

Confidence Intervals for the Population Mean (or percentage) For studies with large samples, “approximately 95% of the time, the population mean will be in the interval given by the sample mean plus or minus two standard errors.” A confidence interval is a range of values

WARNING! This DOES NOT imply that for a given experiment the population parameter has a 95% chance of being in the confidence interval. Pregnancy Analogy CORRECT Interpretation: “I am 95% confident that

Learning to work with the Gaussian/Normal Distribution What if I want a 90% confidence interval? We need to learn how to calculate probabilities based on the Normal Distribution! For confidence intervals, we are working with the sample mean, a random variable which approximately follows a normal distribution. We will talk now in general about a random variable, Y, which follows a normal distribution with mean μ and standard deviation σ.

Framework for the Calculation Define z as the value such that (1-α)100% of the population would fall within z standard deviations of the population mean (as in the Empirical Rule). This is equivalent to saying that a single random variable has a (1-α)100% chance of falling within z standard deviations of the population mean. In the context of confidence intervals (and hypothesis tests), z is called a critical value. Then (1-α)100% of the time, the following statements are true:

The Mathematics FACT: follows a “standard normal” distribution.

The Z-score All normal calculations are based on a z-score: where Y is a normal random variable (which may be a single value, an average, etc.) The z-score measures how many standard deviations (or standard errors) an observed value of a normal random variable is away from its mean. Z-scores follow a standard normal distribution, a normal distribution with

Probabilities from the Standard Normal Chapter 4 in textbook & Table A5.2 on p.366 Areas under the Normal curve represent probabilities or proportions of the population. The total area under the curve equals one. Column #2 gives the probability of falling within z standard deviations of the mean, where z is given in column #1. May be used to refer to population distributions or sampling distributions.

Application of a Normal Distribution to a Population Distribution of IQ’s Suppose that I am going to measure IQ scores of UMDNJ –School of Public Health students. The national average of IQ scores 100 points and the standard deviation is 16 points. For now assume that the national average equals the average for public health students. According to the Empirical Rule, approximately 68% of scores are between 84 and 116, and 95% of scores are between 68 and 132.

IQ Example continued Let’s calculate these percentages exactly.  Plus or minus 1 st. dev.:  68.27% = proportion of.6827  Plus or minus 2 st. dev.:  95.45% = proportion of.9545 The probability of a single observed IQ falling within one standard deviation of the mean is equivalent to the proportion of all IQ’s that fall within one standard deviation of the mean.

Probabilities for Single Observations Calculate the probability that the IQ score of one public health student will be within 10 points of the national average.  Distance between mean and observation=10  Z-score = 10/16 = Std. Dev.’s  Probability =

Probabilities for Sample Means (Sampling Distributions) Suppose I do repeated experiments (studies) in which I draw 10 students for each study. Standard error for a mean of 10 measurements SEM = 16/(square root of 10) = 5.06 Therefore, in approximately 68% of these experiments, the average score will be between ( =) and , and 95% of these experiments, the average score will be between (100-2*5.06=) and Equivalently, For any one experiment, we have a 95% chance that the sample mean will fall between and

Sampling Average IQ’s, cont. Calculate the probability that the average IQ score of 10 public health students will be within 10 points of the national average given that the public health distribution of IQ’s is the same as the national distribution. Standard error for a mean of 10 measurements SEM = 16/(square root of 10) = Z-score = 10/5.060 = Probability within ± st.dev.s = about 95% Much larger probability than when looking at at single measurement!

95% Large Sample Confidence Interval Suppose that I sample 50 public health students and find that Sample Mean = points And the Standard Error Estimated from Sample = SEM = 2.5 points What’s the value of z such that 95% of sample means would fall within z standard errors of the population mean?  z = 1.96 Therefore a 95% confidence interval is  ± 1.96(2.5)  = ( , ) Interpretation: We are 95% confident that the mean IQ for all public health students falls between and points. How likely is it that the mean for the public health students is the same as that the national mean (100)?

90% Large Sample Confidence Interval Sample Mean = points Standard Error of the Mean = SEM = 2.5 points z = Therefore a 90% confidence interval is  ± 1.645(2.5) = ( , ) Interpretation:

What Confidence Level should I use? The more confidence we would like to claim in our interval estimate The standard confidence people use is 95%. Smaller confidence levels (such as 90%) may be appropriate, especially for pilot studies or other studies with small sample sizes. Larger confidence levels may also be appropriate when costly reforms rest on the conclusions from those confidence intervals.

Large Sample Confidence Interval for a Population Mean Sample mean ± z SEM Where SEM = square root of s 2 /n Typically, 95% confidence intervals are used, with  z =1.96 Here, 1.96 is the 97.5 th percentile of the standard normal distribution To be modified for smaller sample sizes shortly

Overweight Status and Eating Patterns… (AJPH, 2002) 95% CI for mean BMI of girls is /- 1.96*4.9/sqrt(2000) = (23.09, 23.51) 95% CI for mean BMI of boys is /- 1.96*4.8/sqrt(2000) = (22.79, 23.21) Unfortunately, we have target percentiles for BMI, not target means.

Large Sample Confidence Intervals for Population Proportions p ± z (SE) Where the estimated variance of the sample proportion is SE 2 = p(1-p)/n Typically, 95% confidence intervals are used, with z =1.96

Actual Percent in the Top Targeted 5 th Percentile – 95% Confidence Interval For girls, p =.125, n=2099 SE = sqrt((.125*.875)/2099) = p ± z (SE)  (0.111,.139)  (11.1%, 13.9%) For boys, p =.166, n=2141 SE = sqrt((.166*.834)/2141) = p ± z (SE)  (0.150,.182)  (15.0%, 18.2%) I am 95% confident that the percent of boys that is in the targeted upper 5 th percentile is actually between 15.0% and 18.2% of boys.

“Men Carrying Pollutant Have More Boys” For description, see news article. n =101 p =.57 SE = Thus a 95% confidence interval for the proportion is given by.57 ± 1.96 (.049) Or equivalently (0.473, 0.667)

Interpretation for Pollutant – Boys E.g. I am 95 % confident that the true proportion of boy babies born from parents who both have detectable PCB levels in their blood is between and I.e., if this is one of the 95% of the times that the true parameter falls in the interval, then the mean is between and If the proportion of of boys in the general population is 0.51, is there a difference in the proportion of boys from parents with detectable PCB levels?

Assumptions for “Large Sample CI’s” Random sample Every individual in the population has an equal chance of being selected. Independent observations Selecting one subject does not alter the chances of selection of another subject Large Sample Size – so that the Central Limit Theorem “kicks in” and the standard error is accurately estimated How large is “large”? Depends! If population distribution is approximately normal, then “large” is approximately 40 subjects. The farther the population distribution away from normal, the larger the needed size of the sample. … E.g., binary variables- at least 30 subjects – 5 in each category.

Other Assumptions Study Population = Target Population Variable accurately measures characteristic of interest Values are correctly measured and recorded

Quick Re-evaluation of Large Sample CI’s for Population Means Why do we use a z-critical value in the CI? Due to the CLT, we can say that if the sample size is large, the sample mean will fall within the specified number (z) of standard errors of the mean. More specifically, the 95% CI is derived from noting The standard deviation is assumed to be known. falls between –1.96 and

CI’s for Population Means when the Sample Sizes are Smaller ISSUE: The standard deviation is unknown and, hence, it is estimated. This estimate is generally OK for large sized samples, but not for smaller sized samples. PROBLEM: There is more variation in the z-score with an estimated standard deviation than is allowed for by the Normal distribution.

Thanks to a smart person! SOLUTION: In 1908, William Sealy Gosset working at the Guinness Brewery in Dublin, Ireland showed, mathematically, that the z-score for the sample average in which the population standard deviation has been replaced by the sample standard deviation follows a “t-distribution” with a specified degrees of freedom (d.f.). I.e.,

T-Distribution What does it look like? Like the Normal distribution With larger variance, depending on the d.f. What are “degrees of freedom.” It is difficult to define “Amount of information about the variance.” Practically, for the sample mean, d.f. = n-1

Using the T-distribution for CI’s Assumptions: Approximately normal, Random and independent sample d.f. = n-1 CI: See Table A5.3 on page 368 for a table of critical values for 90%, 95% and 99% CI’s.

Cadmium levels (ng/gram) in mothers The sample sizes are 14 (for smoking) and 18 (for non-smoking mothers). Distributions are bimodal. CI using the t critical value Means and standard errors are: Smoking: & 6.814/sqrt(14) ng/gram Non-smoking: & 6.199/sqrt(18) ng/gram t-critical values for 95% confidence intervals are and respectively for 13 and 17 degrees of freedom

Cadmium Confidence Intervals The confidence intervals are: Smoking: ± (1.821) (16.481, ) Non-smoking: ± (1.461) (11.639, ) These intervals overlap, suggesting that the mean cadmium levels for the populations of smoking and non-smoking mothers are not different. After the midterm, we will talk about more specific ways to test this hypothesis. (Comparison of two population means)

HELP: When do I use which CI? In general for means, use the t-interval. For means (not proportions), the t-interval is robust to violations of the Normality assumption. If the sample distribution is far from normal (as determined by, e.g., histograms), non-parametric or other more “exact” methods may be needed. These are easier to discuss in the context of hypothesis testing.

When do I use which CI?, cont… For proportions with large sample sizes, use the z-interval. “Large” = 30 with between 5 and 25 subjects in the group of interest Is the pollutant-boys sample size OK? 101 subjects with at least 43 of each gender. For proportions with small sample sizes, there exist “exact” confidence intervals. See Table A5.1 & pp.16-18

Assumptions for all CI’s presented so far Observations within the sample are independent of one another. Sample consists of randomly selected subjects that are representative of the target population. In particular, each subject in the study population has an equal chance of being selected.