Estimating Means with Confidence

Slides:



Advertisements
Similar presentations
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Confidence Intervals Chapter 12.
Advertisements

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16.
Week 10 Comparing Two Means or Proportions. Generalising from sample IndividualsMeasurementGroupsQuestion Children aged 10 Mark in maths test Boys & girls.
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Significance Testing Chapter 13 Victor Katch Kinesiology.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Significance Tests Chapter 13.
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
Chapter 8 Estimation: Additional Topics
Copyright ©2011 Brooks/Cole, Cengage Learning Analysis of Variance Chapter 16 1.
Stat 301- Day 32 More on two-sample t- procedures.
Chap 9-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 9 Estimation: Additional Topics Statistics for Business and Economics.
Two Population Means Hypothesis Testing and Confidence Intervals With Unknown Standard Deviations.
Need to know in order to do the normal dist problems How to calculate Z How to read a probability from the table, knowing Z **** how to convert table values.
Chapter 11: Inference for Distributions
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. The Diversity of Samples from the Same Population Chapter 19.
CHAPTER 19: Two-Sample Problems
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
1/49 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 9 Estimation: Additional Topics.
Copyright ©2011 Brooks/Cole, Cengage Learning Understanding Sampling Distributions: Statistics as Random Variables Chapter 9 1.
Chapter 19: Two-Sample Problems STAT Connecting Chapter 18 to our Current Knowledge of Statistics ▸ Remember that these formulas are only valid.
AP STATISTICS LESSON 11 – 2 (DAY 1) Comparing Two Means.
More About Significance Tests
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
CHAPTER 10 CONFIDENCE INTERVALS FOR ONE SAMPLE POPULATION
Comparing Two Population Means
Estimates and Sample Sizes Lecture – 7.4
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
Week 111 Power of the t-test - Example In a metropolitan area, the concentration of cadmium (Cd) in leaf lettuce was measured in 7 representative gardens.
Inference for distributions: - Comparing two means IPS chapter 7.2 © 2006 W.H. Freeman and Company.
CHAPTER 18: Inference about a Population Mean
1 Happiness comes not from material wealth but less desire.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Turning Data Into Information Chapter 2.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.2.
Week 8 Confidence Intervals for Means and Proportions.
Two sample problems:  compare the responses in two groups  each group is a sample from a distinct population  responses in each group are independent.
1 Chapter 6 Estimates and Sample Sizes 6-1 Estimating a Population Mean: Large Samples / σ Known 6-2 Estimating a Population Mean: Small Samples / σ Unknown.
© Copyright McGraw-Hill 2000
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lesson Comparing Two Means. Knowledge Objectives Describe the three conditions necessary for doing inference involving two population means. Clarify.
AP Statistics Chapter 24 Comparing Means.
Week111 The t distribution Suppose that a SRS of size n is drawn from a N(μ, σ) population. Then the one sample t statistic has a t distribution with n.
Confidence Interval for a Mean when you have a “small” sample...
ISMT253a Tutorial 1 By Kris PAN Skewness:  a measure of the asymmetry of the probability distribution of a real-valued random variable 
Copyright ©2006 Brooks/Cole A division of Thomson Learning, Inc. Introduction to Probability and Statistics Twelfth Edition Robert J. Beaver Barbara M.
AP Statistics.  If our data comes from a simple random sample (SRS) and the sample size is sufficiently large, then we know that the sampling distribution.
UNIT 3 YOUR FINAL EXAMINATION STUDY MATERIAL STARTS FROM HERE Copyright ©2011 Brooks/Cole, Cengage Learning 1.
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Difference Between Two Means.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
ESTIMATION OF THE MEAN. 2 INTRO :: ESTIMATION Definition The assignment of plausible value(s) to a population parameter based on a value of a sample statistic.
Inference for distributions: - Comparing two means.
+ Chapter 8 Estimating with Confidence 8.1Confidence Intervals: The Basics 8.2Estimating a Population Proportion 8.3Estimating a Population Mean.
AP Statistics Chapter 11 Section 2. TestConfidence IntervalFormulasAssumptions 1-sample z-test mean SRS Normal pop. Or large n (n>40) Know 1-sample t-test.
Objectives (PSLS Chapter 18) Comparing two means (σ unknown)  Two-sample situations  t-distribution for two independent samples  Two-sample t test 
AP Statistics Chapter 24 Comparing Means. Objectives: Two-sample t methods Two-Sample t Interval for the Difference Between Means Two-Sample t Test for.
Copyright ©2011 Brooks/Cole, Cengage Learning Understanding Sampling Distributions: Statistics as Random Variables UNIT V 1.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
More About Confidence Intervals
Chapter 9 Roadmap Where are we going?.
Understanding Sampling Distributions: Statistics as Random Variables
CHAPTER 10 Comparing Two Populations or Groups
Estimating Means With Confidence
Statistics 200 Objectives:
CHAPTER 21: Comparing Two Means
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Chapter 9 Estimation: Additional Topics
Presentation transcript:

Estimating Means with Confidence Chapter 12 Estimating Means with Confidence Copyright ©2011 Brooks/Cole, Cengage Learning

11.1 Introduction to CI for Means A parameter is a population characteristic – value is usually unknown. We estimate the parameter using sample information. A statistic, or estimate, is a characteristic of a sample. A statistic estimates a parameter. A confidence interval is an interval of values computed from sample data that is likely to include the true population value. The confidence level for an interval describes our confidence in the procedure we used. We are confident that most of the confidence intervals we compute using a procedure will contain the true population value. Copyright ©2011 Brooks/Cole, Cengage Learning

More Estimation Situations Situation 1. Estimating the mean of a quantitative variable. Example research questions: What is the mean time that college students watch TV per day? What is the mean pulse rate of women? Population parameter: m (spelled “mu” and pronounced “mew”) = population mean for the variable Sample estimate: = the sample mean for the variable Copyright ©2011 Brooks/Cole, Cengage Learning

More Estimation Situations Situation 2. Estimating the population mean of paired differences for a quantitative variable. Example research questions: What is the mean difference in weights for freshmen at the beginning and end of the first semester? What is the mean difference in age between husbands and wives in Britain? Population parameter: md = population mean of the differences in the two measurements Sample estimate: = the mean of the differences for a sample of the two measurements Copyright ©2011 Brooks/Cole, Cengage Learning

More Estimation Situations Situation 3. Estimating the difference between two populations with regard to the mean of a quantitative variable. Example research questions: How much difference is there in average weight loss for those who diet compared to those who exercise to lose weight? How much difference is there between the mean foot lengths of men and women? Population parameter: m1 – m2 = difference between the two population means. Sample estimate: = difference between the two sample means. Copyright ©2011 Brooks/Cole, Cengage Learning

Paired Data Paired data (or paired samples): when pairs of variables are collected. Only interested in population (and sample) of differences, and not in the original data. Each person measured twice. Two measurements of same characteristic or trait are made under different conditions. Similar individuals are paired prior to an experiment. Each member of a pair receives a different treatment. Same response variable is measured for all individuals. Two different variables are measured for each individual. Interested in amount of difference between two variables. Copyright ©2011 Brooks/Cole, Cengage Learning

Independent Samples Two samples are called independent samples when the measurements in one sample are not related to the measurements in the other sample. Random samples taken separately from two populations and same response variable is recorded. One random sample taken and a variable recorded, but units are categorized to form two populations. Participants randomly assigned to one of two treatment conditions, and same response variable is recorded. Two random samples taken from a population and a separate variable is measured in each sample.. Copyright ©2011 Brooks/Cole, Cengage Learning

Standard Errors Rough Definition: The standard error of a sample statistic measures, roughly, the average difference between the statistic and the population parameter. This “average difference” is over all possible random samples of a given size that can be taken from the population. Technical Definition: The standard error of a sample statistic is the estimated standard deviation of the sampling distribution for the statistic. Copyright ©2011 Brooks/Cole, Cengage Learning

Standard Errors of a Sample Means Example 11.2 Mean Hours Watching TV Poll: Class of 175 students. In a typical day, about how much time to you spend watching television? Variable N Mean Median TrMean StDev SE Mean TV 175 2.09 2.000 1.950 1.644 0.124 Copyright ©2011 Brooks/Cole, Cengage Learning

Standard Error of the Difference Between Two Sample Means Example 11.3 Lose More Weight by Diet or Exercise? Study: n1 = 42 men on diet, n2 = 27 men on exercise routine Diet: Lost an average of 7.2 kg with std dev of 3.7 kg Exercise: Lost an average of 4.0 kg with std dev of 3.9 kg So, Copyright ©2011 Brooks/Cole, Cengage Learning

Using Student’s t-Distribution to Determine the Multiplier t-distribution: bell shape, centered at 0, more spread out than standard normal curve from using s, instead of s. t* multiplier: value in a t-distribution with df = n – 1 such that the area between -t* and +t* equals the desired confidence level. Copyright ©2011 Brooks/Cole, Cengage Learning

11.2 CI Module 3: Confidence Intervals for One Population Mean Lesson 1: Finding a CI for a Mean for Any Sample Size and Any Confidence Level Sample estimate  Multiplier  Standard error Sample estimate: = sample mean Standard error: t-interval: where df = n – 1 for the multiplier t*. Copyright ©2011 Brooks/Cole, Cengage Learning

Conditions Required Two Situations t-interval is Valid: Situation 1: The population of the measurements is |bell-shaped, and a random sample of any size is measured. In practice, for small samples, the data should show no extreme skewness and should not contain any outliers. Situation 2: The population of measurements is not bell-shaped, but a large random sample is measured. A somewhat arbitrary definition of a “large” sample is n ≥ 30, but if there are extreme outliers, it is better to have an even larger sample size than n = 30. Copyright ©2011 Brooks/Cole, Cengage Learning

Example 11.5 Mean Forearm Length Data: Forearm lengths (cm) for a random sample of n = 9 men 25.5, 24.0, 26.5, 25.5, 28.0, 27.0, 23.0, 25.0, 25.0 Note: Dotplot shows no obvious skewness and no outliers. Multiplier t* from Table A.2 with df = 8 is t* = 2.31 95% Confidence Interval: 25.5  2.31(.507)  25.5  1.17  24.33 to 26.67 cm Copyright ©2011 Brooks/Cole, Cengage Learning

Calculating a Confidence Interval for a Population Mean t-interval: Make sure appropriate conditions apply checking sample size and/or a shape picture. For small samples (n < 30) with extreme skewness or outliers, you cannot proceed. Choose a confidence level. Compute the sample mean and standard deviation. Calculate the standard error of the mean. Calculate df = n – 1 Use Table A.2 (or software) to find the multiplier t*. Copyright ©2011 Brooks/Cole, Cengage Learning

Example 11.6 Watching TV Poll: Class of 175 students. In a typical day, about how much time to you spend watching television? The sample mean was 2.09 hours and the sample standard deviation was 1.644 hours. Copyright ©2011 Brooks/Cole, Cengage Learning

Example 11.6 Watching TV A 95% Confidence Interval: 2.09  1.98(.124)  2.09  .25  1.84 to 2.34 hours A 99% Confidence Interval: 2.09  2.63(.124)  2.09  .33  1.76 to 2.42 hours Note: the 99% CI estimate is wider than the 95% one. Copyright ©2011 Brooks/Cole, Cengage Learning

Example 11.6 Watching TV Correct Interpretation of a CI for a Mean We don’t know whether or not the mean TV viewing hours for Penn State students is really between 1.84 to 2.34 hours. It is if we got one of the 95% of possible samples for which the sample mean is close enough to the true population mean for our procedure to work. It isn’t if we happened to get one of the 5% of possible samples for which the sample mean is too far away from the population mean for the procedure to work. Copyright ©2011 Brooks/Cole, Cengage Learning

Example 11.6 Watching TV A Common Misinterpretation of a CI for a Mean We can be fairly certain that the population mean forearm length is covered by this interval. This does not give us any information about the range of individual forearm lengths. Do not fall into the trap of thinking that 95% of men’s forearm lengths are in this interval. Copyright ©2011 Brooks/Cole, Cengage Learning

Example 11.7 What Students Sleep More? Q: How many hours of sleep did you get last night, to the nearest half hour? Class N Mean StDev SE Mean Stat 10 (stat literacy) 25 7.66 1.34 0.27 Stat 13 (stat methods) 148 6.81 1.73 0.14 Note: Bell-shape was reasonable for Stat 10 (with smaller n). Notes: Interval for Stat 10 is wider (smaller sample size) Two intervals do not overlap  Stat 10 average significantly higher than Stat 13 average. Copyright ©2011 Brooks/Cole, Cengage Learning

Lesson 2: Special Case: Approximate 95% Confidence Intervals for Large Samples For sufficiently large samples, the interval Sample estimate  2  Standard error is an approximate 95% confidence interval for a population parameter. Note: The 95% confidence level describes how often the procedure provides an interval that includes the population value. For about 95% of all random samples of a specific size from a population, the confidence interval captures the population parameter. Copyright ©2011 Brooks/Cole, Cengage Learning

Example 11.8 Watching TV t-Interval: 2.09  1.98(.124)  2.09  .25  1.84 to 2.34 hours Approximate 95% Confidence Interval: 2.09  2(.124)  2.09  .248  1.842 to 2.338 hours Article: with 95% confidence, the mean is between 1.8 and 2.3 hours. Technically: We are 95% confident the interval from 1.842 to 2.338 hours is correct in that it captures the mean television viewing hours for Penn State students. Copyright ©2011 Brooks/Cole, Cengage Learning

11.3 CI Module 4: CI for Population Mean of Paired Differences Data: two variables for each of the n individuals or pairs, use the difference d = x1 – x2 Population parameter: md = mean of the differences for the population Sample estimate: = sample mean of the differences Standard error: CI for md: where df = n – 1 for the multiplier t* Copyright ©2011 Brooks/Cole, Cengage Learning

11.3 CI Module 4: CI for Population Mean of Paired Differences Data: two variables for n individuals or pairs; use the difference d = x1 – x2. Population parameter: md = mean of differences for the population = m1 – m2. Sample estimate: = sample mean of the differences Standard deviation and standard error: sd = std dev of sample of differences; Confidence interval for md: where df = n – 1 for the multiplier t*. Copyright ©2011 Brooks/Cole, Cengage Learning

Conditions Required Two Situations t-interval is Valid: Situation 1: The population of differences is |bell-shaped, and a random sample of any size is measured. In practice, for small samples, the differences in the sample should show no extreme skewness and should not contain any outliers. Situation 2: The population of differences is not bell-shaped, but a large random sample is measured. A somewhat arbitrary definition of a “large” sample is n ≥ 30 pairs, but if there are extreme outliers in the sample of differences, it is better to have an even larger sample size than n = 30. Copyright ©2011 Brooks/Cole, Cengage Learning

Example 11.9 Screen Time: Computer vs TV Data: Hours spent watching TV and hours spent on computer per week for n = 25 students. Task: Make a 90% CI for the mean difference in hours spent using computer versus watching TV. Note: Boxplot shows no obvious skewness and no outliers. Copyright ©2011 Brooks/Cole, Cengage Learning

Example 11.9 Screen Time: Computer vs TV Results: Multiplier t* from Table A.2 with df = 24 is t* = 1.71 90% Confidence Interval: 5.36  1.71(3.05)  5.36  5.22  0.14 to 10.58 hours Interpretation: We are 90% confident that the average difference between computer usage and television viewing for students represented by this sample is covered by the interval from 0.14 to 10.58 hours per week, with more hours spent on computer usage than on television viewing. Copyright ©2011 Brooks/Cole, Cengage Learning

Calculating a Confidence Interval for a Population Mean of Paired Differences Make sure appropriate conditions apply checking sample size and/or a shape picture of the differences. Choose a confidence level. Compute the differences for the n pairs in the sample, then find the mean and std dev for the differences. Calculate the standard error of the mean difference. Calculate df = n – 1 Use Table A.2 (or software) to find the multiplier t*. Copyright ©2011 Brooks/Cole, Cengage Learning

11.4 CI Module 5: CI for Difference in Two Population Means (Independent Samples) Lesson 1: The General (Unpooled) Case Copyright ©2011 Brooks/Cole, Cengage Learning

CI for Difference for the Difference between Two Population Means (Indep) Approximate CI for m1 – m2: where t* is the value in a t-distribution with area between -t* and t* equal to the desired confidence level. Approximate df difficult to specify. Use computer software or conservatively use the smaller of the two sample sizes and subtract 1. Note: for an approximate 95% CI, use a multiplier of 2. Copyright ©2011 Brooks/Cole, Cengage Learning

Degrees of Freedom The t-distribution is only approximately correct and df formula is complicated (Welch’s approx): Statistical software can use the above approximation, but if done by-hand then use a conservative df = smaller of n1 – 1 and n2 – 1. Copyright ©2011 Brooks/Cole, Cengage Learning

Necessary Conditions Two samples must be independent and either: Situation 1: Populations of measurements both bell-shaped, and random samples of any size are measured. Situation 2: Large (n  30) random samples are measured. But if there are extreme outliers, or extreme skewness, it is better to have an even larger sample than n = 30. Copyright ©2011 Brooks/Cole, Cengage Learning

Example 11.11 Effect of a Stare on Driving Randomized experiment: Researchers either stared or did not stare at drivers stopped at a campus stop sign; Timed how long (sec) it took driver to proceed from sign to a mark on other side of the intersection. No Stare Group (n = 14): 8.3, 5.5, 6.0, 8.1, 8.8, 7.5, 7.8, 7.1, 5.7, 6.5, 4.7, 6.9, 5.2, 4.7 Stare Group (n = 13): 5.6, 5.0, 5.7, 6.3, 6.5, 5.8, 4.5, 6.1, 4.8, 4.9, 4.5, 7.2, 5.8 Task: Make a 95% CI for the difference between the mean crossing times for the two populations represented by these two independent samples. Copyright ©2011 Brooks/Cole, Cengage Learning

Example 11.11 Effect of a Stare on Driving Checking Conditions: Boxplots show … No outliers and no strong skewness. Crossing times in stare group generally faster and less variable. Copyright ©2011 Brooks/Cole, Cengage Learning

Example 11.11 Effect of a Stare on Driving Note: The df = 21 was reported by the computer package based on the Welch’s approximation formula. The 95% confidence interval for the difference between the population means is 0.14 seconds to 1.93 seconds . Copyright ©2011 Brooks/Cole, Cengage Learning

Lesson 2: Equal Variance Assumption and the Pooled Standard Error May be reasonable to assume the two populations have equal population standard deviations, or equivalently, equal population variances: Estimate of this variance based on the combined or “pooled” data is called the pooled variance. The square root of the pooled variance is called the pooled standard deviation: Copyright ©2011 Brooks/Cole, Cengage Learning

Pooled Standard Error Note: Pooled df = (n1 – 1) + (n2 – 1) = (n1 + n2 – 2). Copyright ©2011 Brooks/Cole, Cengage Learning

Pooled Confidence Interval Pooled CI for the Difference Between Two Means (Independent Samples): where t* is found using a t-distribution with df = (n1 + n2 – 2) and sp is the pooled standard deviation. Copyright ©2011 Brooks/Cole, Cengage Learning

Example 11.14 Male and Female Sleep Times Q: How much difference is there between how long female and male students slept the previous night? Data: The 83 female and 65 male responses from students in an intro stat class. Task: Make a 95% CI for the difference between the two population means sleep hours for females versus males. Note: We will assume equal population variances. Copyright ©2011 Brooks/Cole, Cengage Learning

Example 11.14 Male and Female Sleep Times Two-sample T for sleep [with “Assume Equal Variance” option] Sex N Mean StDev SE Mean Female 83 7.02 1.75 0.19 Male 65 6.55 1.68 0.21 Difference = mu (Female) – mu (Male) Estimate for difference: 0.461 95% CI for difference: (-0.103, 1.025) T-Test of difference = 0 (vs not =): T-Value = 1.62 P = 0.108 DF = 146 Both use Pooled StDev = 1.72 Notes: Two sample standard deviations are very similar. Sample mean for females higher than for males. 95% confidence interval contains 0 so cannot rule out that the population means may be equal. Copyright ©2011 Brooks/Cole, Cengage Learning

Example 11.14 Male and Female Sleep Times Pooled standard deviation and pooled standard error “by-hand”: Copyright ©2011 Brooks/Cole, Cengage Learning

Pooled or Unpooled? If the larger sample size produced the larger standard deviation, the pooled procedure is acceptable because it will be conservative. If the smaller standard deviation accompanies the larger sample size, the pooled test can be quite misleading and not recommended. If sample sizes are equal, the pooled and unpooled standard errors are equal. Unless the sample standard deviations are quite similar, it is best to use the unpooled procedure. Copyright ©2011 Brooks/Cole, Cengage Learning

Example 11.15 Male and Female Sleep Times Note: Unpooled s.e. is 0.30 while pooled s.e. is 0.36 (larger). Copyright ©2011 Brooks/Cole, Cengage Learning

Confidence Interval for the Difference in Two Population Means Make sure appropriate conditions apply checking sample size and/or a shape picture of the differences. Choose a confidence level. Compute the mean and std dev for each sample. Determine whether the std devs are similar enough to pooled procedure can be used. Calculate the appropriate standard error (pooled or unpooled). Calculate the appropriate df. Use Table A.2 (or software) to find the multiplier t*. Copyright ©2011 Brooks/Cole, Cengage Learning