Presentation is loading. Please wait.

Presentation is loading. Please wait.

CHAPTER 21: Comparing Two Means

Similar presentations


Presentation on theme: "CHAPTER 21: Comparing Two Means"— Presentation transcript:

1 CHAPTER 21: Comparing Two Means
Basic Practice of Statistics - 3rd Edition CHAPTER 21: Comparing Two Means Basic Practice of Statistics 7th Edition Lecture PowerPoint Slides Chapter 5

2 In Chapter 21, We Cover … Two-sample problems
Comparing two population means Two-sample t procedures Robustness again Avoid the pooled two-sample t procedures* Avoid inference about standard deviations*

3 Two-Sample Problems Suppose we want to compare the mean of some quantitative variable for the individuals in two populations―Population 1 and Population 2. Our parameters of interest are the population means µ1 and µ2. The best approach is to take separate random samples from each population and to compare the sample means. We use the mean response in the two groups to make the comparison. Here’s a table that summarizes these two situations:

4 Comparing Two Population Means
conditions for inference comparing two means We have two SRSs from two distinct populations. The samples are independent. That is, one sample has no influence on the other. We measure the same response variable for both samples. Both populations are Normally distributed. The means and standard deviations of the populations are unknown. In practice, it is enough that the distributions have similar shapes and that the data have no strong outliers. Call the variable 𝑥 1 in the first population and 𝑥 2 in the second because the variable may have different distributions in the two populations. Here is how we describe the two populations: Population Variable Mean Standard deviation 1 𝑥 1 𝜇 1 𝜎 1 2 𝑥 2 𝜇 2 𝜎 2

5 Comparing Two Population Means
Here is how we describe the two samples: To do inference about the difference 𝜇 1 − 𝜇 2 between the means of two populations, we start from the difference 𝑥 1 − 𝑥 2 between the means of the two samples. Population Sample size Sample mean Sample standard deviation 1 𝑛 1 𝑥 1 𝑠 1 2 𝑛 2 𝑥 2 𝑠 2

6 Two-Sample t Procedures
To take variation into account, we would like to standardize the observed difference 𝑥 1 − 𝑥 2 by subtracting its mean, 𝜇 1 − 𝜇 2 , and dividing the result by its standard deviation. Because we don't know the population standard deviations, we estimate them by the sample standard deviations from our two samples. The result is the standard error, or estimated standard deviation, of the difference in sample means: 𝑆𝐸 𝑥 1 − 𝑥 2 = 𝑠 𝑛 𝑠 𝑛 2

7 Two-Sample t Procedures
When we standardize the estimate by subtracting its mean, 𝜇 1 − 𝜇 2 , and dividing the result by its standard error, the result is the two-sample t statistic: 𝑡= 𝑥 1 − 𝑥 2 − 𝜇 1 − 𝜇 𝑠 𝑛 𝑠 𝑛 2 The two-sample t statistic has approximately a t distribution. It does not have exactly a t distribution, even if the populations are both exactly Normal. In practice, however, the approximation is very accurate. There are two practical options for using the two-sample t procedures: We can use technology to determine degrees of freedom OR we can use the smaller of n1 – 1 and n2 – 1 for the degrees of freedom.

8 Two-Sample t Procedures: Confidence Interval for µ1 - µ2
THE TWO-SAMPLE t PROCEDURES Draw an SRS of size 𝑛 1 from a large Normal population with unknown mean 𝜇 1 , and draw an independent SRS of size 𝑛 2 from another large Normal population with unknown mean 𝜇 2 . A level C confidence interval for 𝝁 𝟏 − 𝝁 𝟐 is given by 𝑥 1 − 𝑥 2 ± 𝑡 ∗ 𝑠 𝑛 𝑠 𝑛 2 Here, 𝑡 ∗ is the critical value for confidence level C for the t distribution with degrees of freedom from either Option 1 (software) or Option 2 (the smaller of 𝑛 1 −1 and 𝑛 2 −1).

9 Example STATE: People gain weight when they take in more energy from food than they expend. James Levine and his collaborators at the Mayo Clinic investigated the link between obesity and energy spent on daily activity with data from a study with 𝑛 1 = 𝑛 2 =10 health volunteers; 10 who were lean, 10 who were mildly obese but still healthy. They wanted to address the question: Do lean and obese people differ in the average time they spend standing and walking? PLAN: Give a 90% confidence interval for 𝜇 1 − 𝜇 2 , the difference in average daily minutes spent standing and walking between lean and mildly obese adults. SOLVE: Examination of the data reveals all conditions for inference can be (at least reasonably) assumed; the distributions are a bit irregular, but with only 10 observations this is to be expected.

10 Example SOLVE: (cont’d) The descriptive statistics:
For using Option 2 (conservative degrees of freedom in absence of technology), 𝑛 1 −1= 𝑛 2 −1=9, and t* = 1.833, giving: 𝑥 1 − 𝑥 2 ± 𝑡 ∗ 𝑠 𝑛 𝑠 𝑛 2 = − ± = ±73.390= to minutes Software using Option 1 gives df = and t* = 1.752, for a confidence interval of to minutes—narrower because Option 2 is conservative. CONCLUDE: Whichever interval we report, we are (at least) 90% confident that the mean difference in average daily minutes spent standing and walking between lean and mildly obese adults lies in this interval. Group 𝑛 Mean, 𝒙 Std. Dev., s 1 (lean) 10 2 (obese) 67.498

11 Two-Sample t Procedures: Two-Sample t Test
THE TWO-SAMPLE t PROCEDURES To test the hypothesis 𝑯 𝟎 : 𝝁 𝟏 = 𝝁 𝟐 , calculate the two-sample t statistic: 𝑡= 𝑥 1 − 𝑥 𝑠 𝑛 𝑠 𝑛 2 Find P-values from the t distribution with degrees of freedom from either Option 1 (software) or Option 2 (the smaller of 𝑛 1 −1 and 𝑛 2 −1).

12 Two-Sample t Test for the Difference Between Two Means

13 Example Community service and attachment to friends
STATE: Do college students who have volunteered for community service work differ from those who have not? A study obtained data from 57 students who had done service work and 17 who had not. One of the response variables was a measure of attachment to friends. Here are the results: PLAN: The investigator had no specific direction for the difference in mind before looking at the data, so the alternative is two-sided. We will test the following hypotheses: 𝐻 0 : 𝜇 1 = 𝜇 2 𝐻 𝑎 : 𝜇 1 ≠ 𝜇 2 Group Condition 𝑛 𝒙 s 1 Service 57 105.32 14.68 2 No service 17 96.82 14.26

14 Example SOLVE: The two-sample t statistic:
𝑡 = 𝑥 1 − 𝑥 𝑠 𝑛 𝑠 𝑛 = − = =2.142 Software (Option 1) says that the two-sided P-value is For using Option 2, 𝑛 1 −1=56, 𝑛 2 −1=16, and therefore comparing our test statistic of to two-sided critical values of a t(16) distribution, Table C shows the P-value is between 0.05 and 0.04. CONCLUDE: The data give moderately strong evidence (P < 0.05) that students who have engaged in community service are, on the average, more attached to their friends.

15 Robustness Again The two-sample t procedures are more robust than the one-sample t methods, particularly when the distributions are not symmetric. When the sizes of the two samples are equal and the two populations being compared have distributions with similar shapes, probability values from the t table are quite accurate for a broad range of distributions when the sample sizes are as small as 𝑛 1 = 𝑛 2 =5. When the two population distributions have different shapes, larger samples are needed. As a guide to practice, adapt the guidelines for one-sample t procedures to two-sample procedures by replacing “sample size” with the “sum of the sample sizes,” 𝑛 1 + 𝑛 2 . Caution: In planning a two-sample study, choose equal sample sizes whenever possible. The two-sample t procedures are most robust against non-Normality in this case, and the conservative Option 2 probability values are most accurate.

16 Avoid the Pooled Two-Sample t Procedures*
Many calculators and software packages offer a choice of two-sample t statistics. One is often labeled for “unequal” variances; the other for “equal” variances. The “unequal” variance procedure is our two-sample t. Never use the pooled t procedures if you have software or technology that will implement the “unequal” variance procedure.

17 Avoid Inference About Standard Deviations*
There are methods for inference about the standard deviations of Normal populations. The most common such method is the “F test” for comparing the standard deviations of two Normal populations. Unlike the t procedures for means, the F test for standard deviations is extremely sensitive to non-Normal distributions. We do not recommend trying to do inference about population standard deviations in basic statistical practice.


Download ppt "CHAPTER 21: Comparing Two Means"

Similar presentations


Ads by Google