Presentation on theme: "Chapter 13 Comparing Two Population Parameters AP Statistics Hamilton and Mann."— Presentation transcript:
Chapter 13 Comparing Two Population Parameters AP Statistics Hamilton and Mann
Lipitor or Pravachol Which drug is more effective at lowering “bad cholesterol?” To figure this out, researchers designed a study they called PROVE-IT. They used 4000 people with heart disease as subjects. These people were randomly assigned to one of two treatment groups: Lipitor or Pravachol. At the end of the study, researchers compared the mean “bad cholesterol levels” for the two groups. For Pravachol it was 95 mg/dl versus 62 mg/dl for Lipitor. Is this difference statistically significant? This is a question about comparing two means.
Lipitor or Pravachol The researchers also compared the proportion of subjects in each group who died, had a heart attack, or suffered other serious consequences within two years. For Pravachol, the proportion was 0.263 and for Lipitor it was 0.224. Is this a statistically significant difference? This is a question about comparing two proportions.
Success vs. Failure in Business How do small businesses that fail differ from small businesses that succeed? Business school researchers compared the asset liability ratios of two samples of firms started in 2000, one sample of failed businesses and one of firms that are still going after two years. This observational study compares two random samples, one from each of two different populations.
Two-Sample Problems Comparing two populations or two treatments is one of the most common situations encountered in statistical practice. We call such situations two- sample problems.
Two-Sample Problems A two-sample problem can arise from a randomized comparative experiment that randomly divides subjects into two groups and exposes each group to a different treatment, like the PROVE-IT Study. Comparing random samples separately selected from two populations, like the successful and failed small businesses, is also a two-sample problem. Unlike the matched pairs designs studied earlier, there is no matching of units in the two samples and two samples can be of different sizes. Inference procedures for two-sample data differ from those of matched pairs.
Comparing Means and Proportions Who is more likely to binge drink: male or female college students? This is obviously a two-sample problem because we are comparing the population of male college students to female college students. To conduct this study, the Harvard School of Public Health surveyed random samples of male and female undergraduates at four-year colleges and universities about their drinking behaviors. This observational study was designed to compare the proportion of undergraduate males who binge drink with the proportion of undergraduate females who binge drink.
Comparing Means and Proportions A bank wants to know which of two incentive plans will most increase the use of its credit cards. We are comparing the effect of two different treatments here, so it is a two-sample problem. It offers each incentive to a random sample of credit card customers and compares the amount charged during the following six months. This is a randomized experiment designed to compare the mean amount spent under each of the two incentive “treatments.”
CHAPTER 13 SECTION 1 Comparing Two Means HW: 13.1, 13.2, 13.4, 13.6, 13.8, 13.10, 13.11, 13.14, 13.16
Comparing Two Means We can examine two-sample data graphically by comparing dotplots or stempots (for small samples) and boxplots or histograms (for large samples). Now we will apply the ideas of formal inference in this setting. When both population distributions are symmetric, and especially when they are approximately Normal, a comparison of the mean responses in the two populations is the most common goal of inference.
Notation There are four unknown parameters, the two means and the two standard deviations. We want to compare the two population means, either by giving a confidence interval for their difference µ 1 - µ 2 or by testing the hypothesis of no difference, H 0 :µ 1 = µ 2. We use the sample means and standard deviations to estimate the unknown parameters. ParametersStatistics PopulationVariableMean Standard Deviation Sample SizeMean Standard Deviation 1x1x1 μ1μ1 11 n1n1 s1s1 2x2x2 μ2μ2 22 n2n2 s2s2
Calcium and Blood Pressure Does increasing the amount of calcium in our diet reduce blood pressure? An examination of a large number of people revealed a relationship between calcium intake and blood pressure. The relationship was strongest for black men. As a result, researchers designed a randomized comparative experiment. The subjects were 21 healthy black men. A randomly chosen group of 10 of the men received calcium supplements for 12 weeks. The other 11 men received a placebo pill that looked similar for the 12 weeks.
Calcium and Blood Pressure The response variable is the decrease in systolic blood pressure for a subject after 12 weeks. An increase appears as a negative response. Group 1 will be the calcium group and Group 2 will be the placebo group. Here are the data. Here are the summary statistics. Group 1 – Calcium Group 7-41817-3-511011-2 Group 2 – Placebo Group 12-33-552-11-3 GroupTreatmentns 1Calcium105.0008.743 2Placebo11-0.2735.901
Calcium and Blood Pressure Notice that the calcium group experienced a drop in blood pressure, while the placebo group shows a small increase, Is this good evidence that calcium decreases blood pressure in the entire population of healthy black men more than a placebo does? This example fits the two-sample setting because we have a separate sample from each treatment and we have not attempted to match them. Since we are testing a claim, we will conduct a significance test and follow the Inference Toolbox.
Calcium and Blood Pressure Step 1: Hypotheses – We write the hypotheses in terms of the mean decreases we would see in the entire population μ 1 of black men taking calcium for 12 weeks and μ 2 for black men taking the placebo for 12 weeks. There are two possible hypotheses: or
Calcium and Blood Pressure Step 2 – Conditions – We do not know the name of the test, but we know the conditions we must check to compare two means. – SRS – The 21 subjects are not an SRS. Therefore, we may not be able to generalize our findings to all healthy black men. Since we randomly assigned treatments, however, any differences can be attributed to the treatments themselves. – Normality – Since we have small samples, we must look at a boxplot and histogram for both samples. There are no serious problems (outliers or serious departure from Normality). – Independence – Since we randomized the treatments, we can safely assume that the calcium and placebo are two independent samples.
Calcium and Blood Pressure The natural estimator of the difference µ 1 - µ 2 is the difference between the sample means: This statistic measures the average advantage of calcium over the placebo. In order to use this, however, we need to know about its sampling distribution. In other words, we need to know what the mean and standard deviation would be for the population of differences if we took repeated samples many times.
The Two-Sample z Statistic Here are the facts about the sampling distribution of the difference between the two sample means of independent SRSs. Therefore, If both populations are Normal, then the distribution of is also Normal with
Two-Sample z Statistic When the statistic has a Normal distribution, we can standardize it to obtain a standard Normal z statistic.
Two-Sample z Statistic In the very unlikely case that we know both population standard deviations, the two-sample z statistic is what we would use to conduct inference about Since we rarely know one, much less two, population standard deviations, we are going to move immediately to the more useful t procedures.
Two-Sample t Procedures Because we don’t know the population standard deviations, we estimate them with the standard deviations from our two samples. The result is the standard error, or estimated standard deviation, of the difference in sample means: We then standardize our estimate the result if the two-sample t statistic:
Two-Sample t Procedures The statistic t has the same interpretation as any z or t statistic: it says how far is from its mean in standard deviation units. The two-sample t statistic has approximately a t distribution. It does not have exactly a t distribution even if the populations are both exactly Normal. The approximation is very close though. There is a catch: we must use a messy formula to calculate the degrees of freedom. Often, the degrees of freedom are not whole numbers.
Two-Sample t Procedures There are two practical options for using the two- sample t procedures: 1.With technology, use the statistic t with accurate critical values from the approximating t distribution. 2.Without technology, use the statistic t with critical values from the t distribution with degrees of freedom equal to the smaller of n 1 – 1 and n 2 – 1. These procedures are always conservative for any two Normal populations. Technology will obviously use method 1. We are going to start by looking at how to do method 2.
Two-Sample t Procedures These two-sample t procedures always err on the safe side, reporting higher P-values and lower confidence than may actually be true. The gap between what is reported and the truth is actually quite small unless the sample sizes are both small and unequal. As the sample sizes increase, probability values based on t with degrees of freedom equal to the smaller of n 1 – 1 and n 2 – 1 become more accurate. Lets complete our calcium and blood pressure problem from earlier.
Calcium and Blood Pressure Here are the summary statistics again. Step 3 – Calculations Since it was a one-sided test, we are looking for the probability being 1.604 or greater when we have 9 degrees of freedom. From the table, it is between 0.05 and 0.10. GroupTreatmentns 1Calcium105.0008.743 2Placebo11-0.2735.901
Calcium and Blood Pressure Step 4 – Interpretation – The experiment provides some evidence that calcium reduces blood pressure, but the evidence falls short of the traditional 5% and 1% levels of significance. We would fail to reject H 0 at both significance levels.
Creating a Confidence Interval We can estimate the difference in mean decreases in blood pressure for the hypothetical calcium and placebo populations using a two-sample t interval. We have already checked all of the conditions. Recall Since the 90% confidence interval includes 0, we cannot reject H 0 : μ 1 – μ 2 = 0 against the two-sided alternative at the α = 0.10 level of significance. GroupTreatmentnS 1Calcium105.0008.743 2Placebo11-0.2735.901
Sample Size Matters Sample sizes strongly influence the P-value of a test. A result that fails to be significant at a specified level α in a small sample may be significant in a larger sample. For instance, the difference of 5.273 in the mean systolic blood pressures between our two groups was not significant. In a larger study with more subjects, they were able to obtain a P-value of 0.008.
Robustness Again The two-sample t procedures are more robust than the one-sample t procedures, particularly when the distributions are not symmetric. When the sizes of the two samples are equal and the two populations being compared have distributions with similar shapes, probability values from the t table are quite accurate for a broad range of distributions for samples as small as 5. When the populations have different shapes, larger samples are needed.
Robustness Again As a guide to practice, adapt the guidelines on p. 655 for the use of one-sample t procedures to two- sample t procedures by replacing “sample size” with the “sum of the sample sizes” as long as both samples are at least 5. These guidelines err on the side of safety, especially when the two-samples are of equal size. Whenever possible, try to make both samples the same size. Two-sample procedures are most robust against non-Normality when the sample sizes are equal and the conservative P-values are most accurate.
Software Approximations for the DF The t procedures remain exactly as before except that we use the t distribution with df given by the formula in the box above to give critical values and find P-values.
Calcium and Blood Pressure Here are the summary statistics again. For improved accuracy, lets calculate the df given by the formula on the prior slide. GroupTreatmentns 1Calcium105.0008.743 2Placebo11-0.2735.901
Notice that the P-value here is 0.064 compared to the 0.0716 we got from the conservative approach.
Degrees of Freedom The formula from the box will always give us df at least as large as the smaller of the two samples and never bigger than n 1 + n 2 -2. The number of degrees of freedom is generally not a whole number. Since the table only has whole numbers, we will need to use technology to do these calculations easily. Let’s do the Calcium and Blood Pressure problem on the calculator! We should use the calculator to do these calculations from now on!
DDT Poisoning Poisoning by the pesticide DDT causes convulsions in humans and other mammals. Researchers seek to understand how the convulsions are caused. In a randomized comparative experiment, the compared 6 white rats poisoned with DDT with a control group of 6 unpoisoned rats. Electrical measurements of nerve activity are the main clue to the nature of DDT poisoning. When a nerve is stimulated, its electrical response shows a sharp spike followed by a much smaller second spike. The experiment found that the second spike is larger in rats fed DDT than in normal rats.
DDT Poisoning The researchers measured the height (or amplitude) of the second spike as a percent of the first spike when a nerve in the rats leg was stimulated. For the poisoned rats the results were: For the control group the results were: Let’s conduct a significance test at the 0.05 significance level to determine if there is a difference using the calculator. 12.20716.86925.05022.4298.45620.589 11.0749.68612.0649.3518.1826.642
DDT Poisoning Step 1 – Hypotheses – We want to compare the mean height μ 1 of the second- spike electrical response in rats fed DDT with the mean height μ 2 of the second-spike electrical response in the population of normal rats. Or
DDT Poisoning Step 2 – Conditions – Since both population standard deviations are unknown we need to conduct a 2-sample t test. – SRS – By randomly assigning the rats to the treatments, we can conclude that differences are a result of the treatment. The researchers are willing to assume that the two samples of rats represent an SRS. – Normality – We don’t know if the populations are Normal and do not have a large enough sample. We must look at a boxplot and histogram. No outliers or heavy skewness. – Independence – Due to the random assignment, the researchers can treat the two groups as independent.
DDT Poisoning Step 3 – Calculations – Since it is a two-sided hypothesis, we must find the probability that we are less than -2.99 or greater than 2.99. – The degrees of freedom are df = 5.9 and the P-value from t(5.9) distribution is 0.0246. Step 4 – Conclusion – Since 0.0246 is less than the significance level of 0.05, we reject the null hypothesis and conclude that there is sufficient evidence to conclude that the height of the second-spike electrical response in rats fed DDT differs from that of normal rats.
Pooled Two-Sample t Procedures Do not use them. If a printout says pooled, do not use that. Instead use the one that says unpooled. On the calculator, always do No for pooled. If you want more information you can read it on p. 800.
Prayer and In Vitro Pregnancy Some women want to have children but cannot for medical reasons. One option for these women is in vitro fertilization. About 28% of women who undergo in vitro fertilization get pregnant. Can praying for these women help increase the pregnancy rate? Researchers developed an experiment to help answer this question. (Why not just survey women who have already gone through in vitro to find out if a higher percentage of women who were prayed for got pregnant?)
Prayer and In Vitro Pregnancy A large group of women who were about to undergo in vitro fertilization served as the subjects. Each subject was randomly assigned to the treatment group (prayed for by people who did not know them) or a control group (no prayer). The results: 44 of the 88 women (50%) got pregnant in the treatment (prayer) group while only 21 out of 81 got pregnant in the control group. This seems like a large difference, but is it statistically significant?
Two-Sample Proportions We will use notation that is similar to what we used for two-sample means. We still want to compare two groups, Population 1 and Population 2. Here is the notation: We compare the populations by doing inference about the difference p 1 - p 2 between the population proportions. The statistic that estimates this difference is PopulationPopulation Proportion Sample Size Sample Proportion 1p1p1 n1n1 2p2p2 n2n2
Does Preschool Help? To study the long-term effects of preschool programs for poor children, the High/Scope Educational Research Foundation has followed two groups of Michigan children since early childhood. – Group 1: Control Group – 61 children from population 1, poor children with no preschool – Group 2: Treatment Group – 62 children from population 2, poor children with preschool as 3- and 4-year-olds. Both groups were from the same area and had similar backgrounds. So our sample sizes are n 1 = 61 and n 2 = 62.
Does Preschool Help? One response variable of interest is the need for social services as adults. In the past ten years, 49 of the control sample and 38 of the preschool sample had needed social services. So the sample proportions are: To see if the study provides significant evidence that preschool reduces the later need for social services, we are going to create a 95% confidence interval.
Does Preschool Help? To estimate how large the reduction is, we give a confidence interval for the difference. Both the test and the confidence interval start with the difference in the sample proportions: This means we need to know the sampling distribution of So let’s look at that now!
Sampling Distribution of. Both are random variables because their values would vary if we took repeated samples of the same size. In Chapter 7, we learned that if X and Y are any two random variables then In Chapter 9, we learned that
Using all of this information, we can find the mean and standard deviation of If the two sample proportions are independent, Thus Sampling Distribution of.
As far as the shape, the distribution will be approximately normal when both of the distributions are approximately Normal. In other words, Actually, we are safe performing significance tests about as long as all of these values are greater than 5. The distribution of is on the next graph. Sampling Distribution of.
The standard deviation of involves the unknown parameters p 1 and p 2. Just like in Chapter 12, we must replace these by estimates in order to do inference. Just like in Chapter 12, we do this a bit differently for confidence intervals and significance tests. Sampling Distribution of.
To obtain a confidence interval, replace p 1 and p 2 in the expression for with the sample proportions. The result is the standard error of the statistic The confidence interval again has the form Confidence Intervals for.
Does Preschool Help? Here is a summary of the information from the preschool problem we discussed earlier. We setup our hypotheses earlier. So we have already done Step 1. Here are the Hypotheses as a reminder. or PopulationPopulation Description Sample SizeSample Proportion 1Controln 1 = 61 2Preschooln 2 = 62
Does Preschool Help Step 2 – Conditions – We are going to construct a two-proportion z interval. – SRS – We were not told how the children were selected, so we must be cautious when drawing conclusions. – Normality - Since all are at least 5 we can assume Normality. – Independence – We are fairly certain that there are at least 610 poor children who did not attend preschool and 620 poor children who did attend preschool in our populations of interest.
Does Preschool Help Step 3 – Calculations Step 4 – Interpretation – We are 95% confident that the percent needing social services is between 3.3% and 34.7% lower among those who attended preschool. The interval is wide because of the small sample sizes. Also, our results may be questionable due to the fact that the samples may not have been SRSs.
Observed differences in sample proportions may reflect a difference in the populations, or it may just be due to variation due to random sampling. Significance tests help us to determine if the difference we see is really there or just chance variation. The null hypothesis will always say that there is no difference in the two populations. Hence The alternative hypothesis will always say what kind of difference we expect. Significance Tests for.
To conduct a significance test, we must standardize to get a z statistic. If H 0 is true, all the observations in both samples come from a single population. So, instead of estimating p 1 and p 2 separately, we combine the two samples and use the overall sample proportion to estimate the single population parameter p. Significance Tests for.
We call this single proportion the combined sample proportion. It is Now, we use in place of both in the expression for the standard error of This yields a z statistic that has the standard Normal distribution when H 0 is true. Significance Tests for.
Cholesterol and Heart Attacks High levels of cholesterol in the blood are associated with higher risk of heart attacks. Does using a drug to lower blood cholesterol reduce heart attacks? The Helsinki Heart Study looked at this question by randomly assigning middle-aged men to one of two treatments: 2051 men took the drug gemfibrozil to reduce their cholesterol levels, and a control group of 2030 men took a placebo. During the next 5 years, 56 men in the gemfibrozil group and 84 men in the control group had heart attacks.
Cholesterol and Heart Attacks Is the apparent benefit of gemfibrozil statistically significant? To answer this question, we need to conduct a significance test. To conduct a significance test we need So let’s find PopulationPopulation Description Sample SizeSample Proportion 1Gemfibroziln 1 = 2051 2Controln 2 = 2030
Cholesterol and Heart Attacks Step 1 – Hypotheses – We want to use this comparative randomized experiment to draw conclusions about p 1, the proportion of middle- aged men who would suffer heart attacks after taking gemfibrozil, and p 2, the proportion of middle- aged men who would suffer heart attacks if they only took a placebo. We hope to show that gemfibrozil reduces heart attacks, so we have a one-sided alternative.
Cholesterol and Heart Attacks Step 2 – Conditions - We are going to conduct a two- proportion z test. – SRS – Since the data come from a comparative randomized experiment, we meet this condition. This will allow us to conclude that the treatment caused the differences we observe. Since the men in the experiment were not randomly selected, we may not be able to generalize our results to the population of all middle-aged men. – Normality – We must use to check for Normality since we are assuming that both proportions are the same. So – Independence – Due to the random assignment of men, the two groups of men can be viewed as independent samples.
Cholesterol and Heart Attacks Step 3 – Calculations We believed it would decrease heart attacks, so we need the probability that we are less than or equal to -2.47.
Cholesterol and Heart Attacks Step 4 – Interpretation – Since our P-value (0.0068) is less than 0.01, our results are significant at the α = 0.01 significance level. So there is strong evidence that gemfibrozil reduced the rate of heart attacks.
Don’t Drink the Water The movie A Civil Action tells the story of a legal battle that took place in the small town of Woburn, Massachusetts. A town well that supplied water to East Woburn residents was contaminated by industrial chemicals. During the period that residents drank the water from this well, a sample of 414 births showed 16 birth defects. On the west side of Woburn, a sample of 228 babies born during the same time period revealed 3 with birth defects. The plaintiffs suing the companies responsible for the contamination claimed that these data show that the rate of birth defects was significantly higher in East Woburn, where the contaminated well water was in use. How strong is the evidence supporting the claim? What decision should the judge make?
Don’t Drink the Water To conduct a significance test we need So let’s find Step 1 – Hypotheses – We are interested in seeing if there is a difference in the proportion of birth defects between East and West Coburn. PopulationPopulation Description Sample SizeSample Proportion 1East Coburnn 1 = 414 2West Coburnn 2 = 228
Don’t Drink the Water Conditions – We are going to conduct a Two- Proportion z test. – SRS – We don’t know that they are SRSs, but we will treat them as SRSs. – Normality – We must check our rules. Since each is larger than 5, it is approximately Normal. – Independence – We must assume that both populations are at least 10 times as large as the sample of babies.
Don’t Drink the Water Step 3 - Calculations – The P-value would be the probability that we would be 1.82 or greater. Step 4 – Interpretation – Since the P-value (0.0344) is smaller than the usual level of significance of 0.05, we reject the null hypothesis and conclude that there is reason to believe that the proportion of birth defects was higher in East Coburn.