Inference about Two Populations

Slides:



Advertisements
Similar presentations
1 Selected Sections of Chapters 22 and 24 Confidence Intervals for p 1 - p 2 and µ 1 - µ 2.
Advertisements

1 Chapter 12 Inference About One Population Introduction In this chapter we utilize the approach developed before to describe a population.In.
Chapter 9 Chapter 10 Chapter 11 Chapter 12
1 Inference about Comparing Two Populations Chapter 13.
Announcements Homework 2 will be posted on the web by tonight.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 13 Inference About Comparing Two Populations.
Announcements Extra office hours this week: Thursday, 12-12:45. The midterm will cover through Section I will spend half of Thursday’s class going.
Lecture Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population.
Lecture 10 Inference about the difference between population proportions (Chapter 13.6) One-way analysis of variance (Chapter 15.2)
1 Inference about Comparing Two Populations Chapter 13.
1 Chapter 12 Inference About a Population 2 Introduction In this chapter we utilize the approach developed before to describe a population.In this chapter.
Lecture 9 Inference about the ratio of two variances (Chapter 13.5)
Inferences About Process Quality
1 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: –Investors use variance.
5-3 Inference on the Means of Two Populations, Variances Unknown
Economics 173 Business Statistics Lecture 9 Fall, 2001 Professor J. Petry
Economics 173 Business Statistics Lecture 8 Fall, 2001 Professor J. Petry
1 Economics 173 Business Statistics Lectures 3 & 4 Summer, 2001 Professor J. Petry.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
Economics 173 Business Statistics Lecture 6 Fall, 2001 Professor J. Petry
Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry
Chapter 13 Inference About Comparing Two Populations.
1 Inference about Two Populations Chapter Introduction Variety of techniques are presented to compare two populations. We are interested in:
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 12 Inference About A Population.
Chapter 13 Inference About Comparing Two Populations.
Example (which tire lasts longer?) To determine whether a new steel-belted radial tire lasts longer than a current model, the manufacturer designs the.
1 Nonparametric Statistical Techniques Chapter 17.
Introduction to Inferece BPS chapter 14 © 2010 W.H. Freeman and Company.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Lecture 8 Matched Pairs Review –Summary –The Flow approach to problem solving –Example.
1 Inference about Two Populations Chapter Introduction Variety of techniques are presented whose objective is to compare two populations. We.
1 Confidence Intervals for Two Proportions Section 6.1.
Confidence Intervals for µ 1 - µ 2 and p 1 - p 2 1.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 26 Chapter 11 Section 1 Inference about Two Means: Dependent Samples.
1 Economics 173 Business Statistics Lectures 5 & 6 Summer, 2001 Professor J. Petry.
Chapter 12 Inference About One Population. We shall develop techniques to estimate and test three population parameters.  Population mean   Population.
1 Nonparametric Statistical Techniques Chapter 18.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Economics 173 Business Statistics
Hypothesis Tests l Chapter 7 l 7.1 Developing Null and Alternative
Chapter 10: Comparing Two Populations or Groups
Chapter 10 Two-Sample Tests and One-Way ANOVA.
Chapter 9: Inferences Involving One Population
Confidence Intervals for p1 - p2 and µ1 - µ2
Inference about Comparing Two Populations
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 10: Comparing Two Populations or Groups
Towson University - J. Jung
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Elementary Statistics
STATISTICS INFORMED DECISIONS USING DATA
Elementary Statistics
Inference About Comparing Two Populations
Chapter 13: Inferences about Comparing Two Populations Lecture 7a
EQT 272 PROBABILITY AND STATISTICS ROHANA BINTI ABDUL HAMID
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Hypothesis Testing: The Difference Between Two Population Means
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Presentation transcript:

Inference about Two Populations Chapter 13 Inference about Two Populations

12.1 Introduction Variety of techniques are presented whose objective is to compare two populations. We are interested in: The difference between two means. The ratio of two variances. The difference between two proportions.

13.2 Inference about the Difference between Two Means: Independent Samples Two random samples are drawn from the two populations of interest. Because we compare two population means, we use the statistic .

The Sampling Distribution of is normally distributed if the (original) population distributions are normal . is approximately normally distributed if the (original) population is not normal, but the samples’ size is sufficiently large (greater than 30). The expected value of is m1 - m2 The variance of is s12/n1 + s22/n2

Making an inference about m1 – m2 If the sampling distribution of is normal or approximately normal we can write: Z can be used to build a test statistic or a confidence interval for m1 - m2

Making an inference about m1 – m2 Practically, the “Z” statistic is hardly used, because the population variances are not known. t S12 ? S22 ? Instead, we construct a t statistic using the sample “variances” (S12 and S22).

Making an inference about m1 – m2 Two cases are considered when producing the t-statistic. The two unknown population variances are equal. The two unknown population variances are not equal.

Inference about m1 – m2: Equal variances Calculate the pooled variance estimate by: The pooled variance estimator n2 = 15 n1 = 10 Example: s12 = 25; s22 = 30; n1 = 10; n2 = 15. Then,

Inference about m1 – m2: Equal variances Calculate the pooled variance estimate by: The pooled Variance estimator n2 = 15 n1 = 10 Example: s12 = 25; s22 = 30; n1 = 10; n2 = 15. Then,

Inference about m1 – m2: Equal variances Construct the t-statistic as follows: Perform a hypothesis test H0: m1 - m2 = 0 H1: m1 - m2 > 0 Build a confidence interval or < 0 or 0

Inference about m1 – m2: Unequal variances

Inference about m1 – m2: Unequal variances Conduct a hypothesis test as needed, or, build a confidence interval

Which case to use: Equal variance or unequal variance? Whenever there is insufficient evidence that the variances are unequal, it is preferable to perform the equal variances t-test. This is so, because for any two given samples The number of degrees of freedom for the equal variances case The number of degrees of freedom for the unequal variances case ³

Example: Making an inference about m1 – m2 Do people who eat high-fiber cereal for breakfast consume, on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast? A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal. For each person the number of calories consumed at lunch was recorded.

Example: Making an inference about m1 – m2 Solution: The data are interval. The parameter to be tested is the difference between two means. The claim to be tested is: The mean caloric intake of consumers (m1) is less than that of non-consumers (m2).

Example: Making an inference about m1 – m2 The hypotheses are: H0: (m1 - m2) = 0 H1: (m1 - m2) < 0 To check the whether the population variances are equal, we use (Xm13-01) computer output to find the sample variances We have s12= 4103, and s22 = 10,670. It appears that the variances are unequal.

Example: Making an inference about m1 – m2 Compute: Manually From the data we have:

Example: Making an inference about m1 – m2 Compute: Manually The rejection region is t < -ta,n = -t.05,123 @ -1.658

Example: Making an inference about m1 – m2 Xm13-01 At the 5% significance level there is sufficient evidence to reject the null hypothesis. -2.09 < -1.6573 .0193 < .05

Example: Making an inference about m1 – m2 Compute: Manually The confidence interval estimator for the difference between two means is

Example: Making an inference about m1 – m2 An ergonomic chair can be assembled using two different sets of operations (Method A and Method B) The operations manager would like to know whether the assembly time under the two methods differ.

Example: Making an inference about m1 – m2 Two samples are randomly and independently selected A sample of 25 workers assembled the chair using method A. A sample of 25 workers assembled the chair using method B. The assembly times were recorded Do the assembly times of the two methods differs?

Example: Making an inference about m1 – m2 Assembly times in Minutes Solution The data are interval. The parameter of interest is the difference between two population means. The claim to be tested is whether a difference between the two methods exists.

Example: Making an inference about m1 – m2 Compute: Manually The hypotheses test is: H0: (m1 - m2) = 0 H1: (m1 - m2) ¹ 0 To check whether the two unknown population variances are equal we calculate S12 and S22 (Xm13-02). We have s12= 0.8478, and s22 =1.3031. The two population variances appear to be equal.

Example: Making an inference about m1 – m2 Compute: Manually To calculate the t-statistic we have:

Example: Making an inference about m1 – m2 The rejection region is t < -ta/2,n =-t.025,48 = -2.009 or t > ta/2,n = t.025,48 = 2.009 The test: Since t= -2.009 < 0.93 < 2.009, there is insufficient evidence to reject the null hypothesis. For a = 0.05 Rejection region -2.009 .093 2.009

Example: Making an inference about m1 – m2 Xm13-02 -2.0106 < .93 < +2.0106 .3584 > .05

Example: Making an inference about m1 – m2 Conclusion: There is no evidence to infer at the 5% significance level that the two assembly methods are different in terms of assembly time

Example: Making an inference about m1 – m2 A 95% confidence interval for m1 - m2 is calculated as follows: Thus, at 95% confidence level -0.3176 < m1 - m2 < 0.8616 Notice: “Zero” is included in the confidence interval

Checking the required Conditions for the equal variances case (Example 13.2) Design A The data appear to be approximately normal Design B

13.4 Matched Pairs Experiment What is a matched pair experiment? Why matched pairs experiments are needed? How do we deal with data produced in this way? The following example demonstrates a situation where a matched pair experiment is the correct approach to testing the difference between two population means.

13.4 Matched Pairs Experiment Example 13.3 To investigate the job offers obtained by MBA graduates, a study focusing on salaries was conducted. Particularly, the salaries offered to finance majors were compared to those offered to marketing majors. Two random samples of 25 graduates in each discipline were selected, and the highest salary offer was recorded for each one. The data are stored in file Xm13-03. Can we infer that finance majors obtain higher salary offers than do marketing majors among MBAs?.

13.4 Matched Pairs Experiment Solution Compare two populations of interval data. The parameter tested is m1 - m2 m1 The mean of the highest salary offered to Finance MBAs H0: (m1 - m2) = 0 H1: (m1 - m2) > 0 m2 The mean of the highest salary offered to Marketing MBAs

13.4 Matched Pairs Experiment Solution – continued From the data we have: Let us assume equal variances There is insufficient evidence to conclude that Finance MBAs are offered higher salaries than marketing MBAs.

The effect of a large sample variability Question The difference between the sample means is 65624 – 60423 = 5,201. So, why could we not reject H0 and favor H1 where (m1 – m2 > 0)?

The effect of a large sample variability Answer: Sp2 is large (because the sample variances are large) Sp2 = 311,330,926. A large variance reduces the value of the t statistic and it becomes more difficult to reject H0.

Reducing the variability The range of observations sample A The values each sample consists of might markedly vary... The range of observations sample B

Reducing the variability Differences ...but the differences between pairs of observations might be quite close to one another, resulting in a small variability of the differences. The range of the differences

The matched pairs experiment Since the difference of the means is equal to the mean of the differences we can rewrite the hypotheses in terms of mD (the mean of the differences) rather than in terms of m1 – m2. This formulation has the benefit of a smaller variability. Group 1 Group 2 Difference 10 12 - 2 15 11 +4 Mean1 =12.5 Mean2 =11.5 Mean1 – Mean2 = 1 Mean Differences = 1

The matched pairs experiment Example 13.4 It was suspected that salary offers were affected by students’ GPA, (which caused S12 and S22 to increase). To reduce this variability, the following procedure was used: 25 ranges of GPAs were predetermined. Students from each major were randomly selected, one from each GPA range. The highest salary offer for each student was recorded. From the data presented can we conclude that Finance majors are offered higher salaries?

The matched pairs hypothesis test Solution (by hand) The parameter tested is mD (=m1 – m2) The hypotheses: H0: mD = 0 H1: mD > 0 The t statistic: Finance Marketing The rejection region is t > t.05,25-1 = 1.711 Degrees of freedom = nD – 1

The matched pairs hypothesis test Solution From the data (Xm13-04) calculate:

The matched pairs hypothesis test Solution Calculate t

The matched pairs hypothesis test Xm13-04 3.81 > 1.7109 .0004 < .05

The matched pairs hypothesis test Conclusion: There is sufficient evidence to infer at 5% significance level that the Finance MBAs’ highest salary offer is, on the average, higher than that of the Marketing MBAs.

The matched pairs mean difference estimation

The matched pairs mean difference estimation Using Data Analysis Plus Xm13-04 First calculate the differences, then run the confidence interval procedure in Data Analysis Plus.

Checking the required conditions for the paired observations case The validity of the results depends on the normality of the differences.

13.5 Inference about the ratio of two variances In this section we draw inference about the ratio of two population variances. This question is interesting because: Variances can be used to evaluate the consistency of processes. The relationship between population variances determines which of the equal-variances or unequal-variances t-test and estimator of the difference between means should be applied

Parameter and Statistic Parameter to be tested is s12/s22 Statistic used is Sampling distribution of s12/s22 The statistic [s12/s12] / [s22/s22] follows the F distribution with n1 = n1 – 1, and n2 = n2 – 1.

Parameter and Statistic Our null hypothesis is always H0: s12 / s22 = 1 Under this null hypothesis the F statistic becomes F = S12/s12 S22/s22

Testing the ratio of two population variances Example 13.6 (revisiting Example 13.1) (see Xm13-01) In order to perform a test regarding average consumption of calories at people’s lunch in relation to the inclusion of high-fiber cereal in their breakfast, the variance ratio of two samples has to be tested first. Calories intake at lunch The hypotheses are: H0: H1:

Testing the ratio of two population variances Solving by hand The rejection region is F>Fa/2,n1,n2 or F<1/Fa/2,n2,n1 The F statistic value is F=S12/S22 = .3845 Conclusion: Because .3845<.58 we reject the null hypothesis in favor of the alternative hypothesis, and conclude that there is sufficient evidence at the 5% significance level that the population variances differ.

Testing the ratio of two population variances Example 13.6 (revisiting Example 13.1) (see Xm13-01) In order to perform a test regarding average consumption of calories at people’s lunch in relation to the inclusion of high-fiber cereal in their breakfast, the variance ratio of two samples has to be tested first. The hypotheses are: H0: H1:

Estimating the Ratio of Two Population Variances From the statistic F = [s12/s12] / [s22/s22] we can isolate s12/s22 and build the following confidence interval:

Estimating the Ratio of Two Population Variances Example 13.7 Determine the 95% confidence interval estimate of the ratio of the two population variances in Example 13.1 Solution We find Fa/2,v1,v2 = F.025,40,120 = 1.61 (approximately) Fa/2,v2,v1 = F.025,120,40 = 1.72 (approximately) LCL = (s12/s22)[1/ Fa/2,v1,v2 ] = (4102.98/10,669.77)[1/1.61]= .2388 UCL = (s12/s22)[ Fa/2,v2,v1 ] = (4102.98/10,669.77)[1.72]= .6614

13.6 Inference about the difference between two population proportions In this section we deal with two populations whose data are nominal. For nominal data we compare the population proportions of the occurrence of a certain event. Examples Comparing the effectiveness of new drug versus older one Comparing market share before and after advertising campaign Comparing defective rates between two machines

Parameter and Statistic When the data are nominal, we can only count the occurrences of a certain event in the two populations, and calculate proportions. The parameter is therefore p1 – p2. Statistic An unbiased estimator of p1 – p2 is (the difference between the sample proportions).

Sampling Distribution of Two random samples are drawn from two populations. The number of successes in each sample is recorded. The sample proportions are computed. Sample 1 Sample size n1 Number of successes x1 Sample proportion Sample 2 Sample size n2 Number of successes x2 Sample proportion x n 1 ˆ = p

Sampling distribution of The statistic is approximately normally distributed if n1p1, n1(1 - p1), n2p2, n2(1 - p2) are all greater than or equal to 5. The mean of is p1 - p2. The variance of is (p1(1-p1) /n1)+ (p2(1-p2)/n2)

The z-statistic Because and are unknown the standard error must be estimated using the sample proportions. The method depends on the null hypothesis

Testing the p1 – p2 There are two cases to consider: Then Then Case 1: H0: p1-p2 =0 Calculate the pooled proportion Case 2: H0: p1-p2 =D (D is not equal to 0) Do not pool the data Then Then

Testing p1 – p2 (Case 1) Example 13.8 The marketing manager needs to decide which of two new packaging designs to adopt, to help improve sales of his company’s soap. A study is performed in two supermarkets: Brightly-colored packaging is distributed in supermarket 1. Simple packaging is distributed in supermarket 2. First design is more expensive, therefore,to be financially viable it has to outsell the second design.

Testing p1 – p2 (Case 1) Summary of the experiment results Supermarket 1 - 180 purchasers of Johnson Brothers soap out of a total of 904 Supermarket 2 - 155 purchasers of Johnson Brothers soap out of a total of 1,038 Use 5% significance level and perform a test to find which type of packaging to use.

Testing p1 – p2 (Case 1) Solution The problem objective is to compare the population of sales of the two packaging designs. The data are nominal (Johnson Brothers or other soap) The hypotheses are H0: p1 - p2 = 0 H1: p1 - p2 > 0 We identify this application as case 1 Population 1: purchases at supermarket 1 Population 2: purchases at supermarket 2

Testing p1 – p2 (Case 1) Compute: Manually For a 5% significance level the rejection region is z > za = z.05 = 1.645

Testing p1 – p2 (Case 1) Excel (Data Analysis Plus) Xm13-08 Conclusion: There is sufficient evidence to conclude at the 5% significance level, that brightly-colored design will outsell the simple design.

Testing p1 – p2 (Case 2) Example 13.9 (Revisit Example 13.8) Management needs to decide which of two new packaging designs to adopt, to help improve sales of a certain soap. A study is performed in two supermarkets: For the brightly-colored design to be financially viable it has to outsell the simple design by at least 3%.

Testing p1 – p2 (Case 2) Summary of the experiment results Supermarket 1 - 180 purchasers of Johnson Brothers’ soap out of a total of 904 Supermarket 2 - 155 purchasers of Johnson Brothers’ soap out of a total of 1,038 Use 5% significance level and perform a test to find which type of packaging to use.

Testing p1 – p2 (Case 2) Solution The hypotheses to test are H0: p1 - p2 = .03 H1: p1 - p2 > .03 We identify this application as case 2 (the hypothesized difference is not equal to zero).

Testing p1 – p2 (Case 2) Compute: Manually 15 . 1 038 , ) 1493 ( 904 1991 03 155 180 ˆ 2 = - + ÷ ø ö ç è æ n p D Z The rejection region is z > za = z.05 = 1.645. Conclusion: Since 1.15 < 1.645 do not reject the null hypothesis. There is insufficient evidence to infer that the brightly-colored design will outsell the simple design by 3% or more.

Testing p1 – p2 (Case 2) Using Excel (Data Analysis Plus) Xm13-08

Estimating p1 – p2 Estimating the cost of life saved Two drugs are used to treat heart attack victims: Streptokinase (available since 1959, costs $460) t-PA (genetically engineered, costs $2900). The maker of t-PA claims that its drug outperforms Streptokinase. An experiment was conducted in 15 countries. 20,500 patients were given t-PA 20,500 patients were given Streptokinase The number of deaths by heart attacks was recorded.

Estimating p1 – p2 Experiment results A total of 1497 patients treated with Streptokinase died. A total of 1292 patients treated with t-PA died. Estimate the cost per life saved by using t-PA instead of Streptokinase.

Estimating p1 – p2 Solution The problem objective: Compare the outcomes of two treatments. The data are nominal (a patient lived or died) The parameter to be estimated is p1 – p2. p1 = death rate with t-PA p2 = death rate with Streptokinase

Estimating p1 – p2 Compute: Manually Sample proportions: The 95% confidence interval estimate is

Estimating p1 – p2 Interpretation We estimate that between .51% and 1.49% more heart attack victims will survive because of the use of t-PA. The difference in cost per life saved is 2900-460= $2440. The total cost saved by switching to t-PA is estimated to be between 2440/.0149 = $163,758 and 2440/.0051 = $478,431