Download presentation
Presentation is loading. Please wait.
Published bySybil Hamilton Modified over 9 years ago
1
Chapter 10 Statistical Inferences Based on Two Samples Statistics for Business (Env) 1
2
Statistical Inferences Based on Two Samples 10.1Comparing Two Population Means by Using Independent Samples: Variances Known 10.2Comparing Two Population Means by Using Independent Samples: Variances Unknown 10.3Paired Difference Experiments 10.4Comparing Two Population Proportions by Using Large, Independent Samples 2
3
Comparing Two Population Means by Using Independent Samples: Variances Known Suppose a random sample has been taken from each of two different populations Suppose that the populations are independent of each other – Then the random samples are independent of each other Then the sampling distribution of the difference in sample means is normally distributed 3
4
4 Do the achievement scores for children taught by method A differ from the scores for children taught by method B?
5
5 A research design that uses a separate sample for each treatment condition (or for each population) is called an independent-measures research design or a between-subjects design. The goal of an independent-measures research study is to evaluate the mean difference between two populations (or between two treatment conditions).
6
Sampling Distribution of the Difference of Two Sample Means #1 Suppose population 1 has mean µ 1 and variance σ 1 2 – From population 1, a random sample of size n 1 is selected which has mean and variance s 1 2 Suppose population 2 has mean µ 2 and variance σ 2 2 – From population 2, a random sample of size n 2 is selected which has mean and variance s 2 2 Then the sample distribution of the difference of two sample means… 6
7
Sampling Distribution of the Difference of Two Sample Means #2 Is normal, if each of the sampled populations is normal – Approximately normal if the sample sizes n 1 and n 2 are large Has mean = µ 1 – µ 2 Has standard deviation 7
8
8 µ1µ1 µ2µ2 If you select one score from each of these two populations, the closest two values are X1 =50 and X2 =30. The two values that are farthest apart are X1 =70 and X2 =20.
9
Sampling Distribution of the Difference of Two Sample Means #3 9
10
z-Based Confidence Interval for the Difference in Means (Variances Known) Let be the mean of a sample of size n 1 that has been randomly selected from a population with mean 1 and standard deviation 1 Let be the mean of a sample of size n 2 that has been randomly selected from a population with 2 and 2 Suppose each sampled population is normally distributed or that the samples sizes n 1 and n 2 are large Suppose the samples are independent of each other, then … 10
11
z-Based Confidence Interval for the Difference in Means Continued A 100(1 – ) percent confidence interval for the difference in populations µ 1 –µ 2 is 11
12
Example 10.1 The Bank Customer Waiting Time Case #1 A random sample of size 100 waiting times observed under the current system of serving customers has a sample mean of 8.79 – Call this population 1 – Assume population 1 is normal or sample size is large – The variance is 4.7 A random sample of size 100 waiting times observed under the new system of time of 5.14 – Call this population 2 – Assume population 2 is normal or sample size is large – The variance is 1.9 Then if the samples are independent … 12
13
Example 10.1 The Bank Customer Waiting Time Case #2 At 95% confidence, z /2 = z 0.025 = 1.96, and According to the calculated interval, the bank manager can be 95% confident that the new system reduces the mean waiting time by between 3.15 and 4.15 minutes 13
14
z-Based Test About the Difference in Means (Variances Known) Test the null hypothesis about H 0 : µ 1 – µ 2 = D 0 – D 0 = µ 1 – µ 2 is the claimed difference between the population means – D 0 is a number whose value varies depending on the situation – Often D 0 = 0, and the null means that there is no difference between the population means 14
15
z-Based Test About the Difference in Means (Variances Known) Use the notation from the confidence interval statement on a prior slide Assume that each sampled population is normal or that the samples sizes n 1 and n 2 are large 15
16
Test Statistic (Variances Known) The test statistic is The sampling distribution of this statistic is a standard normal distribution If the populations are normal and the samples are independent... 16
17
z-Based Test About the Difference in Means (Variances Known) Reject H 0 : µ 1 – µ 2 = D 0 in favor of a particular alternative hypothesis at a level of significance if the appropriate rejection point rule holds (i.e. calculated z is in the rejection region). Rules are on the next slide… 17
18
Hypothesis Tests for Two Population Means Lower-tail test: H 0 : μ 1 μ 2 H 1 : μ 1 < μ 2 i.e., H 0 : μ 1 – μ 2 0 H 1 : μ 1 – μ 2 < 0 Upper-tail test: H 0 : μ 1 ≤ μ 2 H 1 : μ 1 > μ 2 i.e., H 0 : μ 1 – μ 2 ≤ 0 H 1 : μ 1 – μ 2 > 0 Two-tail test: H 0 : μ 1 = μ 2 H 1 : μ 1 ≠ μ 2 i.e., H 0 : μ 1 – μ 2 = 0 H 1 : μ 1 – μ 2 ≠ 0 Two Population Means, Known Population Variances 18
19
Two Population Means, Known Population Variances Lower-tail test: H 0 : μ 1 – μ 2 0 H 1 : μ 1 – μ 2 < 0 Upper-tail test: H 0 : μ 1 – μ 2 ≤ 0 H 1 : μ 1 – μ 2 > 0 Two-tail test: H 0 : μ 1 – μ 2 = 0 H 1 : μ 1 – μ 2 ≠ 0 /2 -z -z /2 zz z /2 Reject H 0 if Z < -Z Reject H 0 if Z > Z Reject H 0 if Z < -Z /2 or Z > Z /2 Hypothesis tests for μ 1 – μ 2 19
20
EXAMPLE The mean income in Kingston is $35,000 for a sample of 35 households. The population s.d. is known to be $7,000. Two cities, Boston and Kingston are both in Massachusetts. The mean household income in Boston is $38,000. The population s.d. is known to be $6,000 for a sample of 40 households. At the.01 significance level can we conclude the mean income in Boston is more?
21
Step 2 Select the level of significance. The.01 significance level is stated in the problem. Step 3 Find the appropriate test statistic. Since both samples are more than 30, we can use z as the test statistic. Step 1 State the null and alternate hypotheses. H 0 : µ B < µ K H 1 : µ B > µ K Step 4 State the decision rule. The null hypothesis is rejected if t is greater than 2.326 or p <.01. EXAMPLE 21
22
Because the computed Z of 1.98.01 ( ), the decision is not to reject the H 0. We cannot conclude that the mean household income in Boston is larger. Step 5: Compute the value of z and make a decision. EXAMPLE 22
23
Comparing Two Population Means by Using Independent Samples: Variances Unknown In general, the true values of the population variances σ 1 2 and σ 2 2 are not known They have to be estimated from the sample variances s 1 2 and s 2 2, respectively 23
24
Comparing Two Population Means by Using Independent Samples: Variances Unknown #2 Also need to estimate the standard deviation of the sampling distribution of the difference between sample means Two approaches: 1.If it can be assumed that σ 1 2 = σ 2 2 = σ 2, then calculate the “pooled estimate” of σ 2 2.If σ 1 2 ≠ σ 2 2, then use approximate methods 24
25
Pooled Estimate of σ 2 Assume that σ 1 2 = σ 2 2 = σ 2 The pooled estimate of σ 2 is the weighted averages of the two sample variances, s 1 2 and s 2 2 The pooled estimate of σ 2 is denoted by s p 2 The estimate of the population standard deviation of the sampling distribution is 25
26
26 One sample compared with 2 samples statistics df2 SS2 Assume that σ 1 2 = σ 2 2 = σ 2 Mean
27
t-Based Confidence Interval for the Difference in Means (Variances Unknown) Select independent random samples from two normal populations with equal variances A 100(1 – ) percent confidence interval for the difference in populations µ 1 – µ 2 is where and t /2 is based on (n 1 +n 2 -2) degrees of freedom (df) 27
28
Step Two: Determine the value of t from the following formula. Step One: Pool the sample standard deviations. Finding the value of the test statistic requires two steps: 28
29
Two Population Means, Unknown Population Variances Lower-tail test: H 0 : μ 1 – μ 2 0 H 1 : μ 1 – μ 2 < 0 Upper-tail test: H 0 : μ 1 – μ 2 ≤ 0 H 1 : μ 1 – μ 2 > 0 Two-tail test: H 0 : μ 1 – μ 2 = 0 H 1 : μ 1 – μ 2 ≠ 0 /2 -t -t /2 tt t /2 Reject H 0 if t < -t Reject H 0 if t > t Reject H 0 if t < -t /2 or t > t /2 Hypothesis tests for μ 1 – μ 2 29
30
A recent EPA study compared the highway fuel economy of domestic and imported passenger cars. A sample of 15 domestic cars revealed a mean of 33.7 mpg with a sample standard deviation of 2.4 mpg. A sample of 12 imported cars revealed a mean of 35.7 mpg with a sample standard deviation of 3.9. At the.05 significance level can the EPA conclude that the mpg is higher on the imported cars? Example: 30
31
Step 1 State the null and alternate hypotheses. H 0 : µ D > µ I H 1 : µ D < µ I Step 2 State the level of significance. The.05 significance level is stated in the problem. Step 3 Find the appropriate test statistic. Both samples are less than 30, so we use the t distribution. Example: (continued) 31
32
Step 4 The decision rule is to reject H 0 if t<-1.708. There are n 1 + n 2 – 2 or 25 degrees of freedom. Step 5 We compute the pooled variance. Example: (continued) 32
33
We compute the value of t as follows. Example: (continued) 33
34
Since a computed z of –1.64 > critical z of –1.71, H 0 can not be rejected. There is insufficient sample evidence to claim a higher mpg on the imported cars. Example: (continued) 34 -1.71-1.64
35
35 To show if boys are heavier than girls of the same age, a survey is conducted in which a sample of 15 boys shows a mean weight of 41Kg and a standard deviation of 3Kg. A group of 10 girls of the same age shows a mean weight of 38Kg and a standard deviation of 2Kg. Assuming both the weights of boys and girls follow the normal distribution. At the level of significant 0.05, test if the average weight of boys is greater than the average weight of girls of the same age. Example: Comparing Mean weights Step 1 State the null and alternate hypotheses. H 0 : µ g > µ b H 1 : µ g < µ b
36
Step 2 State the level of significance. The.05 significance level is stated in the problem. Step 3 Find the appropriate test statistic. Both samples are less than 30, so we use the t distribution. Example: (continued) 36 Step 4 The decision rule is to reject H 0 if t > t 0.05 =1.714. There are n 1 + n 2 – 2 or 23 degrees of freedom.
37
37 Example: (continued) Step 5 Compute the pooled variance and t. S 2 p = [(15-1)*3 2 + (10-1)*2 2 ]/ (15+10-2) = 7.04 S p = 2.65 t = (41-38) / sqrt(7.04*(1/15 + 1/10)) = 2.77 Since t =2.77 > t 0.05 =1.714, we reject H 0. So the mean weight of boys is larger than the mean weight of girls of the same age. =1.714
38
38 Example : Directed reading activities in the classroom A class of 21 third-graders participates in these activities for 8 weeks while a control classroom of 23 third-graders follows the same curriculum without the activities. After the 8 weeks, all children take a reading test (scores in table). At a level of significance 0.05, can we conclude directed reading activities help improve reading ability? Step 1 State the null and alternate hypotheses. H 0 : µ 1 = µ 2 H 1 : µ 1 = µ 2
39
39 Step 2 State the level of significance. The.05 significance level is stated in the problem. Step 3 Find the appropriate test statistic. Both samples are less than 30, so we use the t distribution. Step 4 The decision rule is to reject H 0 if t > t 0.025 =1.97 or t < -t 0.025. There are n 1 + n 2 – 2 or 42 degrees of freedom. Example: Directed reading activities (continued)
40
40 Example: Directed reading activities (continued) Step 5 Compute the pooled variance and t. S 2 p = [(21-1)*11.01 2 + (23-1)*17.15 2 ]/ (21+23-2) = 211.79 t = (51.48-41.52) / sqrt(211.79*(1/21 + 1/23)) = 9.96/4.39=2.27 Since t =2.27 > t 0.025 =1.97, we reject H 0. So there are significant difference between the 2 group.
41
41 Step 1 State the null and alternate hypotheses. H 0 : µ 2 > µ 1 H 1 : µ 2 < µ 1 Step 5 Compute the pooled variance and t. S 2 p = [(21-1)*11.01 2 + (23-1)*17.15 2 ]/ (21+23-2) = 211.79 t = (51.48-41.52) / sqrt(211.79*(1/21 + 1/23)) = 9.96/4.39=2.27 There are n 1 + n 2 – 2 or 42 degrees of freedom. The rule is to reject H 0 if t > t 0.05 =1.65. Example: Directed reading activities (continued)
42
Chap 9-42 Pooled Variance t Test: Example You are a financial analyst for a brokerage firm. Is there a difference in dividend yield between stocks listed on the NYSE & NASDAQ? You collect the following data: NYSE NASDAQ Number 21 25 Sample mean 3.27 2.53 Sample std dev 1.30 1.16 Assuming both populations are approximately normal with equal variances, is there a difference in average yield ( = 0.05)?
43
Calculating the Test Statistic The test statistic is: 43
44
Solution H 0 : μ 1 - μ 2 = 0 i.e. (μ 1 = μ 2 ) H 1 : μ 1 - μ 2 ≠ 0 i.e. (μ 1 ≠ μ 2 ) = 0.05 df = 21 + 25 - 2 = 44 Critical Values: t = ± 1.96 Test Statistic: Decision: Conclusion: Reject H 0 at = 0.05 There is evidence of a difference in means. t 0 1.96 -1.96.025 Reject H 0.025 2.040 44
45
Two kinds of studies So far, we have studied : two sets of sample data that come from two independent populations (e.g. women and men, or students from program A and from program B). However, sometimes we want to study two sets of sample data that come from related populations (e.g. “before treatment” and “after treatment”). Independent samples Paired samples 45
46
Paired/Dependent Samples Dependent samples are samples that are paired or related in some fashion. *The same subjects measured at two different points in time (repeated-measures). *Matched or paired observations *Hypothesis test proceeds just as in the one sample case. Independent samples are samples that are not related in any way. 46
47
Existing System (1) New Software (2) Difference D i 9.98 Seconds 9.88 Seconds.10 9.88 9.86.02 9.84 9.75.09 9.99 9.80.19 9.94 9.87.07 9.84 9.84.00 9.86 9.87 -.01 10.12 9.98.14 9.90 9.83.07 9.91 9.86.05 Paired-Sample t Test: Example Assume you work in the finance department. Is the new financial package faster ( =0.05 level)? You collect the following processing times for same set of jobs: 47
48
Paired-Sample t Test: Example Is the new financial package faster ( 0.05 level)?.072D = H 0 : D H 1 : D Test Statistic Critical Value=1.8331 df = n - 1 = 9 Reject 1.8331 Decision: Reject H 0 t Stat. in the rejection zone. Conclusion: The new software package is faster. 3.66 48
49
Suppose we collect 8 pairs of twins. The first twin in the pair is healthy; the second is not. For each twin, we measure grey matter density (gmd). Is grey matter density in the populations significantly different ? Processed data from the 8 pairs is shown below (units not given). Consider the population differences, D = X 1 - X 2, Paired-Sample: Example-twins
50
Hypothesis Testing Involving Paired Observations where D is the mean of the differences s d is the (sample) s.d. of the differences n is the number of pairs (differences) If σ D is unknown, we can estimate the unknown population standard deviation with a sample standard deviation: The test statistic for D is now a t statistic, with n-1 d.f. (continued) 50
51
The confidence interval for μ D is Confidence Interval of Paired Observations, σ D Unknown where S D is: (continued) 51
52
Chap 9-52 Lower-tail test: H 0 : μ D 0 H 1 : μ D < 0 Upper-tail test: H 0 : μ D ≤ 0 H 1 : μ D > 0 Two-tail test: H 0 : μ D = 0 H 1 : μ D ≠ 0 Paired Samples Hypothesis Testing for Mean Difference, σ D Unknown /2 -t -t /2 tt t /2 Reject H 0 if t < -t Reject H 0 if t > t Reject H 0 if t < -t or t > t Where t has n - 1 d.f.
53
Assume you send your salespeople to a “customer service” training workshop. Has the training made a difference in the number of complaints? You collect the following data: Paired Samples Example Number of Complaints: (2) - (1) Salesperson Before (1) After (2) Difference, D i Chen 6 4 - 2 Li 20 6 -14 Zhang 3 2 - 1 Wang 0 0 0 Wan 4 0 - 4 -21 D = DiDi n = -4.2 Chap 9-53
54
Has the training made a difference in the number of complaints (at the 0.01 level)? H 0 : μ D = 0 H 1 : μ D 0 Test Statistic: Critical Value = ± 4.604 Reject /2 - 4.604 4.604 Decision: Do not reject H 0 (t stat is not in the reject region) Conclusion: There is not a significant change in the number of complaints. Paired Samples: Solution Reject /2 - 1.66 =.01d.f. = n - 1 = 4 D = -4.2 Chap 9-54
55
EXAMPLE 4 An independent testing agency is comparing the daily rental cost for renting a compact car from Hertz and Avis. A random sample of eight cities revealed the following information. At the.05 significance level can the testing agency conclude that there is a difference in the rental charged? CityHertz ($) Avis ($) Atlanta4240 Chicago5652 Cleveland4543 Denver48 Honolulu3732 Kansas City4548 Miami4139 Seattle4650
56
Step 4 H 0 is rejected if t 2.365; or if p-value <.05. We use the t distribution with n-1 or 7 degrees of freedom. Step 2 The stated significance level is.05. Step 3 The appropriate test statistic is the paired t- test. Step 1 H o : d = 0 H 1 : d 0 Step 5 Perform the calculations and make a decision. 56
57
CityHertzAvisd d 2 Atlanta4240 2 4 Chicago5652 416 Cleveland4543 2 4 Denver4848 0 0 Honolulu3732 525 Kansas City4548-3 9 Miami4139 2 4 Seattle4650-416 57
58
58
59
P(t>.894) =.20 for a one-tailed t-test at 7 degrees of freedom. Because 0.894 is less than the critical value, the p-value of.20 > a of.05, do not reject the null hypothesis. There is no difference in the mean amount charged by Hertz and Avis. 59
60
Comparing Two Population Proportions Goal: test a hypothesis or form a confidence interval for the difference between two population proportions (p 1 – p 2 ). The point estimate for the difference is Assumptions: n 1 p 1 5, n 1 (1-p 1 ) 5 n 2 p 2 5, n 2 (1-p 2 ) 5 60 two independent samples from two populations
61
Two Population Proportions The pooled estimate for the overall proportion is: where X 1 and X 2 are the numbers from samples 1 and 2 with the characteristic of interest Since we begin by assuming the null hypothesis is true, we assume p 1 = p 2 and pool the two p s estimates 61
62
Two Population Proportions The test statistic for p 1 – p 2 is a Z statistic: (continued) where 62
63
Confidence Interval for Two Population Proportions Population proportions The confidence interval for p 1 – p 2 is: 63
64
Example Are unmarried workers more likely to be absent from work than married workers? A sample of 250 married workers showed 22 missed more than 5 days last year, while a sample of 300 unmarried workers showed 35 missed more than five days. Use a.05 significance level. 64
65
The null hypothesis is rejected if the computed value of z is greater than 1.65 or the p-value <.05. The pooled proportion =.1036 The null and the alternate hypotheses H 0 : U M 65
66
Because the calculated z of 1.10 < a critical z of 1.65 ( of.05), the null hypothesis is not rejected. We cannot conclude that a higher proportion of unmarried workers miss more days in a year than the married workers. 66
67
Chapter Ten Two-Sample Tests of Hypothesis TWO- Conduct a test of hypothesis regarding the difference in two population proportions with Known/ Unknown Variances FOUR- Conduct a test of hypothesis about the mean difference between paired or dependent observations. ONE- Conduct a test of hypothesis about the difference between two independent population means with Known/ Unknown Variances THREE- Understand the difference between dependent and independent samples. 67
68
68
69
69
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.