Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Economics 173 Business Statistics Lectures 5 & 6 Summer, 2001 Professor J. Petry.

Similar presentations


Presentation on theme: "1 Economics 173 Business Statistics Lectures 5 & 6 Summer, 2001 Professor J. Petry."— Presentation transcript:

1 1 Economics 173 Business Statistics Lectures 5 & 6 Summer, 2001 Professor J. Petry

2 2 Inference about the Comparison of Two Populations Chapter 12

3 3 12.1 Introduction Variety of techniques are presented whose objective is to compare two populations. We are interested in: –The difference between two means. –The ratio of two variances. –The difference between two proportions.

4 4 Two random samples are drawn from the two populations of interest. Because we are interested in the difference between the two means, we shall build the statistic for each sample (and support the analysis by the statistic S 2 as well). 12.2Inference about the Difference b/n Two Means: Independent Samples

5 5 î is normally distributed if the (original) population distributions are normal. î is approximately normally distributed if the (original) population is not normal, but the sample size is large.  Expected value of is  1 -  2  The variance of is  1 2 / n 1 +  2 2 / n 2 The Sampling Distribution of

6 6 If the sampling distribution of is normal or approximately normal we can write: Z can be used to build a test statistic or a confidence interval for  1 -  2

7 7 Practically, the “Z” statistic is hardly used, because the population variances are not known. ?? Instead, we construct a “t” statistic using the sample “variances” (S 1 2 and S 2 2 ). S22S22 S12S12 t

8 8 Two cases are considered when producing the t-statistic. –The two unknown population variances are equal. –The two unknown population variances are not equal.

9 9 Case I: The two variances are equal Example: S 1 2 = 25; S 2 2 = 30; n 1 = 10; n 2 = 15. Then, Calculate the pooled variance estimate by: n 2 = 15 n 1 = 10

10 10 Construct the t-statistic as follows: Perform a hypothesis test H 0 :     = 0 H 1 :     > 0; or < 0;or 0 Build an interval estimate

11 11 Case II: The two variances are unequal

12 12 Run a hypothesis test as needed, or, build an interval estimate

13 13 Example 12.1 –Do people who eat high-fiber cereal for breakfast consume, on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast? –A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal. –For each person the number of calories consumed at lunch was recorded.

14 14 Calories consumed at lunch Solution: The data are quantitative. The parameter to be tested is the difference between two means. The claim to be tested is that mean caloric intake of consumers (  1 ) is less than that of non-consumers (  2 ).

15 15 Identifying the technique –The hypotheses are: H 0 : (  1 -  2 ) = 0 H 1 : (  1 -  2 ) < 0 – To check the relationships between the variances, we use a computer output to find the samples’ standard deviations. We have S 1 = 64.05, and S 2 = 103.29. It appears that the variances are unequal. – We run the t - test for unequal variances.  1 <  2 )

16 16 Calories consumed at lunch At 5% significance level there is sufficient evidence to reject the null hypothesis.

17 17 Solving by hand –The interval estimator for the difference between two means is

18 18 Example 12.2 –Do job design (referring to worker movements) affect worker’s productivity? –Two job designs are being considered for the production of a new computer desk. –Two samples are randomly and independently selected A sample of 25 workers assembled a desk using design A. A sample of 25 workers assembled the desk using design B. The assembly times were recorded –Do the assembly times of the two designs differs?

19 19 Assembly times in Minutes Solution The data are quantitative. The parameter of interest is the difference between two population means. The claim to be tested is whether a difference between the two designs exists.

20 20 Solving by hand –The hypotheses test is: H 0 : (  1 -  2 ) = 0 H 1 : (  1 -  2 ) 0 – To check the relationship between the two variances calculate the value of S 1 and S 2. We have S 1 = 0.92, and S 2 =1.14. We can infer that the two variances are equal to one another. – To calculate the t-statistic we have: Let us determine the rejection region

21 21 The rejection region is The test: Since t= 0.93 < 2.009, there is insufficient evidence to reject the null hypothesis. For  = 0.05 2.009.093 Notice the absolute value | t | Rejection region.025

22 22 Rejection region Conclusion: From this experiment, it is unclear at 5% significance level if the two job designs are different in terms of worker’s productivity..025 2.009.093

23 23 The Excel printout P-value of the one tail test P-value of the two tail test Degrees of freedom t - statistic

24 24 A 95% confidence interval for  1 -  2 is calculated as follows: Thus, at 95% confidence level -0.3176 <  1 -  2 < 0.8616 Notice: “Zero” is included in the interval

25 25 Checking the required Conditions for the equal variances case (example 12.2) The distributions are not bell shaped, but they seem to be approximately normal. Since the technique is robust, we can be confident about the results. Design A Design B

26 26 12.4 Matched Pairs Experiment What is a matched pair experiment? Why matched pairs experiments are needed? How do we deal with data produced in this way? The following example demonstrates a situation where a matched pair experiment is the correct approach to testing the difference between two population means.

27 27 Example 12.3 To determine whether a new steel-belted radial tire lasts longer than a current model, the manufacturer designs the following experiment. –A pair of newly designed tires are installed on the rear wheels of 20 randomly selected cars. –A pair of currently used tires are installed on the rear wheels of another 20 cars. –Drivers drive in their usual way until the tires worn out. –The number of miles driven by each driver were recorded. See data next.

28 28 Solution Compare two populations of quantitative data. The parameter is  1 -  2 11 22 The hypotheses are: H 0 : (  1 -  2 ) = 0 H 1 : (  1 -  2 ) > 0 Mean distance driven before worn out occurs for the new design tires Mean distance driven before worn out occurs for the existing design tires

29 29 The hypotheses are H 0 :  1 -  2 = 0 H 1 :  1 -  2 > 0 The test statistic is We run the t test, and obtain the following Excel results. We conclude that there is insufficient evidence to reject H 0 in favor of H 1.

30 30 0 1 2 3 4 5 6 7 45607590105More New design 0 2 4 6 8 10 12 45607590105More Existing design While the sample mean of the new design is larger than the sample mean of the existing design, the variability within each sample is large enough for the sample distributions to overlap and cover about the same range. It is therefore difficult to argue that one expected value is different than the other.

31 31 Example 12.4 –to eliminate variability among observations within each sample the experiment was redone. –One tire of each type was installed on the rear wheel of 20 randomly selected cars (each car was sampled twice, thus creating a pair of observations). –The number of miles until wear-out was recorded

32 32 The values each sample consists of might markedly vary... The range of observations sample B The range of observations sample A So what really happened here?

33 33...but the differences between pairs of observations might be quite close to one another, resulting in a small variability. 0 Differences The range of the differences

34 34 Observe the statistic t shown below and notice how a small variability of the differences (small s D ) helps in rejecting the null hypothesis.

35 35 Solving by hand –Calculate the difference for each x i –Calculate the average differences and the standard deviation of the differences –Build the statistics as follows: –Run the hypothesis test using t distribution with n D - 1 degrees of freedom.

36 36 –The hypotheses test for this problem is H 0 :  D = 0 H 1 :  D > 0 The statistic is The rejection region is: t > t  with d.f. = 20-1 = 19. If  =.05, t.05,19 = 1.729. Since 2.817 > 1.729, there is sufficient evidence in the data to reject the null hypothesis in favor of the alternative hypothesis. Conclusion: At 5% significance level the new type tires last longer than the current type.

37 37 Estimating the mean difference

38 38 Checking the required conditions for the paired observations case The validity of the results depends on the normality of the differences.

39 39 In this section we discuss how to compare the variability of two populations. In particular, we draw inference about the ratio of two population variances. This question is interesting because: –Variances can be used to evaluate the consistency of processes. –The relationships between variances determine the technique used to test relationships between mean values 12.5 Inferences about the ratio of two variances

40 40 Point estimator of  1 2 /  2 2 –Recall that S 2 is an unbiased estimator of  2. –Therefore, it is not surprising that we estimate  1 2 /  2 2 by S 1 2 /S 2 2. Sampling distribution for  1 2 /  2 2 –The statistic [ S 1 2 /  1 2 ] / [ S 2 2 /  2 2 ] follows the F distribution. –The test statistic for  1 2 /  2 2 is derived from this statistic.

41 41 –Our null hypothesis is always H 0 :  1 2 /  2 2 = 1 –Under this null hypothesis the F statistic becomes F = S12/12S12/12 S22/22S22/22 S12S12 S22S22 Testing  1 2 /  2 2

42 42 (see example 12.1) In order to perform a test regarding average consumption of calories at people’s lunch in relation to the inclusion of high-fiber cereal in their breakfast, the variance ratio of two samples has to be tested first. Example 12.5 Calories consumed at lunch The hypotheses are: H 0 : H 1 :

43 43 Solving by hand –The rejection region is F>F  /2, 1, 2 or F<1/F  which becomes (for  =0.05)... –The F statistic value is F=S 1 2 /S 2 2 =.3845 –Conclusion: Because.3845<.63 we can reject the null hypothesis in favor of the alternative hypothesis. –There is sufficient evidence in the data to argue at 5% significance level that the variance of the two groups differ.

44 44 Estimating the Ratio of Two Population Variances From the statistic F = [ S 1 2 /  1 2 ] / [ S 2 2 /  2 2 ] we can isolate  1 2 /  2 2 and build the following interval estimator:

45 45 Example 12.6 –Determine the 95% confidence interval estimate of the ratio of the two population variances in example 12.1 –Solution we find F a/2,v1,v2 = F.025,40,120 = 1.61 (approximately) F a/2,v2,v1 = F.025,120,40 = 1.72 (approximately) LCL = (s 1 2 /s 2 2 )[1/ F a/2,v1,v2 ] = (4102.98/10,669.770)[1/1.61]=.2388 UCL = (s 1 2 /s 2 2 )[ F a/2,v2,v1 ] = (4102.98/10,669.770)[1.72]=.6614

46 46 12.6 Inference about the difference between two population proportions In this section we deal with two populations whose data are qualitative. When data are qualitative we can (only) ask questions regarding the proportions of occurrence of certain outcomes. Thus, we hypothesize on the difference p 1 -p 2, and draw an inference from the hypothesis test.

47 47 Sample 1 Sample size n 1 Number of successes x 1 Sample proportion Sample 1 Sample size n 1 Number of successes x 1 Sample proportion Sampling Distribution of the Difference Between Two sample proportions –Two random samples are drawn from two populations. –The number of successes in each sample is recorded. –The sample proportions are computed. Sample 2 Sample size n 2 Number of successes x 2 Sample proportion Sample 2 Sample size n 2 Number of successes x 2 Sample proportion x n 1 1 ˆ  p 1

48 48 –The statistic is approximately normally distributed if n 1 p 1, n 1 (1 - p 1 ), n 2 p 2, n 2 (1 - p 2 ) are all equal to or greater than 5. –The mean of is p 1 - p 2. –The variance of is p 1 (1-p 1 ) /n 1 )+ (p 2 (1-p 2 )/n 2 ) Because p 1, p 2, are unknown, we use their estimates instead. Thus, are all equal to or greater than 5.

49 49 Testing the Difference between Two Population Proportions –We hypothesize on the difference between the two proportions, p 1 - p 2. –There are two cases to consider: Case 1: H 0 : p 1 -p 2 =0 Calculate the pooled proportion Then Case 2: H 0 : p 1 -p 2 =D (D is not equal to 0) Do not pool the data

50 50 Example 12.7 –A research project employing 22,000 American physicians was conduct to discover whether aspirin can prevent heart attacks. –Half of the participants in the research took aspirin, and half took placebo. –In a three years period,104 of those who took aspirin and 189 of those who took the placebo had had heart attacks. –Is aspirin effective in preventing heart attacks?

51 51 Solution –Identifying the technique The problem objective is to compare the population of those who take aspirin with those who do not. The data is qualitative (Take/do not take aspirin) The hypotheses test are H 0 : p 1 - p 2 = 0 H 1 : p 1 - p 2 < 0 We identify here case 1 so Population 1 - aspirin takers Population 2 - placebo takers

52 52 –Solving by hand For a 5% significance level the rejection region is z < -z  = -z.05 = -1.645 - 5.02 < - 1.645, so reject the null hypothesis.

53 53 Example 12.8 (Marketing application) –Management needs to decide which of two new packaging designs to adopt, to help improve sales of a soap. –A study is performed in two communities: Design A is distributed in Community 1. Design B is distributed in Community 2. The old design packages is still offered in both communities. –For design A to be financially viable it has to outsell design B by at least 3%.

54 54 –Summary of the experiment results Community 1 - 580 packages with new design A sold 324 packages with old design sold Community 2 - 604 packages with new design B sold 442 packages with old design sold –Use 1% significance level and perform a test to find which type of packaging to use.

55 55 Solution –Identifying the technique The problem objective is to compare two populations, consisting of the values “purchase of the new design”, and “purchase of the old design”. Data are qualitative. We need to test p 1 - p 2.. The hypotheses to test are H 0 : p 1 - p 2 =.03 H 1 : p 1 - p 2 >.03 We have to perform case 2 of the test for difference in proportions (the difference is not equal to zero).

56 56 Solving by hand.642 The rejection region is z > z  = z.01 = 2.33. Conclusion: Do not reject the null hypothesis. There is insufficient evidence to infer that packaging with design A will outsell design B by 3% or more.

57 57 Estimating the Difference Between Two Population Proportions Example 12.9 Estimate with 95% the proportion of men who would avoid a heart attack if they take aspirin regularly.

58 58 12.7 Market Segmentation (Optional) Marketing Segmentation is a statistical analysis aimed at determining the differences that exist between buyers and non-buyers of a company’s product. Statistics plays a major role in market segmentation. –Surveys are used to gather the relevant data. –Statistical tests are used to differentiate among segments. –Sales and profit estimates are derived.

59 59 Example 12.10 –A new company in the market offers no-wait services for car oil and filter change. –The company wants to make decisions about where to advertise, and the nature of the advertisement. –A sample of 1000 car owners was selected. The drivers were asked to report whether or not they used a no-wait station, as well as several characteristics of their lives (including age).

60 60 –The research should reveal whether differences in age exist between customers of no-wait service and customers of other types of facilities (see file XM12-10) Solution –Identifying the technique The problem objective is to compare the population of ages of no-wait customers, to the population of ages of other facility users. Data are quantitative. Samples are independent. The parameter to be tested is  1 -  2., (  represents mean age)

61 61 – The hypotheses are H 0 :  1 -  2 = 0 H 1 :  1 -  2 = 0 – When testing for the relationship between the two variances we get the following results We run the test for  1 -  2 with two equal variances

62 62 Statistical Inferences: A Review of Chapter 11 through 12 Chapter 13

63 63 13.1 Introduction In this chapter we try to build a framework that help decide which technique (or techniques) should be used in solving a problem.

64 64 Flow chart of techniques for Chapters 11 and 12

65 65 Problem objective? Describing a single populationCompare two populations Data type? Quantitative Qualitative Quantitative Qualitative Type of descriptive measurements? Type of descriptive measurements? Z test & estimator of p Z test & estimator of p Z test & estimator of p 1 -p 2 Z test & estimator of p 1 -p 2 Central location Variability Central location Variability t- test & estimator of  t- test & estimator of    - test & estimator of  2   - test & estimator of  2 F- test & estimator of   2 /   2 F- test & estimator of   2 /   2 Experimental design? Continue

66 66 t- test & estimator of  1 -  2 (Unequal variances) t- test & estimator of  1 -  2 (Unequal variances) Continue Population variances? t- test & estimator of  D t- test & estimator of  D Unequal Equal t- test & estimator of  1 -  2 (Equal variances) t- test & estimator of  1 -  2 (Equal variances) Experimental design? Independent samplesMatched pairs

67 67 Summary of statistical inferences: Chapters 11 and 12 Problem objective: Describe a single population. –Data type: Quantitative Descriptive measurement: Central location –Parameter:  –Test statistic: –Interval estimator: –Required condition: Normal population

68 68 Summary - continued Descriptive measurement: Variability. –Parameter:  2 –Test statistic: –Interval estimator: –Required condition: normal population.

69 69 Summary - continued –Data type:Qualitative –Parameter: p –Test statistic: –Interval estimator: –Required condition:

70 70 Summary - continued Problem objective: Compare two populations. –Data type: Quantitative. Descriptive measurement: Central location –Experimental design: Independent samples »population variances: »Parameter:  1 -  2 »Test statistic: Interval estimator: »Required condition: Normal populations d.f. = n 1 + n 2 -2

71 71 Summary - continued Problem objective: Compare two populations. –Data type: Quantitative. Descriptive measurement: Central location –Experimental design: Independent samples »population variances: »Parameter:  1 -  2 »Test statistic: Interval estimator: »Required condition: Normal populations

72 72 Summary - continued Problem objective: Compare two populations. –Data type: Quantitative. Descriptive measurement: Central location –Experimental design: Matched pairs »Parameter:  D »Test statistic: Interval estimator: »Required condition: Normal differences d.f. = n D - 1

73 73 Summary - continued Problem objective: Compare two populations. –Data type: Quantitative Descriptive measurement: Variability –Parameter: –Test statistic: –Interval estimator: –Required condition: Normal population

74 74 Summary - continued Problem objective: Compare two populations. –Data type: Qualitative –Parameter: p 1 - p 2 –Test statistic: Case 1: H 0 : p 1 - p 2 = 0 –Interval estimator: Required condition:


Download ppt "1 Economics 173 Business Statistics Lectures 5 & 6 Summer, 2001 Professor J. Petry."

Similar presentations


Ads by Google