# 1 1 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Chapter 11: Comparisons Involving Proportions and a Test of Independence n Inferences About.

## Presentation on theme: "1 1 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Chapter 11: Comparisons Involving Proportions and a Test of Independence n Inferences About."— Presentation transcript:

1 1 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Chapter 11: Comparisons Involving Proportions and a Test of Independence n Inferences About the Difference Between Two Population Proportions Two Population Proportions n Test of Independence: Contingency Tables n Hypothesis Test for Proportions of a Multinomial Population of a Multinomial Population

2 2 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Inferences About the Difference Between Two Population Proportions n Interval Estimation of p 1 - p 2 n Hypothesis Tests About p 1 - p 2

3 3 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. n Expected Value Sampling Distribution of where: n 1 = size of sample taken from population 1 n 2 = size of sample taken from population 2 n 2 = size of sample taken from population 2 n Standard Deviation (Standard Error)

4 4 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. If the sample sizes are large, the sampling distribution If the sample sizes are large, the sampling distribution of can be approximated by a normal probability of can be approximated by a normal probability distribution. distribution. If the sample sizes are large, the sampling distribution If the sample sizes are large, the sampling distribution of can be approximated by a normal probability of can be approximated by a normal probability distribution. distribution. The sample sizes are sufficiently large if all of these The sample sizes are sufficiently large if all of these conditions are met: conditions are met: The sample sizes are sufficiently large if all of these The sample sizes are sufficiently large if all of these conditions are met: conditions are met: n1p1 > 5n1p1 > 5n1p1 > 5n1p1 > 5 n 1 (1 - p 1 ) > 5 n2p2 > 5n2p2 > 5n2p2 > 5n2p2 > 5 n 2 (1 - p 2 ) > 5 Sampling Distribution of

5 5 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Sampling Distribution of p 1 – p 2

6 6 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Interval Estimation of p 1 - p 2 n Interval Estimate

7 7 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Market Research Associates is conducting research to evaluate the effectiveness of a client’s new advertising campaign. Before the new campaign began, a telephone survey of 150 households in the test market area showed 60 households “aware” of new advertising campaign. Before the new campaign began, a telephone survey of 150 households in the test market area showed 60 households “aware” of the client’s product. Interval Estimation of p 1 - p 2 n Example: The new campaign has been initiated with TV and The new campaign has been initiated with TV and newspaper advertisements running for three weeks. newspaper advertisements running for three weeks.

8 8 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. A survey conducted immediately A survey conducted immediately after the new campaign showed 120 of 250 households “aware” of the client’s product. Interval Estimation of p 1 - p 2 Does the data support the position Does the data support the position that the advertising campaign has provided an increased awareness of the client’s product?

9 9 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Point Estimator of the Difference Between Two Population Proportions = sample proportion of households “aware” of the = sample proportion of households “aware” of the product after the new campaign product after the new campaign = sample proportion of households “aware” of the = sample proportion of households “aware” of the product before the new campaign product before the new campaign p 1 = proportion of the population of households p 1 = proportion of the population of households “aware” of the product after the new campaign “aware” of the product after the new campaign p 2 = proportion of the population of households p 2 = proportion of the population of households “aware” of the product before the new campaign “aware” of the product before the new campaign

10 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse..08 + 1.96(.0510).08 +.10 Interval Estimation of p 1 - p 2 Hence, the 95% confidence interval for the difference Hence, the 95% confidence interval for the difference in before and after awareness of the product is -.02 to +.18. For  =.05, z.025 = 1.96:

11 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Hypothesis Tests about p 1 - p 2 n Hypotheses Testing H 0 : p 1 - p 2 < 0 H a : p 1 - p 2 > 0 Left-tailedRight-tailedTwo-tailed We focus on tests involving no difference between the two population proportions (i.e. p 1 = p 2 )

12 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Hypothesis Tests about p 1 - p 2 Pooled Estimate of Standard Error of Pooled Estimate of Standard Error ofwhere:

13 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Hypothesis Tests about p 1 - p 2 Test Statistic Test Statistic

14 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Can we conclude, using a.05 level Can we conclude, using a.05 level of significance, that the proportion of households aware of the client’s product increased after the new advertising campaign? Hypothesis Tests about p 1 - p 2 n Example: Market Research Associates

15 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Hypothesis Tests about p 1 - p 2 1. Develop the hypotheses. H 0 : p 1 - p 2 < 0 H a : p 1 - p 2 > 0 p 1 = proportion of the population of households p 1 = proportion of the population of households “aware” of the product after the new campaign “aware” of the product after the new campaign p 2 = proportion of the population of households p 2 = proportion of the population of households “aware” of the product before the new campaign “aware” of the product before the new campaign

16 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Hypothesis Tests about p 1 - p 2 2. Specify the level of significance.  =.05 3. Compute the value of the test statistic.

17 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Hypothesis Tests about p 1 - p 2 Using the Critical Value Approach Using the Critical Value Approach 5. Compare the Test Statistic with the Critical Value. Because 1.56 < 1.645, we cannot reject H 0. For  =.05, z.05 = 1.645 4. Determine the critical value and rejection rule. We cannot conclude that the proportion of households aware of the client’s product increased after the new campaign.

18 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Hypothesis Tests about p 1 - p 2 5. Compare the p-value with significance level. We cannot conclude that the proportion of households aware of the client’s product increased after the new campaign. 4. Compute the p –value. For z = 1.56, the p –value =.0594 Because p –value >  =.05, we cannot reject H 0. Using the p –Value Approach Using the p –Value Approach

19 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Hypothesis (Goodness of Fit) Test for Proportions of a Multinomial Population 1. Set up the null and alternative hypotheses. 2. Select a random sample and record the observed frequency, f i, for each of the k categories. frequency, f i, for each of the k categories. 3. Assuming H 0 is true, compute the expected frequency, e i, in each category by multiplying the frequency, e i, in each category by multiplying the category probability by the sample size. category probability by the sample size.

20 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. 4. Compute the value of the test statistic. Note: The test statistic has a chi-square distribution with k – 1 df provided that the expected frequencies are 5 or more for all categories. f i = observed frequency for category i e i = expected frequency for category i k = number of categories where: Hypothesis (Goodness of Fit) Test for Proportions of a Multinomial Population

21 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. where  is the significance level and there are k - 1 degrees of freedom p -value approach: Critical value approach: Reject H 0 if p -value <  5. Rejection rule: Reject H 0 if Hypothesis (Goodness of Fit) Test for Proportions of a Multinomial Population

22 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Multinomial Distribution Goodness of Fit Test n Example: Finger Lakes Homes manufactures Finger Lakes Homes manufactures four models of prefabricated homes, four models of prefabricated homes, a two-story colonial, a log cabin, a a two-story colonial, a log cabin, a split-level, and an A-frame. To help split-level, and an A-frame. To help in production planning, management in production planning, management would like to determine if previous would like to determine if previous customer purchases indicate that there customer purchases indicate that there is a preference in the style selected. is a preference in the style selected.

23 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Split- A- Split- A- Model Colonial Log Level Frame # Sold 30 20 35 15 The number of homes sold of each The number of homes sold of each model for 100 sales over the past two years is shown below. Multinomial Distribution Goodness of Fit Test

24 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. n The Hypotheses Multinomial Distribution Goodness of Fit Test where: p C = population proportion that purchase a colonial p C = population proportion that purchase a colonial p L = population proportion that purchase a log cabin p L = population proportion that purchase a log cabin p S = population proportion that purchase a split-level p S = population proportion that purchase a split-level p A = population proportion that purchase an A-frame p A = population proportion that purchase an A-frame H 0 : p C = p L = p S = p A =.25 H a : The population proportions are not equal p C =.25, p L =.25, p S =.25, and p A =.25 p C =.25, p L =.25, p S =.25, and p A =.25

25 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. n Rejection Rule 22 22 7.815 Do Not Reject H 0 Reject H 0 Multinomial Distribution Goodness of Fit Test With  =.05 and k - 1 = 4 - 1 = 3 k - 1 = 4 - 1 = 3 degrees of freedom degrees of freedom if p -value 7.815. Reject H 0 if p -value 7.815.

26 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. n Expected Frequencies n Test Statistic Multinomial Distribution Goodness of Fit Test e 1 =.25(100) = 25 e 2 =.25(100) = 25 e 3 =.25(100) = 25 e 4 =.25(100) = 25 e 3 =.25(100) = 25 e 4 =.25(100) = 25 = 1 + 1 + 4 + 4 = 10

27 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. n Conclusion Using the Critical Value Approach Multinomial Distribution Goodness of Fit Test We reject, at the.05 level of significance, We reject, at the.05 level of significance, the assumption that there is no home style preference.  2 = 10 > 7.815

28 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Multinomial Distribution Goodness of Fit Test n Conclusion Using the p -Value Approach The p -value < . We can reject the null hypothesis. The p -value < . We can reject the null hypothesis. Because  2 = 10 is between 9.348 and 11.345, the Because  2 = 10 is between 9.348 and 11.345, the area in the upper tail of the distribution is between area in the upper tail of the distribution is between.025 and.01..025 and.01. Area in Upper Tail.10.05.025.01.005  2 Value (df = 3) 6.251 7.815 9.348 11.345 12.838

29 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Test of Independence: Contingency Tables 1. Set up the null and alternative hypotheses. 2. Select a random sample and record the observed frequency, f ij, for each cell of the contingency table. frequency, f ij, for each cell of the contingency table. 3. Compute the expected frequency, e ij, for each cell.

30 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. 5. Determine the rejection rule. Reject H 0 if p -value <  or. 4. Compute the test statistic. where  is the significance level and, with n rows and m columns, there are ( n - 1)( m - 1) degrees of freedom. Test of Independence: Contingency Tables

31 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Each home sold by Finger Lakes Each home sold by Finger Lakes Homes can be classified according to price and to style. Finger Lakes’ manager would like to determine if the price of the home and the style of the home are independent variables. Contingency Table (Independence) Test n Example

32 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Price Colonial Log Split-Level A-Frame Price Colonial Log Split-Level A-Frame The number of homes sold for The number of homes sold for each model and price for the past two years is shown below. For convenience, the price of the home is listed as either \$99,000 or less or more than \$99,000. > \$99,000 12 14 16 3 < \$99,000 18 6 19 12 Contingency Table (Independence) Test

33 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. n Hypotheses Contingency Table (Independence) Test H 0 : Price of the home is independent of the style of the home that is purchased style of the home that is purchased H a : Price of the home is not independent of the style of the home that is purchased style of the home that is purchased

34 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. n Expected Frequencies Contingency Table (Independence) Test Price Colonial Log Split-Level A-Frame Total Price Colonial Log Split-Level A-Frame Total < \$99K > \$99K Total Total 30 20 35 15 100 12 14 16 3 45 18 6 19 12 55

35 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. n Rejection Rule Contingency Table (Independence) Test With  =.05 and (2 - 1)(4 - 1) = 3 d.f., Reject H 0 if p -value 7.815 =.1364 + 2.2727 +... + 2.0833 = 9.149 n Test Statistic

36 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. n Conclusion Using the Critical Value Approach Contingency Table (Independence) Test We reject, at the.05 level of significance, We reject, at the.05 level of significance, the assumption that the price of the home is independent of the style of home that is purchased.  2 = 9.145 > 7.815

37 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. n Conclusion Using the p -Value Approach The p -value < . We can reject the null hypothesis. The p -value < . We can reject the null hypothesis. Because  2 = 9.145 is between 7.815 and 9.348, the Because  2 = 9.145 is between 7.815 and 9.348, the area in the upper tail of the distribution is between area in the upper tail of the distribution is between.05 and.025..05 and.025. Area in Upper Tail.10.05.025.01.005  2 Value (df = 3) 6.251 7.815 9.348 11.345 12.838 Contingency Table (Independence) Test