Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright (c) Bani K. Mallick1 STAT 651 Lecture #16.

Similar presentations


Presentation on theme: "Copyright (c) Bani K. Mallick1 STAT 651 Lecture #16."— Presentation transcript:

1 Copyright (c) Bani K. Mallick1 STAT 651 Lecture #16

2 Copyright (c) Bani K. Mallick2 Topics in Lecture #16 Inference about two population proportions

3 Copyright (c) Bani K. Mallick3 Book Sections Covered in Lecture #16 Chapter 10.3

4 Copyright (c) Bani K. Mallick4 Lecture #15 Review: Categorical Data In general, we can discuss a problem where the outcome is binary, the success probability is , and number of experiments is n. X = the number of successes in the experiment = the fraction of successes in the experiment

5 Copyright (c) Bani K. Mallick5 Lecture #15 Review: Categorical Data The number of success X in n experiments each with probability of success  is called a binomial random variable There is a formula for this: Pr(X = k) = 0! = 1, 1! = 1, 2! = 2 x 1 = 2, 3! = 3 x 2 x 1 = 6, 4! = 4 x 3 x 2 x 1 = 24, etc.

6 Copyright (c) Bani K. Mallick6 Lecture #15 Review: Categorical Data The fraction of successes in n experiments each with probability of success  also have a formula : Pr( = k/n) = The binomial formulae is used to understand the properties of the sample fraction, e.g., its standard deviation

7 Copyright (c) Bani K. Mallick7 Lecture #15 Review: If you code your attribute as “0” and “1” in SPSS, then the sample fraction is the sample as the sample mean of these “data” For example, let the “data” be 0,1,0,0,0,1,0,1 Then n = 8, and = 3/8 What is the sample mean of these data?

8 Copyright (c) Bani K. Mallick8 Lecture #15 Review: If you code your attribute as “0” and “1” in SPSS, then the sample fraction is the sample as the sample mean of these “data” For example, let the “data” be 0,1,0,0,0,1,0,1 Then n = 8, and = 3/8 What is the sample mean of these “data”?

9 Copyright (c) Bani K. Mallick9 Lecture #15 Review: Categorical Data (1  100% CI for the population fraction is by looking up 1  in Table 1

10 Copyright (c) Bani K. Mallick10 Lecture #15 Review: Sample Size Calculations If you want an (1  100% CI interval to be you should set

11 Copyright (c) Bani K. Mallick11 Lecture #15 Review: Sample Size Calculations The small problem is that you do not know . You have two choices: Make a guess for  Set  = 0.50 and calculate (most conservative, since it results in largest sample size)

12 Copyright (c) Bani K. Mallick12 Comparison of Two Population Proportions In some cases, we may want to compare two populations  1 and  2 The null hypothesis is H 0 :  1 =  2 This is the same as H 0 :  1 -  2 = 0 There are two ways to test this hypothesis One is via what is called a chisquared statistic, which gives you only a p-value This is bad: why?

13 Copyright (c) Bani K. Mallick13 Comparison of Two Population Proportions In some cases, we may want to compare two populations  1 and  2 The null hypothesis is H 0 :  1 -  2 = 0 There are two ways to test this hypothesis One is via what is called a chisquared statistic, which gives you only a p-value This is bad: why? If you reject, you have no idea how different the populations are!

14 Copyright (c) Bani K. Mallick14 Comparison of Two Population Proportions The null hypothesis is H 0 :  1 -  2 = 0 The other way is to form a CI for the difference in population proportions  1 -  2 The estimate of this difference is simply the difference in the sample fractions:

15 Copyright (c) Bani K. Mallick15 Comparison of Two Population Proportions The standard error of the difference in the sample fractions: The usual way to form a CI is to replace the unknown population fractions by the sample fractions

16 Copyright (c) Bani K. Mallick16 Comparison of Two Population Proportions The estimated standard error of the difference in the sample fractions: The (1  100% CI then is

17 Copyright (c) Bani K. Mallick17 Comparison of Two Population Proportions: Boxers versus Brief Most books force you to compute this by hand For female preferences in men: For male preferences: Think the populations are different?

18 Copyright (c) Bani K. Mallick18 Comparison of Two Population Proportions: Boxers versus Brief The estimated standard error of the difference in the sample fractions is

19 Copyright (c) Bani K. Mallick19 Comparison of Two Population Proportions: Boxers versus Brief Putting this together we get that the 95% CI is 0.2664 – 1.96 * 0.04944 = 0.17 up to the value 0.2664 + 1.96 * 0.04944 = 0.36 So, 95% CI is from 0.17 to 0.36 What is this a CI for? What is the conclusion?

20 Copyright (c) Bani K. Mallick20 Comparison of Two Population Proportions: Boxers versus Brief 95% CI is from 0.17 to 0.36 What is this a CI for? The difference in population fractions of preferring boxers is from 0.17 to 0.36 What is the conclusion? More females prefer men to wear boxers than do males, by 17% to 36%

21 Copyright (c) Bani K. Mallick21 Comparison of Two Population Proportions: Remarkably, but perhaps not surprisingly, you do not have to compute these confidence intervals by hand! The idea: simply pretend, and I do mean pretend, that the binary outcomes are real numbers and run your ordinary t-test CI, unequal variance line The results will be slightly different from your hand calculations, but actually a bit more accurate

22 Copyright (c) Bani K. Mallick22 Illustration with the Boxers Problem The value “1” indicates a preference for boxers Note how women have a higher preference for boxers than do men, in this sample

23 Copyright (c) Bani K. Mallick23 Illustration with the Boxers Problem

24 Copyright (c) Bani K. Mallick24 Illustration with the Boxers Problem Independent Samples Test 49.523.0005.373363.000.26644.957E-02.1689.3639 5.393361.642.000.2664 4.939E-02.1692.3635 Equal variances assumed Equal variances not assumed Boxer versus Briefs Preference FSig. Levene's Test for Equality of Variances tdfSig. (2-tailed) Mean Difference Std. Error DifferenceLowerUpper 95% Confidence Interval of the Difference t-test for Equality of Means Difference in sample means = 0.2664 Standard error of this difference = 0.04939

25 Copyright (c) Bani K. Mallick25 Illustration with the Boxers Problem: hand CI is 0.17 to 0.36: note similarities! Independent Samples Test 49.523.0005.373363.000.26644.957E-02.1689.3639 5.393361.642.000.26644.939E-02.1692.3635 Equal variances assumed Equal variances not assumed Boxer versus Briefs Preference FSig. Levene's Test for Equality of Variances tdf Sig. (2-tailed) Mean Difference Std. Error DifferenceLowerUpper 95% Confidence Interval of the Difference t-test for Equality of Means p-value = 0.000. Note how you use the unequal variances p-value

26 Copyright (c) Bani K. Mallick26 Illustration with the Boxers Problem: hand CI is 0.17 to 0.36: note similarities! Independent Samples Test 49.523.0005.373363.000.26644.957E-02.1689.3639 5.393361.642.000.26644.939E-02.1692.3635 Equal variances assumed Equal variances not assumed Boxer versus Briefs Preference FSig. Levene's Test for Equality of Variances tdfSig. (2-tailed) Mean Difference Std. Error DifferenceLowerUpper 95% Confidence Interval of the Difference t-test for Equality of Means The 95% CI from SPSS is 0.1692 to 0.3635. Nearly same as hand calculation. Men and Women have different preferences at even 99.9% confidence.

27 Copyright (c) Bani K. Mallick27 US Availability and Rating: Are Better Beers More Widely Available? Group Statistics 110.45.52.16 240.75.449.03E-02 Very Good versus Other Very Good Fair or Good Availability in the U.S. NMeanStd. Deviation Std. Error Mean With the “data” coded as 0 and 1, this means that in the sample, 45% of the very good beers were widely available The “data” are coded as 0 = not widely available 1 = widely available

28 Copyright (c) Bani K. Mallick28 US Availability and Rating: Are Better Beers More Widely Available? Group Statistics 110.45.52.16 240.75.449.03E-02 Very Good versus Other Very Good Fair or Good Availability in the U.S. NMeanStd. Deviation Std. Error Mean With the “data” coded as 0 and 1, this means that in the sample, 75% of the fair/good beers were widely available

29 Copyright (c) Bani K. Mallick29 US Availability and Rating: Are Better Beers More Widely Available? Independent Samples Test 3.169.084-1.73433.092-.30.17-.645.12E-02 -1.62816.864.122 -.30.18-.688.77E-02 Equal variances assumed Equal variances not assumed Availability in the U.S. FSig. Levene's Test for Equality of Variances tdfSig. (2-tailed) Mean Difference Std. Error DifferenceLowerUpper 95% Confidence Interval of the Difference t-test for Equality of Means This is the p-value for the hypothesis that the two population fractions are the same

30 Copyright (c) Bani K. Mallick30 Comparison of Two Population Proportions: Note that the p-values were > 0.10 What does this mean?

31 Copyright (c) Bani K. Mallick31 Comparison of Two Population Proportions: Note that the p-values were > 0.10 What does this mean? There is no evidence that those beers which are very good have any more or less national availability than those which are good or fair

32 Copyright (c) Bani K. Mallick32 Construction Example The construction example was based on a survey made available to me. I will look at the percentages of males sampled in Texas and in states outside of Texas If these were random samples, they would be a measure of how different states are in their gender distributions in the construction industry

33 Copyright (c) Bani K. Mallick33 Construction Data: Gender Differences by Texas or Not (1 = male) Something strange: 86% of the sample outside Texas is male 26% of the sample in Texas is male

34 Copyright (c) Bani K. Mallick34 Construction Data: Gender Differences by Texas or Not (1 = male) Something strange: 86% of the sample outside Texas is male 26% of the sample in Texas is male Not surprising: p-value = 0.000

35 Copyright (c) Bani K. Mallick35 Comparison of Two Population Proportions: Please study the slides for the next lecture before coming to class The material is somewhat difficult, and if you do not look at the slides and try to understand them, you will find my lecture all but impossible to understand.


Download ppt "Copyright (c) Bani K. Mallick1 STAT 651 Lecture #16."

Similar presentations


Ads by Google