Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 11 Multinomial Experiments and Contingency Tables 11-1 Overview 11-2 Multinomial Experiments:

Similar presentations


Presentation on theme: "Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 11 Multinomial Experiments and Contingency Tables 11-1 Overview 11-2 Multinomial Experiments:"— Presentation transcript:

1 Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 11 Multinomial Experiments and Contingency Tables 11-1 Overview 11-2 Multinomial Experiments: Goodness-of-fit 11-3 Contingency Tables: Independence and Homogeneity

2 Slide 2 Copyright © 2004 Pearson Education, Inc. Created by Erin Hodgess, Houston, Texas Section 11-1 & 11-2 Overview and Multinomial Experiments: Goodness of Fit

3 Slide 3 Copyright © 2004 Pearson Education, Inc. Overview  We focus on analysis of categorical (qualitative or attribute) data that can be separated into different categories (often called cells).  Use the  2 (chi-square) test statistic (Table A-4).  The goodness-of-fit test uses a one-way frequency table (single row or column).  The contingency table uses a two-way frequency table (two or more rows and columns).

4 Slide 4 Copyright © 2004 Pearson Education, Inc. Multinomial Experiment This is an experiment that meets the following conditions: 1. The number of trials is fixed. 2. The trials are independent. 3. All outcomes of each trial must be classified into exactly one of several different categories. 4. The probabilities for the different categories remain constant for each trial. Definition

5 Slide 5 Copyright © 2004 Pearson Education, Inc. Example: Last Digit Analysis In 2001, Barry Bonds hit 73 home runs. Table 10-2 summarizes the last digit of those home run distances. Verify that the four conditions of a multinomial experiment are satisfied.

6 Slide 6 Copyright © 2004 Pearson Education, Inc. Example: Last Digit Analysis 1. The number of trials (last digits) is the fixed number 73. 2. The trials are independent, because the last digit of the length of a home run does not affect the last digit of the length of any other home run. 3. Each outcome (last digit) is classified into exactly 1 of 10 different categories. The categories are 0, 1, …, 9. 4. Finally, if we assume that the home run distances are measured, the last digits should be equally likely, so that each possible digit has a probability of 1/10. In 2001, Barry Bonds hit 73 home runs. Table 10-2 summarizes the last digit of those home run distances. Verify that the four conditions of a multinomial experiment are satisfied.

7 Slide 7 Copyright © 2004 Pearson Education, Inc. Definition Goodness-of-fit test A goodness-of-fit test is used to test the hypothesis that an observed frequency distribution fits (or conforms to) some claimed distribution.

8 Slide 8 Copyright © 2004 Pearson Education, Inc. 0 represents the observed frequency of an outcome E represents the expected frequency of an outcome k represents the number of different categories or outcomes n represents the total number of trials Goodness-of-Fit Test Notation

9 Slide 9 Copyright © 2004 Pearson Education, Inc. Expected Frequencies If all expected frequencies are equal: the sum of all observed frequencies divided by the number of categories n E = k

10 Slide 10 Copyright © 2004 Pearson Education, Inc. Expected Frequencies If all expected frequencies are not all equal: each expected frequency is found by multiplying the sum of all observed frequencies by the probability for the category E = n p

11 Slide 11 Copyright © 2004 Pearson Education, Inc. Goodness-of-fit Test in Multinomial Experiments Test Statistic Critical Values 1. Found in Table A-4 using k – 1 degrees of freedom where k = number of categories 2. Goodness-of-fit hypothesis tests are always right-tailed.  2 =  ( O – E ) 2 E

12 Slide 12 Copyright © 2004 Pearson Education, Inc.  A large disagreement between observed and expected values will lead to a large value of  2 and a small P -value.  A significantly large value of  2 will cause a rejection of the null hypothesis of no difference between the observed and the expected.  A close agreement between observed and expected values will lead to a small value of  2 and a large P -value.

13 Slide 13 Copyright © 2004 Pearson Education, Inc. Figure 11-3Relationships Among Components in Goodness-of-Fit Hypothesis Test

14 Slide 14 Copyright © 2004 Pearson Education, Inc. Example: Last Digit Analysis In 2001, Barry Bonds hit 73 home runs. Table 10-2 summarizes the last digit of those home run distances. Test the claim that the digits do not occur with the same frequency. H 0 : p 0 = p 1 =  = p 9 H 1 : At least one of the probabilities is different from the others.  = 0.05 k – 1 = 9  2.05,9 = 16.919

15 Slide 15 Copyright © 2004 Pearson Education, Inc. Example: Last Digit Analysis In 2001, Barry Bonds hit 73 home runs. Table 11-2 summarizes the last digit of those home run distances. Test the claim that the digits do not occur with the same frequency.

16 Slide 16 Copyright © 2004 Pearson Education, Inc. Example: Last Digit Analysis In 2001, Barry Bonds hit 73 home runs. Table 11-2 summarizes the last digit of those home run distances. Test the claim that the digits do not occur with the same frequency. The test statistic is  2 = 251.521. Since the critical value is 16.919, we reject the null hypothesis. There is sufficient evidence to support the claim that the last digits do not occur with the same relative frequency.

17 Slide 17 Copyright © 2004 Pearson Education, Inc. Example: Last Digit Analysis In 2001, Barry Bonds hit 73 home runs. Table 10-2 summarizes the last digit of those home run distances. Test the claim that the digits do not occur with the same frequency.

18 Slide 18 Copyright © 2004 Pearson Education, Inc. Example: Detecting Fraud In the Chapter Problem, it was noted that statistics can be used to detect fraud. Table 11-1 list the percentages for leading digits. Test the claim that there is a significant discrepancy between the leading digits expected from Benford’s Law and the leading digits from the 784 checks. H 0 : p 1 = 0.301, p 2 = 0.176, p 3 = 0.125, p 4 = 0.097, p 5 = 0.079, p 6 = 0.067, p 7 = 0.058, p 8 = 0.051 and p 9 = 0.046 H 1 : At least one of the proportions is different from the claimed values.  = 0.01 k – 1 =8  2.01,8 = 20.090

19 Slide 19 Copyright © 2004 Pearson Education, Inc. Example: Detecting Fraud In the Chapter Problem, it was noted that statistics can be used to detect fraud. Table 11-1 list the percentages for leading digits. Test the claim that there is a significant discrepancy between the leading digits expected from Benford’s Law and the leading digits from the 784 checks.

20 Slide 20 Copyright © 2004 Pearson Education, Inc. Example: Detecting Fraud In the Chapter Problem, it was noted that statistics can be used to detect fraud. Table 11-1 list the percentages for leading digits. Test the claim that there is a significant discrepancy between the leading digits expected from Benford’s Law and the leading digits from the 784 checks. The test statistic is  2 = 3650.251. Since the critical value is 20.090, we reject the null hypothesis. There is sufficient evidence to reject the null hypothesis.

21 Slide 21 Copyright © 2004 Pearson Education, Inc. Example: Detecting Fraud In the Chapter Problem, it was noted that statistics can be used to detect fraud. Table 11-1 list the percentages for leading digits. Test the claim that there is a significant discrepancy between the leading digits expected from Benford’s Law and the leading digits from the 784 checks.

22 Slide 22 Copyright © 2004 Pearson Education, Inc. Example: Detecting Fraud In the Chapter Problem, it was noted that statistics can be used to detect fraud. Table 11-1 list the percentages for leading digits. Test the claim that there is a significant discrepancy between the leading digits expected from Benford’s Law and the leading digits from the 784 checks.

23 Slide 23 Copyright © 2004 Pearson Education, Inc. Created by Erin Hodgess, Houston, Texas Section 11-3 Contingency Tables: Independence and Homogeneity

24 Slide 24 Copyright © 2004 Pearson Education, Inc.  Contingency Table (or two-way frequency table) A contingency table is a table in which frequencies correspond to two variables. (One variable is used to categorize rows, and a second variable is used to categorize columns.) Contingency tables have at least two rows and at least two columns. Definition

25 Slide 25 Copyright © 2004 Pearson Education, Inc.

26 Slide 26 Copyright © 2004 Pearson Education, Inc.  Test of Independence This method tests the null hypothesis that the row variable and column variable in a contingency table are not related. (The null hypothesis is the statement that the row and column variables are independent.) Definition

27 Slide 27 Copyright © 2004 Pearson Education, Inc. Assumptions 1. The sample data are randomly selected. 2.The null hypothesis H 0 is the statement that the row and column variables are independent; the alternative hypothesis H 1 is the statement that the row and column variables are dependent. 3. For every cell in the contingency table, the expected frequency E is at least 5. (There is no requirement that every observed frequency must be at least 5.)

28 Slide 28 Copyright © 2004 Pearson Education, Inc. Test of Independence Test Statistic Critical Values 1. Found in Table A-4 using degrees of freedom = (r – 1)(c – 1) r is the number of rows and c is the number of columns 2. Tests of Independence are always right-tailed.  2 =  ( O – E ) 2 E

29 Slide 29 Copyright © 2004 Pearson Education, Inc. (row total) (column total) (grand total) E = Total number of all observed frequencies in the table

30 Slide 30 Copyright © 2004 Pearson Education, Inc. Tests of Independence H 0 : The row variable is independent of the column variable H 1 : The row variable is dependent (related to) the column variable This procedure cannot be used to establish a direct cause-and-effect link between variables in question. Dependence means only there is a relationship between the two variables.

31 Slide 31 Copyright © 2004 Pearson Education, Inc. Expected Frequency for Contingency Tables E = grand total row total column total grand total E = (row total) (column total) (grand total) (probability of a cell) n p

32 Slide 32 Copyright © 2004 Pearson Education, Inc. Observed and Expected Frequencies 332 1360 1692 318 104 422 29 35 64 27 18 45 706 1517 2223 Men Women Boys GirlsTotal Survived Died Total We will use the mortality table from the Titanic to find expected frequencies. For the upper left hand cell, we find: = 537.360 E = (706)(1692) 2223

33 Slide 33 Copyright © 2004 Pearson Education, Inc. 332 537.360 1360 1692 318 104 422 29 35 64 27 18 45 706 1517 2223 Men Women Boys GirlsTotal Survived Died Total Find the expected frequency for the lower left hand cell, assuming independence between the row variable and the column variable. = 1154.640 E = (1517)(1692) 2223 Observed and Expected Frequencies

34 Slide 34 Copyright © 2004 Pearson Education, Inc. 332 537.360 1360 1154.64 1692 318 134.022 104 287.978 422 29 20.326 35 43.674 64 27 14.291 18 30.709 45 706 1517 2223 Men Women Boys GirlsTotal Survived Died Total To interpret this result for the lower left hand cell, we can say that although 1360 men actually died, we would have expected 1154.64 men to die if survivablility is independent of whether the person is a man, woman, boy, or girl. Observed and Expected Frequencies

35 Slide 35 Copyright © 2004 Pearson Education, Inc. Example: Using a 0.05 significance level, test the claim that when the Titanic sank, whether someone survived or died is independent of whether that person is a man, woman, boy, or girl. H 0 : Whether a person survived is independent of whether the person is a man, woman, boy, or girl. H 1 : Surviving the Titanic and being a man, woman, boy, or girl are dependent.

36 Slide 36 Copyright © 2004 Pearson Education, Inc. Example: Using a 0.05 significance level, test the claim that when the Titanic sank, whether someone survived or died is independent of whether that person is a man, woman, boy, or girl.  2 = (332–537.36) 2 + (318–132.022) 2 + (29–20.326) 2 + (27–14.291) 2 537.36 134.022 20.326 14.291 + (1360–1154.64) 2 + (104–287.978) 2 + (35–43.674) 2 + (18–30.709) 2 1154.64 287.978 43.674 30.709  2 =78.481 + 252.555 + 3.702+11.302+36.525+117.536+1.723+5.260 = 507.084

37 Slide 37 Copyright © 2004 Pearson Education, Inc. Example: Using a 0.05 significance level, test the claim that when the Titanic sank, whether someone survived or died is independent of whether that person is a man, woman, boy, or girl. The number of degrees of freedom are (r–1)(c–1)= (2–1)(4–1)=3.  2.05,3 = 7.815. We reject the null hypothesis. Survival and gender are dependent.

38 Slide 38 Copyright © 2004 Pearson Education, Inc. Test Statistic  2 = 507.084 with  = 0.05 and ( r – 1) ( c– 1) = (2 – 1) (4 – 1) = 3 degrees of freedom Critical Value  2 = 7.815 (from Table A-4)

39 Slide 39 Copyright © 2004 Pearson Education, Inc. Relationships Among Components in X 2 Test of Independence Figure 11-8

40 Slide 40 Copyright © 2004 Pearson Education, Inc. Definition  Test of Homogeneity In a test of homogeneity, we test the claim that different populations have the same proportions of some characteristics.

41 Slide 41 Copyright © 2004 Pearson Education, Inc. How to distinguish between a test of homogeneity and a test for independence: Were predetermined sample sizes used for different populations (test of homogeneity), or was one big sample drawn so both row and column totals were determined randomly (test of independence)?

42 Slide 42 Copyright © 2004 Pearson Education, Inc. Example: Using Table 10-7 as seen below, with a 0.05 significance level, test the effect of pollster gender on survey responses by men.

43 Slide 43 Copyright © 2004 Pearson Education, Inc. Example: Using Table 10-7 as seen below, with a 0.05 significance level, test the effect of pollster gender on survey responses by men. H 0 : The proportions of agree/disagree responses are the same for the subjects interviewed by men and the subjects interviewed by women. H 1 : The proportions are different.

44 Slide 44 Copyright © 2004 Pearson Education, Inc. Example: Using Table 10-7 as seen below, with a 0.05 significance level, test the effect of pollster gender on survey responses by men.

45 Slide 45 Copyright © 2004 Pearson Education, Inc. Example: Using Table 10-7 as seen below, with a 0.05 significance level, test the effect of pollster gender on survey responses by men. The Minitab display includes the test statistic of  2 = 6.529 and a P-value of 0.011. Using the P-value approach, we reject the null hypothesis of equal(homogeneous) proportions(because the P-value of 0.011 is less than 0.05. There is sufficient evidence to reject the claim of equal proportions.

46 Slide 46 Copyright © 2004 Pearson Education, Inc. Created by Erin Hodgess, Houston, Texas Section 11-4 One-Way ANOVA

47 Slide 47 Copyright © 2004 Pearson Education, Inc. Overview  Analysis of variance(ANOVA) is a method for testing the hypothesis that three or more population means are equal.  For example: H 0 : µ 1 = µ 2 = µ 3 =... µ k H 1 : At least one mean is different

48 Slide 48 Copyright © 2004 Pearson Education, Inc. ANOVA methods require the F-distribution 1. The F-distribution is not symmetric; it is skewed to the right. 2. The values of F can be 0 or positive, they cannot be negative. 3. There is a different F-distribution for each pair of degrees of freedom for the numerator and denominator. Critical values of F are given in Table A-7

49 Slide 49 Copyright © 2004 Pearson Education, Inc. F - distribution Figure 11-9

50 Slide 50 Copyright © 2004 Pearson Education, Inc. One-Way ANOVA An Approach to Understanding ANOVA 1. Understand that a small P -value (such as 0.05 or less) leads to the rejection of the null hypothesis of equal means. With a large P - value (such as greater than 0.05), fail to reject the null hypothesis of equal means. 2. Develop an understanding of the underlying rationale by studying the example in this section.

51 Slide 51 Copyright © 2004 Pearson Education, Inc. 3. Become acquainted with the nature of the SS (sum of squares) and MS (mean square) values and their role in determining the F test statistic, but use statistical software packages or a calculator for finding those values.

52 Slide 52 Copyright © 2004 Pearson Education, Inc. Definition  Treatment (or factor) A treatment(or factor) is a property or characteristic that allows us to distinguish the different populations from another. Use computer software or TI-83 Plus for ANOVA calculations if possible

53 Slide 53 Copyright © 2004 Pearson Education, Inc. One-Way ANOVA Assumptions 1. The populations have approximately normal distributions. 2. The populations have the same variance  2 (or standard deviation  ). 3. The samples are simple random samples. 4. The samples are independent of each other. 5. The different samples are from populations that are categorized in only one way.

54 Slide 54 Copyright © 2004 Pearson Education, Inc. Procedure for testing: H 0 : µ 1 = µ 2 = µ 3 =... 1. Use STATDISK, Minitab, Excel, or a TI- 83 Calulator to obtain results. 2. Identify the P -value from the display. 3. Form a conclusion based on these criteria:  If P -value  , reject the null hypothesis of equal means.  If P -value > , fail to reject the null hypothesis of equal means.

55 Slide 55 Copyright © 2004 Pearson Education, Inc. Example: Readability Scores Given the readability scores summarized in Table 11-1 and a significance level of  = 0.05, use STATDISK, Minitab, Excel, or a TI-83 PLUS calculator to test the claim that the three samples come from populations with means that are not all the same. H 0 :  1 =  2 =  3 H 1 : At least one of the means is different from the others. Please refer to the displays on the next slide. The displays all show a P -value of 0.000562 or 0.001. Because the P-value is less than the significance level of  = 0.05, we reject the null hypothesis of equal means.

56 Slide 56 Copyright © 2004 Pearson Education, Inc. Example: Readability Scores

57 Slide 57 Copyright © 2004 Pearson Education, Inc. Example: Readability Scores (cont’d)

58 Slide 58 Copyright © 2004 Pearson Education, Inc. Given the readability scores summarized in Table 11-1 and a significance level of  = 0.05, use STATDISK, Minitab, Excel, or a TI-83 PLUS calculator to test the claim that the three samples come from populations with means that are not all the same. There is sufficient evidence to support the claim that the three population means are not all the same. We conclude that those books have readability levels that are not all the same. Example: Readability Scores

59 Slide 59 Copyright © 2004 Pearson Education, Inc. Estimate the common value of  2 using 1. The variance between samples (also called variation due to treatment) is an estimate of the common population variance  2 that is based on the variability among the sample means. 2.The variance within samples (also called variation due to error) is an estimate of the common population variance  2 based on the sample variances. ANOVA Fundamental Concept

60 Slide 60 Copyright © 2004 Pearson Education, Inc. Relationships Among Components of ANOVA Figure 11-10

61 Slide 61 Copyright © 2004 Pearson Education, Inc. ANOVA Fundamental Concept A excessively large F test statistic is evidence against equal population means. F = variance between samples variance within samples Test Statistic for One-Way ANOVA

62 Slide 62 Copyright © 2004 Pearson Education, Inc. Calculations with Equal Sample Sizes where s p = pooled variance (or the mean of the sample variances) 2 where s x = variance of samples means 2  Variance between samples = ns x 2  Variance within samples = s p 2

63 Slide 63 Copyright © 2004 Pearson Education, Inc. Example: Sample Calculations Table 10-9

64 Slide 64 Copyright © 2004 Pearson Education, Inc. Example: Sample Calculations Calculate the variance between samples, variance within samples, and the F test statistic. s 2 x = (5.5) 2 + (6.0) 2 + (6.0) 2 – [(5.5+6.0+6.0) 2 /3] 3–1 s 2 x = 102.25 – 102.08333 = 0.0833 2 ns 2 x = 4(0.0833) = 0.3332

65 Slide 65 Copyright © 2004 Pearson Education, Inc. Example: Sample Calculations Calculate the variance between samples, variance within samples, and the F test statistic. s 2 p = 3.0 + 2.0 + 2.0 = 2.333 3 F = ns 2 x = 0.3332 = 0.1428 s 2 p 2.3333

66 Slide 66 Copyright © 2004 Pearson Education, Inc. Critical Value of F  Right-tailed test  Degree of freedom with k samples of the same size n numerator df = k – 1 denominator df = k ( n – 1 )

67 Slide 67 Copyright © 2004 Pearson Education, Inc. where x = mean of all sample scores combined k = number of population means being compared n i = number of values in the i th sample x i = mean values in the i th sample s i = variance of values in the i th sample 2 Calculations with Unequal Sample Sizes F = = variance between samples variance within samples  (n i – 1 )s i  (n i – 1) 2  n i (x i – x ) 2 k – 1

68 Slide 68 Copyright © 2004 Pearson Education, Inc. Key Components of ANOVA Method SS(total), or total sum of squares, is a measure of the total variation (around x ) in all the sample data combined. Formula 11-1 SS(total) =  (x – x) 2

69 Slide 69 Copyright © 2004 Pearson Education, Inc. SS(treatment) is a measure of the variation between the samples. In one-way ANOVA, SS(treatment) is sometimes referred to as SS(factor). Because it is a measure of variability between the sample means, it is also referred to as SS (between groups) or SS (between samples). Key Components of ANOVA Method SS(treatment) = n 1 (x 1 – x) 2 + n 2 (x 2 – x) 2 +... n k (x k – x) 2 =  n i (x i - x) 2 Formula 11-2

70 Slide 70 Copyright © 2004 Pearson Education, Inc. SS(error) is a sum of squares representing the variability that is assumed to be common to all the populations being considered. Key Components of ANOVA Method

71 Slide 71 Copyright © 2004 Pearson Education, Inc. SS(error) is a sum of squares representing the variability that is assumed to be common to all the populations being considered. SS(error) = (n 1 – 1 )s 1 + (n 2 – 1 )s 2 + (n 3 – 1 )s 3... n k (x k – 1 )s i =  (n i – 1 )s i Formula 11-3 22 2 2 2 Key Components of ANOVA Method

72 Slide 72 Copyright © 2004 Pearson Education, Inc. SS(total) = SS(treatment) + SS(error) Formula 11-4 Key Components of ANOVA Method

73 Slide 73 Copyright © 2004 Pearson Education, Inc. Mean Squares (MS) Sum of Squares SS(treatment) and SS(error) divided by corresponding number of degrees of freedom. MS (treatment) is mean square for treatment, obtained as follows: Formula 11-5 MS (treatment) = SS (treatment) k – 1

74 Slide 74 Copyright © 2004 Pearson Education, Inc. Mean Squares (MS) MS (error) is mean square for error, obtained as follows: MS (total) = SS (total) N – 1 Formula 11-7 Formula 11-6 MS (error) = SS (error) N – k

75 Slide 75 Copyright © 2004 Pearson Education, Inc. Test Statistic for ANOVA with Unequal Sample Sizes  Numerator df = k – 1  Denominator df = N – k F = MS (treatment) MS (error) Formula 11-8

76 Slide 76 Copyright © 2004 Pearson Education, Inc. Example: Readability Scores


Download ppt "Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 11 Multinomial Experiments and Contingency Tables 11-1 Overview 11-2 Multinomial Experiments:"

Similar presentations


Ads by Google