Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Chi-Squared Distributions Inference for Categorical Data and Multiple Groups.

Similar presentations


Presentation on theme: "1 Chi-Squared Distributions Inference for Categorical Data and Multiple Groups."— Presentation transcript:

1 1 Chi-Squared Distributions Inference for Categorical Data and Multiple Groups

2 Comparing Several Distributions  In our last lesson, we used a χ 2 -GoF (Goodness of Fit) test to compare a single distribution with a known model…  In this lesson, we will compare several distributions to one another and determine if they are similar to each other (or “fit” each other).  The test that we will use in this lesson is the χ 2 - Test of Homogeneity

3 Inference for Two-Way Table  The first step in the overall test for comparing several proportions is to arrange the data in a two-way table that gives counts for both successes and failures (in the calculator we will use matrices)  Example : Here is two-way table (rows X columns table):  The sample is drawn from an SRS and individuals were randomly assigned to each group  We call this a 3 X 2 table because it has 3 rows and 2 columns. NoYes Desipramine1410 Lithium618 Placebo420 Relapse

4 The problem of multiple comparisons Call the population proportions of successes in the three groups p 1, p 2, and p 3. To compare these three population proportions, we might use the two - sample z procedures several times :  Test H o : p 1 = p 2 to see if the success rate of desipramine differs from that of lithium.  Test H o : p 1 = p 3 to see if desipramine differs from a placebo.  Test H o : p 2 = p 3 to see if lithium differs from a placebo. The weakness of doing three tests is that we get three p-values, one for each test alone.

5 Chi-Squared Test of Homogeneity Chi- Squared Test of Homogeneity. But with a homogeneity test, we have no model. We are comparing the data with itself. Rather than comparing several distributions two at a time, it is much easier to use just one test. One test that allows us to do this is the Chi- Squared Test of Homogeneity. This test is almost exactly the same as the chi-squared goodness of fit test. Why do we need a new name? The goodness of fit test compares counts with a model. But with a homogeneity test, we have no model. We are comparing the data with itself. We are looking to see if data across differing populations are homogenous (or if the populations are the same).

6 The assumptions and conditions for all our Chi- Squared tests are the same. Do you remember what they are? Can you state them without looking at your notes? To find the expected cell counts use: Also, degrees of freedom = (r – 1)(c – 1) Chi-Squared Test of Homogeneity

7  Medical researchers enlist 90 random subjects for an experiment comparing treatments for depression. The subjects were randomly divided into three groups and given pills to take for a period of three months. Unknown to the subjects, one group received a placebo, the second group the “natural” remedy St. John’s wort, and the third group the prescription drug Posrex. After six months, psychologists and physicians (who did not know which treatment each person received) evaluated the subjects to see if their depression had returned. PlaceboSt. J WortPosrex Depression Returned242214 No sign of depression6816 Treatment Example 1 Diagnosis

8  Determine if the recurrence of depression is the same for all three treatments Parameter Hypotheses  Step 1 : Identify population Parameter, state the null and alternative Hypotheses, determine what you are trying to do and what the question is asking.  We want to know whether the recurrence of depression for the three treatments differ from one another  H 0 = The recurrence of depression for the three treatments are the same  H A = The recurrence of depression for the three treatments differ Chi-Squared Test of Homogeneity

9 Assumptions  Step 2: Verify the Assumptions by checking the conditions  Data are counts Assumption  Data are Counts Condition: yes, the data are counts of categorical data  Independence Assumption  Randomization condition – the subjects were randomly collected and randomly assigned to each treatment  10% condition – it is reasonable to assume that we have less than 10% of the populations The Solution

10 Assumptions  Step 2: Verify the Assumptions by checking the conditions  Sample Size Assumption  All expected counts must be greater than 5  Since all expected counts are greater than 5, we can proceed… PlaceboSt. J WortPosrex Depression Returned242214 No sign of depression6816 Treatment Diagnosis 20 10 Expected Counts are in the corner

11 Name the inference procedureTest statisticObtain the p-value  Step 3: If conditions are met, Name the inference procedure, find the Test statistic, and Obtain the p-value in carrying out the inference:  We will perform a χ 2 Goodness of Fit Test  Since there are 2 row and 3 columns, there are (2 – 1) x (3 – 1) = 2 degrees of freedom  P-value ≈ 0.015  Before we state the conclusion check to see if there is one cell that has a large effect on our p-value The Solution

12 Make a decisionState the Conclusion  Step 4: Make a decision and State the Conclusion in context of the problem:  With a p-value of.015, we reject the null hypothesis at the  = 0.05 level. There is strong evidence that the tested treatments are not all equally effective in preventing the recurrence of depression. By looking at the table, it appears that the people who take the prescription drug Posrex are more likely to remain free of depression than those who took the placebo or St. John’s wort. The Solution

13 An independent random sample of 1067 voters was taken from random locations throughout the country. Each voter’s political affiliation was recorded. Are political party affiliations the same throughout the country? NortheastSouthMidwestWestTotal Republican8613714293458 Democrat15811279123472 Other22234151137 Total2662722622671067 Example 2

14  Determine if political party affiliation is the same across the country Parameter Hypotheses  Step 1 : Identify population Parameter, state the null and alternative Hypotheses, determine what you are trying to do and what the question is asking.  We want to know whether political affiliation is the same throughout the country (that is, if they are homogeneous).  H 0 = Political affiliation is the same throughout the country  H a = Political affiliation differs throughout the country The Solution

15 Assumptions  Step 2: Verify the Assumptions by checking the conditions  Data are counts Assumption  Data are Counts Condition: yes, the data are counts of categorical data  Independence Assumption  Randomization condition – the data comes from a random sample  10% condition – it is reasonable to assume that we have less than 10% of the populations The Solution

16  Step 2: Verify the Assumptions by checking the conditions  Sample Size Assumption  All expected counts must be greater than 5  Since all expected counts are greater than 5, we can proceed… Expected Counts are in the corner NortheastSouthMidwestWestTotal Republican8613714293458 Democrat15811279123472 Other22234151137 Total2662722622671067 114.2 116.8 112.5 117.7120.3 115.9 34.2 34.9 33.6 114.6 118.1 34.3

17 Name the inference procedureTest statisticObtain the p-value  Step 3: If conditions are met, Name the inference procedure, find the Test statistic, and Obtain the p-value in carrying out the inference:  We will perform a χ 2 Test of Homogeneity  Since there are 3 row and 4 columns, there are (3 – 1) x (4 – 1) = 6 degrees of freedom  P-value ≈ 1.844 x 10 -12  Before we state the conclusion check to see if there is one cell that has a large effect on our p-value The Solution

18 Make a decisionState the Conclusion  Step 4: Make a decision and State the Conclusion in context of the problem:  With a p-value of 1.844 X 10 -12, we reject the null hypothesis. There is extremely strong evidence that political affiliation is not homogenous; in other words, political affiliation is not the same throughout the country and some regions favor one political affiliation over the others. The Solution


Download ppt "1 Chi-Squared Distributions Inference for Categorical Data and Multiple Groups."

Similar presentations


Ads by Google