Presentation is loading. Please wait.

Presentation is loading. Please wait.

DSCI 346 Yamasaki Lecture 3 Chi Square Tests.

Similar presentations


Presentation on theme: "DSCI 346 Yamasaki Lecture 3 Chi Square Tests."— Presentation transcript:

1 DSCI 346 Yamasaki Lecture 3 Chi Square Tests

2 Chi Square Test of Homogeneity
Consider the following table: A yes no yes X11 X R1 B no X21 X R2 C1 C N P(A=yes) = C1/N P(B=no) = R2/N P(A=yes and B=yes) = X11/N P(A=yes/ B=no) = X21/R2 DSCI 346 Lect 3 (16 pages)

3 Recall if A and B are independent P(AB) = P(A)P(B)
The question of interest: Are A and B independent? (Also called homogeneous). That is are the proportions of B the same no matter what level of A you are on and vice versa. Recall if A and B are independent P(AB) = P(A)P(B) Let’s look at P(A=yes, B=yes) From our table P(A=yes, B=yes) P(A=yes)P(B=yes) X11/N = C1/N R1/N X = R1C1/N We can go through the same process for all the cells DSCI 346 Lect 3 (16 pages)

4 Observed Expected(if independent) X11 R1C1/N X12 R1C2/N X21 R2C1/N
X11 X R R1C1/N R1C2/N R1 X21 X R R2C1/N R2C2/N R2 C1 C N C1 C N Note: “Margins” (i.e., R1, R2, C1, C2, N) stay the same. DSCI 346 Lect 3 (16 pages)

5 So now we have to come up with a measure of closeness.
Since we only have a sample (with its inherent variability), we generally don’t get exactly what we expect. So the question becomes, is the sample close enough to what we expected under independence to be consistent with that hypothesis? So now we have to come up with a measure of closeness. Observe: Xij Expect: RiCj/N 1st try: DSCI 346 Lect 3 (16 pages)

6 So what is considered “close” is relative to what is expected.
2nd try: Consider if we observed 1,000,060 and we expected 1,000,010. Those numbers would be considered pretty close. However, if we observed 60 and expected 10; those two numbers would be considered pretty far apart. So what is considered “close” is relative to what is expected. 3rd try: DSCI 346 Lect 3 (16 pages)

7 BINGO! Third time’s the charm
has a known (at least in the large sample approximation sense) distribution. More importantly that distribution is table. In this last example, both variables, A and B had just two levels. This calculation does work when there are more than two levels. Suppose that in each cell, we were off by just a little bit. If we had a lot of cells then our statistic could get big simply because we had a lot of cells, so we have to take into account the size of the table. Lots of rows and columns means that the statistic can be larger and still be “close”. What we have are degrees of freedom associated the statistic that are based on the number of rows and columns. DSCI 346 Lect 3 (16 pages)

8 For 2 rows and 2 columns, the degrees of freedom = 1.
Now let’s look at the Chi Square table. It’s laid out like the t table where rows correspond to degrees of freedom and columns correspond to tail areas. a So with 1 degree of freedom, P(c2 > ) = .05 Note: Since we squared the differences, we do not have negative values; no such thing as one tailed. DSCI 346 Lect 3 (16 pages)

9 Example: A study to determine the effectiveness of a drug for arthritis resulted in the comparison of two groups, each consisting of 200 arthritic patients. One group was inoculated with the serum; the other received a placebo. After a period of time, each person in the study was asked to state whether his arthritic condition had improved. The following results were observed. Treated Untreated Improved Not Improved Do the data provide sufficient evidence to indicate that the serum was effective in improving the condition of the arthritic patients? Use a = .05 DSCI 346 Lect 3 (16 pages)

10 H0: Treatment and Improvement are independent
Treated Untreated Improved Not Improved H0: Treatment and Improvement are independent HA: Treatment and Improvement are not independent a = degrees of freedom=1 critical value=3.84 Expected (if independent) DSCI 346 Lect 3 (16 pages)

11 OR because margins stay the same (fixed)
Since > 3.84, we reject the null hypothesis and conclude that treatment and improvement are not independent. DSCI 346 Lect 3 (16 pages)

12 3. Chi square tests are easily generalized to more than 2 x 2 tables.
Notes: 1. For the 2 x 2 table (like we just did), the chi square test is equivalent to doing 2. Chi square tests are automatically two tailed (i.e. we don’t double p-values). 3. Chi square tests are easily generalized to more than 2 x 2 tables. DSCI 346 Lect 3 (16 pages)

13 Consider this example relating color and size of automobile Color
Size red white black other compact x11 x12 x13 x14 r1 mid x21 x22 x23 x24 r2 luxury x31 x32 x33 x34 r3 c1 c2 c3 c4 N Note that since margins are fixed, we can get the last row and column of the expecteds by subtraction Degrees of freedom =(#rows-1)*(#columns-1) For the car example df = (3-1)*(4-1)=6 DSCI 346 Lect 3 (16 pages)

14 Chi Square Goodness of Fit Tests
Suppose you have a theoretical distribution you believe should be in effect. You gather a sample. Is your sample consistent with what you expect? Example: Suppose the operations manager of a taxi cab company believes that the demand for cabs is fairly level throughout Monday-Friday and 25% less on weekends. To investigate, the manager selects 20 samples of each day and tabulates the totals for each day. The results are shown below: Sunday Monday Tuesday Wednesday Thursday Friday Saturday Total 4, , , , , , ,000 How do you get expected? Since under the manager’s belief, Saturday and Sunday should only count for .75 of a day, then the week would be = 6.5 full days. On a full day he would expect 56,000/6.5 = 8, On Saturday on Sunday.75% of that, 75% of 8, = 6,461.54 DSCI 346 Lect 3 (16 pages)

15 H1: Customer demand is something else a = .05 C26,.05 = 12.5916
(4,502–6,461.54)2/6, (6.623–8,615.38)2/8, …+(4,363-6,461.54)2/6,461.54 3,335.6 df = # categories -1 = 7-1 = 6 H0: Customer demand is evenly spread throughout the week and 25% less on weekends H1: Customer demand is something else a = .05 C26,.05 = Since 3,335.6 > there is sufficient evidence to show that customer demand is something other than what the manager believed. DSCI 346 Lect 3 (16 pages)

16 One last use of Chi Square table.
Suppose you wanted to test whether a population variance was ,say, less than a particular value. H0 : s2 = 4.0 HA: s2 < 4.0 n = 24, s2 = 2.146, a = .05 C2 = (n-1)s2 = 23*2.146/4 = s2 C223, .95 = use lower tail since HA <. Reject when too low. Since < there is sufficient evidence to conclude variance is < 4.0 Note: Normality of underlying data is important for this test. Remember, you aren’t looking at an average so Central Limit Theorems don’t apply. DSCI 346 Lect 3 (16 pages)


Download ppt "DSCI 346 Yamasaki Lecture 3 Chi Square Tests."

Similar presentations


Ads by Google