Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

Similar presentations


Presentation on theme: "Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore."— Presentation transcript:

1 Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore

2 Chi-square Test of Independence  The chi-square test of independence is probably the most frequently used hypothesis test in the sciences.  Chi-square, symbolically written as χ 2 (Pronounced as Ki-square), is a statistical measure used in the context of sampling analysis for comparing a variance to a theoretical variance.  A fundamental problem in genetics is determining whether the experimentally determined data fits the results expected from theory (i.e. Mendel’s laws as expressed in the Punnett square). This can be solved by using Chi- Square test.

3  It can be used to determine if categorical data shows dependency or the two classifications are independent. It can also be used to make comparisons between theoretical populations and actual data when categories are used.”  The test is, in fact, a technique through the use of which it is possible for all researchers to (i) test the goodness of fit; (ii) test the significance of association between two attributes, and (iii) test the homogeneity or the significance of population variance.

4 Independence Defined  Two variables are independent if, for all cases, the classification of a case into a particular category of one variable (the group variable) has no effect on the probability that the case will fall into any particular category of the second variable (the test variable).  When two variables are independent, there is no relationship between them. We would expect that the frequency breakdowns of the test variable to be similar for all groups.

5 Example  Suppose we are interested in the relationship between gender and attending college.  If there is no relationship between gender and attending college and 40% of our total sample attend college, we would expect 40% of the males in our sample to attend college and 40% of the females to attend college.  If there is a relationship between gender and attending college, we would expect a higher proportion of one group to attend college than the other group, e.g. 60% to 20%.

6 Independent and Dependent Relationships

7 Expected Frequencies  Expected frequencies are computed as if there is no difference between the groups, i.e. both groups have the same proportion as the total sample in each category of the test variable.  Since the proportion of subjects in each category of the group variable can differ, we take group category into account in computing expected frequencies as well.  To summarize, the expected frequencies for each cell are computed to be proportional to both the breakdown for the test variable and the breakdown for the group variable.  Expected frequencies are the number of observations that would be expected for each category of a frequency distribution assuming the null hypothesis is true with chi- squared analysis."

8 Observed Frequencies  The test of independence starts with frequencies or counts we observe in our sample, or the observed frequencies.  Observed frequencies are the number of actual observations noted for each category of a frequency distribution with chi-squared analysis.  This is what be get through our Experimentation.  For example,  The frequency of 5 in the sample 4, 6, 5, 7, 4, 5, 2, 5 is 3.

9 Hypothesis  The research hypothesis states that the two variables are dependent or related. This will be true if the observed counts for the categories of the variables in the sample are different from the expected counts.  The null hypothesis is that the two variables are independent. This will be true if the observed counts in the sample are similar to the expected counts.  Suppose that Variable A has r levels, and Variable B has c levels. The null hypothesis states that knowing the level of Variable A does not help you predict the level of Variable B. That is, the variables are independent. H 0 : Variable A and Variable B are independent. H a : Variable A and Variable B are not independent.  The alternative hypothesis is that knowing the level of Variable A can help you predict the level of Variable B.

10 The level of significance  The level of significance : This is a very important concept in the context of hypothesis testing.  It is always some percentage (usually 5%) which should be chosen wit great care, thought and reason.  In case we take the significance level at 5 per cent, then this implies that H0 will be rejected. when the sampling result (i.e., observed evidence) has a less than 0.05 probability of occurring if H0 is true.  In other words, the 5 per cent level of significance means that researcher is willing to take as much as a 5 per cent risk of rejecting the null hypothesis when it (H0) happens to be true. Thus the significance level is the maximum value of the probability of rejecting H0 when it is true and is usually determined in advance before testing the hypothesis.

11 Expected Frequencies versus Observed Frequencies  The chi-square test of independence plugs the observed frequencies and expected frequencies into a formula which computes how the pattern of observed frequencies differs from the pattern of expected frequencies.  The general formula is

12 where – O = observed data in each category – E = observed data in each category based on the experimenter’s hypothesis –  = Sum of the calculations for each category If two distributions (observed and theoretical) are exactly alike, χ 2 = 0; but generally due to sampling errors, χ 2 is not equal to zero

13 Degrees of Freedom  If there are 10 frequency classes and there is one independent constraint, then there are (10 – 1) = 9 degrees of freedom.  Thus, if ‘n’ is the number of groups and one constraint is placed by making the totals of observed and expected frequencies equal, the d.f. would be equal to (n– 1).

14 P-value  P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic.  The probability value (p-value) of a statistical hypothesis test is the probability of getting a value of the test statistic as extreme as or more extreme than that observed by chance alone, if the null hypothesis H0, is true.  It is the probability of wrongly rejecting the null hypothesis if it is in fact true.  It is equal to the significance level of the test for which we would only just reject the null hypothesis. The p-value is compared with the actual significance level of our test and, if it is smaller, the result is significant. That is, if the null hypothesis were to be rejected at the 5% signficance level, this would be reported as "p < 0.05".  Small p-values suggest that the null hypothesis is unlikely to be true. The smaller it is, the more convincing is the rejection of the null hypothesis. It indicates the strength of evidence for say, rejecting the null hypothesis H0, rather than simply concluding "Reject H0' or "Do not reject H0".

15 CONDITIONS FOR THE APPLICATION OF χ 2 TEST  The following conditions should be satisfied before χ 2 test can be applied:  (i) Observations recorded and used are collected on a random basis.  (ii) All the items in the sample must be independent.  (iii) No group should contain very few items, say less than 10. In case where the frequencies are less than 10, regrouping is done by combining the frequencies of adjoining groups so that the new frequencies become greater than 10. Some statisticians take this number as 5, but 10 is regarded as better by most of the statisticians.  (iv) The overall number of items must also be reasonably large. It should normally be at least 50, howsoever small the number of groups may be.

16 Test of Goodness of Fit  As a test of goodness of fit, χ 2 test enables us to see how well does the assumed theoretical distribution fit to the observed data.  When some theoretical distribution is fitted to the given data, we are always interested in knowing as to how well this distribution fits with the observed data.  The chi-square test can give answer to this. If the calculated value of χ 2 is less than the table value at a certain level of significance, the fit is considered to be a good one which means that the divergence between the observed and expected frequencies is attributable to fluctuations of sampling.  But if the calculated value of χ 2 is greater than its table value, the fit is not considered to be a good one.

17 Chi Square test of Independence  As a test of independence, χ 2 test enables us to explain whether or not two attributes are associated.  For instance, we may be interested in knowing whether a new medicine is effective in controlling fever or not, χ 2 test will helps us in deciding this issue. In such a situation, we proceed with the null hypothesis that the two attributes (viz., new medicine and control of fever) are independent which means that new medicine is not effective in controlling fever.  On this basis we first calculate the expected frequencies and then work out the value of χ 2.  If the calculated value of χ 2 is less than the table value. we conclude that null hypothesis stands which means that the two attributes are independent or not associated (i.e., the new medicine is not effective in controlling the fever). But if the calculated value of χ 2 is greater than its table value, our inference then would be that null hypothesis does not hold good which means the two attributes are associated and the association is not because of some chance factor but it exists in reality (i.e., the new medicine is effective in controlling the fever and as such may be prescribed).

18 STEPS INVOLVED IN APPLYING CHI- SQUARE TEST  The various steps involved are as follows:  First of all calculate the expected frequencies on the basis of given hypothesis or on the basis of null hypothesis.  Obtain the difference between observed and expected frequencies and find out the squares of such differences i.e., calculate (Oij – Eij)2  Divide the quantity (Oij – Eij)2 obtained as stated above by the corresponding expected frequency to get (Oij – Eij)2/Eij.  Find the summation of (Oij – Eij)2/Eij values. This is the required χ 2value.  The χ 2 value obtained as such should be compared with relevant table value of χ 2 and then inference be drawn as stated above.

19 Critical Values of the χ 2 Distribution P df0.9950.9750.90.50.10.050.0250.010.005df 1.000 0.0160.4552.7063.8415.0246.6357.8791 20.0100.0510.2111.3864.6055.9917.3789.21010.5972 30.0720.2160.5842.3666.2517.8159.34811.34512.8383 40.2070.4841.0643.3577.7799.48811.14313.27714.8604 50.4120.8311.6104.3519.23611.07012.83215.08616.7505 60.6761.2372.2045.34810.64512.59214.44916.81218.5486 70.9891.6902.8336.34612.01714.06716.01318.47520.2787 81.3442.1803.4907.34413.36215.50717.53520.09021.9558 91.7352.7004.1688.34314.68416.91919.02321.66623.5899 102.1563.2474.8659.34215.98718.30720.48323.20925.18810 112.6033.8165.57810.34117.27519.67521.92024.72526.75711 123.0744.4046.30411.34018.54921.02623.33726.21728.30012 133.5655.0097.04212.34019.81222.36224.73627.68829.81913 144.0755.6297.79013.33921.06423.68526.11929.14131.31914 154.6016.2628.54714.33922.30724.99627.48830.57832.80115

20 The Chi square test in Genetics.  Example 1  Genetic theory states that children having one parent of blood type A and the other of blood type B will always be of one of three types, A, AB, Band that the proportion of three types will on an average be as 1 : 2 : 1. A report states that out of 300 children having one Aparent and B parent, 30 per cent were found to be types A, 45 per cent per cent type AB and remainder type B. Test the hypothesis by χ 2test.  Solution:  The observed frequencies of type A, AB and B is given in the question are 90, 135 and 75 respectively.  The expected frequencies of type A, ABand B(as per the genetic theory) should have been 75,150 and 75 respectively.

21  We now calculate the value of χ 2 as follows:

22 Critical Values of the χ 2 Distribution P df0.9950.9750.90.50.10.050.0250.010.005df 1.000 0.0160.4552.7063.8415.0246.6357.8791 20.0100.0510.2111.3864.6055.9917.3789.21010.5972 30.0720.2160.5842.3666.2517.8159.34811.34512.8383 40.2070.4841.0643.3577.7799.48811.14313.27714.8604 50.4120.8311.6104.3519.23611.07012.83215.08616.7505 60.6761.2372.2045.34810.64512.59214.44916.81218.5486 70.9891.6902.8336.34612.01714.06716.01318.47520.2787 81.3442.1803.4907.34413.36215.50717.53520.09021.9558 91.7352.7004.1688.34314.68416.91919.02321.66623.5899 102.1563.2474.8659.34215.98718.30720.48323.20925.18810 112.6033.8165.57810.34117.27519.67521.92024.72526.75711 123.0744.4046.30411.34018.54921.02623.33726.21728.30012 133.5655.0097.04212.34019.81222.36224.73627.68829.81913 144.0755.6297.79013.33921.06423.68526.11929.14131.31914 154.6016.2628.54714.33922.30724.99627.48830.57832.80115

23  The calculated value of χ 2 is 4.5 which is less than the table value and hence can be ascribed to have taken place because of chance. This supports the theoretical hypothesis of the genetic theory that on an average type A, AB and B stand in the proportion of 1 : 2 : 1.

24 Example 2

25

26

27 CAUTION IN USING χ 2 TEST  The chi-square test is no doubt a most frequently used test, but its correct application is equally an uphill task. It should be borne in mind that the test is to be applied only when the individual observations of sample are independent which means that the occurrence of one individual observation (event) has no effect upon the occurrence of any other observation (event) in the sample under consideration. Small theoretical frequencies, if these occur in certain groups, should be dealt with under special care. The other possible reasons concerning the improper application or misuse of this test can be  (i) neglect of frequencies of non-occurrence;  (ii) failure to equalise the sum of observed and the sum of the expected frequencies;  (iii) wrong determination of the degrees of freedom;  (iv) wrong computations, and the like. The researcher while applying this test must remain careful about all these things and must thoroughly understand the rationale of this important test before using it and drawing inferences in respect of his hypothesis.

28  Thanks


Download ppt "Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore."

Similar presentations


Ads by Google