Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.

Slides:



Advertisements
Similar presentations
Introductory Mathematics & Statistics for Business
Advertisements

Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Parametric/Nonparametric Tests. Chi-Square Test It is a technique through the use of which it is possible for all researchers to:  test the goodness.
Chi-Square Test A fundamental problem is genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.
Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.
Hypothesis Testing IV Chi Square.
Chapter 11 Inference for Distributions of Categorical Data
Chi-square Test of Independence
IENG 486 Statistical Quality & Process Control
Inferences About Process Quality
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Nonparametric or Distribution-free Tests
Presentation 12 Chi-Square test.
The Chi-square Statistic. Goodness of fit 0 This test is used to decide whether there is any difference between the observed (experimental) value and.
Chi-Square Test Dr Kishor Bhanushali. Chi-Square Test Chi-square, symbolically written as χ2 (Pronounced as Ki-square), is a statistical measure used.
How Can We Test whether Categorical Variables are Independent?
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Hypothesis Testing II The Two-Sample Case.
Fundamentals of Hypothesis Testing: One-Sample Tests
Testing Hypotheses Tuesday, October 28. Objectives: Understand the logic of hypothesis testing and following related concepts Sidedness of a test (left-,
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 13: Nominal Variables: The Chi-Square and Binomial Distributions.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chi-square Test of Independence Steps in Testing Chi-square Test of Independence Hypotheses.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Ch9. Inferences Concerning Proportions. Outline Estimation of Proportions Hypothesis concerning one Proportion Hypothesis concerning several proportions.
Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.
Chapter 12 A Primer for Inferential Statistics What Does Statistically Significant Mean? It’s the probability that an observed difference or association.
Chapter 11: Inference for Distributions of Categorical Data Section 11.1 Chi-Square Goodness-of-Fit Tests.
Chapter-8 Chi-square test. Ⅰ The mathematical properties of chi-square distribution  Types of chi-square tests  Chi-square test  Chi-square distribution.
Chi- square test x 2. Chi Square test Symbolized by Greek x 2 pronounced “Ki square” A Test of STATISTICAL SIGNIFICANCE for TABLE data.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Chi-square Test of Independence
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Nonparametric Tests of Significance Statistics for Political Science Levin and Fox Chapter Nine Part One.
Chapter Outline Goodness of Fit test Test of Independence.
Testing the Differences between Means Statistics for Political Science Levin and Fox Chapter Seven 1.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
Virtual University of Pakistan Lecture No. 44 of the course on Statistics and Probability by Miss Saleha Naghmi Habibullah.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
© Copyright McGraw-Hill 2004
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Lecture 11. The chi-square test for goodness of fit.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
Presentation 12 Chi-Square test.
CHAPTER 11 Inference for Distributions of Categorical Data
Chi-Square Test Dr Kishor Bhanushali.
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
Inference on Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
UNIT V CHISQUARE DISTRIBUTION
S.M.JOSHI COLLEGE, HADAPSAR
CHAPTER 11 Inference for Distributions of Categorical Data
Presentation transcript:

Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore

Chi-square Test of Independence  The chi-square test of independence is probably the most frequently used hypothesis test in the sciences.  Chi-square, symbolically written as χ 2 (Pronounced as Ki-square), is a statistical measure used in the context of sampling analysis for comparing a variance to a theoretical variance.  A fundamental problem in genetics is determining whether the experimentally determined data fits the results expected from theory (i.e. Mendel’s laws as expressed in the Punnett square). This can be solved by using Chi- Square test.

 It can be used to determine if categorical data shows dependency or the two classifications are independent. It can also be used to make comparisons between theoretical populations and actual data when categories are used.”  The test is, in fact, a technique through the use of which it is possible for all researchers to (i) test the goodness of fit; (ii) test the significance of association between two attributes, and (iii) test the homogeneity or the significance of population variance.

Independence Defined  Two variables are independent if, for all cases, the classification of a case into a particular category of one variable (the group variable) has no effect on the probability that the case will fall into any particular category of the second variable (the test variable).  When two variables are independent, there is no relationship between them. We would expect that the frequency breakdowns of the test variable to be similar for all groups.

Example  Suppose we are interested in the relationship between gender and attending college.  If there is no relationship between gender and attending college and 40% of our total sample attend college, we would expect 40% of the males in our sample to attend college and 40% of the females to attend college.  If there is a relationship between gender and attending college, we would expect a higher proportion of one group to attend college than the other group, e.g. 60% to 20%.

Independent and Dependent Relationships

Expected Frequencies  Expected frequencies are computed as if there is no difference between the groups, i.e. both groups have the same proportion as the total sample in each category of the test variable.  Since the proportion of subjects in each category of the group variable can differ, we take group category into account in computing expected frequencies as well.  To summarize, the expected frequencies for each cell are computed to be proportional to both the breakdown for the test variable and the breakdown for the group variable.  Expected frequencies are the number of observations that would be expected for each category of a frequency distribution assuming the null hypothesis is true with chi- squared analysis."

Observed Frequencies  The test of independence starts with frequencies or counts we observe in our sample, or the observed frequencies.  Observed frequencies are the number of actual observations noted for each category of a frequency distribution with chi-squared analysis.  This is what be get through our Experimentation.  For example,  The frequency of 5 in the sample 4, 6, 5, 7, 4, 5, 2, 5 is 3.

Hypothesis  The research hypothesis states that the two variables are dependent or related. This will be true if the observed counts for the categories of the variables in the sample are different from the expected counts.  The null hypothesis is that the two variables are independent. This will be true if the observed counts in the sample are similar to the expected counts.  Suppose that Variable A has r levels, and Variable B has c levels. The null hypothesis states that knowing the level of Variable A does not help you predict the level of Variable B. That is, the variables are independent. H 0 : Variable A and Variable B are independent. H a : Variable A and Variable B are not independent.  The alternative hypothesis is that knowing the level of Variable A can help you predict the level of Variable B.

The level of significance  The level of significance : This is a very important concept in the context of hypothesis testing.  It is always some percentage (usually 5%) which should be chosen wit great care, thought and reason.  In case we take the significance level at 5 per cent, then this implies that H0 will be rejected. when the sampling result (i.e., observed evidence) has a less than 0.05 probability of occurring if H0 is true.  In other words, the 5 per cent level of significance means that researcher is willing to take as much as a 5 per cent risk of rejecting the null hypothesis when it (H0) happens to be true. Thus the significance level is the maximum value of the probability of rejecting H0 when it is true and is usually determined in advance before testing the hypothesis.

Expected Frequencies versus Observed Frequencies  The chi-square test of independence plugs the observed frequencies and expected frequencies into a formula which computes how the pattern of observed frequencies differs from the pattern of expected frequencies.  The general formula is

where – O = observed data in each category – E = observed data in each category based on the experimenter’s hypothesis –  = Sum of the calculations for each category If two distributions (observed and theoretical) are exactly alike, χ 2 = 0; but generally due to sampling errors, χ 2 is not equal to zero

Degrees of Freedom  If there are 10 frequency classes and there is one independent constraint, then there are (10 – 1) = 9 degrees of freedom.  Thus, if ‘n’ is the number of groups and one constraint is placed by making the totals of observed and expected frequencies equal, the d.f. would be equal to (n– 1).

P-value  P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic.  The probability value (p-value) of a statistical hypothesis test is the probability of getting a value of the test statistic as extreme as or more extreme than that observed by chance alone, if the null hypothesis H0, is true.  It is the probability of wrongly rejecting the null hypothesis if it is in fact true.  It is equal to the significance level of the test for which we would only just reject the null hypothesis. The p-value is compared with the actual significance level of our test and, if it is smaller, the result is significant. That is, if the null hypothesis were to be rejected at the 5% signficance level, this would be reported as "p < 0.05".  Small p-values suggest that the null hypothesis is unlikely to be true. The smaller it is, the more convincing is the rejection of the null hypothesis. It indicates the strength of evidence for say, rejecting the null hypothesis H0, rather than simply concluding "Reject H0' or "Do not reject H0".

CONDITIONS FOR THE APPLICATION OF χ 2 TEST  The following conditions should be satisfied before χ 2 test can be applied:  (i) Observations recorded and used are collected on a random basis.  (ii) All the items in the sample must be independent.  (iii) No group should contain very few items, say less than 10. In case where the frequencies are less than 10, regrouping is done by combining the frequencies of adjoining groups so that the new frequencies become greater than 10. Some statisticians take this number as 5, but 10 is regarded as better by most of the statisticians.  (iv) The overall number of items must also be reasonably large. It should normally be at least 50, howsoever small the number of groups may be.

Test of Goodness of Fit  As a test of goodness of fit, χ 2 test enables us to see how well does the assumed theoretical distribution fit to the observed data.  When some theoretical distribution is fitted to the given data, we are always interested in knowing as to how well this distribution fits with the observed data.  The chi-square test can give answer to this. If the calculated value of χ 2 is less than the table value at a certain level of significance, the fit is considered to be a good one which means that the divergence between the observed and expected frequencies is attributable to fluctuations of sampling.  But if the calculated value of χ 2 is greater than its table value, the fit is not considered to be a good one.

Chi Square test of Independence  As a test of independence, χ 2 test enables us to explain whether or not two attributes are associated.  For instance, we may be interested in knowing whether a new medicine is effective in controlling fever or not, χ 2 test will helps us in deciding this issue. In such a situation, we proceed with the null hypothesis that the two attributes (viz., new medicine and control of fever) are independent which means that new medicine is not effective in controlling fever.  On this basis we first calculate the expected frequencies and then work out the value of χ 2.  If the calculated value of χ 2 is less than the table value. we conclude that null hypothesis stands which means that the two attributes are independent or not associated (i.e., the new medicine is not effective in controlling the fever). But if the calculated value of χ 2 is greater than its table value, our inference then would be that null hypothesis does not hold good which means the two attributes are associated and the association is not because of some chance factor but it exists in reality (i.e., the new medicine is effective in controlling the fever and as such may be prescribed).

STEPS INVOLVED IN APPLYING CHI- SQUARE TEST  The various steps involved are as follows:  First of all calculate the expected frequencies on the basis of given hypothesis or on the basis of null hypothesis.  Obtain the difference between observed and expected frequencies and find out the squares of such differences i.e., calculate (Oij – Eij)2  Divide the quantity (Oij – Eij)2 obtained as stated above by the corresponding expected frequency to get (Oij – Eij)2/Eij.  Find the summation of (Oij – Eij)2/Eij values. This is the required χ 2value.  The χ 2 value obtained as such should be compared with relevant table value of χ 2 and then inference be drawn as stated above.

Critical Values of the χ 2 Distribution P df df

The Chi square test in Genetics.  Example 1  Genetic theory states that children having one parent of blood type A and the other of blood type B will always be of one of three types, A, AB, Band that the proportion of three types will on an average be as 1 : 2 : 1. A report states that out of 300 children having one Aparent and B parent, 30 per cent were found to be types A, 45 per cent per cent type AB and remainder type B. Test the hypothesis by χ 2test.  Solution:  The observed frequencies of type A, AB and B is given in the question are 90, 135 and 75 respectively.  The expected frequencies of type A, ABand B(as per the genetic theory) should have been 75,150 and 75 respectively.

 We now calculate the value of χ 2 as follows:

Critical Values of the χ 2 Distribution P df df

 The calculated value of χ 2 is 4.5 which is less than the table value and hence can be ascribed to have taken place because of chance. This supports the theoretical hypothesis of the genetic theory that on an average type A, AB and B stand in the proportion of 1 : 2 : 1.

Example 2

CAUTION IN USING χ 2 TEST  The chi-square test is no doubt a most frequently used test, but its correct application is equally an uphill task. It should be borne in mind that the test is to be applied only when the individual observations of sample are independent which means that the occurrence of one individual observation (event) has no effect upon the occurrence of any other observation (event) in the sample under consideration. Small theoretical frequencies, if these occur in certain groups, should be dealt with under special care. The other possible reasons concerning the improper application or misuse of this test can be  (i) neglect of frequencies of non-occurrence;  (ii) failure to equalise the sum of observed and the sum of the expected frequencies;  (iii) wrong determination of the degrees of freedom;  (iv) wrong computations, and the like. The researcher while applying this test must remain careful about all these things and must thoroughly understand the rationale of this important test before using it and drawing inferences in respect of his hypothesis.

 Thanks