Presentation on theme: " 2 test Chi-square 2 test 2 2 is the most popular discrete data hypothesis testing method. It is a non-parametric test of statistical significance."— Presentation transcript:
2 test Chi-square 2 test 2 2 is the most popular discrete data hypothesis testing method. It is a non-parametric test of statistical significance for bivariate tabular analysis. It helps us analyze data that come in the form of counts. It is used to compare two proportions, for example, the proportion of successes in two groups exposed to different treatments. The most common application for chi-squared is in comparing observed counts of particular cases to the expected counts (theory). 2The greater the value of 2, the greater would be the difference between observed and expected frequencies, significant, ie, p<0.05, ie, it could not arisen due to fluctuations of sampling. It was introduced by Karl Pearson in 1900.
2 test Chi-square 2 test Chi-square is the sum of the squared difference between observed (O) and the expected (E) data divided by the expected (E) data in all possible categories. It measures the agreement between experimentally obtained (observed) results and the (expected) results suggested by a theory or hypothesis. df = (c-1)(r-1)
2 test Chi-square 2 test 1.The data must be in the form of frequencies counted in each of a set of categories. 2.The total numbers observed must exceed 20. 3.There should not be less than 5 observations in any one cell. 4.All the observations must be independent of each other. In other words, one observation must not have an influence upon another observation. Requirements for chi-square test
2 test Chi-square 2 test 1.State the hypothesis being tested (There is no difference between observed and expected results). 2.Determine the expected numbers for each observational class. 3.Calculate 2 using the formula. 4.Use the chi-square distribution table to determine significance of the value. 5.State your conclusion in terms of your hypothesis. a. If the p value for the calculated 2 is p > 0.05, accept the null hypothesis. 'The deviation is small enough that chance alone accounts for it. b. If the p value for the calculated 2 is p < 0.05, reject your hypothesis, and conclude that some factor other than chance is operating for the deviation to be so great. For example, a p value of 0.01 means that there is only a 1% chance that this deviation is due to chance alone. Therefore, other factors must be involved. Step-by-Step Procedure for Calculating Chi-Square
Expected value (E 1 ) for observed value, O 1 : E 1 (for O 1 ) = E 3 (for O 3 ) = E 2 (for O 2 ) = E 4 (for O 4 ) = 2 test 2 test - Example In a survey of smoking habits in which 100 men and 100 women were asked to classify themselves as smokers or nonsmokers. The following contingency table summarizes the survey results (2 2 contingency table). Do these figures provide any association that smoking habits differ between the sexes or equally prevalent in the two genders? SmokerMaleFemaleTotal Yes54 (O 1 )32 (O 2 )86 (r 1 ) No46 (O 3 )68 (O 4 )114 (r 2 ) 100 (c 1 )100 (c 2 )200 (N)
The expected (E) frequencies are presented in the table: Smoker MaleFemaleTotal YesObserved value (O) Expected value (E) 54 (43.0) 32 (43.0) 86 NoObserved value (O) Expected value (E) 46 (57.0) 68 (57.0) 114 100 200 Observed value (O) Expected value (E) 5443111212.81 3243-11-1212.81 4657-11-1212.12 6857111212.12 9.87
We can calculate value directly from formula as given below: = 2.81 + 2.81 + 2.12 + 2.12 = 9.87 = Degree of freedom (df) = (no. of rows – 1) (no. of columns – 1) = (2 - 1) (2 - 1) = 1 Interpretation: Since calculate value of (9.87) is greater than the table value 6.63 for 1 df at 1% level of significance, we reject this null hypothesis and conclude that a real association exist between gender and smoking habits. We can write p<0.01, and we can conclude, more precisely, that males are significantly more likely to smoke than are females.
Yates' correction Yates' correction is a conservative adjustment to chi-square when applied to tables with one or more cells with frequencies less than five. It is only applied to 2 by 2 tables. We use Fisher’s exact test for which there is no sample size restriction. The formula with Yates’ correction becomes Some computer packages (SPSS) label Yates' correction as continuity correction in their output.