Presentation on theme: "Test of Significance Vikash R Keshri Moderator: Mr. M. S. Bharambe Vikash R Keshri Moderator: Mr. M. S. Bharambe."— Presentation transcript:
Test of Significance Vikash R Keshri Moderator: Mr. M. S. Bharambe Vikash R Keshri Moderator: Mr. M. S. Bharambe
Outline Introduction: Important Terminologies. Test of Significance: – Z test. – t test. – F test. – Chi Square test. – Fishers Exact test. – Significant test for correlation Coefficient. – One Way Analysis of Variance (ANOVA). Conclusion:
Introduction: All scientists work look for the answer to following questions: – How probable the difference between the observed and expected results by chance only ? – Is the difference statistically significant?
Important Terminologies: Population & Sample: Population is any infinite collection of elements i.e. individual, items, observations etc. A part or subset of population. But The Basic problem of the sample is generalization. Parameters & Statistic: A parameter is a constant describing a population. Statistic is quantity describing the sample i.e. a function of observation.
Sampling Distribution : The distribution of the value of statistics which would arise from all possible samples are called sampling distribution.
Standard Error (SE): The standard deviation of sampling distribution is called as the Standard Error. It provides the estimate that how far from the true value the estimated value is likely to be.
Confidence Limits: Confidence Limit is range within which all the Possible sample mean will lie. A population mean ± 1 Std. Error limit correspond to percent of sample mean value. A population mean ± 1.96 Std. Error correspond to 95.0% of the sample mean values. Population mean ± 2.58 stand. Error corresponds to 99 % sample mean values. Population mean ± 3.29 correspond to 99.9% of the sample mean value. Interval is confidence interval.
Hypothesis: A statistical Hypothesis is a statement about the parameter (forms of population). i.e. x 1 = x 2 or x = µ or p 1 = p 2 or p = P Null Hypothesis (H 0 ): It is hypothesis of no difference between two outcome variables. Alternative Hypothesis (H 1 ): There is difference between the two variables under study. Hypotheses are always about parameters of populations, never about statistic from samples. Test of Significance: Testing the null hypothesis.
Type 1 and Type 2 Error: Null Hypothesis Test Result True False Significant Accepting Hi Rejecting Ho Type 1 Error No error Power (1- β) Not significant Accepting Ho Rejecting Hi No ErrorType 2 Error
Parametric Vs. Non – Parametric test; Parametric test Based on assumptions that data follow normal distribution or normal family of distribution. Estimate parameter of underlying normal distribution. Significance of difference known Non parametric test Variable under study dont follow normal distribution or any other distribution of normal family. Association can be estimated.
P – Value: P value provides significant departure or some degree of evidence against null hypothesis. P value derived from statistical tests depend on the size and direction of the effect. P < 0.05 = significant = 1.96 Std. Error = 95% Confidence Interval. P < 0.01 or p <.001 = highly significant = 99% and 99.9% Confidence Interval. The Non Significant departure doesnt provide the positive evidence in favour of hypothesis. Dependent on Sample Size. If P > alpha, calculate the power – If power < 80% - The difference could not be detected; repeat the study with deficit number of study subjects. – If power 80 % - The difference between groups is not statistically significant.
One Sided ( One tailed) Vs. Two Sided (two tailed) : Two Sided test: Significantly large departure from Null Hypothesis in either direction will be judged by significance. One Sided Test: Is used we are interested in measuring the departure in only one particular direction. A one sided test at level P is same as two sided test at level 2P. Example: test to compare population mean of two group A and B – Alternate Hypothesis mean of A > mean of B. – One tailed test. – Alternate Hypothesis Mean of B > mean of A > mean of B. – two tailed test.
STEPS : Defining the research question. Null Hypothesis (H 0 ) - there is no difference between the group. Alternative hypothesis (H 1 ) – there is some difference between the groups. Selecting appropriate test. Calculation of test criteria (c). Deciding the acceptable level of significance (α). Usually 0.05 (5%). Compare the test criteria with theoretical value at α. Accepting Null Hypothesis or Alternative Hypothesis. Inference.
Common concerns: Sample mean and Population mean Two or more sample mean. Sample Proportion (percentage) vs. Population proportion (percentages). Two or more Sample Proportion (percentages). Sample Correlation Coefficient vs. population correlation coefficient. Two sample correlation coefficient.
Why test of significance? Testing SAMPLE and commenting on POPULATION. Two different SAMPLES (group means) from same or different POPULATIONS (from which the samples were drawn)? Is the difference obtained TRUE or by chance alone? Will another set of samples be also different? Significance Testing - Deals with answer to above Questions.
Standard Normal Deviate (Z) test Assumptions: Samples are selected randomly. Quantitative data. Variable follow normal distribution in the population. Sufficiently large sample Size.
The steps: To find out the problem and question to be answered. Statement of Null (H 0 ) Alternative Hypothesis (H 1 ). Calculation of standard Error. Calculation of Critical ratio. Fixation of level of significance. (α) critical level of significance. Comparison of calculated critical ratio with the theoretical value. Drawing the inference.
Comparison of Means of Two Samples: Z c = x 1 – x 2 / SE (x 1 –x 2 ). SE of (x 1 – x 2 ) = [ (SE SE 2 2 )] SE of (x 1 – x 2 ) = [SD 1 2 /n 1 + SD 2 2 / n 2 ] ½ Example: We have to compare and infer from the given data that the arm circumference of Indian and American children. DetailsAmericanIndian No. of Subjects625 Mean Standard Deviation5.05.4
Interpreting Z value: Area under curve: Z 0.05, = 1.96 Z = 2.56 Z 0.01 = 3.29 If Calculated Z value (Z c ) > Z 0.05, Z 0.01, Z Null hypothesis is rejected Alternate Hypothesis is accepted.
Comparing Sample Mean with Population Mean: Z = difference between sample and population mean / SE of sample mean. SE of sample mean= sample std. deviation / square root of n Example: If the Mean weight of population Follow normal distribution. Do the mean weight of 17.8 kg. of 100 children with std. deviation of 1.25 Kg. different from the population mean wt. of 20 kg.
Difference between two sample Proportions: Difference in proportion / SE (Difference in proportion) Z = p 1 – p 2 / [PQ (1/n 1 + 1/n 2 )] 1/2 Here p 1 = Proportion of sample 1 p 2 = Proportion of sample 2 P = p 1 n 1 + p 2 n 2 / n 1 + n 2 and Q = 1- P Example: Given table provides data for Prevalence of Overweight and Obesity among Indians and USA. can we conclude that the Prevalence of Overweight and Obesity among Indians and USA is same? DetailsIndiaUSA Sample Size500 Prevalence of overweight or obesity p 1 = 28.0p 2 = 30.0 Proportion
Comparison of Sample Proportion with Population Proportion: Zc = Difference between sample proportion and population proportion / SE of Difference between sample proportion and population proportion. Zc = p – P / [PQ (1/n)] ½ p= Sample proportion, P = Population Proportion and Q = 1-P., n = Sample Size. Example: In school health survey the prevalence of nutritional dwarfism among the school age children in class 10 is Sample size studied was 250. Does it confirm that 20% of school age of children is nutritional dwarf?
Variance Ratio test (F – test). Developed by Fisher and Snedecor. Comparison of Variance between two groups (or Sample). Involves the distribution of F. Applied If the SD 1 2 and SD 2 2 of two sample is known. SD 1 2 > SD 2 2 than SD 1 2 / SD 2 2 follows the F distribution at n 1 -1 and n 2 – 1 Degree of Freedom. F = SD 1 2 / SD 2 2 Example: SD 1 2 of 25 males adults for height is 5.0. SD 1 2 for 25 females is 9.0. Can we conclude that the variance in height is same in both male and female adults?
t – test: Prof. W.S. Gosset. ( pen name of student.) Difference b/w Normal and t Distribution: Very Small Sample size dont follow the normal distribution. They follow the t distribution. Bell shaped vs. symmetrical.
Prerequisite: Unpaired data: – Sample size is small (Usually < 30) – Population variance is not known. – Two separate group of samples drawn from two separate population group. – These two groups can be control and cases also. Paired data: – Applied only when each individual gives a pair of data. i.e. study of accuracy of two instruments or study on weight of one individual on two different occasion.
Assumptions: Samples are randomly selected. Quantitative data. Variable under study follow normal distribution family. Sample variances are mostly same in both group. Sample size is small (usually < 30).
Unpaired t test: Mean of two independent samples. Example: Mean value of birth weight with std. deviation is given below by socio- economic status. Small randomly selected sample size. Variance is mostly the same, so t test can be applied. DetailsHSESLSES Sample size1510 Mean Birth weight Standard deviation
Steps: State Null hypothesis (H 0 ): X 1 = X 2 Alternative Hypothesis (H 1 ): H 0 is not true. Test criteria t = mean difference between two samples / SE (mean difference between two samples) t = x 1 – x 2 / SE (x 1 – x 2 ). SE (x 1 – x 2 ) = SD [1/n 2 + 1/n 2 ] 1/2 SD = [(n 1 -1)SD (n 2 -1) SD 2 2 / n 1 + n 2 -2] Calculate df = (n 1 – 1) + (n 2 -1) = n 1 + n Compare of calculated t value with its table value at t 0.05, t 0.01, t at n 1 + n 2 -2 df. Inference: if calculated value is > or equal to theoretical value Null Hypothesis rejected.
Difference between sample mean and population mean: t = [x – u ] / SE t = [x – u ] / SD/ n 1/2 Degree of freedom: n -1 Example: – mean Hb. Level of 25 school children is 10.6 gm% with SD of 1.15 gm. / dl. Is it significantly different from mean value of 11.0 gm%.
For difference between two small sample Proportion: t = p 1 – p 2 / [PQ (1/n 1 + 1/n 2 )] 1/2 P = p 1 n 1 + p 2 n 2 / n 1 + n 2 Q = 1- P df = n 1 + n Example: Proportion of infant with frequent diarrhea by type of feeding habits is given. Is there significant difference between the incidence of frequent diarrhea among EBF babies and not EBF babies. DetailsExclusive breast fed Not EBF Sample size30 Percentage of infants with diarrhea Proportion
Paired t test: Pre-requisite: – When each individual is providing a pair of result. – When the pair of results are correlated. t = mean d – 0 /SE (d) t = mean d / SD/ (n) 1/2 SE = SD / (n) 1/2 = [SD 2 / n ] 1/2 SD 2 = Σ (d - mean d) 2 / n-1 Σ (d - mean d) 2 = Σ d 2 – (Σ d) 2 /n
Example: The fat fold at triceps was recorded on 12 children before and at the end of commencement of feeding programme. Is there any significant change in the fat fold at triceps at the end of the programme? Child no. Triceps before X 1 Triceps after X 2 Difference (d) X 2 – X 1 d2d Σ d = 9Σ d 2 =27
t = mean d – 0 /SE (d) = mean d / SD/ (n) 1/2 Σ (d - mean d) 2 = Σ d 2 – (Σ d) 2 /n = 27 – 81/12 = 27 – 6.75 = SD 2 = Σ (d - mean d) 2 / n-1 = / 11 = 1.84 SE = SD / (n) 1/2 = [SD 2 / n ] 1/2 = [1.84 / 12] 1/2 = [0.1533] 1/2 = t = 0.75 / = 1.92 df = n -1 = 11 calculated t value is < t 0.05 at 11 df. Difference is not statistically significant.
Chi Square ( Ϫ 2 ) test: Underlying theory: If the two variables are not associated the value of observed and expected frequencies should be close to each to each other and any discrepancies should be due to randomization only. Non-parametric test. Statistical significance for bivariate tabular analysis. Evaluate differences between experimental or observed data and expected or hypothetical data.
Ϫ 2 Assumptions: 1. Quantitative data. 2. One or more categories. 3. Independent observations. 4. Adequate sample size. 5. Simple random sample. 6. Data in frequency form.
Contingency table: A frequency table where sample classified in to two different attributes. A contingency table may be 2 x 2 table or r x c table. Marginal total = (a + b) or (a + c) or (c + d) or (b +d) Grand total = N = a + b + c + d Expected value (E) = R X C / N where R = row total, C = Column total and N = Grand total. DiseaseSmokerNon – smokerTotal Cancer6 a4 b10 (a + b) No cancer94 c 96 d190 (c + d) 100 (a+c)100 (b+d)200 ( a +b +c +d)
Calculation: = (O – E) 2 / E Degree of freedom: df = (r-1) (c-1) for 2x2 table: Ϫ 2 = (ad – bc) 2 N / (a+b) (b+d) (c+d) (a+c) with 1 df
In given example calculation of expected value: E a = 10 x100 / 200 = 5 O – Ea = 1 (O – Ea) 2 = 1 E b = 10 x100 / 200 = 5 O –E b = -1 (O-Ea) 2 = 1 E c = 190 x 100 /200 = 95 O- Ec = = 1 (O-Ec) 2 = 1 E d = 190 x 100 /200 = 95 O- E d = -1 (O-E d ) 2 = 1 Ϫ 2 = 4 at 1 df Calculated value Ϫ 2 < Ϫ 2 at 0.05 for 1 df. The difference is statistically significant
Yates's continuity correction: Described by F. Yates. When the value in a 2x2 table is fairly small, correction for continuity is required. No precise rule for situation in which the Yates correction needs to be applied. Generally it is applied if the grand total is < 100 or a Expected value is < 5 in any cell. Ϫ 2 = [(ad – bc) –N/2] 2 N / (a+b) (b+d) (c+d) (a+c)
Exact Probability test or Fishers Exact test: Cochrans Criteria: Recommended by W. G. Cochran in Fishers Exact test should be used if: – If n < 20 – and smallest expected value is less than 5. – For contingency table more than 1 df the criteria states that if Expected value < 5 in more than 20% of cells. What if the observed value is 0 in one cell? – Chi square can still be applied if it fulfills the above criteria of expected value.
Fishers Exact test……. Devised by Fisher, Yates and Irwin. Example: Survival rate after two different types of treatments: Is the difference in survival statistically significant? No. of tables possible with marginal total is 4 = lowest total marginal +1. SurvivedDiedTotal Treatment A314 r 1 Treatment B224 r 2 5 s 1 3 s 2 8 n
Table 1 SurvivedDiedTotal Treatment A 404 r 1 Treatment B 134 r 2 5 s 1 3 s 2 8 n Table 2SurvivedDiedTotal Treatment A 314 r 1 Treatment B 224 r 2 5 s 1 3 s 2 8 n Table 3SurvivedDiedTotal Treatmen t A 224 r 1 Treatmen t B 314 r 2 5 s 1 3 s 2 8 n Table 4SurvivedDiedTotal Treatmen t A 134 r 1 Treatmen t B 404 r 2 5 s 1 3 s 2 8 n
Exact probability P value = The P value for each table is 0.O71, 0.429, and Table 2 is similar to the test table. Final P value: Conventional Approach: P = P of observed set + extreme value = O = 0.5 Mid P approach given by Armitage and Berry: P = 0.5 X observed P + Extreme value = = Exact probability is essentially One sided. For two sided test double the P value.
Significance test for Correlation Coefficients: Sample correlation coefficient (r) and Population with correlation coefficient (r = 0 in population). Is the sample correlation coefficient r is from the population with correlation coefficient o? Valid if at least one variable follow normal distribution. Null hypothesis H 0 p = 0. Sample correlation coefficient is zero). Std Error of r = [(1-r 2 )/ n-2] 1/2 For small sample test: t = r – 0 / SE (r) = r / SE ( r) at n-2 df.
Example: Correlation coefficient between intake of calories and protein in adults is The sample size studied was 12. Is this r value statistically significant? First calculate SE(r ) = [ 1-(0.8652) 2 / 10] 1/2 = t = r – 0 / SE (r) t = / = df = n -2 = 10 t value is > t value at for 10 df. so the r value is highly significant.
Two independent correlation coefficient. r 1 and r 2 are two independent correlation coefficient based on n 1 and n 2 sample size. First z transformation: (also known as Fishers Z transformation). Z 1 = ½ log 1+r 1 / 1-r 2 and Z 2 = ½ log 1+r 2 / 1-r 1 For small sample t test is used: t = Z 1 - Z 2 / [1/ n /n 2 -3] 1/2 at n 1 + n 2 – 6 df. For large sample test of significance: Z = Z 1 - Z 2 / [1/ n /n 2 -3] 1/2 Z value follow normal distribution.
Example: Correlation coefficient between protein and calorie intakes calculated from two samples of 1200 and 1600 are and respectively. Do the two estimates differs significantly? n 1 = 1200 n 2 = 1600 r 1 = and r 2 = then Z 1 = and Z 2 = from fishers table Z = Z 1 - Z 2 / [1/ n /n 2 -3] 1/2 = Z calculated > Z at level. The difference in correlation between two sample is highly significant.
Effect of Sample Size: If sample size is 12 and 16. Data given: n 1 = 12 n 2 = 16 and r 1 = , r 2 = Z 1 = and Z 2 = from fishers table t = Z 1 - Z 2 / [1/ n /n 2 -3] 1/2 t = Df = n 1 + n 2 – 6 = 22 Calculated t < t 0.05 So P > No difference between correlation Coefficient.
Conclusion: Significance of test of Significance ? Strength of association? Result is meaningful in practical sense ? Result fails the test of significance doesnt mean there is no relationship between two variables. Significance only relates to probability of result being commonly or rarely by chance. The results are statistically significant but no clinical or biochemical significance. Assumption for test of significance: – Group to be equal in all respect other than the factor under study. – Random selection of the patient for each group. Factors where significance test is not full proof: – Small Sample size. – Matching
Selecting Appropriate test: Goal of Analysis Type of Data Distribution of data No. of Groups Design of Study
Selecting Appropriate test:
Selecting Appropriate test ……
References: Rao VK. Biostatistics: A manual of statistical method for use in health nutrition and anthropometry. 2 nd ed. New Delhi: Jaypee Brothers; Armitage P, Berry G. Statistical Method in Medical Research. 3 rd ed. London: Oxford Blackwell scientific publication; 1994 Swinskow TV, Campbell MJ. Statistics at Square One. 10 th ed. London: BMJ Books; Bland M. An Introduction to Medical Statistics. 3 rd ed. New York: Oxford University Press; 200. Moye LA. Statistical Reasoning in Medicine: The Intuitive P Value Primer. 1 st ed. New York: Springer- Verlag Mahajan BK. Methods in Biostatistics. 7 th ed. New Delhi: Jaypee Brothers; 2010.