Presentation on theme: "ABOUT TWO INDEPENDENT POPULATIONS"— Presentation transcript:
1 ABOUT TWO INDEPENDENT POPULATIONS HYPOTHESIS TESTING:ABOUT TWO INDEPENDENT POPULATIONS
2 In this lecture, we are going to study the procedures for making inferences about two populations. When comparing two populations we need two samples. Two basic kinds of samples can be used: independent and dependent. The dependence or independence of a sample is determined by the sources used for the data .If the same set of sources is used to obtain the data representing different situations, we have dependent sampling. If two unrelated sets of sources are used, one set from each population, we have independent sampling.
3 Two Independent Populations The Significance Test for The Difference Between Two Population MeansMann Whitney U TestThe Significance Test for The Difference Between Two Population Proportions2*2 Chi Square Tests
4 The Significance Test for The Difference Between Two Population Means Hypothesis testing involving the difference between two population means is most frequently employed to determine whether or not it is reasonable to conclude that the two are unequal. In such cases, one or the other of the following hypothesis may be formulated:H0: 1= 2Ha: 1 2(1)H0: 1= 2Ha: 1> 2(2)H0: 1= 2Ha: 1 < 2(3)
5 The difference between two population means will be discussed in three different contexts: When sampling is from normally distributed populations with known population variancesWhen sampling is from normally distributed populations with unknown population variancesWhen sampling is from populations that not normally distributed
6 Sampling from Normally Distributed Populations: Population Variances Known When each of two independent simple random samples has been drawn from a normally distributed population with a known variance, the test statistic for testing the null hypothesis of equal population means is
7 ExampleResearchers wish to know if the data they have collected provide sufficient evidence to indicate a difference in mean serum uric acid levels between normal individuals and individuals with mongolism. The data consist of serum uric acid readings on 12 mongoloid individuals and 15 normal individuals. The means are 4.5 mg/100 ml and 3.4 mg/100 ml. The data constitute two independent simple random samples each drawn from a normally distributed population with a variance equal to 1.H0: 1= 2Ha: 1 2
9 Sampling from Normally Distributed Populations: Population Variances Unknown When the population variances are unknown, two possibilities exist. The two population variances may be equal or unequal. When comparing two populations, it is quite natural that we compare their variances or standard deviations.
10 F Table( =0.05) Testing the equality of two population variances: Variances are not equalVariances are equalDenominatorDegrees ofFreedom12345...120161.4199.5215.7224.6230.2253.3254.318.519.019.1619.2519.3019.4919.5010.139.559.289.129.018.558.533.923.072.682.452.291.351.253.843.002.602.372.211.221.00F Table(=0.05)Numerator Degrees of Freedom
11 Population Variances Equal: When the population variances are unknown,
12 Example A research team collected serum amylase data from a sample of healthy subjects and from a sample of hospitalized subjects. The data consist of serum amylase determination on 22 hospitalized subjects and 15 healthy subjects with mean 120 and 96 units/ml and standard deviation 40 and 35 units/ml, respectively. The data constitute two independent random samples, each drawn from a normally distributed population. The population variances are unknown. They wish to know if they would be justified in concluding that the population means are different.Variances are equal
14 tcalculated=1.88Since ttable>tcalculated, accept H0.The mean of serum amylase level of hospitalized subjects are not different from the mean of serum amylase levels of healthy subjects
15 Population Variances Unequal: When two independent simple random samples have been drawn from normally distributed populations with unknown and unequal variances the test statistic for testing H0: 1= 2 isThe critical value of t for level of significance and a two-sided test is approximately
16 Researchers wish to know if two populations differ with respect to the mean value of total complement activity (CH50). The data consist of total serum complement activity determinations of 20 apparently normal subjects and 10 subjects with disease. The sample means and standard deviations are 62.6 and 33.8 for normal subjects and 47.2 and 10.1 for subjects with disease.
17 H0: 1- 2 =0 Ha: 1- 2 0 -2.255<1.41<2.255 Accept H0. w1=33.82/10= and w2=(10.1)2/10=t1= and t2=2.0930-2.255<1.41<2.255Accept H0.
18 MANN-WHITNEY U TESTThe sign test discussed in the preceding lecture does not make full use of all the information present in the two samples when the variable of interest is measured on at least an ordinal scale. By reducing an observation’s information content to merely that of whether or not it fails above or below the common median is waste of information. If, for testing the desired hypothesis, there is available a procedure that makes use of more of the information inherent in the data, that procedure should be used if possible.
19 Such a nonparametric procedure that can be used instead of the sign test is Mann Whitney U Test. Mann Whitney U Test is a nonparametric alternative for the significance test for difference between two independent population means. Since the test is based on the ranks of the observations it utilizes more information than does the sign test.
20 The assumptions underlying the Mann-Whitney U Test are as follows:The two samples, of size n and m, respectively, available for analysis have been independently and randomly drawn from their respective populations.The measurement scale is at least ordinal.If the populations differ at all, they differ only with respect to their medians.
21 If the sample size is smaller than 20 The calculation of the test statistic U is a two-step procedure. We first determine the sum of the ranks for the first sample. Then using this sum of ranks, we calculate a U score for each sample. The larger U score is the test statistic. The critical U value gets from the U table.
22 If the sample sizes are greater than 20, we can use the standard normal, z approximation. This is possible since the distribution of U is approximately normal with a meann1n2/2and a standard deviation
23 Example A researcher designed an experiment to assess the effects of prolonged inhalation of cadmium oxide. Fifteen laboratory animals served as experimental subjects, while 10 similar animals served as controls. The variable of interest was hemoglobin level following the experiment. The results are shown in the table. We wish to know if we can conclude that prolonged inhalation if cadmium oxide reduces hemoglobin level.
24 From the table, critical value is 45 125 >45 Reject H0 Exposed animalsRankUnexposed animals14,4717,42414,2616,21713,8217,12316,51917,52514,14,515,08,516,61616,01515,91416,92215,61216,31815,310,516,82115,71316,72013,7114,03Since n1 and n2 <20From the table, critical value is >45 Reject H0
25 The Difference Between Two Population Proportions The most frequent test employed relative to the difference between two population proportions is that their difference is zero. It is possible, however, to test that the difference is equal to some other value.
26 In a study designed to compare a new treatment for migraine headache with the standard treatment, 78 of 100 subjects who received the standard treatment responded favorably. Of the 100 subjects who received the new treatment 90 responded favorably. Do these data provide sufficient evidence to indicate that the new treatment is more effective than the standard?>Z(0.05)=1.645The new treatment is more effective than the standard.
27 2x2 Chi Square TestWe can use the chi-square test to compare frequencies or proportions in two or more groups. The classification according to two criteria, of a set of entities, can be shown by a table in which the r rows represents the various levels of of one criterion of classification and c columns represent the various levels of the second criterion. Such a table is generally called a contingency table.We will be interested in testing the null hypothesis that in the population the two criteria of classification are independent or associated.
28 Eij should be greater than or equal to 5. +-1O112TotalNFirst criteriaSecondCriteriaO12O21O22O.1O.2O1.O2.Eij should be greater than or equal to 5.df = (r-1)(c-1)=1
29 There is no relation between squint and family history +-203050155570Total3585120SquintFamilyHistory14.5835.4220.4249.58Is squint more common among children with a positive family history?Is there an association between squint and family history of squint?2(1,0.025)=5.024 > Accept H0.There is no relation between squint and family history
30 Let O21=a, P(O21<a) should be calculated. If any expected frequencies are less than 5, then alternative procedure to called Fisher’s Exact Test should be performed.+-1O112TotalNFirst criteriaSecondCriteriaO12O21O22O.1O.2O1.O2.Let O21=a, P(O21<a) should be calculated.If P(f21<a) < ,then H1 is accepted.
31 P(O21£1)=0.068+0.006=0.074 + - 5 6 11 1 10 Total 16 22 Squint Family History+-6511Total1622SquintFamilyHistoryP(O21£1)= =0.074