Presentation on theme: "1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS."— Presentation transcript:
1 HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS
2 In this lecture, we are going to study the procedures for making inferences about two populations. When comparing two populations we need two samples. Two basic kinds of samples can be used: independent and dependent. The dependence or independence of a sample is determined by the sources used for the data. If the same set of sources is used to obtain the data representing different situations, we have dependent sampling. If two unrelated sets of sources are used, one set from each population, we have independent sampling.
3 Two Independent Populations The Significance Test for The Difference Between Two Population Means Mann Whitney U Test The Significance Test for The Difference Between Two Population Proportions 2*2 Chi Square Tests
4 The Significance Test for The Difference Between Two Population Means Hypothesis testing involving the difference between two population means is most frequently employed to determine whether or not it is reasonable to conclude that the two are unequal. In such cases, one or the other of the following hypothesis may be formulated: H 0 : 1 = 2 H a : 1 2 (1) H 0 : 1 = 2 H a : 1 > 2 (2) H 0 : 1 = 2 H a : 1 < 2 (3)
5 The difference between two population means will be discussed in three different contexts: 1.When sampling is from normally distributed populations with known population variances 2.When sampling is from normally distributed populations with unknown population variances 3.When sampling is from populations that not normally distributed
6 Sampling from Normally Distributed Populations: Population Variances Known When each of two independent simple random samples has been drawn from a normally distributed population with a known variance, the test statistic for testing the null hypothesis of equal population means is
7 Example Researchers wish to know if the data they have collected provide sufficient evidence to indicate a difference in mean serum uric acid levels between normal individuals and individuals with mongolism. The data consist of serum uric acid readings on 12 mongoloid individuals and 15 normal individuals. The means are 4.5 mg/100 ml and 3.4 mg/100 ml. The data constitute two independent simple random samples each drawn from a normally distributed population with a variance equal to 1. H 0 : 1 = 2 H a : 1 2
8 < The two population means are not equal.
9 Sampling from Normally Distributed Populations: Population Variances Unknown When the population variances are unknown, two possibilities exist. The two population variances may be equal or unequal. When comparing two populations, it is quite natural that we compare their variances or standard deviations.
10 Variances are not equal Variances are equal Testing the equality of two population variances:
11 Population Variances Equal: When the population variances are unknown,
12 Example A research team collected serum amylase data from a sample of healthy subjects and from a sample of hospitalized subjects. The data consist of serum amylase determination on 22 hospitalized subjects and 15 healthy subjects with mean 120 and 96 units/ml and standard deviation 40 and 35 units/ml, respectively. The data constitute two independent random samples, each drawn from a normally distributed population. The population variances are unknown. They wish to know if they would be justified in concluding that the population means are different. Variances are equal
13 H 0 : 1 - 2 =0H a : 1 - 2 0
14 Since t table >t calculated, accept H 0. t calculated =1.88 The mean of serum amylase level of hospitalized subjects are not different from the mean of serum amylase levels of healthy subjects
15 Population Variances Unequal: When two independent simple random samples have been drawn from normally distributed populations with unknown and unequal variances the test statistic for testing H 0 : 1 = 2 is The critical value of t for level of significance and a two-sided test is approximately
16 Researchers wish to know if two populations differ with respect to the mean value of total complement activity (C H50 ). The data consist of total serum complement activity determinations of 20 apparently normal subjects and 10 subjects with disease. The sample means and standard deviations are 62.6 and 33.8 for normal subjects and 47.2 and 10.1 for subjects with disease.
17 H 0 : 1 - 2 =0H a : 1 - 2 0 w 1 = /10= and w 2 =(10.1) 2 /10= t 1 = and t 2 = <1.41<2.255 Accept H 0.
18 MANN-WHITNEY U TEST The sign test discussed in the preceding lecture does not make full use of all the information present in the two samples when the variable of interest is measured on at least an ordinal scale. By reducing an observation’s information content to merely that of whether or not it fails above or below the common median is waste of information. If, for testing the desired hypothesis, there is available a procedure that makes use of more of the information inherent in the data, that procedure should be used if possible.
19 Such a nonparametric procedure that can be used instead of the sign test is Mann Whitney U Test. Mann Whitney U Test is a nonparametric alternative for the significance test for difference between two independent population means. Since the test is based on the ranks of the observations it utilizes more information than does the sign test.
20 The assumptions underlying the Mann-Whitney U Test are as follows: 1.The two samples, of size n and m, respectively, available for analysis have been independently and randomly drawn from their respective populations. 2.The measurement scale is at least ordinal. 3.If the populations differ at all, they differ only with respect to their medians.
21 The calculation of the test statistic U is a two-step procedure. We first determine the sum of the ranks for the first sample. Then using this sum of ranks, we calculate a U score for each sample. The larger U score is the test statistic. The critical U value gets from the U table. If the sample size is smaller than 20
22 If the sample sizes are greater than 20, we can use the standard normal, z approximation. This is possible since the distribution of U is approximately normal with a mean n 1 n 2 /2 and a standard deviation
23 Example A researcher designed an experiment to assess the effects of prolonged inhalation of cadmium oxide. Fifteen laboratory animals served as experimental subjects, while 10 similar animals served as controls. The variable of interest was hemoglobin level following the experiment. The results are shown in the table. We wish to know if we can conclude that prolonged inhalation if cadmium oxide reduces hemoglobin level.
24 Exposed animals RankUnexposed animals Rank 14,4717,424 14,2616,217 13,8217,123 16,51917,525 14,14,515,08,5 16,61616,015 15,91416,922 15,61215,08,5 14,14,516,318 15,310,516,821 15,713 16,720 13,71 15,310,5 14,03 Since n1 and n2 <20 From the table, critical value is >45 Reject H 0
25 The Difference Between Two Population Proportions The most frequent test employed relative to the difference between two population proportions is that their difference is zero. It is possible, however, to test that the difference is equal to some other value.
26 In a study designed to compare a new treatment for migraine headache with the standard treatment, 78 of 100 subjects who received the standard treatment responded favorably. Of the 100 subjects who received the new treatment 90 responded favorably. Do these data provide sufficient evidence to indicate that the new treatment is more effective than the standard? Z (0.05) =1.645 > The new treatment is more effective than the standard.
27 2x2 Chi Square Test We can use the chi-square test to compare frequencies or proportions in two or more groups. The classification according to two criteria, of a set of entities, can be shown by a table in which the r rows represents the various levels of of one criterion of classification and c columns represent the various levels of the second criterion. Such a table is generally called a contingency table. We will be interested in testing the null hypothesis that in the population the two criteria of classification are independent or associated.
28 df = (r-1)(c-1)= O 11 2 TotalN First criteria Total Second Criteria O 12 O 21 O 22 O.1 O.2 O1.O1. O 2. E ij should be greater than or equal to 5.
29 Is squint more common among children with a positive family history? Is there an association between squint and family history of squint? Total Squint Total Family History 2 (1,0.025) =5.024 > Accept H 0. There is no relation between squint and family history
30 If any expected frequencies are less than 5, then alternative procedure to called Fisher’s Exact Test should be performed. Let O 21 =a, P(O 21
31 P(O 21 1)= = Total61622 Squint Total Family History Total61622 Squint Total Family History