## Presentation on theme: "ABOUT TWO INDEPENDENT POPULATIONS"— Presentation transcript:

HYPOTHESIS TESTING: ABOUT TWO INDEPENDENT POPULATIONS

In this lecture, we are going to study the procedures for making inferences about two populations.
When comparing two populations we need two samples. Two basic kinds of samples can be used: independent and dependent. The dependence or independence of a sample is determined by the sources used for the data . If the same set of sources is used to obtain the data representing different situations, we have dependent sampling. If two unrelated sets of sources are used, one set from each population, we have independent sampling.

Two Independent Populations
The Significance Test for The Difference Between Two Population Means Mann Whitney U Test The Significance Test for The Difference Between Two Population Proportions 2*2 Chi Square Tests

The Significance Test for The Difference Between Two Population Means
Hypothesis testing involving the difference between two population means is most frequently employed to determine whether or not it is reasonable to conclude that the two are unequal. In such cases, one or the other of the following hypothesis may be formulated: H0: 1=  2 Ha: 1  2 (1) H0: 1=  2 Ha: 1>  2 (2) H0: 1=  2 Ha: 1 < 2 (3)

The difference between two population means will be discussed in three different contexts:
When sampling is from normally distributed populations with known population variances When sampling is from normally distributed populations with unknown population variances When sampling is from populations that not normally distributed

Sampling from Normally Distributed Populations: Population Variances Known
When each of two independent simple random samples has been drawn from a normally distributed population with a known variance, the test statistic for testing the null hypothesis of equal population means is

Example Researchers wish to know if the data they have collected provide sufficient evidence to indicate a difference in mean serum uric acid levels between normal individuals and individuals with mongolism. The data consist of serum uric acid readings on 12 mongoloid individuals and 15 normal individuals. The means are 4.5 mg/100 ml and 3.4 mg/100 ml. The data constitute two independent simple random samples each drawn from a normally distributed population with a variance equal to 1. H0: 1=  2 Ha: 1  2

< The two population means are not equal.

Sampling from Normally Distributed Populations: Population Variances Unknown
When the population variances are unknown, two possibilities exist. The two population variances may be equal or unequal. When comparing two populations, it is quite natural that we compare their variances or standard deviations.

F Table(  =0.05) Testing the equality of two population variances:
Variances are not equal Variances are equal Denominator Degrees of Freedom 1 2 3 4 5 ... 120 161.4 199.5 215.7 224.6 230.2 253.3 254.3 18.5 19.0 19.16 19.25 19.30 19.49 19.50 10.13 9.55 9.28 9.12 9.01 8.55 8.53 3.92 3.07 2.68 2.45 2.29 1.35 1.25 3.84 3.00 2.60 2.37 2.21 1.22 1.00 F Table( =0.05) Numerator Degrees of Freedom

Population Variances Equal: When the population variances are unknown,

Example A research team collected serum amylase data from a sample of healthy subjects and from a sample of hospitalized subjects. The data consist of serum amylase determination on 22 hospitalized subjects and 15 healthy subjects with mean 120 and 96 units/ml and standard deviation 40 and 35 units/ml, respectively. The data constitute two independent random samples, each drawn from a normally distributed population. The population variances are unknown. They wish to know if they would be justified in concluding that the population means are different. Variances are equal

H0: 1-  2 =0 Ha: 1-  2 0

tcalculated=1.88 Since ttable>tcalculated, accept H0. The mean of serum amylase level of hospitalized subjects are not different from the mean of serum amylase levels of healthy subjects

Population Variances Unequal: When two independent simple random samples have been drawn from normally distributed populations with unknown and unequal variances the test statistic for testing H0: 1=  2 is The critical value of t for  level of significance and a two-sided test is approximately

Researchers wish to know if two populations differ with respect to the mean value of total complement activity (CH50). The data consist of total serum complement activity determinations of 20 apparently normal subjects and 10 subjects with disease. The sample means and standard deviations are 62.6 and 33.8 for normal subjects and 47.2 and 10.1 for subjects with disease.

H0: 1-  2 =0 Ha: 1-  2 0 -2.255<1.41<2.255 Accept H0.
w1=33.82/10= and w2=(10.1)2/10= t1= and t2=2.0930 -2.255<1.41<2.255 Accept H0.

MANN-WHITNEY U TEST The sign test discussed in the preceding lecture does not make full use of all the information present in the two samples when the variable of interest is measured on at least an ordinal scale. By reducing an observation’s information content to merely that of whether or not it fails above or below the common median is waste of information. If, for testing the desired hypothesis, there is available a procedure that makes use of more of the information inherent in the data, that procedure should be used if possible.

Such a nonparametric procedure that can be used instead of the sign test is Mann Whitney U Test. Mann Whitney U Test is a nonparametric alternative for the significance test for difference between two independent population means. Since the test is based on the ranks of the observations it utilizes more information than does the sign test.

The assumptions underlying the Mann-Whitney U Test are
as follows: The two samples, of size n and m, respectively, available for analysis have been independently and randomly drawn from their respective populations. The measurement scale is at least ordinal. If the populations differ at all, they differ only with respect to their medians.

If the sample size is smaller than 20
The calculation of the test statistic U is a two-step procedure. We first determine the sum of the ranks for the first sample. Then using this sum of ranks, we calculate a U score for each sample. The larger U score is the test statistic. The critical U value gets from the U table.

If the sample sizes are greater than 20, we can use the standard normal, z approximation. This is possible since the distribution of U is approximately normal with a mean n1n2/2 and a standard deviation

Example A researcher designed an experiment to assess the effects of prolonged inhalation of cadmium oxide. Fifteen laboratory animals served as experimental subjects, while 10 similar animals served as controls. The variable of interest was hemoglobin level following the experiment. The results are shown in the table. We wish to know if we can conclude that prolonged inhalation if cadmium oxide reduces hemoglobin level.

From the table, critical value is 45 125 >45 Reject H0
Exposed animals Rank Unexposed animals 14,4 7 17,4 24 14,2 6 16,2 17 13,8 2 17,1 23 16,5 19 17,5 25 14,1 4,5 15,0 8,5 16,6 16 16,0 15 15,9 14 16,9 22 15,6 12 16,3 18 15,3 10,5 16,8 21 15,7 13 16,7 20 13,7 1 14,0 3 Since n1 and n2 <20 From the table, critical value is >45 Reject H0

The Difference Between Two Population Proportions
The most frequent test employed relative to the difference between two population proportions is that their difference is zero. It is possible, however, to test that the difference is equal to some other value.

In a study designed to compare a new treatment for migraine headache with the standard treatment, 78 of 100 subjects who received the standard treatment responded favorably. Of the 100 subjects who received the new treatment 90 responded favorably. Do these data provide sufficient evidence to indicate that the new treatment is more effective than the standard? > Z(0.05)=1.645 The new treatment is more effective than the standard.

2x2 Chi Square Test We can use the chi-square test to compare frequencies or proportions in two or more groups. The classification according to two criteria, of a set of entities, can be shown by a table in which the r rows represents the various levels of of one criterion of classification and c columns represent the various levels of the second criterion. Such a table is generally called a contingency table. We will be interested in testing the null hypothesis that in the population the two criteria of classification are independent or associated.

Eij should be greater than or equal to 5.
+ - 1 O11 2 Total N First criteria Second Criteria O12 O21 O22 O.1 O.2 O1. O2. Eij should be greater than or equal to 5. df = (r-1)(c-1)=1

There is no relation between squint and family history
+ - 20 30 50 15 55 70 Total 35 85 120 Squint Family History 14.58 35.42 20.42 49.58 Is squint more common among children with a positive family history? Is there an association between squint and family history of squint? 2(1,0.025)=5.024 > Accept H0. There is no relation between squint and family history

Let O21=a, P(O21<a) should be calculated.
If any expected frequencies are less than 5, then alternative procedure to called Fisher’s Exact Test should be performed. + - 1 O11 2 Total N First criteria Second Criteria O12 O21 O22 O.1 O.2 O1. O2. Let O21=a, P(O21<a) should be calculated. If P(f21<a) <  ,then H1 is accepted.

P(O21£1)=0.068+0.006=0.074 + - 5 6 11 1 10 Total 16 22 Squint Family
History + - 6 5 11 Total 16 22 Squint Family History P(O21£1)= =0.074