Presentation is loading. Please wait.

Presentation is loading. Please wait.

EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing.

Similar presentations


Presentation on theme: "EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing."— Presentation transcript:

1 EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing

2 LULU E. BUDIMAN Introduction Every member of a population cannot be examined so we use the data from a sample, taken from the same population, to estimate some measure, such as the mean, of the population itself. The sample will provide us with the best estimate of the exact 'truth' about the population. The method of sampling depends on the data available but the ideal method, as every member of the population has an equal chance of being selected, is random sampling.

3 LULU E. BUDIMAN We estimate limits within which we are expect the 'truth' about the population to lie and state how confident we are about this estimation. There are therefore two types of estimate of a population parameter: –Point estimate - one particular value –Interval estimate - an interval centred on the point estimate. Point estimate Interval estimate

4 LULU E. BUDIMAN Estimating population Point estimate is a single number used to estimate a population parameter. The best point estimate of the population mean is the sample mean. The accuracy with which the sample mean estimates the population mean is dependent upon how well the sample represents the population. Interval estimate, which is a range of values used to estimate a population parameter

5 LULU E. BUDIMAN Hypothesis Testing Statistics to test hypotheses take the following general form Hypothesis Testing Hypothesis testing is generally used when some comparison is to be made.

6 LULU E. BUDIMAN Hypothesis testing is the use of statistics to determine the probability that a given hypothesis is true. Hypothesis in statistics, is a claim or statement about property of a population

7 LULU E. BUDIMAN The usual process of hypothesis testing consists of four steps. Formulate the null hypothesis (commonly, that the observations are the result of pure chance) and the alternative hypothesis (commonly, that the observations show a real effect combined with a component of chance variation). Identify a test statistic that can be used to assess the truth of the null hypothesis. Compute the P-value, which is the probability that a test statistic at least as significant as the one observed would be obtained assuming that the null hypothesis were true. The smaller the value, the stronger the evidence against the null hypothesis. Compare the value to an acceptable significance value (sometimes called an alpha value). If, that the observed effect is statistically significant, the null hypothesis is ruled out, and the alternative hypothesis is valid.

8 LULU E. BUDIMAN Treatment A Treatment B Survive Not Survive Survive Not Survive Examples : We were to give a new cancer treatment to a group of patients Survival rate, for example, was different than the survival rate of those who do not receive the new treatment. What we are testing then is whether the sample patients who receive the new treatment come from the population we already know about (cancer patients without the treatment). Hipotesis????? H 0 ?....H 1 ?

9 LULU E. BUDIMAN The parameter (mean, proportion, relative risk, coefficient of correlation) in a study population, which can be estimated only by observing the sample, is equal to the values given by the hypothesis. If the estimated value for the parameter turns out to be close enough to the hypothesized value, we can accept the hypotheses. If not, we may have to reject the hypothesis.

10 LULU E. BUDIMAN A significance test estimate the likelihood that an observed result (e.g. a difference between two groups) is due to the chance. In other words, a significance test is used to find out whether a study result which is observed in a sample can be considered as a result which exists in the population from which the sample was drawn.

11 LULU E. BUDIMAN Example : We are investigating the medical risks associated with a certain occupation and we take a random sample of 20 men aged 30-39 and their mean systolic blood pressure is found to be 141.4 mmHg. Suppose the past experience has told us that in the population at large the mean systolic blood pressure for men of this age group is  = 133.2 mmHg with standard deviation  = 15.1 mmHg. Does the evidence of our sample indicate an increased blood pressure associated with this occupation ?

12 LULU E. BUDIMAN Suppose for the moment, we propose a hypothesis, that there is no increase in blood pressure in this occupation, and the sample of 20 men can be regarded as a random sample from the whole population of men aged 30-39 years. Then we know (in past experience) that the means of samples of 20 will be distributed normally about a mean of  = 133.2 mmHg, with standard deviation  /  n = 15.1/  20 = 3.38 (standard error of the mean).

13 LULU E. BUDIMAN From what we know of the normal distribution sample means outside the range 133.2  1.96 x 3.38, i.e. outside 126.6 to 139.9 would occur only in 5 % of samples of this size, i.e, with probability 0.05. Our sample mean lies outside this range because it is 141.4 mmHg. What can we conclude ?

14 LULU E. BUDIMAN 1.Our hypothesis that there is no increase in systolic blood pressure in this occupation is correct and our sample mean was large purely by an unfortunate sampling fluke. That is, a result as extreme as our sample mean which has a probability of 0.05, just happened to occur. 2.Our hypothesis that there is no increase in systolic blood pressure in this occupation is wrong We cannot be sure which of these alternatives is correct, but because the probability that (1)is the correct conclusion is to small, we are obliged to conclude (2)Thus we conclude that it is likely that there is an increase is systolic blood pressure among men in this occupation and the probability P that we are wrong is less than 0.05. We write this as p <0.05.This type of argument is called a significance test.

15 LULU E. BUDIMAN TEST STATISTIC PROVED !!

16 LULU E. BUDIMAN

17 From formula : 95 % confidence interval for  x  1.96  /  n or equivalently if : x -  Z = ----------    n is numerically greater than 1.96 we say the difference betwen x and  is significant at the 5 % level and we write p <0.05. If the Z is greater than 2.58 the difference is significant at the 1 % level and we write p <0.01.

18 From formula : 95 % confidence interval for  :  = p  1.96 *   (1-  )/n or equivalently if : (p -  ) Z = ----------------   (1-  )/n If Z 0.05. If the Z >1.96 the difference is significant at the 5 % level and we write p 2.58 the difference is highly significant (p < 0.01).

19 LULU E. BUDIMAN Mean, , is unknown PopulationRandom Sample I am 95% confident that  is between 40 & 60. Mean X = 50 Estimation Process Sample

20 LULU E. BUDIMAN Interpretation of a Confidence Interval (1 - α) x 100% of the confidence intervals –Constructed from different samples will actually contain the population mean. –The probability that you obtain a confidence interval that contains the population mean. Often it is more useful to quote two limits between which the parameter is expected to lie, together with the probability of it lying in that range. The limits are called the confidence limits and the interval between them the confidence interval. e.g. We are 95% confident that the mean male height lies between 158 cm and 175 cm.

21 LULU E. BUDIMAN The width of the confidence interval depends on three sensible factors:  the degree of confidence we wish to have in it, the chance of it including the 'truth', e.g. 95%;  the size of the sample, n;  the amount of variation among the members of the sample, i.e. its standard deviation, s.

22 LULU E. BUDIMAN P-value The P-value is the probability of observing a sample statistic as extreme as the test statistic, assuming the null hypothesis is true. Interpret the results: If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting the null hypothesis when the P-value is less than the significance level.

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39 Conclusions in hypothesis testing * Always test the null hypothesis - Reject the H 0 - Fail to reject the H 0

40

41

42

43

44

45

46 Hypothesis test of a population mean, . The variable X is normally distributed in the population with mean  and variance  2. Two situations are considered : (1)  2 known (from previous experience) (2)  2 unknown. 1.  2 known To a test of any parameter which is estimated by a statistic whose sampling distribution is normal. The procedure is : a. Specify H 0 :  =  0, where  0 is a particular value. b. Specify H 1 :    0, say. c. Select a random sample of observations, x 1, x 2,..., x n d. Compute from a sample x =  x i / n

47 e. Consider the test statistic ( x -  0 ) Z = -------------------- (  /  n) f. Determine the critical region from tables of the standard normal distribution (see table 1). Since the specification of H 1 has no direction, the critical region consists of both tails of the distribution. Thus, for a two-tailed test at 2  level of significance, reject H 0 if | Z | > Z (  ) { i.e. If Z > Z (  ) or Z 2 . In particular, if 2  = 0.05--> Z (  ) = 1.96. If 2  = 0.01--> Z (  ) = 2.58.

48 LULU E. BUDIMAN Standard Normal Distribution Table

49 2.  2 unknown a. Consider the test statistic (x -  0 ) T = -------------------- (s /  n) where s = the sample estimator of . T has a t-distribution on n-1 degrees of freedom. b. Determine the critical region from tables of the t- distribution (Table 2). From a two tailed test at the 2  level of significance, reject H 0 if : | T| > t n - 1 (  ) {i.e. T > t n - 1 (  ) or T 2 

50 LULU E. BUDIMAN Example : The sleeping time from the nine observations are 25; 31; 24; 28; 29; 30;31; 33 and 35 min. From these we wish to test at  = 0.05. H 0 :  = 26 versus H 1 :   26 Suppose that the population variance is unknown and must be estimated from the sample. We assume the nine observations are from a normal population.

51 LULU E. BUDIMAN From these data, we compute x = 29.56 s 2 = 12.53, s = 3.539. From table 2 (appendix), t 0.975 (8) = 2.306, and we reject H 0 if the computed T exceeds 2.306. The computed T is T = (29.56 - 26) / (3.539/  9) = 3.02 Which exceeds 2.306; thus we reject H 0 at the 0.05 significance level.

52 LULU E. BUDIMAN

53 Hypothesis test of a population proportion,  The procedure is to : a. Specify H 0 :  =  0, where  0 is a particular value. b. Specify H 1 :    0, say. c. Select a random sample of n individuals and determine the number x, of them with the characteristic. d. Compute from a sample p =x / n e. Consider the test statistic (p -  0 ) Z = --------------------  (  0 (1 -  0 ) / n) This test statistic has a standard normal distribution. f. Determine the critical region from tables of the standard normal distribution. For a two-tailed test at 2  level of significance, reject H 0 if | Z | > Z (  ) { i.e. If Z > Z (  ) or Z 2 .

54 LULU E. BUDIMAN PROBLEMS 1.The mean level of prothrombin in the normal population is known to be 20 mg/100 ml of plasma and standard deviation is 4 mg/100 ml. A sample of 40 patients showing vitamin K deficiency has a mean prothrombin level of 18.5 mg/100 ml. How reasonable is it to conclude that the true mean for patients with vitamin K deficiency is the same as that for the normal population ? 2.The height of adults living in suburban area of a large city has a mean equal to 160 cm, with standard deviation 7.5 cm. In a sample of 178 adults living in the inner city area, the mean height is found to be 156 cm. Assuming the same standard deviation for the two groups, are the mean heights significantly different ?

55 LULU E. BUDIMAN 3.A program to stop smoking expects to obtain a 75 % success rate. The observed number of definitive cessations in a group of 100 adult attending the program is 80. Is this sufficient evidence to conclude that the success rate has increased ? 4.From population mortality data, suppose that 4 % of males age 65 die within one year. If it is found that 60 of such males in a group of 1000 die within a year, is this evidence of an increase in mortality in this sample ? LULU E. BUDIMAN

56


Download ppt "EPIDEMIOLOGY AND BIOSTATISTICS DEPT. 2011 Esimating Population Value with Hypothesis Testing."

Similar presentations


Ads by Google