# Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.

## Presentation on theme: "Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test."— Presentation transcript:

Hypothesis Testing

To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test statistic into two parts The Acceptance Region The Critical Region

To perform a statistical Test we 1.Collect the data. 2.Compute the value of the test statistic. 3.Make the Decision: If the value of the test statistic is in the Acceptance Region we decide to accept H 0. If the value of the test statistic is in the Critical Region we decide to reject H 0.

The z-test for Proportions Testing the probability of success in a binomial experiment

Situation A success-failure experiment has been repeated n times The probability of success p is unknown. We want to test –H 0 : p = p 0 (some specified value of p) Against –H A :

The Test Statistic The Acceptance and Critical Region Accept H 0 if: Reject H 0 if: Two-tailed critical region

The Acceptance and Critical Region Accept H 0 if: Reject H 0 if: One-tailed critical regions These are used when the alternative hypothesis (H A ) is one-sided Accept H 0 if: Reject H 0 if:

The Acceptance and Critical Region Accept H 0 if:, Reject H 0 if: One-tailed critical regions

The Acceptance and Critical Region Accept H 0 if:, Reject H 0 if: One-tailed critical regions

Comments Whether you use a one-tailed or a two-tailed tests is determined by the choice of the alternative hypothesis H A The alternative hypothesis, H A, is usually the research hypothesis. The hypothesis that the researcher is trying to “prove”.

Examples 1.A person wants to determine if a coin should be accepted as being fair. Let p be the probability that a head is tossed. One is trying to determine if there is a difference (positive or negative) with the fair value of p.

2.A researcher is interested in determining if a new procedure is an improvement over the old procedure. The probability of success for the old procedure is p 0 (known). The probability of success for the new procedure is p (unknown). One is trying to determine if the new procedure is better (i.e. p > p 0 ).

2.A researcher is interested in determining if a new procedure is no longer worth considering. The probability of success for the old procedure is p 0 (known). The probability of success for the new procedure is p (unknown). One is trying to determine if the new procedure is definitely worse than the one presently being used (i.e. p < p 0 ).

The z-test for the Mean of a Normal Population We want to test, , denote the mean of a normal population

The Situation Let x 1, x 2, x 3, …, x n denote a sample from a normal population with mean  and standard deviation . Let we want to test if the mean, , is equal to some given value  0. Obviously if the sample mean is close to  0 the Null Hypothesis should be accepted otherwise the null Hypothesis should be rejected.

The Test Statistic

The Acceptance and Critical Region This depends on H 0 and H A Accept H 0 if: Reject H 0 if: Two-tailed critical region Accept H 0 if: Reject H 0 if: One-tailed critical regions Accept H 0 if: Reject H 0 if:

Example A manufacturer Glucosamine capsules claims that each capsule contains on the average: 500 mg of glucosamine To test this claim n = 40 capsules were selected and amount of glucosamine (X) measured in each capsule. Summary statistics:

We want to test: Manufacturers claim is correct against Manufacturers claim is not correct

The Test Statistic

The Critical Region and Acceptance Region Using  = 0.05 We accept H 0 if -1.960 ≤ z ≤ 1.960 z  /2 = z 0.025 = 1.960 reject H 0 if z 1.960

The Decision Since z= -2.75 < -1.960 We reject H 0 Conclude: the manufacturers’s claim is incorrect:

“Students” t-test

Recall: The z-test for means The Test Statistic

Comments The sampling distribution of this statistic is the standard Normal distribution The replacement of  by s leaves this distribution unchanged only the sample size n is large.

For small sample sizes: The sampling distribution of Is called “students” t distribution with n –1 degrees of freedom

Properties of Student’s t distribution Similar to Standard normal distribution –Symmetric –unimodal –Centred at zero Larger spread about zero. –The reason for this is the increased variability introduced by replacing  by s. As the sample size increases (degrees of freedom increases) the t distribution approaches the standard normal distribution

t distribution standard normal distribution

The Situation Let x 1, x 2, x 3, …, x n denote a sample from a normal population with mean  and standard deviation . Both  and  are unknown. Let we want to test if the mean, , is equal to some given value  0.

The Test Statistic The sampling distribution of the test statistic is the t distribution with n-1 degrees of freedom

The Alternative Hypothesis H A The Critical Region t  and t  /2 are critical values under the t distribution with n – 1 degrees of freedom

Critical values for the t-distribution  or  /2

Critical values for the t-distribution are provided in tables. A link to these tables are given with today’s lecture

Look up df Look up 

Note: the values tabled for df = ∞ are the same values for the standard normal distribution

Example Let x 1, x 2, x 3, x 4, x 5, x 6 denote weight loss from a new diet for n = 6 cases. Assume that x 1, x 2, x 3, x 4, x 5, x 6 is a sample from a normal population with mean  and standard deviation . Both  and  are unknown. we want to test: versus New diet is not effective New diet is effective

The Test Statistic The Critical region: Reject if

The Data The summary statistics:

The Test Statistic The Critical Region (using  = 0.05) Reject if Conclusion: Accept H 0 :

Confidence Intervals

Confidence Intervals for the mean of a Normal Population, m, using the Standard Normal distribution Confidence Intervals for the mean of a Normal Population, m, using the t distribution

The Data The summary statistics:

Example Let x 1, x 2, x 3, x 4, x 5, x 6 denote weight loss from a new diet for n = 6 cases. The Data: The summary statistics:

Confidence Intervals (use  = 0.05)

Comparing Populations Proportions and means

Sums, Differences, Combinations of R.V.’s A linear combination of random variables, X, Y,... is a combination of the form: L = aX + bY + … where a, b, etc. are numbers – positive or negative. Most common: Sum = X + YDifference = X – Y Simple Linear combination of X, bX + a

Means of Linear Combinations The mean of L is: Mean(L) = a Mean(X) + b Mean(Y) + … Most common: Mean( X + Y) = Mean(X) + Mean(Y) Mean(X – Y) = Mean(X) – Mean(Y) Mean(bX + a) = bMean(X) + a IfL = aX + bY + …

Variances of Linear Combinations If X, Y,... are independent random variables and L = aX + bY + … then Variance(L) = a 2 Variance(X) + b 2 Variance(Y) + … Most common: Variance( X + Y) = Variance(X) + Variance(Y) Variance(X – Y) = Variance(X) + Variance(Y) Variance(bX + a) = b 2 Variance(X)

If X, Y,... are independent normal random variables, then L = aX + bY + … is normally distributed. In particular: X + Y is normal with X – Y is normal with Combining Independent Normal Random Variables

Comparing proportions Situation We have two populations (1 and 2) Let p 1 denote the probability (proportion) of “success” in population 1. Let p 2 denote the probability (proportion) of “success” in population 2. Objective is to compare the two population proportions

We want to test either: or

The test statistic:

Where: A sample of n 1 is selected from population 1 resulting in x 1 successes A sample of n 2 is selected from population 2 resulting in x 2 successes

Logic:

The Alternative Hypothesis H A The Critical Region

Example In a national study to determine if there was an increase in mortality due to pipe smoking, a random sample of n 1 = 1067 male nonsmoking pensioners were observed for a five-year period. In addition a sample of n 2 = 402 male pensioners who had smoked a pipe for more than six years were observed for the same five-year period. At the end of the five-year period, x 1 = 117 of the nonsmoking pensioners had died while x 2 = 54 of the pipe-smoking pensioners had died. Is there a the mortality rate for pipe smokers higher than that for non-smokers

We want to test:

The test statistic:

Note:

The test statistic:

We reject H 0 if: Not true hence we accept H 0. Conclusion: There is not a significant (  = 0.05) increase in the mortality rate due to pipe-smoking

Estimating a difference proportions using confidence intervals Situation We have two populations (1 and 2) Let p 1 denote the probability (proportion) of “success” in population 1. Let p 2 denote the probability (proportion) of “success” in population 2. Objective is to estimate the difference in the two population proportions  = p 1 – p 2.

Confidence Interval for  = p 1 – p 2 100P% = 100(1 –  ) % :

Example Estimating the increase in the mortality rate for pipe smokers higher over that for non- smokers  = p 2 – p 1

Comparing Means Situation We have two normal populations (1 and 2) Let  1 and  1 denote the mean and standard deviation of population 1. Let  2 and  2 denote the mean and standard deviation of population 1. Let x 1, x 2, x 3, …, x n denote a sample from a normal population 1. Let y 1, y 2, y 3, …, y m denote a sample from a normal population 2. Objective is to compare the two population means

We want to test either: or

Consider the test statistic:

If: will have a standard Normal distribution This will also be true for the approximation (obtained by replacing  1 by s x and  2 by s y ) if the sample sizes n and m are large (greater than 30)

Note:

The Alternative Hypothesis H A The Critical Region

Example A study was interested in determining if an exercise program had some effect on reduction of Blood Pressure in subjects with abnormally high blood pressure. For this purpose a sample of n = 500 patients with abnormally high blood pressure were required to adhere to the exercise regime. A second sample m = 400 of patients with abnormally high blood pressure were not required to adhere to the exercise regime. After a period of one year the reduction in blood pressure was measured for each patient in the study.

We want to test: The exercize group did not have a higher average reduction in blood pressure The exercize group did have a higher average reduction in blood pressure vs

The test statistic:

Suppose the data has been collected and:

The test statistic:

We reject H 0 if: True hence we reject H 0. Conclusion: There is a significant (  = 0.05) effect due to the exercise regime on the reduction in Blood pressure

Estimating a difference means using confidence intervals Situation We have two populations (1 and 2) Let  1 denote the mean of population 1. Let  2 denote the mean of population 2. Objective is to estimate the difference in the two population proportions  =  1 –  2.

Confidence Interval for  =  1 –  2

Example Estimating the increase in the average reduction in Blood pressure due to the excercize regime  =  1 –  2

Comparing Means – small samples Situation We have two normal populations (1 and 2) Let  1 and  1 denote the mean and standard deviation of population 1. Let  2 and  2 denote the mean and standard deviation of population 1. Let x 1, x 2, x 3, …, x n denote a sample from a normal population 1. Let y 1, y 2, y 3, …, y m denote a sample from a normal population 2. Objective is to compare the two population means

We want to test either: or

Consider the test statistic:

If the sample sizes (m and n) are large the statistic will have approximately a standard normal distribution This will not be the case if sample sizes (m and n) are small

The t test – for comparing means – small samples Situation We have two normal populations (1 and 2) Let  1 and  denote the mean and standard deviation of population 1. Let  2 and  denote the mean and standard deviation of population 1. Note: we assume that the standard deviation for each population is the same.  1 =  2 = 

Let

The pooled estimate of . Note: both s x and s y are estimators of . These can be combined to form a single estimator of , s Pooled.

The test statistic: If  1 =  2 this statistic has a t distribution with n + m –2 degrees of freedom

The Alternative Hypothesis H A The Critical Region are critical points under the t distribution with degrees of freedom n + m –2.

Example A study was interested in determining if administration of a drug reduces cancerous tumor size. For this purpose n +m = 9 test animals are implanted with a cancerous tumor. n = 3 are selected at random and administered the drug. The remaining m = 6 are left untreated. Final tumour sizes are measured at the end of the test period

We want to test: The treated group did not have a lower average final tumour size. The exercize group did have a lower average final tumour size. vs

The test statistic:

Suppose the data has been collected and:

The test statistic:

We reject H 0 if: Hence we accept H 0. Conclusion: The drug treatment does not result in a significant (  = 0.05) smaller final tumour size, with d.f. = n + m – 2 = 7

Download ppt "Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test."

Similar presentations