Virtual University of Pakistan

Virtual University of Pakistan
Lecture No. 39 of the course on Statistics and Probability by Miss Saleha Naghmi Habibullah

IN THE LAST LECTURE, YOU LEARNT
Hypothesis-Testing regarding 1 - 2 (based on Z-statistic) Hypothesis Testing regarding p (based on Z-statistic)

Hypothesis Testing Regarding p1-p2 (based on Z-statistic)
TOPICS FOR TODAY Hypothesis Testing Regarding p1-p2 (based on Z-statistic) The Student’s t-distribution Confidence Interval for  based on the t-distribution

In the last lecture, we discussed hypothesis-testing regarding p, the proportion of successes in a binomial population.

Next, we consider the case when we are interested in testing the equality of two population proportions. We illustrate this situation with the help of the following example:

EXAMPLE A leading perfume company in a western country recently developed a new perfume which they plan to market under the name 'Fragrance'.

A number of comparison tests indicate that 'Fragrance' has very good market potential.
The Sales Departments of the company want to plan their strategy so as to reach and impress the largest possible segments of the buying public.

One of the questions is whether the perfume is preferred by younger or older women.
These are two independent populations, a population consisting of the younger women and a population consisting of the older women.

A standard scent test will be used where each sampled woman is asked to sniff several perfumes, one of which is 'Fragrance', and indicate the one that she likes best.

A total of 100 young women were selected at random, and each was given the standard scent test.
Twenty of the 100 young women chose 'Fragrance' as the perfume they liked best.

Two hundred older women were selected at random, and each was given the same standard scent test.
Of the 200 older women, 100 preferred 'Fragrance'.

Test the hypothesis that there is no difference between the proportions of younger and older women who prefer ‘Fragrance’.

SOLUTION We designate p1 as the proportion of younger women who prefer 'Fragrance' and p2 as the proportion of older women who prefer 'Fragrance'.

Hypothesis-Testing Procedure:
Step-1: H0 : p1 = p2 (i.e. p1-p2 = 0) (There is no difference between the proportions of young women and older women who prefer 'Fragrance’.) H1 : p1  p2 (i.e. p1-p2  0) (The two proportions are not equal.)

Step-2: Level of Significance  = 0.05.

Step-3: Test Statistic

where the combined or pooled proportion, is given by:

This can also be written as
which means that is the weighted mean of n1 and n2 acting as the weights.

Important Note: In this example, as the hypothesized value of p1 - p2 is equal to zero, therefore both are estimating the common population proportion p. Hence, we use the pooled proportion of the two samples to estimate p.

(The rationale is that the pooled estimator
(The rationale is that the pooled estimator is a better estimator of the common population proportion p (as compared with ), as it is based on n1 + n2 observations (i.e. based on a greater amount of information).

Step-4: Calculations: X1 is the number of preferring 'Fragrance' = 20.
n1 is the number is the sample = 100. X 20 = = = p ˆ 1 . 20 1 n 100 1

X2 is the number of preferring 'Fragrance' = 100.
n2 is the number is the sample = 200. X 100 = 2 = = p ˆ . 50 2 n 200 2

Now, the pooled or weighted proportion, is computed as follows:
20 + 100 120 = = = . 40 100 + 200 300

Computation: . 20 - . 50 = æ 1 1 ö ( ) ( ) . 40 . 60 ç + ÷ è 100 200 ø - . 30 = = - 5 . 00 . 06

Step-5: Critical Region
Since H1 does not state any direction (such as p1 < p2), the test is two-tailed. Thus, the critical values for the .05 level are –1.96 and

Two-Tailed Test, Areas of Rejection and Non-rejection,
Two-Tailed Test, Areas of Rejection and Non-rejection, .05 Level of Significance: Z .95 -1.96 .025 1.96 H0 is not rejected H0 is rejected

Step-6: Conclusion: The computed z of –5.00 is in the area of rejection, that is, to the left of –1.96. Therefore, the null hypothesis is rejected at the .05 level of significance.

In other words, we conclude that the proportion of young women in the population who prefer 'Fragrance' is not equal to the proportion of older women in the population who prefer 'Fragrance'.

(The difference between the two sample proportion i. e
(The difference between the two sample proportion i.e is so large that it is highly unlikely that such a large difference could be due to chance (i.e. attributable to sampling fluctuations).)

In fact, the value z = -5. 00 is even larger than -2
In fact, the value z = is even larger than -2.58, the critical value lying on the left tail of the sampling distribution if  = 0.01. As such, we can say that our statistic is highly significant.

(In such a situation, the statistic is said to be highly significant because of the fact that we are allowing as small a risk of committing Type-I error as 1%.)

Now, consider another situation:
Suppose that the computed value of our test-statistic comes out to be such that it falls between and In such a situation, we will reject H0 at the 5% level of significance, but we cannot reject H0 at the 1% level.

This means that, if we are willing to allow as much as 5% risk of committing type I error, then we say that we are going to reject H0. But if we are willing to allow only 1% risk of committing type I error, then we conclude that the sample does not provide sufficient evidence to reject H0.

Going back to the example of the perfume, obviously, the company would be interested in determining, which category of women prefers this perfume in greater numbers than the other? The data clearly indicates that the proportion of women who prefer this particular perfume is higher in the population of older women. (This is the reason why the computed value of our test-statistic has come out to be negative.)

Let us consolidate the above ideas by considering another example:

EXAMPLE A candidate for mayor in a large city believes that he appeals to at least 10 per cent more of the educated voters than the uneducated voters. He hires the services of a poll-taking organization, and they find that 62 of 100 educated voters interviewed support the candidate, and 69 of 150 uneducated voters support him. At the 0.05 significance level, is the hypothesis accepted or rejected?

Step-1: The null and alternative hypothesis are
H0 : p1 – p2 > 0.10, and H1 : p1 – p2 < 0.10, where p1 = proportion of educated voters, and p2 = proportion of uneducated voters.

Step-2: Level of Significance:  = 0.05.

Step-3: Test Statistic:
which for large sample sizes, is approximately standard normal.

Important Note: In this example, as the hypothesized value of p1 - p2 is not equal to zero, therefore are note estimating the same quantity, and, as such, we do not use in the formula of the test statistic.

Step-4: Computation:

Step-5: Critical Region:
As this is a one-tailed test, therefore the critical region is given by Z < -z0.05 =

Step-6: Conclusion: Since the calculated value z = 0.95 does not fall in the critical region, so we accept the null hypothesis H0 : p1 – p2 > 0.10. The data seems to support the candidate’s view.

Until now, we have discussed in considerable detail interval estimation and hypothesis-testing based on the standard normal distribution and the Z-statistic. Next, we begin the discussion of interval estimation hypothesis-testing based on the t-distribution:

We begin by presenting the formal definition of the t-distribution and stating some of its main properties:

The Student’s t-Distribution
The mathematical equation of the t-distribution is as follows: This distribution has only one parameter , which is known as the degrees of freedom of the t-distribution.

Properties of Student’s t-distribution:
The t-distribution has the following properties: The t-distribution is bell-shaped and symmetric about the value t = 0, ranging from –  to .

The number of degrees of freedom
determines the shape of the t-distribution. Thus there is a different t-distribution for each number of degrees of freedom. As such, it is a whole family of distributions.

The t-distribution, for small values of , is flatter than the standard normal distribution which means that the t-distribution is more spread out in the tails than is the standard normal distribution.

Standard Normal Distribution
t-distribution 3 degrees of freedom

As the degrees of freedom increase, the t-distribution becomes narrower and narrower, until, as n tends to infinity, it tends to coincide with the standard normal distribution. (The t-distribution can never become narrower than the standard normal distribution.)

The t-distribution has a mean of zero,
when   2. (The mean does not exist when  = 1.) iv) The median of the t-distribution is also equal to zero.

v) The t-distribution is unimodal.
The density of the distribution reaches its maximum at t = 0 and thus the mode of the t-distribution is t = 0.

(The students will recall that, for any hump-shaped symmetric distribution, the mean, median and mode are equal.)

vi ) The variance of the t-distribution is given by for n > 2. greater than It is always 1, the variance of the standard normal distribution . (This indicates that the t-distribution is more spread out than the standard normal distribution.) For n 2, the variance does not exist.

Next, we discuss the application of the t-distribution in statistical inference --- those situations where we need to carry out interval estimation and hypothesis - testing on the basis of the t-distribution. (Situations where the t-distribution is the appropriate sampling distribution ).

With reference to interval estimation and hypothesis-testing about , it has been mathematically proved that, if the population from which the sample has been drawn is normally distributed, the population variance is unknown, and the sample size is small (less than 30), then the statistic follows the t-distribution having n-1 degrees of freedom. where

First, we discuss the construction of a Confidence Interval for  based on the t-distribution with the help of an example:

EXAMPLE The masses, in grams, of thirteen ball bearings seen at random from a batch are 21.4, 23.1, 25.9, 24.7, 23.4, 24.5, 25.0, 22.5, 26.9, 26.4, 25.8, 23.2, 21.9

Calculate a 95% confidence interval for the mean mass of the population, supposed normal, from which these masses were drawn.

SOLUTION The 95% confidence interval for the mean mass of the population , is given by

(The derivation of the above confidence interval is very similar to that of the confidence interval for  based on the Z-statistic.)

Now, in this problem, the sample meanX and s come out to be:
21 . 24 13 7 314 n X = å ( ) ú û ù ê ë é - = å n X 1 s 2 [ ] 77 . 1 12 3 43 37 16 7618 59 7655 = -

The question is: ‘How do we find
For this purpose, we will need to consult the table of areas under the t-distribution:

Table of areas under the t-distribution:
Continued

The above table is an abridged version of the table by Fisher and Yates, and the entries in this table are values of t,() for which the area to their right under the t-distribution with  degrees of freedom is equal to , as shown below:

 t

Now, in this problem, since n – 1 = 12, and the desired level of confidence is 95%, therefore, the right-tail area is 2½%, and, hence, (using the t-table) we obtain t0.025(12) = 2.179

Substituting these values, we obtain the
95% confidence interval for m as follows: ÷ ø ö ç è æ 13 77 . 1 179 2 21 24 or 24.21 2.179 (0.49) or 24.21 1.07 or to 25.28

Hence, the 95% confidence interval for the mean mass of the ball bearings calculated from the given sample is (23.1, 25.3) grams.

Hypothesis Testing Regarding p1-p2 (based on Z-statistic)
IN TODAY’S LECTURE, YOU LEARNT Hypothesis Testing Regarding p1-p2 (based on Z-statistic) The Student’s t-distribution Confidence Interval for  based on the t-distribution

The Chi-square Distribution
IN THE NEXT LECTURE, YOU WILL LEARN Tests and Confidence Intervals based on the t-distribution The Chi-square Distribution Hypothesis Testing and Interval Estimation Regarding a Population Variance (based on Chi-square Distribution)

Virtual University of Pakistan

Similar presentations

Presentation on theme: "Virtual University of Pakistan"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Virtual University of Pakistan

Similar presentations

Presentation on theme: "Virtual University of Pakistan"— Presentation transcript:

Similar presentations

About project

Feedback