Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.

Similar presentations


Presentation on theme: "1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal."— Presentation transcript:

1 1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.

2 2 You may recall from several chapters back the idea of the variance and the associated idea called standard deviation. Each of these concepts is a measure of how spread out are the values of a quantitative variable. On a variable that has a normal distribution, the square root of the variance, the standard deviation, helped in making probability statements about certain ranges of values of the variable. For what we want to do next we will focus more on the variance than on the standard deviation. µ σ

3 3 When we do not know the population variance we saw that the sample variance was considered a good way to estimate the population variance. If the sample size is n and the sample mean is xbar (remember this is an x with a line or bar over it, but is a drag for me to type), then the sample variance is Σ(x i – xbar) 2 /(n-1). So, to get the sample variance you take each sample data value and subtract the sample mean and then square the deviation. You add up, or take the summation of, the squared deviation based on each point and divide the result by n-1.

4 4 Say you have two variables and one is quantitative and one is qualitative. When the qualitative variable has three or more categories we can do Analysis of Variance. The basic idea is that we test to see if the quantitative variable has the same mean for each value of the qualitative variable. As an example, say we have three majors. In each major there is a population mean gpa value. The null hypothesis is that the mean of each population is the same. The alternative is that the means are not all the same. An underlying assumption is that the variance of each population is the same. Operationally we will take a sample and have sample points of gpa from the various majors. We will work with both the sample means and the sample variances to test the null hypothesis and the test here is always a one-tailed test.

5 5 xbar1 xbar3 µ xbar3 Null hypothesis in general Ho: µ 1 = µ 2 = µ 3 = … = µ k Here we make some comments to help you get a feel for the test that will occur. If the null is true, then really we have only one large group and you see here each majors sample gpa is put on the number line and the population mean is just the value µ. Here is where variances come into play. We will look at variances derived from the samples and we will make a statement about how good the sample information is expected to be in terms of estimating the population variance.

6 6 xbar1 µ1 xbar3 µ3 µ2 xbar2 Under the alternative hypothesis we would expect each groups distribution to be located in a different place, and I show here group 2’s population mean being the farthest to the right. Note: on the last slide and this slide I have bell shaped curves and I am assuming the variance is the same on each curve. I do not know the value of the variance. There are several ways to estimate the variance and we will turn to that next, but I will want you to look at each graph from time to time.

7 7 The between group estimate of the variance Essentially the between group estimate of the variance uses the sample means from each group (you have at least identified groups as possibly being distinct) and compares these means to the overall sample mean (ignoring group status). Operationally from each sample mean the overall sample mean is subtracted off the result is squared and combined and the result is divided by the number of groups minus 1. Under the null hypothesis you see each sample mean under one bell curve and thus the samples means are in the same neighborhood. If the null is true that all groups have the same mean then the property of this between group estimate as an estimate of the population variance has been deemed “good” because we are using information “inside” the bounds of the one distribution. Under the alternative hypothesis of different group means you see each sample mean under a bell curve, but with three separate curves the values are spread out greater. For illustrative purposes I have xbar2 way to the right. As sample means are compared to an overall sample mean the between group estimate is deemed bad and in fact “too big” because we are using information from samples that are in different neighborhoods (having different means). So depending on which hypothesis is true this estimate is good or bad and we will exploit that in our test. But first we will discuss another idea.

8 8 The within group estimate of the variance Essentially the within group estimate uses the sample variance from each group, combines them and divides the result by the overall sample size minus the number of groups. Since the population variance is assumed the same in each group and since the within group estimate uses the sample variance within a group, this method of estimating the population variance is deemed good under either the null or alternative hypothesis. Another way to see this is that although under the alternative hypothesis the means are not equal, we expect the shape of each distribution to be the same and thus the samples variances within a given curve should be okay. So, this method of estimating the population variance is deemed “good” under either hypothesis.

9 9 F statistic There is a statistic called the F statistic that is formed in this context by taking the ratio of the between group estimate to the within group estimate. We would have F = (between group) divided by (within group). Under the null of equal means we have informally the ratio F = (good)/(good) and should then be 1. Under the alternative F = (too big)/(good) and should be greater than 1.

10 10 F The F distribution is skewed in such a way that it has a long tail on the right. Even when the null hypothesis is true (all group means are the same) it is possible due to sample variation that the sample F could actually be greater than 1. But the farther the F is from 1 the less likely values are to occur. When we pick an alpha value to control for a type I error, we pick a value of F from the table such that if the F calculated from a sample is greater than the tabled value (sometimes called he critical value) then we reject the null hypothesis and go with the null. Otherwise the null hypothesis is accepted.

11 11 Summary We test differences in group means by working with properties of estimates of the population variance. If the groups are the same we expect the F statistic, made up of a ratio of estimates of the population variance, to be fairly close to 1. So, when we get a sample F that is larger than a critical F we reject the null. Or, if the p-value for the F is less than alpha, we reject the null. Note: there is a large number of computations to be made in this section so we turn to Excel again to do our calculations. Note that F has two df’s – numerator and denominator degrees of freedom. The numerator degrees of freedom = k – 1, where k is number of groups. The denominator degrees of freedom = n T – k, where n T is the total number of observations, or n 1 + n 2 + … + n k

12 12 F table – starting on page 655 In the chapter there is an example with 3 groups and 18 total observations. The critical F has 2 and 15 degrees of freedom. So you go over 2 columns first and then down 15, which places us on page 656. With alpha =.05 the critical F is 3.68. If our F from the sample information is larger than this we reject the null and conclude at least one group mean is different from the others. Similarly, if our p-value from the sample information is less than alpha we reject the null. Fortunately, Excel gives us both pieces of information.


Download ppt "1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal."

Similar presentations


Ads by Google