Presentation on theme: "1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION."— Presentation transcript:
1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
2 Between subjects experiments The caffeine experiment was of between subjects design, that is, each participant was tested under only one condition. Participants were RANDOMLY ASSIGNED to the conditions, so that there was no basis on which the data could be paired. Between subjects experiments result in INDEPENDENT SAMPLES of data.
3 More than two conditions In more complex experiments, there may be three or more conditions. For example, we could compare the performance of groups of participants who have ingested four different supposedly performance-enhancing drugs with that of a control or placebo group.
4 Factors In the context of analysis of variance (ANOVA), a FACTOR is a set of related treatments, conditions or categories. The ANOVA term factor is a synonym for the term independent variable.
5 One-factor experiments In the drug experiment, there is just ONE set of (drug-related) conditions. The experiment therefore has ONE treatment factor. The conditions making up a factor are known as its LEVELS. In the drug experiment, the treatment factor has 5 levels.
6 Results of the experiment raw scores grand mean
7 Statistics of the results group (cell) means group (cell) standard deviations Group (cell) variances
8 The null hypothesis The null hypothesis states that, in the population, all the means have the same value. We cannot test this hypothesis with the t statistic.
9 The alternative hypothesis The alternative hypothesis is that, in the population, the means do NOT all have the same value. MANY POSSIBILITIES are implied by H 1.
10 The One-way ANOVA The ANOVA of a one-factor between groups experiment is also known as the ONE-WAY ANOVA. The one-way ANOVA must be sharply distinguished from the one-factor WITHIN SUBJECTS (or REPEATED MEASURES) ANOVA, which is appropriate when participants are tested at every level of the treatment factor. The between subjects and within subjects ANOVA are based upon different statistical models.
11 There are some large differences among the five treatment means, suggesting that the null hypothesis is false.
12 Mean square (MS) In ANOVA, the numerator of a variance estimate is known as a SUM OF SQUARES (SS). The denominator is known as the DEGREES OF FREEDOM (df). The variance estimate itself is known as a MEAN SQUARE (MS), so that MS = SS/df.
13 Accounting for variability The building block for any variance estimate is a DEVIATION of some sort. The TOTAL DEVIATION of any score from the grand mean (GM) can be divided into 2 components: 1. a BETWEEN GROUPS component; 2. a WITHIN GROUPS component. total deviation between groups deviation within groups deviation grand mean
14 Example of the breakdown The score, the group mean and the grand mean have been ringed in the table. This breakdown holds for each of the fifty scores in the data set. score grand mean group mean
15 Breakdown (partition) of the total sum of squares If you sum the squares of the deviations over all 50 scores, you obtain an expression which breaks down the total variability in the scores into between groups and within groups components.
16 How ANOVA works The variability BETWEEN the treatment means is compared with the average spread of scores around their means WITHIN the treatment groups. The comparison is made with a statistic called the F-RATIO.
17 The variances of the scores in each group around their group mean are averaged to obtain a WITHIN GROUPS MEAN SQUARE
18 From the values of the five treatment means, a BETWEEN GROUPS MEAN SQUARE is calculated.
19 The statistic F is calculated by dividing the between groups MS by the within groups MS thus
20 The F ratio
21 The value of the MS between, since it is calculated from the MEANS, reflects random error, plus any real differences among the population means that there may be.
22 The value of MS within, since it is calculated only from the variances of the scores within groups and ignores the values of the group means, reflects ONLY RANDOM ERROR.
23 What F is measuring If there are differences among the population means, the numerator will be inflated and F will increase. If there are no differences, F will be close to 1. error + real differences error only
24 Expectations If the null hypothesis is true, the values of MS between and MS within will be similar, because both variance estimates merely reflect individual differences and random variation or ERROR. If so, the value of F will be around 1. If the null hypothesis is false, real differences among the population means will inflate the value of MS between but the value of MS within will be unaffected. The result will be a LARGE value of F.
25 Range of variation of F The F statistic is the ratio of two sample variances. A variance can take only non-negative values. So the lower limit for F is zero. There is no upper limit for F.
26 Imagine… Suppose the null hypothesis is true. Imagine the experiment were to be repeated thousands and thousands of times, with fresh samples of participants each time. There would be thousands and thousands of data sets, from each of which a value of F could be calculated.
27 Sampling distribution To test the null hypothesis, you must be able to locate YOUR value of F in the population or PROBABILITY DISTRIBUTION of such values. The probability distribution of a statistic is known as its SAMPLING DISTRIBUTION. To specify a sampling distribution, you must assign values to properties known as PARAMETERS.
28 Parameters of F Recall that the t distribution has ONE parameter: the DEGREES OF FREEDOM (df ). The F distribution has TWO parameters: the degrees of freedom of the between groups and within groups mean squares, which we shall denote by df between and df within, respectively.
29 Rule for finding the degrees of freedom Theres a useful rule for finding the degrees of freedom of a statistic. Take the number of independent observations and subtract the number of parameters estimated. The sample variance of n scores is based upon n independent observations. But to obtain the deviations, we need an estimate of ONE parameter, namely, the mean. So the degrees of freedom of the sample variance is n – 1, not n.
30 Rule for obtaining the df
31 Degrees of freedom of the two mean squares The degrees of freedom of MS between is the number of treatment groups minus 1. (One parameter estimated: the grand mean.) The degrees of freedom of MS within is the total number of scores minus the number of treatment groups. (Five parameters are estimated: the five group means.)
32 The correct F distribution We shall specify an F distribution with the notation F(df between, df within ). We have seen that in our example, df between = 4 and df within = 45. The correct F distribution for our test of the null hypothesis is therefore F(4, 45).
33 The distribution of F(1, 45) F distributions are POSITIVELY SKEWED, i.e., they have a long tail to the right. However, the shape of F varies quite markedly with the values of the df.
34 The distribution of F(4, 45)
35 Distribution of F(4, 45) The critical region is in the upper tail of this F distribution. If we set the significance level at.05, the value of F must be at least 2.6. The value 2.58 is the 95 th Percentile of the distribution F(4, 45).
36 The F distribution An F distribution is asymmetric, with an infinitely long tail to the right. The critical region lies above the 95 th percentile which, in this F distribution, is F 95 th percentile = F(df between, df within ) = F(4, 45)
37 The ANOVA summary table F large, nine times larger than unity, the expected value from the null hypothesis and well over the critical value The p-value (Sig.) <.01. So F is significant beyond the.01 level. Write this result as follows: with an alpha-level of.05, F is significant: F(4, 45) = 9.09; p <.01. Do NOT write the p-value as.000! Notice that SS total = SS between groups + SS within groups
38 SPSS advice A few general points. Give close attention to the labels you give to your variables, and to the appearance of your data. Unnecessary decimal places clutter the display. It is particularly important to assign VALUE LABELS to the code numbers you choose for any grouping variables. Specify also the LEVEL OF MEASUREMENT of each variable.
39 Start in Variable View Work in Variable View first, amending the settings so that when you enter Data View, your variables are already labelled, the scores appear without unnecessary decimals and you will have the option of displaying the value labels of your grouping variable.
40 Graphics The latest SPSS graphics require you to specify the level of measurement of the data on each variable. The group code numbers are at the NOMINAL level of measurement, because they are merely CATEGORY LABELS. Make the appropriate entry in the Measure column.
41 Grouping variables To instruct SPSS to analyse data from between subjects experiments, you must construct a GROUPING VARIABLE consisting of code numbers identifying the treatment condition under which a score was achieved. So we could set 1 = Placebo, 2 = Drug A, 3 = Drug B, 4 = Drug C, and 5 = Drug D.
42 Data View This is what Data View will look like. The entry of data for an ANOVA on SPSS is similar to the procedure we followed when making an independent-samples t-test. On the right, the VALUE LABELS are displayed, instead of the values themselves. (This option appears in the Data menu.)
43 Assignment of values in Variable View
44 Variable View completed Note the setting of Decimals so that only whole numbers will appear in Data View. Note the informative variable LABELS, which will appear in the output. Note the VALUE LABELS giving the key to the code numbers you have chosen for your grouping variable. (The values themselves are the code numbers you have chosen.)
46 The One-Way ANOVA dialog box
47 More statistics By clicking Options, you can order more statistics than would normally appear in the ANOVA output. Click the Descriptive button to order the extra statistics and then Continue, to return to the ANOVA dialog box.
48 A word of warning Modern computing packages such as SPSS afford a bewildering variety of attractive graphs and displays to help you bring out the most important features of your results. You should certainly use them. But there are pitfalls awaiting the unwary. Suppose the drug experiment had turned out rather differently. The researcher proceeds as follows.
49 Ordering a means plot
50 A picture of the results
51 The picture is false! The table of means shows miniscule differences among the five group means. The value of F is very small indeed. The p-value of F is very high – unity to two places of decimals. The experiment has failed to show that any of the drugs works.
52 A small scale view Only a microscopically small section of the scale is shown on the vertical axis. This greatly magnifies even small differences among the group means.
53 Putting things right Double-click on the image to get into the Graph Editor. Double-click on the vertical axis to access the scale specifications. Click here
54 Putting things right … Uncheck the minimum value box and enter zero as the desired minimum point. Click Apply. Amend entry
55 The true picture!
56 The true picture … The effect is dramatic. The profile now reflects the true situation. Always be suspicious of graphs that do not show the complete vertical scale.
57 Summary In the one-way ANOVA, we compare two variance estimates, MS between and MS within by means of their ratio, which is called the F statistic. If F is large, we conclude that there is at least one significant difference somewhere among the array of treatment means.