Presentation on theme: "This afternoon’s programme"— Presentation transcript:
1This afternoon’s programme 2.05 – A short talk.3.00 – 3.20 A break for coffee.3.20 – Running tests with PASW Statistics 17.
2SESSION 2 Further topics in the analysis of variance
3Only the starting-point In Monday’s session, I revised the one-way ANOVA.We saw that merely obtaining a significant F and therefore rejecting the null hypothesis of equality of the means leaves many questions unanswered.Therefore, the ANOVA is just the first step in the complete analysis of a set of data.
4Does ‘significant’ mean ‘substantial’? The F test produced a significant result.The null hypothesis of equality of the five treatment means must be rejected.With large numbers of observations, however, a statistical test can have too much POWER to reject the null hypothesis. Even tiny differences among the means will result in a significant F, with a miniscule p-value.Modern Internet studies can yield millions of observations, so the possibility of having too much data is no longer remote.‘Significant’ does not necessarily mean ‘substantial’.
6Breakdown (or partition) of the total sum of squares
7Eta squaredThe oldest measure of effect size is suggested by the partition of the total sum of squares.In this measure, the between groups sum of squares is expressed as a PROPORTION of the total sum of squares.
9Range of eta squaredTheoretically, eta squared can take values between zero (no differences among the means) and unity (the scores in any group all have the same value).In practice, its values will always lie somewhere between these limits.
10Why is eta called the ‘correlation ratio’? Suppose that opposite each of the 50 scores in the one-way drug experiment, we were to place its group mean.The correlation between the column of scores and the column of means gives the value of eta.Let’s demonstrate this.
11The Aggregate procedure In SPSS/PASW, the Aggregate procedure places opposite each score a value (such as the mean – but other statistics can be chosen) which summarises the scores in the group.The grouping variable (in this case Drug Condition) is specified as the BREAK VARIABLE.The participant’s score (the DV) is the VARIABLE TO BE SUMMARISED.
19Eta is the correlation between the scores and their group means Eta is the correlation between the scores and their group means. The square of the correlation between the scores and the group means is eta squared.
21Other measures of effect size Several other measures of effect size have been proposed.One of these, Cohen’s f, is used as input for G*Power, a useful package for answering questions about the numbers of participants you would need in a planned study to achieve sufficient power in your statistical tests.
26Positive biasEta squared is positively biased as an estimate of effect size.Were the experiment to be repeated many times, the long run average or EXPECTED VALUE of eta squared would be higher than the population value.
27Omega squaredFor some ANOVA designs, there is available a statistic called omega squared, which corrects for positive bias.But the application of omega squared is problematic in complex designs with repeated measures factors.PASW Statistics 17 does not offer omega squared.
28Interpreting values of eta squared and Cohen’s f
29Multiple comparisonsWhen there are three or more groups, the rejection of the null hypothesis leaves many important questions unanswered.Is the mean for the Placebo group significantly different from that of the Drug D group? Is it significantly different from the Drug C group?
30Planned contrastsOn Monday, I discussed the making of specific PLANNED comparisons, simple and complex, among the individual treatment means.
31Unplanned or ‘post hoc’ tests Often, however,after we have the results of an experiment, we shall want to do some DATA-SNOOPING – i.e., run unplanned statistical tests of differences among the individual treatment means.Such unplanned tests are known, (solecistically) as POST HOC tests.The following points apply both to planned and unplanned comparisons.
33The critical regionA small arbitrary probability (usually .05) known as the SIGNIFICANCE LEVEL is fixed in advance.The CRITICAL REGION is a range of values such that, assuming that the null hypothesis is true, the probability that t will fall inside the range is less than or equal to the fixed significance level.
35Type I errorsIf the sigificance level α is set at 0.05, any p-value less than 0.05 will result in the rejection of the null hypothesis (H0).If H0 is true, it will be wrongly rejected on 5% of occasions with repeated sampling. A false rejection of the null hypothesis is known as a ‘Type I’ error, and the significance level is therefore also known as the ‘Type I’ or ‘alpha’ error rate.
36Type I error rate per comparison Suppose we gather our data and test the differences among pairs of means for significance.We make several of these tests, setting the significance level at .05 each time and rejecting the null hypothesis whenever the p-value is less than .05 .This significance level (.05) is known as the Type I error rate PER COMPARISON.
37An array of ten treatment means Suppose, to make my next point more strongly, I have an array of ten treatment means and want to make comparisons between, say, the Placebo mean and each of nine drug means.I set the type I error rate per comparison at .05.Suppose the null hypothesis is true – in the population, the means all have the same value.In the population, the profile is a pancake.
38The type I error rate per family What is the probability that AT LEAST ONE test will show significance, even if the null hypothesis is true?This is known as the PER FAMILY or FAMILYWISE type I error rate.The older term EXPERIMENTWISE has a similar meaning.
39The familywise type I error rate is unacceptably high! Were I to make 9 INDEPENDENT tests of differences among the 10 means, the familywise type I error rate would be nearly .4 – FORTY PER CENT!(The probability of at least one type I error is 1 minus the probability of none.)
40Capitalising upon chance With a large array of treatment means, we might decide to make a large number of comparisons.Even if the null hypothesis is true, the familywise Type I error rate might be 0.90 or even higher!Failure to take the heightened probability of the familywise Type I error into account when making sets of comparisons is known as CAPITALISING UPON CHANCE.
41Familywise type I error rate with unplanned pairwise comparisons Suppose we decide to make every possible pairwise comparison. Assume, for simplicity, that the comparisons are independent.The number of possible pairings from ten means is 45.If the per comparison error rate is fixed at .05, the familywise type I error rate is in the region of
42Conservative testsA CONSERVATIVE TEST adjusts the p-value per comparison upwards in order to to control the familywise Type I error rate.This is equivalent to setting the per comparison significance level at a lower value than the traditional significance level.There are many different approaches to the making of conservative tests to avoid capitalising upon chance.
43The Bonferroni correction The Bonferroni correction was originally applied to PLANNED comparisons.You plan to make k contrasts among a set of means.You want to keep the per family Type I error rate at the .05 level approximately.You achieve this by multiplying the obtained p-value for each value of t by k.
44ExampleReturning to our drug experiment, we planned to make four simple contrasts.We want to control the familywise Type I error rate and keep it at the .05 level.
45Results of the four simple contrasts Double-click to get into the editor and use Cell properties to display more places of decimals for the second contrast.
46Applying the Bonferroni correction Multiply the given p-value by 4, the number of planned simple contrasts.For the second contrast, the corrected p-value isReport the Bonferroni-corrected p-values rather than the values given in the table.Write that the given p-values have been Bonferroni-corrected: For the second contrast (upper half of the table), write:“t(45) = 2.88; p = .024 (Bonferroni-corrected)”.
47Bonferroni correction for post hoc comparisons You must assume that ALL POSSIBLE pairwise comparisons will be made.If you have ten treatment means, there are 45 possible pairs.So the p-value for each test must be multiplied by 45.Equivalently, the per comparison significance level must be set atThat’s a tough criterion for significance.
49Which one?The Bonferroni is the most conservative of these tests. With a large array of means it’s almost impossible to get anything significant.For between subjects experiments, the Tukey test is preferred. (The Tukey B test is less conservative.)The LSD (least significant difference) test makes no correction; but the test is made only if the ANOVA F value is significant.The Dunnet is the most powerful conservative test, but it is suitable only for the situation where you are comparing the mean of the controls with each of the other treatment means, that is, when you are making simple comparisons.The Scheffe test is good for unplanned complex comparisons.
51Factorial experiments In a FACTORIAL experiment, there are two or more treatment factors.The ANOVA really comes into its own when it is applied to the analysis of data from factorial experiments.
52Types of ANOVA designThe three most common types of factorial ANOVA design are:BETWEEN SUBJECTS FACTORIAL designs, in which ALL factors are between subjects.WITHIN SUBJECTS FACTORIAL designs, in which ALL factors are within subjects.MIXED FACTORIAL designs, in which SOME factors are between subjects and some are within subjects.
53A two-factor between subjects factorial experiment Suppose that a researcher has been commissioned to investigate the effects upon simulated driving performance of two new anti-hay fever drugs, A and B. It is suspected that at least one of the drugs may have different effects upon fresh and tired drivers, and the firm developing the drugs needs to ensure that neither drug has an adverse effect upon driving performance in any circumstances.The researcher decides to carry out a two-factor between subjects factorial experiment, in which the factors are:Drug Treatment, with levels Placebo, Drug A and Drug B;Alertness, with levels Fresh and Tired.
55Main effects and interactions A factor is said to have a MAIN EFFECT if, in the population, there are differences among the means at its different levels, ignoring any other factors in the design.A main effect is indicated by differences among the MARGINAL means.In factorial experiments, interest usually centres not on main effects, but on the interplay among the treatment factors, that is, upon INTERACTIONS.
57Observations Main effects are evident in the MARGINAL TOTALS. Not surprisingly the Fresh participants outperformed the Tired participants. Looks as if the Alertness factor has a main effect.Performance was higher in the Drug B group, suggesting a main effect of the Drug Treatment factor as well.But the CELL means are the main focus of interest, because certain patterns in those indicate the presence of an INTERACTION.
58Profile plotsYou find that the F test for an interaction is significant. What does this mean?The next step is to examine the appropriate profile plots.More than one plot is possible: your choice depends upon which factor is of principal interest.
61Nonparallel profiles In neither plot are the profiles parallel. In the first profile, the factor of Alertness seems to reverse its effect at different levels of the Drug factor. In fact, Drug A actually depressed the performance of the Fresh participants.In the second profile, the ordering of the means at the three levels of the Drug factor changes from level to level of the Alertness factor.Nonparallelism of the profiles indicates the presence of an interaction.
62Simple main effectsA factor is said to have a SIMPLE MAIN EFFECT when there are differences among its means at a specific level of another factor.In the first profile plot, the Alertness factor would seem to have simple main effects at all three levels of the Drug factor.
63InteractionA two-factor INTERATION between two factors is said to occur when the simple main effects of one factor are not homogeneous across all levels of the other factor.The simple main effects of the Alertness factor are not the same across all levels of the Drug factor.It would appear that a two-factor interaction may be present in these data.
64Partition of the total sum of squares in the two-way ANOVA A and B are the two treatment factors, and AB is their interaction.
68Effect size in factorial experiments A controversial area.The measure known as COMPLETE ETA SQUARED expresses the contribution of a source (whether a main effect or an interaction) to the total variance in the presence of all other treatment or group sources.The measure known as PARTIAL ETA SQUARED excludes all other treatment or group sources.
69Complete eta squaredExpresses the variance attributable to a source in terms of the TOTAL variance.
74When the interaction is significant We shall often want to ‘unpack’ a significant interaction by testing for simple main effects and making multiple comparisons among the individual treatment means.
75Simple effects with PASW Simple effects are not an option in the ANOVA dialog windows.It is easy to run simple effects on PASW, but we must use SYNTAX to achieve this.A small problem is that we must use, not the ANOVA syntax command, but the command for what is known as Multivariate Analysis of Variance or MANOVA.But it really is VERY EASY to do this.
76Multivariate analysis of variance (MANOVA) In the ANOVA, there is just ONE dependent variable.Multivariate Analysis of Variance (MANOVA) is a generalisation of the ANOVA to the analysis of data from experiments of ANOVA design with two or more DVs.We can, therefore, regard the ANOVA as a special case of the MANOVA.
77Using MANOVA to run ANOVA If there is only one DV, running the MANOVA procedure will run a univariate ANOVA and produce the usual ANOVA summary table.Why bother? Tests for simple effects are options in the MANOVA command.They are not available in other PASW commands.
88The need for a smaller comparison ‘family’ An interaction is significant.We want to make unplanned or post hoc multiple comparisons among the treatment means.But there may be many cells in the design, so that the critical difference for significance may be impossibly large.In terms of the Bonferroni test, you could be multiplying the p-value by a large factor or setting the per comparison significance level at a tiny value.We need to justify making the comparisons among a smaller array of means.
89First, we test for simple main effects We might argue that if we have a significant main effect of the Drug factor at one level of Body State or Alertness, we can define the comparison family in relation to those means at the Fresh level of Body State only. This will produce a less conservative test.When testing for simple main effects, however, we should use the Bonferroni correction to control the familywise Type I error rate.In our example, since there are two simple main effects, the criterion for significance should be that p is less than 0.025, rather than 0.05.
90Reduce the data set.There is more than one way of making the multiple comparisons.You can easily run a one-way ANOVA on the data from the scores of the fresh participants only, then ask for a Tukey test.
95SummaryA report of an ANOVA F test should be accompanied by a measure of effect size, such as eta squared, omega squared or Cohen’s f. Follow Lisa DeBruine’s guidelines.Beware of capitalising upon chance: follow-up tests should be conservative.When unpacking significant interactions, use syntax to test for simple main effects.The obtaining of a significant simple main effect can be an argument for a smaller comparison ‘family’.
96Further readingFor a thorough and readable coverage of elementary (and not so elementary) statistics, I recommend …Howell, D. C. (2007). Statistical methods for psychology (6th ed.). Belmont, CA: Thomson/Wadsworth.
97For PASW/SPSSKinnear, P. R., & Gray, C. D. (2009). PASW 17 for Windows Made Simple. Hove and New York: Psychology Press.In addition to practical advice about using PASW Statistics 17, we also offer informal explanations of many of the techniques.