Presentation on theme: "1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas."— Presentation transcript:
1 Analysis of variance (ANOVA)-the General Linear Model (GLM) Kazimieras Pukėnas
2 Analysis of variance (ANOVA) is used to uncover the main and interaction effects of categorical independent variables (called "factors") on an interval dependent variable. The General Linear Model is "general" in the sense that one may implement both regression and ANOVA models. The GLM Univariate procedure provides regression analysis and analysis of variance for one dependent variable by one or more factors and/or variables. The factor variables divide the population into groups. Using this GLM procedure, you can test null hypotheses about the effects of other variables on the means of various groupings of a single dependent variable. You can investigate interactions between factors as well as the effects of individual factors. In addition, the effects of covariates and covariate interactions with factors can be included. INTRODUCTION
3 The GLM Multivariate procedure provides analysis of variance for multiple dependent variables by one or more factor variables or covariates. The GLM Repeated Measures procedure provides analysis of variance when the same measurement is made several times on each subject or case. If between- subjects factors are specified, they divide the population into groups. Using this general linear model procedure, you can test null hypotheses about the effects of both the between-subjects factors and the within-subjects factors. You can investigate interactions between factors as well as the effects of individual factors. In addition, the effects of constant covariates and covariate interactions with the between-subjects factors can be included.
4 GLM Univariate, one-way ANOVA One-way ANOVA tests differences in a single interval dependent variable among two, three, or more groups formed by the categories of a single categorical independent variable (factor). Data requirements: In all GLM models, the dependent(s) variable(s) X 1 …X k is/are continuous. The independents may be categorical factors (including both numeric and string types) or quantitative covariates. The data are a random sample from a normal population. The variance(s) of the dependent variable(s) is/are assumed to be the same for each cell formed by categories of the factor(s). Analysis of variance is robust to departures from normality, although the data should be symmetric. To check assumptions, you can use homogeneity of variances tests.
5 One-way ANOVA can be very briefly in popular form explained as follows: The idea of the analysis of variance is to take a summary of the variability in all the observations and partition it into separate sources. This sum of squares total SST is partitioned into two separate, and additive, pieces. These are a sum of squares among (between), SSA and a sum of squares within, SSW ; where ; ; ; GLM Univariate, one-way ANOVA
6 - the j th observation in the i th group; - the overall mean of all samples; - the sample mean for the i th group; k - the number of independent groups (populations); n i - the size of i th group; The ratio MSA/MSW serves as a measure of the statistical importance or significance of the differences among the group means because MSA~MSW if the null hypothesis is true, i. e. (the homogeneity of variances is assumed); GLM Univariate, one-way ANOVA
7 The statistical hypotheses under consideration Decision rule: The null hypothesis H 0 is rejected (not all means are equal) if ; The null hypothesis H 0 is not rejected (there is no difference between means) if ; where is the significance level;
8 GLM Univariate, two-way ANOVA Two-way ANOVA analyzes one interval dependent in terms of the categories (groups) formed by two independents (factors), one of which may be conceived as a control variable, and tests the interaction of two independent variables. Data requirements are similar to one-way ANOVA: The data are a random sample from a normal population; In the population, all cell variances are the same; Analysis of variance is robust to departures from normality, although the data should be symmetric.
9 GLM Univariate, two-way ANOVA The two ‐ way ANOVA tests three hypotheses: the main effect for factor A; the main effect for factor B; interaction effect of two factors. For interval scale dependent variables with unknown means, and variance, where a – the number of categories of factor A, b – the number of categories of factor B, we can test the hypotheses: where null hypothesis H 0 is that the factor A has no influence on the response variable;
10 GLM Univariate, two-way ANOVA where null hypothesis H 0 is that the factor B has no influence on the response variable; where null hypothesis H 0 assumed that there is no interaction effect of two factors; ; ;- overall mean; Each null hypothesis H 0 is rejected if ;
11 GLM Univariate, two-way ANOVA The one-way and two-way ANOVA procedures in SPSS are performed in similar manner, therefore, we present the step- by-step instructions on how to perform a two-way ANOVA. Open the file with the data analyzed. From the menus choose: Analyze General Linear Model Univariate... Select a dependent variable in Univariate dialog box (Fig.1) and select variables for Fixed Factor(s), Random Factor(s), and Covariate(s), as appropriate for your data. A covariates are an interval-level independents and are commonly used as control variables to test the main and interaction effects of categorical variables on a continuous dependent variable, controlling for the effects of selected other continuous variables which covary with the dependent.
13 GLM Univariate, two-way ANOVA Leave default Full Factorial model in dialog box Univariate: Model, i.e. you can skip Model... and Contrasts…; Click Plots... and specify a plot by selecting factors for the horizontal axis and, optionally, factors for separate lines and separate plots in Univariate: Profile Plots dialog box (Fig. 2); the plot must be added to the Plots list. A profile plot is a line plot in which each point indicates the estimated marginal mean of a dependent variable at one level of a factor. A profile plot of one factor shows whether the estimated marginal means are increasing or decreasing across levels. For two factors, parallel lines indicate that there is no interaction between factors. Nonparallel lines indicate an interaction. Click Continue.
15 GLM Univariate, two-way ANOVA Click Post Hoc... to select post hos tests in Univariate: Post Hoc Multiple Comparisons for Observed Means dialog box (Fig. 3); Once you have determined that differences exist among the means and factor has more than two levels, post hoc range tests and pairwise multiple comparisons can determine which means differ. The Bonferroni and Tukey’s honestly significant difference tests are commonly used multiple comparison tests. But Bonferroni test is unappropriate when factor has multiple levels. Select the corresponding variables (factors) into the Post Hoc Tests for box, check Tukey’s test and click Continue.
16 GLM Univariate, two-way ANOVA Fig. 3. Univariate: Post Hoc Multiple Comparisons for Observed Means dialog box
17 GLM Univariate, two-way ANOVA Click Options... At the top of Univariate: Options box (Fig. 4) you cold ask for Estimated Marginal Means to be displayed, by moving the variables (factors and interactions) to the right-hand box Display Means for. This is used when you want to adjust the means to remove the effect of covariate. When you haven’t got a covariate, the Estimated Marginal Means will be the same as the means from your sample, which are displayed using the Descriptive Statistics option at the bottom of Univariate: Options... dialog box.
19 GLM Univariate, two-way ANOVA Click the box next to Estimates of effect size. Estimates of effect size gives a Partial Eta-Squared value for each effect and each parameter estimate. The eta-squared statistic describes the proportion of total variability attributable to a factor;; Select Observed power. Observed power is the likelihood of finding a significant difference between groups in any particular sample with the sample size as the difference between groups in the population. In other words, Observed power is the probability of correctly rejecting a false statistical null hypothesis and is equal to 1-β, where β is the probability of a Type II error.Conventionally a test with a power greater than 0.8 (or β<=0.2) is considered statistically powerful. Select Homogeneity tests. Homogeneity tests produces the Levene test of the homogeneity of variance for each dependent variable across all level combinations of the between-subjects factors, for between-subjects factors only.
20 Example Example. Data are gathered for individual swimmers in the senior swimming championship for several years. The time in which each swimmer finishes is the dependent variable. Other factors include date of championship, and age (categorical). You might find that age and date of championship are a significant effect and that the interaction of age with date is significant. It is suppose, that different individuals participated in different championships, i.e., the samples are independent. The data file fragment is show in Fig. 5. The following basic tables are obtained from the GLM Univariate output.
21 Example Fig. 5. Data View
22 Example Table Between-Subjects Factors (Fig. 6) contains general information about independent variables (influence factors); Levene's test of homogeneity of variance is computed by SPSS to test the GLM Univariate assumption that each group (category) of the independent(s) has the same variance. In our example, resulting p-value of Levene's test is greater than significance level (0,05) as are shown in table Levene’s Test of Equality of Error Variances (Fig.6). That is, assumptions are met. Note, that the Levene’s test is robust in the face of departures from normality.
23 Example Fig. 6. The main outputs of GLM Univariate
24 Example The Tests of Between Subjects Effects table (Fig. 7) gives us information about the main and interaction effects. This table shows that for the Age main effect, p(Sig.) = 0.000, with a Partial Eta-Squared effect size of 0.300, and Observed Power Since p < 0.05 we reject H 0. There is a significant Age main effect on the dependent variable, Time. This table also shows that for the Championship main effect, p = 0.000, with a Partial Eta-Squared effect size of 0.475, and Observed Power Since p < 0.05 we reject H 0. There is a significant Championship main effect on the dependent variable, Time. Finally, the table shows that for Age*Championship interaction, p = Since p<0.05, we reject H 0. There is significant interaction between Age and Championship.
25 Example Fig. 7. The main outputs of GLM Univariate
26 Example The table Estimated Marginal Means (Fig. 8) shows mean of dependent variable (Time) for each level of Age and Championship, along with the standard error of estimate of the mean. The Post Hoc Test Multiple Comparisons table (Fig.9) for the Tukey test displays all pairwise comparisions between groups of independent variable Age. Significant differences in Time scores were found between the age groups years and years, also between the age groups years and years. No significant difference was found between the age groups years and years. All comparisons are made twice, so all results are repeated.
27 Fig. 8. The main outputs of GLM Univariate
28 Example Post Hoc Tests Fig. 9. The main outputs of GLM Univariate Age
29 Example Also the table Homogenous Subsets (Fig. 10) shows there are two significantly different homogenous subsets. Similar results are across levels of second independent variable (factor) – Championship (not shown here).
30 Example Homogeneous Subsets Fig. 10. The main outputs of GLM Univariate
31 Example Profile plots are an easy way to visualize the relationship of factors to the dependent variable and to each other. Profile plot Estimated Marginal Means (Fig.11) shows the marginal means on the continuous dependent variable Time for value groups of factor Championship, using values of another factor Age as the X axis (the Y axis is the magnitude of the mean). That the profile plot lines are not parallel shows there is an interaction effect between Championship and Age. The fundamental difference between the nature of the curve suggests the interaction of factors - the final conclusion is based on Test of Between- Subject Effects table.
32 Example Fig. 11. The main outputs of GLM Univariate