Presentation on theme: "Two and more factors in analysis of variance Factorial and nested designs."— Presentation transcript:
Two and more factors in analysis of variance Factorial and nested designs
Factorial design Each level of the first factor is combined with each level of the second one. By two levels in each factor 2 factors -> 4 combinations 3 factors -> 8 combinations Generally: Number of combinations is product of number of levels for each factor
Mowing, fertilization, removing of dominant Usually – each combination in several replications
Factorial designs in terrain - factors: shape and pattern
Another possibility - nested design Plant 1 from the first locality has nothing common with plant 1 from any other locality. factor A (local) factor C (plant) sing. observ.
Proportional design The same proportion of replications of each factor at each level of other factor; contingency table of no. of replications χ 2 equals zero - i.e. factors are absolutely independent In ideal case is the same number of observations in all combinations, but proportional design is enough:
So, for example for non-fertilized non-mowed I.e. the same proportional representation of the first factor’s level by each level of the second factor – then we consider the factors independent [formula for expected frequency in contingency table]
When factors are “independent”, and design is balanced Balanced design Weights of rats
When factors are “independent”, and design is proportional Proportion design Weights of rats
When factors are “dependent”, i.e. design isn’t balanced nor proportional According to marginal means it seems as listening of music can affect weight of rats. (There are methods, which can partly cope with it [LS means], but power of test is lowered for both factors). Non-proportional design Weights of rats
Statistica can compute anything, but If I have proportion design, the result should be always the same. Two-way ANOVA can be computed even in non- proportion design – default there (Type III sum of squares - orthogonal) is alright, but I can, according to the experiment situation, decide myself for other type (perhaps Type I - sequential), and I should know, what means what (and why are results different).
Model of two-way ANOVA Two factors (mown and fertilized) - index i is level of the first factor (non-mown, mown), index j is level of the second one, k replication in within group – response is e.g. number of species. Grand mean Effect of mowing Effect of fertilization Interaction Error variability Parameterisation is usually such, that α, β, and γ would be balanced around zero (then μ is really mean of everything).
Three null hypothesis α i =0 for all i – mowing has no effect β i =0 for all j – fertilization has no effect γ ij =0 for all combinations of ij - there is no interaction between mowing and fertilization Null interaction means, that main effects are purely additive
Null interaction “Effect of every factor is independent of the level of other factor” ATTENTION – it means additivity
Interaction is deviation from additivity e.g.
Can be seen well in graphs (interaction plot) Do not forget to stress, that connection of means isn’t an interpolation here – we just want to visualize interaction with help of (non) parallelism of lines
Can be seen well in diagram (interaction plot) When I refer about result, it isn’t enough to write that interaction is significant, but one need to say why (where is the deviation from additivity).
Null hypothesis of main effects - “averaged” over all levels of the second effect α i =0 for all i – mowing has no effect (at mean over all levels of fertilization) β i =0 for all j – fertilization has no effect (at mean over all levels of mowing)
You have to use head when interpreting results!!! (and look at diagram) Administrate two medicines separately and together (factorial design) - main effects are insignificant – it doesn’t mean the medicines are ineffective though. Just their effects cancel when applied together.
Holds again – grand/overall variability expressed with help of SS TOT can be divided SS A SS B SS AB(interaction) SS error(resid) Explained by model Error (Residual) SS TOT = sum of deviations from grand mean SS A = sum of deviations of marginal means of factor A groups from grand mean, weigh by number of observations (similar to SS B ) SS AB = weigh sum of squares of deviations of means combination from means if there is pure additivity Expected without interaction
Example mown, fertilized, number of species as response Test of null hypothesis, that mean number of species is zero everywhere
a, b are sums of levels for factors A and B, n is number of observations in all groups Holds DF A = a-1, DF B =b-1, DF AB =(a-1)(b-1), DF TOT =n-1 DF error = DF TOT - DF A - DF B - DF AB Holds again, that fraction MS = SS/DF is estimation of grand variance, if null hypothesis is true
Test: F effect = MS effect / MS error If all the effects are fixed
Problem – what is in denomination depends on which factor is with fixed effect and which factor with random effect (especially important if one of the factors is experimental (and thus of our major interest), and the other is locality. Important for experimental design planning!
I, the experimenter, am the one deciding, which model I will use “classic” ANOVA factorial ANOVA without interactions (also Main effects ANOVA) - “non- additivity” is part of random variability – it makes possible to work with data with one observation for each factor combination (better avoid it though)
Experimental design Pseudoreplications WRONGLATIN SQUARE C RANDOMIZED BLOCKS
Completely randomized blocks I test by two-way analysis of variance without repetition (error variability is deviations from additivity, i.e. interaction between block and treatment) It can give more powerful test, if blocks explain something, i.e. help to control variability.
Multiple comparison Similar to one-way analysis of variance – if I do it “on interaction” – I compare all factorially-made groups with each other; if I do it on main effect, I compare additive effects of single levels. I am the one deciding what will be compared.
Friedman test - nonparametric ANOVA for completely randomized blocks Based on sequencing values inside block where a is number of levels of factor studied, b is number of blocks and R i is sum of ranks for level i of factor studied.
Two-factorial experiment – I compare daisy and sunflower and their response to level of nutrients (response is height of plant) Three null hypothesis: 1. Height of daisies and sunflowers isn’t different (it can sometimes happen, we are testing totally unrealistic null hypothesis, we didn’t need to test this one obviously) 2. Height of plants is independent of level of nutrients 3. Effect of level of nutrients is the same for both species
We have a problem Data are positively skewed (the least important problem) There is distinctive inhomogeneity of variances (CV could be constant, i.e. SD linearly depends on mean) Classic interaction tests additivity – thus if fertilization elongates daisies from 10 to 20 cm, sunflowers should be elongated from 100 to 110 cm. From biological point of view this isn’t absolutely “the same effect” to both species.
Additive effect Multiplicative effect After log-transformation with every value we multiply error – thus SD is linearly dependent on mean. ε ijk has lognormal distribution centered around 1. is multiplicative effect changed to additive
Logarithmic transformation Changes lognormal distribution to normal one If SD was linearly dependent on mean, it leads to homogeneity of variances Changes multiplicative effects to additive ones ATTENTION – it makes everything simultaneously – I cannot want just one of those
Many biological data contain zeroes Transformation often used X´ = log(X+1) has similar quality, but not exactly the same, especially if there are low X values. Particularly inaccurate can be the change from multiplicativity to additivity!!! Sometimes is used X´ = log(bX+a), where a and b are constants. (but the change to additivity from multiplicativity is never achieved)
Other transformations used For Poisson distribution (numbers of randomly placed individuals) For percentages (p as a number between 0 and 1)
Nested design We measure length of corolla’s tubes Plant 1 from the first locality has nothing common with plant 1 from any other locality. factor A (local) factor C (plant) sing. observ.
The top factor in hierarchy can be either with fixed effect or with random one Factors in lower position of hierarchy are almost always with random effect (it is possible to compute it also with fixed one, but it is very unusual case) In analysis of sum of squares we count squares of differences of each observation (or mean) and its hierarchically nearest upper relevant mean. If hierarchically lower effects are random, then we test every effect against nearest hierarchically lower effect
Test of null hypothesis, that mean tube length is zero F locality = MS locality /MS plant F plant = MS plant /MS error = 2,15/2,24=0,96 Ideal, when model is balanced - Statistika compute it even if it isn’t, but they are various approximations…. Null hypothesis on lower hierarchical levels – plants do not differ in mean length of their tubes in scope of locality
Most frequent use Analysis of variability among single hierarchical levels, e.g. in taxonomy Often – I am interested mainly (only) in hierarchically higher factor, everything else is just for increasing test power. I.e. I can have just 6 pounds, three pastured and three non-pastured (I am not able to have more). In each of them I lay out 10 squares for biomass sampling, and I do three analytic determinations from every square. Analysis of variability can help me to plan optimal sampling design.
Mind mixed samples I can spare my work, but they must be independently replicated! These aren’t independent observations
More complicated models of ANOVA Factorial and nested designs can be combined in different ways, whereas some of them will be with fixed effect and some with random one
Split plot (main plots and split plots - two error levels) 6 plots (3 calcite, 3 granite), 3 types of impacts in each plot
ANOVA - Repeated measures I have some experimental design and I follow the state of individual objects in time, e.g. growing plants, etc.