# Two and more factors in analysis of variance

## Presentation on theme: "Two and more factors in analysis of variance"— Presentation transcript:

Two and more factors in analysis of variance
Factorial and nested designs

Factorial design Each level of the first factor is combined with each level of the second one. By two levels in each factor 2 factors -> 4 combinations 3 factors -> 8 combinations Generally: Number of combinations is product of number of levels for each factor

Mowing, fertilization, removing of dominant
Usually – each combination in several replications

Factorial designs in terrain - factors: shape and pattern

Another possibility - nested design
factor A (local) factor C (plant) sing. observ. Plant 1 from the first locality has nothing common with plant 1 from any other locality.

Factorial design

Proportional design The same proportion of replications of each factor at each level of other factor; contingency table of no. of replications χ2 equals zero - i.e. factors are absolutely independent In ideal case is the same number of observations in all combinations, but proportional design is enough:

[formula for expected frequency in contingency table]
So, for example for non-fertilized non-mowed I.e. the same proportional representation of the first factor’s level by each level of the second factor – then we consider the factors independent

When factors are “independent”, and design is balanced
Balanced design Weights of rats

When factors are “independent”, and design is proportional
Proportion design Weights of rats

When factors are “dependent”, i. e
When factors are “dependent”, i.e. design isn’t balanced nor proportional Non-proportional design Weights of rats According to marginal means it seems as listening of music can affect weight of rats. (There are methods, which can partly cope with it [LS means], but power of test is lowered for both factors).

Statistica can compute anything, but
If I have proportion design, the result should be always the same. Two-way ANOVA can be computed even in non-proportion design – default there (Type III sum of squares - orthogonal) is alright, but I can, according to the experiment situation, decide myself for other type (perhaps Type I - sequential), and I should know, what means what (and why are results different).

Model of two-way ANOVA Two factors (mown and fertilized) - index i is level of the first factor (non-mown, mown), index j is level of the second one, k replication in within group – response is e.g. number of species. Grand mean Effect of mowing Effect of fertilization Error variability Interaction Parameterisation is usually such, that α, β, and γ would be balanced around zero (then μ is really mean of everything).

Three null hypothesis αi=0 for all i – mowing has no effect
βi=0 for all j – fertilization has no effect γij=0 for all combinations of ij - there is no interaction between mowing and fertilization Null interaction means, that main effects are purely additive

Null interaction “Effect of every factor is independent of the level of other factor” ATTENTION – it means additivity

e.g.

Can be seen well in graphs (interaction plot)
Do not forget to stress, that connection of means isn’t an interpolation here – we just want to visualize interaction with help of (non) parallelism of lines

Can be seen well in diagram (interaction plot)
When I refer about result, it isn’t enough to write that interaction is significant, but one need to say why (where is the deviation from additivity).

Null hypothesis of main effects - “averaged” over all levels of the second effect
αi=0 for all i – mowing has no effect (at mean over all levels of fertilization) βi=0 for all j – fertilization has no effect (at mean over all levels of mowing)

You have to use head when interpreting results!!! (and look at diagram)
Administrate two medicines separately and together (factorial design) - main effects are insignificant – it doesn’t mean the medicines are ineffective though. Just their effects cancel when applied together.

Holds again – grand/overall variability expressed with help of SSTOT can be divided
SSA SSB SSAB(interaction) SSTOT = sum of deviations from grand mean SSA = sum of deviations of marginal means of factor A groups from grand mean, weigh by number of observations (similar to SSB) SSAB = weigh sum of squares of deviations of means combination from means if there is pure additivity Explained by model SSerror(resid) Error (Residual) Expected without interaction

Example mown, fertilized, number of species as response
Test of null hypothesis, that mean number of species is zero everywhere

a, b are sums of levels for factors A and B, n is number of observations in all groups
Holds DFA= a-1, DFB=b-1, DFAB=(a-1)(b-1), DFTOT=n-1 DFerror = DFTOT - DFA - DFB - DFAB Holds again, that fraction MS = SS/DF is estimation of grand variance, if null hypothesis is true

If all the effects are fixed
Test: Feffect = MSeffect / MSerror

Problem – what is in denomination depends on which factor is with fixed effect and which factor with random effect (especially important if one of the factors is experimental (and thus of our major interest), and the other is locality. Important for experimental design planning!

I, the experimenter, am the one deciding, which model I will use
“classic” ANOVA factorial ANOVA without interactions (also Main effects ANOVA) - “non-additivity” is part of random variability – it makes possible to work with data with one observation for each factor combination (better avoid it though)

Experimental design Pseudoreplications C RANDOMIZED BLOCKS WRONG
LATIN SQUARE Pseudoreplications

Completely randomized blocks
I test by two-way analysis of variance without repetition (error variability is deviations from additivity, i.e. interaction between block and treatment) It can give more powerful test, if blocks explain something, i.e. help to control variability.

Multiple comparison Similar to one-way analysis of variance – if I do it “on interaction” – I compare all factorially-made groups with each other; if I do it on main effect, I compare additive effects of single levels. I am the one deciding what will be compared.

Friedman test - nonparametric ANOVA for completely randomized blocks
Based on sequencing values inside block where a is number of levels of factor studied, b is number of blocks and Ri is sum of ranks for level i of factor studied.

Two-factorial experiment – I compare daisy and sunflower and their response to level of nutrients (response is height of plant) Three null hypothesis: 1. Height of daisies and sunflowers isn’t different (it can sometimes happen, we are testing totally unrealistic null hypothesis, we didn’t need to test this one obviously) 2. Height of plants is independent of level of nutrients 3. Effect of level of nutrients is the same for both species

We have a problem Data are positively skewed (the least important problem) There is distinctive inhomogeneity of variances (CV could be constant, i.e. SD linearly depends on mean) Classic interaction tests additivity – thus if fertilization elongates daisies from 10 to 20 cm, sunflowers should be elongated from 100 to 110 cm. From biological point of view this isn’t absolutely “the same effect” to both species.

with every value we multiply error – thus SD is linearly dependent on mean. εijk has lognormal distribution centered around 1. After log-transformation is multiplicative effect changed to additive

Logarithmic transformation
Changes lognormal distribution to normal one If SD was linearly dependent on mean, it leads to homogeneity of variances Changes multiplicative effects to additive ones ATTENTION – it makes everything simultaneously – I cannot want just one of those

Many biological data contain zeroes
Transformation often used X´ = log(X+1) has similar quality, but not exactly the same, especially if there are low X values. Particularly inaccurate can be the change from multiplicativity to additivity!!! Sometimes is used X´ = log(bX+a), where a and b are constants. (but the change to additivity from multiplicativity is never achieved)

Other transformations used
For Poisson distribution (numbers of randomly placed individuals) For percentages (p as a number between 0 and 1)

Nested design We measure length of corolla’s tubes
factor A (local) factor C (plant) sing. observ. Plant 1 from the first locality has nothing common with plant 1 from any other locality.

The top factor in hierarchy can be either with fixed effect or with random one
Factors in lower position of hierarchy are almost always with random effect (it is possible to compute it also with fixed one, but it is very unusual case) In analysis of sum of squares we count squares of differences of each observation (or mean) and its hierarchically nearest upper relevant mean. If hierarchically lower effects are random, then we test every effect against nearest hierarchically lower effect

Null hypothesis on lower hierarchical levels – plants do not differ in mean length of their tubes in scope of locality Test of null hypothesis, that mean tube length is zero Flocality= MSlocality/MSplant Fplant = MSplant/MSerror= 2,15/2,24=0,96 Ideal, when model is balanced - Statistika compute it even if it isn’t, but they are various approximations….

Most frequent use Analysis of variability among single hierarchical levels, e.g. in taxonomy Often – I am interested mainly (only) in hierarchically higher factor, everything else is just for increasing test power. I.e. I can have just 6 pounds, three pastured and three non-pastured (I am not able to have more). In each of them I lay out 10 squares for biomass sampling, and I do three analytic determinations from every square. Analysis of variability can help me to plan optimal sampling design.

Mind mixed samples I can spare my work, but they must be independently replicated! These aren’t independent observations

More complicated models of ANOVA
Factorial and nested designs can be combined in different ways, whereas some of them will be with fixed effect and some with random one

Split plot (main plots and split plots - two error levels)
6 plots (3 calcite, 3 granite), 3 types of impacts in each plot

ANOVA - Repeated measures
I have some experimental design and I follow the state of individual objects in time, e.g. growing plants, etc.

Replicated BACI - repeated measures