Presentation on theme: "Two and more factors in analysis of variance"— Presentation transcript:
1 Two and more factors in analysis of variance Factorial and nested designs
2 Factorial designEach level of the first factor is combined with each level of the second one. By two levels in each factor2 factors -> 4 combinations3 factors -> 8 combinationsGenerally: Number of combinations is product of number of levels for each factor
3 Mowing, fertilization, removing of dominant Usually – each combination in several replications
4 Factorial designs in terrain - factors: shape and pattern
5 Another possibility - nested design factor A (local)factor C (plant)sing. observ.Plant 1 from the first locality has nothing common with plant 1 from any other locality.
7 Proportional designThe same proportion of replications of each factor at each level of other factor; contingency table of no. of replications χ2 equals zero - i.e. factors are absolutely independentIn ideal case is the same number of observations in all combinations, but proportional design is enough:
8 [formula for expected frequency in contingency table] So, for example for non-fertilized non-mowedI.e. the same proportional representation of the first factor’s level by each level of the second factor – then we consider the factors independent
9 When factors are “independent”, and design is balanced Balanced designWeights of rats
10 When factors are “independent”, and design is proportional Proportion designWeights of rats
11 When factors are “dependent”, i. e When factors are “dependent”, i.e. design isn’t balanced nor proportionalNon-proportional designWeights of ratsAccording to marginal means it seems as listening of music can affect weight of rats. (There are methods, which can partly cope with it [LS means], but power of test is lowered for both factors).
12 Statistica can compute anything, but If I have proportion design, the result should be always the same.Two-way ANOVA can be computed even in non-proportion design – default there (Type III sum of squares - orthogonal) is alright, but I can, according to the experiment situation, decide myself for other type (perhaps Type I - sequential), and I should know, what means what (and why are results different).
13 Model of two-way ANOVATwo factors (mown and fertilized) - index i is level of the first factor (non-mown, mown), index j is level of the second one, k replication in within group – response is e.g. number of species.Grand meanEffect of mowingEffect of fertilizationError variabilityInteractionParameterisation is usually such, that α, β, and γ would be balanced around zero (then μ is really mean of everything).
14 Three null hypothesis αi=0 for all i – mowing has no effect βi=0 for all j – fertilization has no effectγij=0 for all combinations of ij - there is no interaction between mowing and fertilizationNull interaction means, that main effects are purely additive
15 Null interaction“Effect of every factor is independent of the level of other factor” ATTENTION – it means additivity
17 Can be seen well in graphs (interaction plot) Do not forget to stress, that connection of means isn’t an interpolation here – we just want to visualize interaction with help of (non) parallelism of lines
18 Can be seen well in diagram (interaction plot) When I refer about result, it isn’t enough to write that interaction is significant, but one need to say why (where is the deviation from additivity).
19 Null hypothesis of main effects - “averaged” over all levels of the second effect αi=0 for all i – mowing has no effect (at mean over all levels of fertilization)βi=0 for all j – fertilization has no effect (at mean over all levels of mowing)
20 You have to use head when interpreting results!!! (and look at diagram) Administrate two medicines separately and together (factorial design) - main effects are insignificant – it doesn’t mean the medicines are ineffective though. Just their effects cancel when applied together.
21 Holds again – grand/overall variability expressed with help of SSTOT can be divided SSASSBSSAB(interaction)SSTOT = sum of deviations from grand meanSSA = sum of deviations of marginal means of factor A groups from grand mean, weigh by number of observations (similar to SSB)SSAB = weigh sum of squares of deviations of means combination from means if there is pure additivityExplained by modelSSerror(resid)Error(Residual)Expected without interaction
22 Example mown, fertilized, number of species as response Test of null hypothesis, that mean number of species is zero everywhere
23 a, b are sums of levels for factors A and B, n is number of observations in all groups Holds DFA= a-1, DFB=b-1, DFAB=(a-1)(b-1), DFTOT=n-1DFerror = DFTOT - DFA - DFB - DFABHolds again, that fraction MS = SS/DF is estimation of grand variance, if null hypothesis is true
24 If all the effects are fixed Test: Feffect = MSeffect / MSerror
25 Problem – what is in denomination depends on which factor is with fixed effect and which factor with random effect (especially important if one of the factors is experimental (and thus of our major interest), and the other is locality. Important for experimental design planning!
26 I, the experimenter, am the one deciding, which model I will use “classic” ANOVA factorialANOVA without interactions (also Main effects ANOVA) - “non-additivity” is part of random variability – it makes possible to work with data with one observation for each factor combination (better avoid it though)
27 Experimental design Pseudoreplications C RANDOMIZED BLOCKS WRONG LATIN SQUAREPseudoreplications
28 Completely randomized blocks I test by two-way analysis of variance without repetition (error variability is deviations from additivity, i.e. interaction between block and treatment)It can give more powerful test, if blocks explain something, i.e. help to control variability.
29 Multiple comparisonSimilar to one-way analysis of variance – if I do it “on interaction” – I compare all factorially-made groups with each other; if I do it on main effect, I compare additive effects of single levels. I am the one deciding what will be compared.
30 Friedman test - nonparametric ANOVA for completely randomized blocks Based on sequencing values inside blockwhere a is number of levels of factor studied, b is number of blocks and Ri is sum of ranks for level i of factor studied.
31 Two-factorial experiment – I compare daisy and sunflower and their response to level of nutrients (response is height of plant)Three null hypothesis:1. Height of daisies and sunflowers isn’t different (it can sometimes happen, we are testing totally unrealistic null hypothesis, we didn’t need to test this one obviously)2. Height of plants is independent of level of nutrients3. Effect of level of nutrients is the same for both species
32 We have a problemData are positively skewed (the least important problem)There is distinctive inhomogeneity of variances (CV could be constant, i.e. SD linearly depends on mean)Classic interaction tests additivity – thus if fertilization elongates daisies from 10 to 20 cm, sunflowers should be elongated from 100 to 110 cm. From biological point of view this isn’t absolutely “the same effect” to both species.
33 Additive effect Multiplicative effect with every value we multiply error – thus SD is linearly dependent on mean. εijk has lognormal distribution centered around 1.After log-transformationis multiplicative effect changed to additive
34 Logarithmic transformation Changes lognormal distribution to normal oneIf SD was linearly dependent on mean, it leads to homogeneity of variancesChanges multiplicative effects to additive onesATTENTION – it makes everything simultaneously – I cannot want just one of those
35 Many biological data contain zeroes Transformation often used X´ = log(X+1) has similar quality, but not exactly the same, especially if there are low X values. Particularly inaccurate can be the change from multiplicativity to additivity!!!Sometimes is used X´ = log(bX+a), where a and b are constants. (but the change to additivity from multiplicativity is never achieved)
36 Other transformations used For Poisson distribution (numbers of randomly placed individuals)For percentages (p as a number between 0 and 1)
37 Nested design We measure length of corolla’s tubes factor A (local)factor C (plant)sing. observ.Plant 1 from the first locality has nothing common with plant 1 from any other locality.
38 The top factor in hierarchy can be either with fixed effect or with random one Factors in lower position of hierarchy are almost always with random effect (it is possible to compute it also with fixed one, but it is very unusual case)In analysis of sum of squares we count squares of differences of each observation (or mean) and its hierarchically nearest upper relevant mean.If hierarchically lower effects are random, then we test every effect against nearest hierarchically lower effect
39 Null hypothesis on lower hierarchical levels – plants do not differ in mean length of their tubes in scope of localityTest of null hypothesis, that mean tube length is zeroFlocality= MSlocality/MSplantFplant = MSplant/MSerror= 2,15/2,24=0,96Ideal, when model is balanced - Statistika compute it even if it isn’t, but they are various approximations….
40 Most frequent useAnalysis of variability among single hierarchical levels, e.g. in taxonomyOften – I am interested mainly (only) in hierarchically higher factor, everything else is just for increasing test power.I.e. I can have just 6 pounds, three pastured and three non-pastured (I am not able to have more). In each of them I lay out 10 squares for biomass sampling, and I do three analytic determinations from every square. Analysis of variability can help me to plan optimal sampling design.
41 Mind mixed samplesI can spare my work, but they must be independently replicated!These aren’t independent observations
42 More complicated models of ANOVA Factorial and nested designs can be combined in different ways, whereas some of them will be with fixed effect and some with random one
43 Split plot (main plots and split plots - two error levels) 6 plots (3 calcite, 3 granite), 3 types of impacts in each plot
44 ANOVA - Repeated measures I have some experimental design and I follow the state of individual objects in time, e.g. growing plants, etc.