Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multivariate Description

Similar presentations


Presentation on theme: "Multivariate Description"— Presentation transcript:

1 Multivariate Description

2 What Technique? Response variable(s) ... Predictors(s) No Yes
... is one • distribution summary • regression models ... are many • indirect gradient analysis (PCA, CA, DCA, MDS) • cluster analysis • direct gradient analysis • constrained cluster analysis • discriminant analysis (CVA)

3 Rotate the Variable Space

4 Raw Data

5 Linear Regression

6 Two Regressions

7 Principal Components

8 Gulls Variables

9 Scree Plot

10 Output Importance of components:
> summary(gulls.pca2) Importance of components: Comp Comp Comp Standard deviation Proportion of Variance Cumulative Proportion > gulls.pca2$loadings Loadings: Comp.1 Comp.2 Comp.3 Comp.4 Weight Wing Bill H.and.B

11 Bi-Plot

12 Male or Female?

13 Linear Discriminant > gulls.lda <- lda(Sex ~ Wing + Weight + H.and.B + Bill, gulls) lda(Sex ~ Wing + Weight + H.and.B + Bill, data = gulls) Prior probabilities of groups: Group means: Wing Weight H.and.B Bill Coefficients of linear discriminants: LD1 Wing Weight H.and.B Bill

14 Discriminating

15 Relationship between PCA and LDA

16 CVA

17 CVA

18 Managing Dimensionality (but not acronyms) PCA, CA, RDA, CCA, MDS, NMDS, DCA, DCCA, pRDA, pCCA

19 Type of Data Matrix species attributes desert macroph inverts uses
sites species attributes attributes watervar rain gulls individuals sites

20 Models of Species Response
There are (at least) two models:- Linear - species increase or decrease along the environmental gradient Unimodal - species rise to a peak somewhere along the environmental gradient and then fall again

21 A Theoretical Model

22 Linear

23 Unimodal

24 Ordination Techniques
Linear methods Weighted averaging (unimodal) Unconstrained (indirect) Principal Components Analysis (PCA) Correspondence Analysis (CA) Constrained (direct) Redundancy Analysis (RDA) Canonical Correspondence Analysis (CCA)

25 Inferring Gradients from Species (or Attribute) Data

26 Indirect Gradient Analysis
Environmental gradients are inferred from species data alone Three methods: Principal Component Analysis - linear model Correspondence Analysis - unimodal model Detrended CA - modified unimodal model

27 PCA - linear model

28 PCA - linear model

29 Terschelling Dune Data

30 PCA gradient - site plot

31 PCA gradient - site/species biplot
standard biodynamic & hobby nature

32 Making Effective Use of Environmental Variables

33 Approaches Use single responses in linear models of environmental variables Use axes of a multivariate dimension reduction technique as responses in linear models of environmental variables Constrain the multivariate dimension reduction into the factor space defined by the environmental variables

34 Ordination Constrained by the Environmental Variables

35 Constrained?

36 Working with the Variability that we Can Explain
Start with all the variability in the response variables. Replace the original observations with their fitted values from a model employing the environmental variables as explanatory variables (discarding the residual variability). Carry our gradient analysis on the fitted values.

37 Unconstrained/Constrained
Unconstrained ordination axes correspond to the directions of the greatest variability within the data set. Constrained ordination axes correspond to the directions of the greatest variability of the data set that can be explained by the environmental variables.

38 Dune Data Unconstrained

39 Direct Gradient Analysis
Environmental gradients are constructed from the relationship between species environmental variables Three methods: Redundancy Analysis - linear model Canonical (or Constrained) Correspondence Analysis - unimodal model Detrended CCA - modified unimodal model

40 Direct Gradient Analysis
Basic PCA yik = b0k + b1kxi + eik xi - the sample scores on the ordination axis b1k - the regression coefficients for each species (the species scores on the ordination axis) In RDA there is a further constraint on xi xi = c1zi1 + c2zi2 Making yik = b0k + b1kc1zi1 + b1kc2zi2 + eik

41 Direct Gradient Analysis
cca(species_data ~ e1 + e en, data=environmental_data) cca(dune ~ Manure + Moisture + A1, data=dune.env)

42 Dune Data Constrained

43 Lake Nasser - Egypt

44 Nasser Data Sites – 23 sampling stations on Lake Nasser 3 Data Frames:
Aquatic macrophytes Invertebrate classes Water chemistry

45 Lake Nasser Unconstrained

46 Lake Nasser Constrained

47 Modelling Environmental Variables

48 Ways of Building Models
Automated environmental variable selection (stepwise addition or removal of variables from the model – as with multiple regression) mod0 <- cca(nasser.inverts ~ 1, nasser.watervar) mod1 <- cca(nasser.inverts ~ ., nasser.watervar) op <- options(digits=7) mod <- step(mod0, scope=formula(mod1)) options(op) mod plot(mod)

49 Ways of Building Models
Manual selection of environmental variables using prior knowledge (e.g. example starting with full model and removing terms) mod1 <- cca(nasser.inverts ~ ., nasser.watervar) mod2 <- cca(nasser.inverts ~ . -WMg, nasser.watervar) mod3 <- cca(nasser.inverts ~ . -WMg -WEC, nasser.watervar) mod4 <- cca(nasser.inverts ~ . -WMg -WEC -WCa, nasser.watervar)

50 Ways of Evaluating Models
Graphically using Procrustes Rotation plot(procrustes(mod2, mod1)) plot(procrustes(mod3, mod2)) plot(procrustes(mod4, mod3)) plot(procrustes(mod4, mod1))

51 Procrustes

52 Ways of Evaluating Models
Permutation Tests can be used to assess adequacy of the models using a Pseudo ANOVA or Permutest anova(mod1) anova(mod2) anova(mod3) anova(mod4) permutest.cca(mod1, perm=1000) permutest.cca(mod2, perm=1000) permutest.cca(mod3, perm=1000) permutest.cca(mod4, perm=1000)

53 Removing the Effect of Nuisance Variables

54 Getting rid of the Variability that is Not of Interest
Amongst the explanatory variables there may be variability attributable to: Blocks and other design strata Covariates that we can measure but are not the focus of interest We may want to use only the variability attributable to: Meaningful Environmental Variables

55 Partial Analyses Remove the effect of covariates
variables that we can measure but which are of no interest e.g. block effects, start values, etc. Carry out the gradient analysis on what is left of the variation after removing the effect of the covariates.

56 Lichen-rich Forest Understorey

57 Forest Data Sites – 28 sites in forests in Finland grazed by reindeer
Species Data – 44 heathland plant species (including many lichens and mosses that are very sensitive to their chemical environment) Environmental Data – Soil chemical composition (N P K Ca Mg S Al Fe Mn Zn Mo Baresoil Humdepth pH)

58 CCA

59 Removing pH Effect cca(species_data ~ e1 + e en + Condition(e5), data=environmental_data) cca(varespec ~ Al + P + K + Baresoil + Condition(pH), data=varechem)

60 Removing pH Effect

61 Interactions in Models
cca(species_data ~ e1 + e en + Condition(e5), data=environmental_data) cca(varespec ~ Al + P*(K + Baresoil) + Condition(pH), data=varechem)

62 CCA

63 Removing pH Effect

64 Cluster Analysis

65 Different types of data
example Continuous data : height Categorical data ordered (nominal) : growth rate very slow, slow, medium, fast, very fast not ordered : fruit colour yellow, green, purple, red, orange Binary data : fruit / no fruit

66 Similarity matrix We define a similarity between units – like the correlation between continuous variables. (also can be a dissimilarity or distance matrix) A similarity can be constructed as an average of the similarities between the units on each variable. (can use weighted average) This provides a way of combining different types of variables.

67 Distance metrics relevant for continuous variables:
Euclidean city block or Manhattan A B A B (also many other variations)

68 A Distance Matrix

69 Uses of Distances Distance/Dissimilarity can be used to:-
Explore dimensionality in data (using PCO) As a basis for clustering/classification

70 UK Wet Deposition Network

71 Fitting Environmental Variables

72 A Map based on Measured Variables

73 Fitting Environmental Variables

74 Similarity coefficients for binary data
simple matching count if both units 0 or both units 1 Jaccard count only if both units 1 (also many other variants) simple matching can be extended to categorical data 0,1 1,1 0,0 1,0 0,1 1,1 0,0 1,0

75 Clustering methods hierarchical non-hierarchical divisive
put everything together and split monothetic / polythetic agglomerative keep everything separate and join the most similar points (classical cluster analysis) non-hierarchical k-means clustering

76 Agglomerative hierarchical
Single linkage or nearest neighbour finds the minimum spanning tree: shortest tree that connects all points chaining can be a problem

77 Agglomerative hierarchical
Complete linkage or furthest neighbour compact clusters of approximately equal size. (makes compact groups even when none exist)

78 Agglomerative hierarchical
Average linkage methods between single and complete linkage

79 From Alexandria to Suez

80 Hierarchical Clustering

81 Hierarchical Clustering

82 Hierarchical Clustering

83 Summarise by Weighted Averages

84 Species and Sites as Weighted Averages of each other
SPP Bel per Jun buf …42.. Jun art Air pra Ele pal Rum ace …23.. Vic lat Bra rut Ran fla Hyp rad Leo aut Pot pal Poa pra …4.. Cal cus Tri pra …2.. Tri rep Ant odo Sal rep Ach mil …2.. Poa tri …45.. Ely rep Sag pro Pla lan …5.. Agr sto Lol per …6.. Alo gen Bro hor …2..

85 Species and Sites as Weighted Averages of each other

86 Reciprocal Averaging - unimodal
Site A B C D E F Species Prunus serotina Tilia americana Acer saccharum Quercus velutina Juglans nigra

87 Reciprocal Averaging - unimodal
Site A B C D E F Species Score Species Iteration Prunus serotina Tilia americana Acer saccharum Quercus velutina Juglans nigra Iteration Site Score

88 Reciprocal Averaging - unimodal
Site A B C D E F Species Score Species Iteration Prunus serotina Tilia americana Acer saccharum Quercus velutina Juglans nigra Iteration Site Score

89 Reciprocal Averaging - unimodal
Site A B C D E F Species Score Species Iteration Prunus serotina Tilia americana Acer saccharum Quercus velutina Juglans nigra Iteration Site Score

90 Reciprocal Averaging - unimodal
Site A B C D E F Species Score Species Iteration Prunus serotina Tilia americana Acer saccharum Quercus velutina Juglans nigra Iteration Site Score

91 Reordered Sites and Species
Site A C E B D F Species Species Score Quercus velutina Prunus serotina Juglans nigra Tilia americana Acer saccharum Site Score

92 Gradient Length

93 Alpha and Beta Diversity
alpha diversity is the diversity of a community (either measured in terms of a diversity index or species richness) beta diversity (also known as ‘species turnover’ or ‘differentiation diversity’) is the rate of change in species composition from one community to another along gradients; gamma diversity is the diversity of a region or a landscape.

94 A Short Coenocline

95 A Long Coenocline

96 Arches - Artifact or Feature?

97 The Arch Effect What is it? Why does it happen?
What should we do about it?

98 CA - with arch effect (sites)

99 CA - with arch effect (species)

100 Long Gradients A B C D

101 Gradient End Compression

102 CA - with arch effect (species)

103 CA - with arch effect (sites)

104 Detrending by Segments

105 DCA - modified unimodal

106 Testing Significance in Ordination

107 Randomisation Tests

108 Randomisation Tests

109 Randomisation Example
Model: cca(formula = dune ~ Moisture + A1 + Management, data = dune.env) Df Chisq F N.Perm Pr(>F) Model < *** Residual Signif. codes: 0 *** ** 0.01 * 0.05


Download ppt "Multivariate Description"

Similar presentations


Ads by Google