Presentation on theme: "An Introduction to Multivariate Analysis"— Presentation transcript:
1 An Introduction to Multivariate Analysis Lectures 14-15Drs. Alan S.L. Leung and Kenneth M.Y. Leung
2 Multivariate analysis An extension to univariate (with a single variable) and bivariate (with two variables) analysisDealing with a number of samples and species/environmental variables simultaneously
3 Multivariate Data Set Data usually in a form of data matrix….. Morphological measurement of organisms (e.g. length)Physiological measurement of organisms (e.g. blood pressure)Physiochemical measurement of the environment (e.g. air temperature)Species abundanceSpecies richness etc……Data usually in a form of data matrix…..
7 Similarity (S) between samples Ranged from 0 to 100 % or 0 to 1S = 100% if two samples are totally similar (i.e. the entries in two samples are identical)S = 0 if two samples are totally dissimilar (i.e. the two samples has no species in common)
8 Bray-Curtis coefficient (Bray & Curtis, 1957) First developed in terrestrial ecologyWhere,yij represented the abundance of species i in sample j,yik represented the abundance of species i in sample k, andn represented the total number of samples.
9 Please calculate the Bray-Curtis Similarity between samples: where, yij represented the abundance of species i in sample j, yik represented theabundance of species i in sample k, and n represented the total number of samples.Please calculate the Bray-Curtis Similarity between samples:X2 and X3X3 and Y1
12 Transformation Two distinct roles: To validate statistical assumptions for parametric analysis (e.g. variance heterogeneity in ANOVA)To weight the contributions of common and rare species in non-parametric multivariate analysis
13 Why Transforming the data? To weight the contributions of common and rare speciesTransformed and untransformed data can give different results on the computation of dissimilarities between samplesAffect the final outcome (solution) of nMDS
14 Choice of transformation in multivariate analysis Intermediate abundance speciesSquare-rootFourth-root / Log (1+y)Presence/AbsenceDegree of severityRare speciesNot commonly used
15 Species similarity matrix – Fourth-root transformed Some patterns can be seen, but…
16 Multivariate Techniques The most widely used multivariate techniques included:Cluster AnalysisOrdinationE.g. Multiple discriminant analysis
17 Cluster AnalysisPut samples (sites, species, or environmental variables) into groups based on their similarity.Samples within the same group are more similar to each other than samples in different groups
18 DendrogramSamplesStatistical Software: PRIMER 5 for Windows
19 Ordination Graphical presentation technique Ordination map (usually two or three-dimensional)The relatively distances among points in the ordination map represent the similarity among samples (say species composition)
20 Two Types of Ordination Techniques Indirect gradient analysisOnly includes biological data- Species abundance by samples matrixEnvironmental data can be correlated with the ordination axes subsequentlyDirect gradient analysisIncludes both environmental and biological data
22 PCA Use original data matrix Best-fit curve First Principle Component Axis (PC1)Source: Clarke, K. R. & Warwick, R. M. (1994) Change in Marine Communities: an Approach to Statistical Analysis and Interpretation.Plymouth Marine Laboratory, Plymouth: 144pp.
23 RotationSecond principal component axis (PC2) – perpendicular to PC1 (i.e. uncorrelated / orthogonal)
24 Third principal component axis (PC3) Theoretically, many more species can be added
26 The variances extracted by the PCs EigenvaluesPC Eigenvalues %Variation Cum.%VariationEigenvectors(Coefficients in the linear combinations of variablesmaking up PC's)Variable PC1 PC2 PC3 PC4 PC5ABCDESpecies
28 Ecological data which can fulfill these assumptions are rare….. PCA AssumptionsLinear relationships between variablesNormality of the variablesEcological data which can fulfill these assumptions are rare…..
29 Multidimensional Scaling A technique for analyzing multivariate dataVisualization of the relationships between samples to facilitate interpretation in a low dimensional spaceThere are two types of MDS:MetricNon-metric
30 Metric MDS: Non-metric MDS (nMDS) Assume the input data is either interval or ratio during measurementQuantitativeNon-metric MDS (nMDS)The data should be in the form of rankQuantitative and/or Qualitative
31 Major Advantages of nMDS Ordination is based on the ranked similarities/dissimilarities between pairs of samplesOrdinal data could be usedThe actual values of data are not being used in the ordination, few (no?) assumptions on the nature and quality of the datae.g. 1 = very low; 2 = low; 3 = mid; 4 = high; 5 = very high
32 Bray-Curtis similarity Modified from Clarke & Warwick, 1994
33 An Ecological Example Spatial and temporal variability in benthic macroinvertebrate communities in HongKong Streams
41 Nested analysis of variance (ANOVA) Statistical AnalysisNested analysis of variance (ANOVA)Regions (Random, orthogonal)Sites (Random, nested within Regions)Sections (Random, nested within Sites)SpatialYears (Random, orthogonal)Seasons (Fixed, orthogonal)Days (Random, nested within Years and Seasons)TemporalInteractions between them
42 Non-parametric multivariate analysis Statistical AnalysisNon-parametric multivariate analysisNon-metric multidimensional scaling (NMDS)Analysis of similarities (ANOSIM)Display the stream community data in ordination diagrams intended to reveal underlying patterns in the community structureCompare the community structure among spaces and times
50 Multivariate analysis - Temporal Years [All samples in all sites; Each Region; Each Site; Each Section in each Site]Seasons (all years & each year) [All samples in all sites; Each Region; Each Site; Each Section in each Site]Dates within Seasons in each year
54 Day 1 A1–Day 1 B2n.s.Day 1 C3Day 2 A40.281Day 2 B5Day 2 C6Day 3 A70.531Day 3 B80.5000.698Day 3 C9Day 4 A100.7710.7290.8130.8440.8330.792Day 4 B110.6880.6670.5210.3330.4690.615Day 4 C120.4060.4480.417ANOSIMR statistics:R = 1 only if all replicates within sites are more similar to each other than any replicates from different sitesR is approximately zero if the similarities between and within sites are the same on averageResults of one-way ANOSIM between the Lam Tsuen site sampling sections within the dry season in The pairs that are significantly different (at 5% significant level) are shown with the R statistics values.
55 Day 1 A1–Day 1 B2n.s.Day 1 C3Day 2 A40.281Day 2 B5Day 2 C6Day 3 A70.531Day 3 B80.5000.698Day 3 C9Day 4 A100.7710.7290.8130.8440.8330.792Day 4 B110.6880.6670.5210.3330.4690.615Day 4 C120.4060.4480.417Day 2Day 3ANOSIMR statistics:R = 1 only if all replicates within sites are more similar to each other than any replicates from different sitesR is approximately zero if the similarities between and within sites are the same on average
56 ANOSIMLT 1997 Dry SeasonThe number of pairs of sections significantly different (percentage)Average R statistics of significantly different pairsThe same section between different days9/18 (50%)0.578Among all sections within the same day0/12 (0%)––Among all sections between different days20/36 (56%)0.602
57 ANOSIMLT 1997 Wet SeasonThe number of pairs of sections significantly different (percentage)Average R statistics of significantly different pairsThe same section between different days15/18 (83%)0.674Among all sections within the same day2/12 (17%)0.662Among all sections between different days29/36 (81%)0.647
58 Implications The macroinvertebrate community structures are, on average:more similar within the same region more similar within the same site …. and the patterns are more obvious in the dry seasonsSites of the same region are more similar to each othersSamples of the same site are more similar to each others
59 ImplicationsThere is no obvious pattern on the community structure between sections within a siteThe community structures of the study sites are, in general, similar between yearsSeasonalityThe spatial scale “Sections” is not an important factorHowever, in some sites, variation between years could be highThere are STRONG seasonality patterns. However, withinseason variation (days) is also noticeable
60 ImplicationsPatterns in the community structure are uncovered. Regions, Sites and Seasons are important factors to our understanding of the stream communities in Hong KongAlthough there is small scale variability (within site), large scale variability (among sites and between regions) is playing a more important role in the macroinvertebrate communities