Multivariate statistics for use in ecological studies Kevin Wilcox ECOL 600 – Community Ecology Spring 2014.

Multivariate statistics for use in ecological studies Kevin Wilcox ECOL 600 – Community Ecology Spring 2014

Useful web resources Vegan tutorial: http://cc.oulu.fi/~jarioksa/opetus/metodi/vegantutor.pdf The little book of r for multivariate analyses: http://little-book-of-r-for-multivariateanalysis.readthedocs.org/en/latest/src/multivariateanalysis.html#means-and- variances-per-group Ordination Methods by Michael Palmer: http://ordination.okstate.edu/overview.htm#Nonmetric_Multidimensional_Scaling Community analyses lectures by Jari Oksanen: http://cc.oulu.fi/~jarioksa/opetus/metodi/

Univariate statistics to measure community dynamics Richness (R or S, Either local or regional) Shannon index (H’; Shannon &Weaver 1949) Incorporates richness as well as the relative abundances into a metric Emphasizes richness Simpsons index (D or λ; Simpson 1949) Emphasizes evenness Pielou’s evenness index (J’)

Univariate indices (cont.) Specieslow.lightmid.lighthigh.light A0.7598760.3837370.083528 B0.6207180.1563240.152468 C0.2459970.5245310.185889 D0.337690.5714570.52394 E0.2075860.2813120.545748 F0.1439890.2893510.561022 Metriclow.lightmid.lighthigh.light Richness666 H’1.631.711.60 D0.780.810.77 J’0.910.960.89 Species A and B are dominant in low light All species do OK with moderate light Species E and F are dominant in high light No information about individual species responses

You could look at each species individually Lacks clarity with many species When looking solely at individual responses, you lose information about entire community dynamics Light Abundance OR…

You could use multivariate statistics 3 parts: 1.Dissimilarity matrices 2.Ordinations 3.Statistical tests of differences between or among communities low.lightmid.lighthigh.light low.light00.3475730.491285 mid.light0.34757300.287918 high.light0.4912850.2879180 MDS1 low.lightmid.lighthigh.light low.light mid.light0.347573 high.light0.4912850.287918

3 parts: 1.Dissimilarity metrics and matrices 2.Ordination 3.Statistical tests of differences between or among communities Software: R SAS SPSS PRIMER with PERMANOVA+ PC-ORD You could use multivariate statistics

Dissimilarity metrics are the building blocks used in many multivariate statistics Visual representation (ordination) Statistical tests Think carefully about which type of matrix or dissimilarity metric you should use P < 0.05

Dissimilarity matrices… brace yourself A dissimilarity matrix is simply a table that compares all local communities (plots). The higher the number, the more dissimilar the communities are sp1sp2sp3sp4sp5sp6sp7sp8sp9sp10 plot10.6622050.350450.7785120.4599160.5528450.5701770.4586880.5691260.5366080.647962 plot20.5630420.3480890.4782760.6422960.4941290.5054710.3803920.4497990.6585230.418476 plot30.7187890.4525080.5580810.585130.588810.5030740.6853870.5841840.3376960.520809 plot40.6357450.3740850.5016640.3978160.5436340.5708570.4309360.5944280.4920540.477197 plot50.4079580.28440.6196150.497970.2686070.5403570.4399140.66020.4654810.515861 plot60.4421340.5420960.5464380.5758780.5575830.3142150.3633730.4277130.611890.765725 plot70.4016360.7258720.3167280.5736880.3296040.4630.5168890.4996630.5795060.530339 plot80.3530480.6537210.5897790.4812710.5497430.4451260.6819870.5986170.4432140.386528 plot90.542110.466420.4958430.3351230.6952280.3155190.6105750.5169010.5255990.377469 plot100.4588720.5733750.4850130.4529280.6044780.6361830.3980310.5080750.3423920.440215

Dissimilarity matrices… brace yourself A dissimilarity matrix is simply a table that compares all local communities (plots). The higher the number, the more dissimilar the communities are plot1plot2plot3plot4plot5plot6plot7plot8plot9 plot20.119391 plot30.1056720.129548 plot40.0629240.0905720.092943 plot50.1112470.1374850.1406320.100948 plot60.1351120.1127540.1502430.1473310.168915 plot70.1739120.1370660.1572810.1597040.1380850.131513 plot80.1446940.1582550.0986380.123560.1317750.146640.125334 plot90.1458050.1350680.1219510.1062570.1843960.1412950.1679020.112891 plot100.1304640.1199840.1134350.0887280.1399980.130520.1424640.1085120.122446

Types of dissimilarity metrics Euclidean distance Operates in species space Meaning that each species (or dependent variable) gets its own orthogonal axis in multidimensional space. Because the differences are squared, single large differences become very important when determining dissimilarities Dissimilarities between pairs of plots with no shared species are not necessarily the same This is why ED is usually used for environmental and not abundance data plot1plot2plot3plot4plot5plot6plot7plot8plot9 plot20.119391 plot30.1056720.129548 plot40.0629240.0905720.092943 plot50.1112470.1374850.1406320.100948 plot60.1351120.1127540.1502430.1473310.168915 plot70.1739120.1370660.1572810.1597040.1380850.131513 plot80.1446940.1582550.0986380.123560.1317750.146640.125334 plot90.1458050.1350680.1219510.1062570.1843960.1412950.1679020.112891 plot100.1304640.1199840.1134350.0887280.1399980.130520.1424640.1085120.122446

Types of dissimilarity metrics Manhattan-type distances Bray-Curtis (abundance data) Jacaard (presence-absence) Use sums or differences instead of squared terms making it less sensitive to single differences Reach a maximum dissimilarity of 1 when there are no shared species between communities plot1plot2plot3plot4plot5plot6plot7plot8plot9 plot20.119391 plot30.1056720.129548 plot40.0629240.0905720.092943 plot50.1112470.1374850.1406320.100948 plot60.1351120.1127540.1502430.1473310.168915 plot70.1739120.1370660.1572810.1597040.1380850.131513 plot80.1446940.1582550.0986380.123560.1317750.146640.125334 plot90.1458050.1350680.1219510.1062570.1843960.1412950.1679020.112891 plot100.1304640.1199840.1134350.0887280.1399980.130520.1424640.1085120.122446

Dissimilarity metrics are used to look at differences between communities Euclidean distances are good for looking at many types of environmental data but is not great for species abundances. Knapp et al. in prep

Ordinations Basically, ordinations plot the communities based on all response variables (e.g. species responses) and then squish this into 2 or 3 dimensions. Example 1: 2 species, 2 axes. Species A Species B Plot 1 Plot 2 Plot 3 Sp.A Sp.B

Ordinations Plots the communities based on the response variables and then squishing this into 2 or 3 dimensions. Example 2: 3 species, 3 axes Etc up to n response variables We can’t visualize this well after 3 axes but it happens Species A Species B Plot 1 Plot 2 Plot 3 Species C Plot 4

Ordinations Example 3: 3 species, 2 axes. Axis 1 Axis 2 Plot 1 Plot 2 Plot 3 Sp.A Sp.B Plot 4 Sp.C Species A Species B Plot 1 Plot 2 Plot 3 Species C Plot 4

Ordinations Analyzes communities based on all response variables and then 2 species, 2 axes 3 species, 3 axes Etc up to n species n axis Need to squash n dimensions into 2. Ordination rotates the axes to minimize distance from primary axes and maximize explanation of variance by axes

Constrained vs unconstrained ordinations Constrained ordination makes the data fit into measured variables This is limiting because you can only examine species differences to things you measure However this is beneficial if you are interested in only a couple of environmental variables Unconstrained tries to represent variability of the data even if there are no variables to explain the variation For example, if different temperatures in two areas caused altered communities but not included in the model, you would still be able to detect differences in community structure Better for exploratory analyses

Types of unconstrained ordinations Principle components analysis (PCA) Uses Euclidean distances to map plots with the 2 or 3 axes that explain the majority of variation Use with environmental data Be sure to standardize response variables if they are in different scales Principle coordinates analysis (PCO; Gower 1966) Acts like PCA but uses a dissimilarity matrix instead of pulling straight from the data. This is more like plotting a close fitting trendline instead of the actual data. Fits the line by maximizing a linear correlation – this can be problematic Is sometimes called metric dimensional scaling (MDS)… not to be confused with NMDS

Non-metric multidimentional scaling (NMDS) - PRIMER calls this MDS!!! Ugh. Very complicated.. In the past, the drawback with this technique was the large amount of computing power necessary… this is no longer an issue. Preserves rank order of relationships while plotting more similar local communities closer together in 2D or 3D space – this solves the linear problem Axes aren’t constrained by distances (e.g. Euclidean) so this method is more flexible. Types of unconstrained ordinations

Non-metric multidimentional scaling (NMDS) Stress = mismatch between rank orders of distances in data and in ordination Excellent – stress < 0.05 Good – stress < 0.1 Acceptable – 0.1 < stress < 0.2 On the edge – 0.2 < stress < 0.3 Unacceptable – stress > 0.3 To cope with high stress… Unconstrained ordinations Increase the number of dimensions of your ordination.. if possible

Types of constrained ordinations Constrained analysis of proximities (CAP) You can plug in any dissimilarity matrix into this Performs linear mapping Redundancy analysis (RDA) Constrained version of PCA Constrained correspondence analysis (CCA) Based on Chi-squared distances Weighted linear mapping

Incorporating environmental data into ordination Can overlay vectors of environmental data on top of community data Vectors supply information about the direction and strength of environmental variables Easy to interpret the effects of many variables However, it assumes all relationships are linear. This might not be the case… Oksanen 2013

Incorporating environmental data into ordination Can overlay surfaces of environmental data on top of community data Surfaces provide more detailed information about how communities exist within abiotic variables More difficult to interpret with more than a couple variables Using treatments is a special case for this Oksanen 2013

Ordination by itself is not a robust statistical test Although ordination is great for visualizing your data, we need to back it up. One way is to calculate confidence ellipses around the centroid Another way is to use resemblance-based permutation methods They give P values… For discussion how to do this in R, see: http://stats.stackexchange.com/questions/34017/c onfidence-intervals-around-a-centroid-with- modified-gower-similarity

Resemblance-based permutation methods One benefit to these techniques is that they compare n dimensional data instead of ordination data squished into 2 or 3D Many assumptions of regular MANOVAs are violated with ecological community data (see Clarke 1993) which spurred the creation of new methods for analyzing multivariate data 3 majorly used methods: Permutational MANOVA (or PERMANOVA) Analysis of similarities (ANOSIM) Mantel’s test One assumption of all three of these tests is equal variance among treatments… This is a problem but we’ll come back to this

ANOSIM – Clarke 1993 Ranks dissimilarities among local communities from 1 to the number of comparisons made Then looks at averages of ranked dissimilarities within and among groups Compares these averages to random permutations of the R values to get p-value (Originally from Clarke 1993 and reviewed in Anderson and Walsh 2013) =1 if i and j are in the same group and =0 if they are in different groups Mean dissimilarity rank of plot pairs between groups Mean dissimilarity rank of plot pairs within a group Used to calculate P value

ANOSIM – Clarke 1993 Essentially, during each permutation, plot labels in the dissimilarity matrix are shuffled and an R value is calculated. Over many permutations, a null distribution for R is created which the original R can be compared to - a p-value is obtained by where the original R falls on the distrubution R Density Compare R actual to calculate p value

Mantel test Doesn’t use ranks To compare groups, it uses one dissimilarity matrix and one model matrix to designate contrasts and compare within and among groups p value is calculated as the proportion of z(0,1) (within group dissimilarities) that is lower or equal to z(1,0) (between group dissimilarities) plot1plot2plot3plot4plot5plot6plot7plot8plot9 plot20.119391 plot30.1056720.129548 plot40.0629240.0905720.092943 plot50.1112470.1374850.1406320.100948 plot60.1351120.1127540.1502430.1473310.168915 plot70.1739120.1370660.1572810.1597040.1380850.131513 plot80.1446940.1582550.0986380.123560.1317750.146640.125334 plot90.1458050.1350680.1219510.1062570.1843960.1412950.1679020.112891 plot100.1304640.1199840.1134350.0887280.1399980.130520.1424640.1085120.122446 Group Z Group Y Plot1plot2plot3plot4plot5plot6plot7plot8plot9 plot21 plot311 plot4111 plot51111 plot600000 plot7000001 plot80000011 plot900000111 plot10000001111 (See Legendre & Legendre 2012 for more detail)

PERMANOVA Calculates a pseudo-F statistic Pseudo-F is identical to a normal F statistic if there is only one response variable This pseudo-F is calculated using the original data and compared with a distribution of pseudo F statistics from many random permutations. This step is the same as ANOSIM. (See Anderson 2001, 2005 for more detail) Pseudo F Density Pseudo F

Choosing a method A major assumption in all three methods is equal variance among groups This is often violated in real-world communities In fact, this change in variance (i.e. dispersion or convergence among replicates or beta diversity) is often of interest to ecologists So… how do we deal with this? Anderson and Walsh 2013

Choosing a method Anderson and Walsh 2013

PERMDISP Permutational analysis of multivariate dispersions (Anderson 2004) Compares multivariate dispersion among groups Uses any distance or dissimilarity measure you feed into it 2 main reasons to use this: 1.To look for violations of assumptions in tests of centroid location (although, as we discussed above, this may not be as big of a deal as once thought) 2.Variance among local communities within a treatment may be of ecological interest (for more info about using community dissimilarity methods to estimate beta diversity, see Legendre & Caceres 2013) Chase 2007 Anderson 2004

SIMPER Similarity percentages of component species or functional groups Bray-Curtis dissimilarity matrix is implicit in a SIMPER analysis Can force it to use a Euclidean distance matrix in PRIMER I have not seen evidence for or against this practice…. Use this to find out which variables are responsible for observed shifts in multivariate space Knapp et al. in prep

References Anderson, Marti J., and Daniel CI Walsh. "PERMANOVA, ANOSIM, and the Mantel test in the face of heterogeneous dispersions: What null hypothesis are you testing?." Ecological Monographs 83.4 (2013): 557-574. Anderson, M. J. "PERMDISP: a FORTRAN computer program for permutational analysis of multivariate dispersions (for any two-factor ANOVA design) using permutation tests." Department of Statistics, University of Auckland, New Zealand (2004). Anderson, Marti J. "Permutational multivariate analysis of variance." Department of Statistics, University of Auckland, Auckland (2005). Chase, Jonathan M. "Drought mediates the importance of stochastic community assembly." Proceedings of the National Academy of Sciences104.44 (2007): 17430-17434. Clarke, K R. "Non‐parametric multivariate analyses of changes in community structure." Australian journal of ecology 18.1 (1993): 117-143. Gower, John C. "Some distance properties of latent root and vector methods used in multivariate analysis." Biometrika 53.3-4 (1966): 325-338. Legendre, Pierre, and Miquel Cáceres. "Beta diversity as the variance of community data: dissimilarity coefficients and partitioning." Ecology letters 16.8 (2013): 951-963. Legendre, Pierre, and Louis Legendre. Numerical ecology. Vol. 20. Elsevier, 2012. Oksanen, Jari. "Multivariate analysis of ecological communities in R: vegan tutorial." R package version (2011): 2-0. Shannon, Claude E., and Warren Weaver. "The mathematical theory of information." (1949). Simpson, Edward H. "Measurement of diversity." Nature (1949).

Interactions between climate and plant community structure alter ecosystem sensitivity and thus ecosystem function

Precipitation regimes Ecosystem function and services Ecosystem Sensitivity 1 1 Direct impacts of precipitation regimes are based on ecosystem sensitivity Sensitivity = absolute change in productivity per unit change in precipitation IPCC 2007

Precipitation regimes Ecosystem function and services Ecosystem Sensitivity 1 Direct impacts of precipitation regimes are based on ecosystem sensitivity Precipitation regimes may alter ecosystem sensitivity through changes in soil moisture dynamics 1 2 2 Soil moisture dynamics

Climate regimes Ecosystem function and services Ecosystem Sensitivity Community composition 1 1 Direct impacts of precipitation regimes are based on ecosystem sensitivity Precipitation regimes may alter ecosystem sensitivity through changes in soil moisture dynamics 2 2 Individual species responses to long term climate regimes shifts are a potential mechanism that may structure communities 3 3 Soil moisture dynamics Species responses

3 4 Climate regimes Ecosystem function and services Ecosystem Sensitivity Community composition 1 Soil moisture dynamics Species responses 2 3 1 Direct impacts of precipitation regimes are based on ecosystem sensitivity Precipitation regimes may alter ecosystem sensitivity through changes in soil moisture dynamics 2 Community composition can directly affect ecosystem services through dominance or diversity effects or indirectly by altering ecosystem sensitivity to precipitation regimes 3 Individual species responses to long term climate regimes shifts are a potential mechanism that may structure communities 4

Overarching question… Do interactions between precipitation drivers, plant community structure, and ecosystem sensitivity alter effects of precipitation regimes on ecosystem function?

Shifts in Ecosystem fxn across space and time

B C A. Changes in overall soil moisture cause a change in the intercept of the Productivity – Precipitation relationship B. Different drought sensitivities of component species within a community control slope and intercept of the relationship by altering ecosystem responses in dry years C. Growth limitations (e.g. growth rate maximums, co-limitation by other resources such as N) of component species in wet years determine slope and intercept. A A Precipitation Ecosystem Function Dry years Wet years

Experimental designs ANPP data from 2 long-term data sets and linked precipitation data 1.Irrigation transect – relieves water limitation throughout the growing season 1991-2011 2.Uplands vs Lowlands – Annually burned, ungrazed watershed. 1984 – 2011 Looked at slopes between growing season rainfall and ANPP to assess sensitivity in control and manipulated plots.

I) Reduced soil water capacity B C A I)Chronic reduction of soil water availability should cause increased sensitivity due to a reduction in the overall productivity of the system (A; i.e. lowered intercept), while the slope and intercept are altered by the resident plant community. The capacity for growth in wet years (C) should be similar due to unchanged growth potential and lack of limiting nutrients, but the negative response to drought should be increased (B) due to reduction of soil water stores to buffer against drought. II)Increased soil water availability should decrease sensitivity by increasing overall productivity of the system (A; i.e. increased intercept), while limitations on cumulative growth rates of the extant plant community should reduce productivity response in wet years (C) thus reducing sensitivity of the system to precipitation inputs (i.e. slope). Precipitation Ecosystem Function I)Chronic reduction of soil water availability should cause increased sensitivity due to a reduction in the overall productivity of the system (A; i.e. lowered intercept), while the slope and intercept are altered by the resident plant community. The capacity for growth in wet years (C) should be similar due to unchanged growth potential and lack of limiting nutrients, but the negative response to drought should be increased (B) due to reduction of soil water stores to buffer against drought. II)Increased soil water availability should decrease sensitivity by increasing overall productivity of the system (A; i.e. increased intercept), while limitations on cumulative growth rates of the extant plant community should reduce productivity response in wet years (C) thus reducing sensitivity of the system to precipitation inputs (i.e. slope). II) Increased soil water availability A C

Ecosystem response to altered precip regimes / soil conditions * Precipitation Ecosystem Function

Soil depth Irrigation Sensitivity shifts? n.s. ** Growing season precipitation (mm) ANPP (g/m2) n.s.

How is community structure modifying these relationships? Smith 2009

Uplands vs lowlands Species Contribution to divergence (%) Panicum virgatum21.68 Schizachyrium scoparium16.76

Predictions based on abiotic forcings Reduction in dry years because of limited soil water storage to buffer plants during periods of drought Predictions when incorporating biotic forcings Water limitation is not an important factor in wet years Growth rate limitations of extant species limit production in wet years Decreased drought sensitivity of extant species limit production loss in dry years Uplands vs lowlands

Chronic irrigation and community shifts After 10 years, reordering of the community occurred Panicum virgatum replaced Andropogon gerardii as the dominant species in 2001 We decided to look at ecosystem sensitivity before and after this community shift to test some of our predictions Collins et al. 2012

Predictions based on abiotic forcings Predictions when incorporating biotic shifts Growth rate or other resources limit production in wet years Chronic irrigation As species take over that do not have growth rate limitations, productivity responses in wet years should increase

Community change over time and altered sensitivity Axis 1 Axis 2 1991 2011*

Conclusions and implications 1.Sensitivity of ecosystems to climate drivers may be altered under future precipitation regimes 2.Additionally, community shifts driven by these altered precipitation regimes may cause a change in ecosystem sensitivity 3.Short term experiments may not pick up these community driven sensitivity changes

Multivariate statistics for use in ecological studies Kevin Wilcox ECOL 600 – Community Ecology Spring 2014.

Similar presentations

Presentation on theme: "Multivariate statistics for use in ecological studies Kevin Wilcox ECOL 600 – Community Ecology Spring 2014."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multivariate statistics for use in ecological studies Kevin Wilcox ECOL 600 – Community Ecology Spring 2014.

Similar presentations

Presentation on theme: "Multivariate statistics for use in ecological studies Kevin Wilcox ECOL 600 – Community Ecology Spring 2014."— Presentation transcript:

Similar presentations

About project

Feedback