Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.

Similar presentations


Presentation on theme: "1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions."— Presentation transcript:

1 1 Multivariate Statistics ESM 206, 5/17/05

2 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions with large datasets with many variables Ordination: find a (hopefully small) number of composite variables that capture most of the variability among data points Cluster Analysis: discover natural groupings of similar data points Discriminant Analysis: find a (hopefully small) number of composite variables that can be used to predict the levels of a categorical dependent variable Canonical Correlation Analysis: find relationships between two groups of variables –“Dependent variable” is multivariate

3 3 WHAT CAN MULTIVARIATE STATISTICS DO? Reflect more accurately the true multidimensional nature of environmental systems Provide a way to handle large datasets with large numbers of variables by summarizing the redundancy Provide rules for combining variables in an “optimal” way Provide a means of detecting and quantifying truly multivariate patterns that arise out of correlational structure of the variable set Provide a means of exploring complex data sets for patterns and relationships from which hypotheses can be generated and subsequently tested experimentally

4 4 DISTINGUISHING ECOLOGICAL NICHES OF 3 SPECIES

5 5

6 6

7 7 ORDINATION Simplify the interpretation of complex data by organizing sampling entities along independent gradients or factors defined by combinations of interrelated variables Uncover a more fundamental set of factors that account for the major patterns across all of the original variables If a few major gradients explain much of the variability in data, then data can be interpreted with respect to these gradients without loss of information

8 8 PRINCIPAL COMPONENTS ANALYSIS (PCA) Most commonly used ordination technique Given P correlated variables, extract P principal components –Linear combinations of the variables –Uncorrelated with one another –First PC is direction through data cloud that captures the most variance in data –Second PC is direction perpendicular to first that captures the most remaining variance –Etc. Assumptions of PCA: 1.Data are multivariate normal 2.Data are independent 3.Observed variables depend linearly on underlying factors May need to transform data to satisfy these Unless variables are all measured on same scale, use correlations rather than covariances –Gives equal weight to variability in all variables

9 9 EXAMPLE: CHEMICAL SOLUBILITY 72 chemical compounds tested for solubility in each of 6 solvents –Solubility measure on log scale Strong (but not perfect) correlations among the 6 solvents Can we use fewer than 6 variables to characterize each chemical?

10 10 SOLUBILITY PCA Eigenvalue indicates how much of the variability in data is explained by the PC –Magnitude depends on number of variables (and variances if done with covariance matrix) –Instead look at percents Eigenvector gives coefficients of linear relationship of PC to each variable –NOTE: some software scales the eignvectors differently Interpretation: PC1 is axis of overall increasing solubility PC2 is axis of differential solubility in 1-Ocatanol & Ether vs. other 4 solvents

11 11

12 12 CHARACTRISTICS OF ORDINATION Organizes sampling entities (e.g., species, sites, observations) along continuous environmental gradients Assesses relationships within single set of variables; doesn’t define relationship between a set of independent variables and one or more dependent variables –However, PC’s can be used as independent variables in a regression Reduces dimensionality of multivariate data set by condensing large # of original variables into smaller set of new composite variables with minimal loss of information Summarizes data redundancy by placing similar entities in proximity in ordination space Defines new composite variables (e.g., principal components) as weighted linear combinations of the original variables Eliminates noise from a multivariate data set by recovering patterns in first few composite dimensions and deferring noise to subsequent axes

13 13 OTHER ORDINATION TECHNIQUES Polar Ordination (PO) Factor Analysis (FA) –This is often used as a generic term meaning “ordination” in social sciences Nonmetric Multidimensional Scaling (NMMDS) –Relaxes normality and linearity assumptions by using ranks Correspondence Analysis (CA) –Allows data (e.g., species abundance) to take on peak values at intermediate levels of the gradient –Also called Reciprocal Averaging Detrended Correspondence Analysis (DCA) –Deals particularly well with nonlinear relationships Canonical Correspondence Analysis (CCA) –Like CA, but ordination of variables of interest (e.g., species abundance) is constrained to depend linearly on other variables (e.g., environmental characteristics) measured at same sites

14 14 FURTHER READING McGarigal, K., S. Cushman, and S. Stafford. 2000. Multivariate Statistics for Wildlife and Ecology Research (Springer-Verlag, New York). Gotelli, H.J., and A.M. Ellison. 2004. A Primer of Ecological Statistics (Sinauer, Sunderland, MA); Chapter 12. Spicer, J. 2005. Making Sense of Multivariate Data Analysis (Sage Press, Thousand Oaks).


Download ppt "1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions."

Similar presentations


Ads by Google