Introduction to the gradient analysis. Community concept (from Mike Austin)

Slides:



Advertisements
Similar presentations
Questions From Yesterday
Advertisements

Multivariate Description. What Technique? Response variable(s)... Predictors(s) No Predictors(s) Yes... is one distribution summary regression models...
What we Measure vs. What we Want to Know
Multivariate Description. What Technique? Response variable(s)... Predictors(s) No Predictors(s) Yes... is one distribution summary regression models...
Tables, Figures, and Equations
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.

An Introduction to Multivariate Analysis
Part I – MULTIVARIATE ANALYSIS C1 Introduction to MA © Angel A. Juan & Carles Serrat - UPC 2007/2008.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Experiments evaluated using multivariate methods.
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)
Chapter 17 Overview of Multivariate Analysis Methods
Visual Recognition Tutorial
Lecture 7: Principal component analysis (PCA)
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
Multivariate Methods Pattern Recognition and Hypothesis Testing.
CHAPTER 19 Correspondence Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
A quick introduction to the analysis of questionnaire data John Richardson.
10/17/071 Read: Ch. 15, GSF Comparing Ecological Communities Part Two: Ordination.
Chapter 11 Multiple Regression.
CHAPTER 30 Structural Equation Modeling From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach,
Proximity matrices and scaling Purpose of scaling Similarities and dissimilarities Classical Euclidean scaling Non-Euclidean scaling Horseshoe effect Non-Metric.
Elaboration Elaboration extends our knowledge about an association to see if it continues or changes under different situations, that is, when you introduce.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Proximity matrices and scaling Purpose of scaling Classical Euclidean scaling Non-Euclidean scaling Non-Metric Scaling Example.
Smith/Davis (c) 2005 Prentice Hall Chapter Eight Correlation and Prediction PowerPoint Presentation created by Dr. Susan R. Burns Morningside College.
Community Ordination and Gamma Diversity Techniques James A. Danoff-Burg Dept. Ecol., Evol., & Envir. Biol. Columbia University.
Large Two-way Arrays Douglas M. Hawkins School of Statistics University of Minnesota
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
DIRECT ORDINATION What kind of biological questions can we answer? How can we do it in CANOCO 4.5?
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Lecture 5. Niching and Speciation (2) 4 학습목표 진화로 얻어진 해의 다양성을 확보하기 위한 대표 적인 방법에 대해 이해한다.
Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis.
Multidimensional scaling MDS  G. Quinn, M. Burgman & J. Carey 2003.
Spatial Statistics in Ecology: Continuous Data Lecture Three.
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Classification. Similarity measures Each ordination or classification method is based (explicitely or implicitely) on some similarity measure (Two possible.
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
Multivariate Data Analysis  G. Quinn, M. Burgman & J. Carey 2003.
Constrained ordinations Dependence of multivariate response on one or many predictors.
1 Cluster Analysis Objectives ADDRESS HETEROGENEITY Combine observations into groups or clusters such that groups formed are homogeneous (similar) within.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)
ORDINATION What is it? What kind of biological questions can we answer? How can we do it in CANOCO 4.5? Some general advice on how to start analyses.
Simultaneous estimation of monotone trends and seasonal patterns in time series of environmental data By Mohamed Hussian and Anders Grimvall.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Prediction models perform better when including transition zones Sophie Vermeersch Plant Science and Nature Plant Science and Nature Management, Department.
Lecture 6 Ordination Ordination contains a number of techniques to classify data according to predefined standards. The simplest ordination technique is.
SOCW 671 #11 Correlation and Regression. Uses of Correlation To study the strength of a relationship To study the direction of a relationship Scattergrams.
The Principal Components Regression Method David C. Garen, Ph.D. Hydrologist USDA Natural Resources Conservation Service National Water and Climate Center.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Lesson 8: Basic Monte Carlo integration
More complex (multidimensional) methods
Statistical Data Analysis - Lecture /04/03
Boosting and Additive Trees (2)
Statistical Methods For Engineers
Classification (Dis)similarity measures, Resemblance functions
Historical Vegetation Analysis
Descriptive Statistics vs. Factor Analysis
What is Regression Analysis?
Principal Component Analysis (PCA)
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Introduction to the gradient analysis

Community concept (from Mike Austin)

Continuum concept (from Mike Austin)

The real situation is somewhere between and more complicated

Originally (and theoretically) Community concept as a basis for classification Continuum concept as a basis for ordination or gradient analysis

In practice I need a vegetation map (or categories for nature conservation agency) - I will use classification I am interested in transitions, gradients, etc. - lets go for the gradient analysis (ordination)

Methods of the gradient analysis

Over a short gradient, the linear response is good approximation, over a long gradient, it is not.

However In most cases, neither the linear, nor the unimodal response models are sufficient description of reality for all the species I use a methods based on either of the models not because I would believe that all the species behave according to those models, but because I see them as a reasonable compromise between reality and clarity.

Estimating species optima by the weighted averaging method Optimum Tolerance

The techniques based on the linear response model are suitable for homogeneous data sets, the weighted averaging techniques are suitable for more heterogeneous data.

Calibrations (using weighted averages)

Cactus Nymphea Urtica Drosera Menyanthes Comarum Chenopodium Aira Ordination diagram

Cactus Nymphea Urtica Drosera Menyanthes Comarum Chenopodium Aira Ordination diagram Nutrients Water Proximity means similarity

1. Find a configuration of samples in the ordination space so that the distances between samples in this space correspond best to the dissimilarities of their species composition. This is explicitly done by the multidimensional scaling methods. (Metric and non-metric). Requires a measure of dissimilarity between samples. 2. Find "latent" variable(s) (ordination axes) which represent the best predictors for the values of all the species. This approach requires the model of species response to such latent variables to be explicitly specified. Two formulations of the ordination problem

The linear response model is used for linear ordination methods, the unimodal response model for weighted averaging methods. In linear methods, the sample score is a linear combination (weighted sum) of the species scores. In weighted averaging methods, the sample score is a weighted average of the species scores (after some rescaling). Note: The weighted averaging algorithm contains an implicit standardization by both samples and species. In contrast, we can select in linear ordination the standardized and non-standardized forms.

Transformation is an algebraic function X ij ’=f(X ij ) which is applied independently of the other values. Standardization is done either with respect to the values of other species in the sample (standardization by samples) or with respect to the values of the species in other samples (standardization by species). Quantitative data Centering means the subtraction of a mean so that the resulting variable (species) or sample has a mean of zero. Standardization usually means division of each value by the sample (species) norm or by the total of all the values in a sample (species).

Euclidean distance - used in linear methods For ED, standardize by sample norm, not by total The samples with t contain values standardized by the total, those with n samples standardized by sample norm. For samples standardized by total, ED12 = 1.41 (√2), whereas ED34=0.82, whereas for samples standardized by sample norm, ED12=ED34=1.41

Percentual similarity (quantitative Sörensen) - no counterpart in either linear or WA methods, can be used in mutlidimensional scaling

Weighted averaging methods correspond to the use of

The two formulations may lead to the same solution. (When samples of similar species composition would be distant on an ordination axis, this axis could hardly serve as a good predictor of their species composition.) For example, principal component analysis can be formulated as a projection in Euclidean space, or as a search for latent variable when linear response is assumed. By specifying species response, we specify the (dis)similarity measure Species 1 Species 2

Species 1 Species 2 Species 1 Species 2 Sp1Sp2 Sp1Sp2

The result of the ordination will be the values of this latent variable for each sample (called the sample scores) and the estimate of species optimum on that variable for each species (the species scores). Further, we require that the species optima be correctly estimated from the sample scores (by weighted averaging) and the sample scores be correctly estimated as weighted averages of the species scores (species optima). This can be achieved by the following iterative algorithm:

 Step 1 Start with some (arbitrary) initial site scores {x i }  Step 2 Calculate new species scores {y i } by [weighted averaging] regression from {x i }  Step 3 Calculate new site scores {x i } by [weighted averaging] calibration from {y i }  Step 4 Remove the arbitrariness in the scale by standardizing site scores (stretch the axis)  Step 5 Stop on convergence, else GO TO Step 2 =eigenvalue

The larger the eigenvalue, the better is the explanatory power of the axis. Amount of variability explained is proportional to the eigenvalue. In weighted averaging, eigenvalues < 1 (=1 only for perfect partitioning). In CANOCO, linear methods are scaled so that total of eigenvalues = 1 (not in some other programs) 0 0 x x 0 x x x x x 0 x x 0 x x 0 x x x 0 x samples species perfect partitioning

Constrained ordination The axis is linear combination of measured variables  Step 1 Start with some (arbitrary) initial site scores {x i }  Step 2 Calculate new species scores {y i } by [weighted averaging] regression from {x i }  Step 3 Calculate new site scores {x i } by [weighted averaging] calibration from {y i }  Step 4 Remove the arbitrariness in the scale by standardizing site scores (stretch the axis)  Step 5 Stop on convergence, else GO TO Step 2

Constrained ordination The axis is linear combination of measured variables  Step 1 Start with some (arbitrary) initial site scores {x i }  Step 2 Calculate new species scores {y i } by [weighted averaging] regression from {x i }  Step 3 Calculate new site scores {x i } by [weighted averaging] calibration from {y i }  Step 3a Calculate a multiple regression of the site scores {x i } on the environmental variables and take the fitted values of this regression as the new site scores.  Step 4 Remove the arbitrariness in the scale by standardizing site scores (stretch the axis)  Step 5 Stop on convergence, else GO TO Step 2

Basic ordination techniques Detrending Hybrid analyses

Detrending - second axis si BY DEFINITION linearly independent of the first - this does not prevent quadratic dependence

Let’s take a hammer Done in each iteration

And straight the axis Detrending by segments (highly non- parametric) or by polynomials Despite its very “heuristic” nature, detrending often makes the second axis interpretable

Two approaches Having both environmental data and data on species composition, we can first calculate an unconstrained ordination and then calculate a regression of the ordination axes on the measured environmental variables (i.e. to project the environmental variables into the ordination diagram) or we can calculate directly a constrained ordination. The two approaches are complementary and both should be used! By calculating the unconstrained ordination first, we do not miss the main part of the variability in species composition, but we can miss that part of the variability that is related to the measured environmental variables. By calculating a constrained ordination, you do not miss the main part of the biological variability explained by the environmental variables, but we can miss the main part of the variability that is not related to the measured environmental variables.

What shall we do with categorial variables?

ANOVA grouping=var4 Regression Summary for Dependent Variable: Var7 (Spreadsheet1) Independent Var5 and Var6 R= R2= Adjusted R2= F(2,7)= p< Std.Error of estimate:

Dummy variables

Predictors and response are correlated, distribution usually non-normal. Use the distribution free Monte Carlo permutation test.

Monte Carlo permutation test