Quantifying biological diversity Lou Jost

Quantifying biological diversity Lou Jost
Standard measures of biological diversity lead to invalid inferences

What are diversity measures used for?
Measure the impact of human or natural disturbances on an ecosystem. Prioritize sites for conservation Provide a robust community summary statistic which can be compared with values predicted by ecological and evolutionary theories. Genetic diversity as well as species diversity

Meaning of diversity in biology
Compositional complexity as viewed by an organism in the community. In standard approach, diversity is independent of density: Diversity depends only on relative abundances p_i. When comparing diversities of communities, the communities being compared should have the same densities. (In some important approaches this rule can be relaxed.)

Principle of transfers
For most purposes, if all else is equal, a community with five equally common species is more diverse than a community with one very abundant species and four extremely rare ones. For fixed richness and density, we require diversity measures to obey a Principle of Transfers: Diversity should never decrease with transfer of abundance from abundant species to rare species.

Diversity is linked to compositional similarity and differentiation between groups
Compare mean diversity of the groups to the diversity of the pooled groups.

Diversity is linked to compositional similarity and differentiation between groups
Standard method in ecology and genetics: Similarity = mean within-group diversity / pooled diversity.

Biologists have equated diversity with standard measures of complexity:
Shannon entropy: Generalized entropies of order or degree q: Example Havrda and Charvat, Daroczy, Tsallis entropy

q=1: “Shannon entropy” or “Shannon-Weaver index” or “Shannon-Wiener index” q=2: “Gini-Simpson index” in ecology, “heterozygosity” or “gene diversity” in genetics. Probability that two randomly drawn individuals are different species. pi is true relative abundance of species i in the population, S is number of species.

Vast ecological and genetic literature based on these measures
Do they give reasonable answers to biological questions???

Conservation biology application
20 islands: Each island has the same number of individuals, the same number of species, and the same species relative abundance distribution. Assume that there are no shared species between islands; each island has a completely distinctive set of species. Their diversities are all equal regardless of one’s definition of diversity.

For definiteness suppose the species relative abundances on each island are the same as those actually observed for the trees of Barro Colorado Island, Panama (Hubbell et al. 2005). Conservation biology question: Suppose our goal is to protect half the diversity of the region. How many islands must be preserved? The correct answer has to be ten islands by symmetry. What do the standard diversity measures say?

Why does species richness give the reasonable answer while the other standard measures do not?
Each island must contribute equally to total diversity. Linearity wrt pooling: For N completely distinct, equally large islands of equal diversity, pooled diversity must equal N ∙ individual diversity. This “Replication Principle” from economics is the requirement for diversity to be self-consistent in these kinds of inferences.

This property is implicit in our intuitive concept of diversity, and many of our rules of inference presuppose this property. Shannon entropy and the Gini-Simpson index (heterozygosity) do not have this property. Species richness, exp(H), and inverse Simpson concentration do have it.

Linking diversity to compositional similarity and differentiation between groups
Standard method in ecology and genetics: Similarity = mean within-group “diversity” / pooled “diversity”.

So the classic measures of biodiversity give badly misleading and self-contradictory results!!

“Numbers equivalents” or “effective number of species” or “Hill numbers”
How many equally common species are needed to give the observed entropy value X? That number is the “numbers equivalent” of the entropy value. Found by setting X = H(p1, p2, ..pk) = H(1/D, 1/D,..1/D) and solving for D. Ranges from 1 to S where S is the actual number of species in the community. Equivalence classes are defined by the value of the standard complexity measures. Parameterize these classes by D. The value of D obeys the Replication Principle so D is a valid measure of diversity.

When we find the effective number of species using any q-order generalized entropy that is a monotonic function of the sum of the relative abundances to the power q, we always get the same formula.

The numbers equivalent of a standard generalized entropy (eg Renyi, Tsallis) is the inverse of a power mean of the species relative abundances.

Introduced by economists in the 1960’s Introduced to ecology for special cases by MacArthur in the 1960s, and to genetics by Crow and Kimura in 1964 for the special case of heterozygosity. General case treated by Mark Hill in ecology in 1973. Didn’t catch on in ecology but did in economics…

We now have diversity measures which possess the properties that are implicit in the diversity concept used by biologists. Biologists had been making inferences about diversity which were invalid because the mathematical properties of standard measures of complexity did not support these rules of inference. Biologists did not notice the problem!!!

Can plot this diversity as a function of q to create a “diversity profile”
This is similar to the Renyi spectrum based on Renyi generalized entropies. The diversity profile contains all the information contained in the relative abundance vector for the community.

Diversity profile plotting software

How to partition effective number of species (Hill numbers) into within- and between-group components? A method commonly used in ecology and genetics to set conservation priorities and understand evolutionary processes is “additive partitioning” (Nei 1973, Lande 1996). Total (gamma) diversity = mean within-group diversity (alpha) + between-group (beta) diversity, where H is a generalized entropy with q=1 or q=2: Htotal-Hwithin = Hbetween Hgamma – Halpha = Hbeta For q=2, partitions the Gini-Simpson index or Gene diversity H= Additive partitioning of the Gini-Simpson index or gene diversity is incomplete---it produces a between-group component that is confounded with the within-group component. Htotal = Hwithin + Hbetween but since 0<H<=1, when Hwithin is close to unity, Hbetween must be close to 0 no matter how different the groups are.

Axioms: Partitioning diversity into within- and between-group components
Complete partitioning: The within-group component should contain no information about the between-group component, and vice versa. Knowledge of one component should give no information, nor impose any mathematical constraint, on the value of the other component. Within-group component should be a generalized mean of the diversities of the individual groups. Weakest possible specification: If all groups have diversity D, then the within-group diversity is D. Between-group component must take minimum when communities are identical and maximum when they share no species. This between-group diversity measures the degree of differentiation of the relative abundance vectors. (Other goals are possible.) These properties are implicit in the way that biologists use the concepts of within- and between-group diversity, even though their measures of within- and between-group diversity did not generally possess these properties.

Partitioning diversity… (Jost, Ecology 2007)
If a meaningful partitioning of the numbers equivalent exists!!!!

Components of Shannon diversity (alpha = within-group, beta = between-group, gamma = total diversity)

“Beta” or between-group diversity for Shannon diversity
Beta ranges from 1 (when all communities have identical compositions) to the exponential of the entropy of the weights (when all assemblages are completely different). When weights are equal, beta ranges from 1 to N where N is number of communities. This beta diversity is the effective number of completely distinct communities in the region or dataset. It is the exponential of Mutual Information.

Shannon beta diversity = regional heterogeneity of the relative abundance vectors
If there are N equally large, completely distinct communities in the region, beta = N. It is not necessary to identify or demarcate the communities in the region: just make many random sample points or points on a grid. As number of samples becomes large, the sample beta approaches the true regional beta.

Shannon beta diversity = regional heterogeneity of the relative abundance vectors Each color represents a completely distinct community with no shared species with other communities. Beta diversity for left-hand region =4, beta diversity for right-hand region =1.05.

Within-group or “alpha” diversity for q ≠ 1
Mean generalized entropy, converted to effective number of species. Beta ranges from 1 to N where N is number of communities. It is impossible to satisfy my partitioning conditions when weights are unequal for q ≠ 0 or 1.

Normalizations of this beta diversity provide measures of compositional similarity and differentiation

Monotonicity issues q=0, 1
Beta and normalized similarity and differentiation measures obey strongest possible monotonicity properties. Other values of q Beta and normalized similarity and differentiation measures obey weaker monotonicity properties and should not be used without good reason.

Another problem with classical diversity measures
Diversity is the complexity per individual, depends only on relative abundances. Adding a super-abundant new species will decrease diversity even if the original species are not affected by the new species. An ideal diversity measure would never decrease with the addition of a new species that does not change the abundances of the other species. Hill numbers do not satisfy that property Need to find a density-dependent analogue of “probability of inter-specific encounters”, such as the rate of interspecific encounters per unit time. Active field of work.

Why have people ignored these problems with their measures for so long?
Ecologists and geneticists often treat measures as mere tools for the calculation of p-values (statistical significance). Statistical significance depends at least as much on sample size as on the magnitude of the effect being measured. In natural populations, the null hypothesis of zero differentiation is virtually always false, and if sample size is large enough, a difference can always be demonstrated with any desired degree of statistical significance. P- values are not a substitute for real measures of effect size, and despite its popularity with researchers and journal editors, testing a null hypothesis is rarely the appropriate model in science.

Why have people ignored these problems with their measures for so long?
Ecological problems should usually be cast in terms of estimating a meaningful parameter, rather than testing an always-false null hypothesis (which will always be rejected if sample size is large enough). Biologists need measures whose absolute magnitudes are interpretable. Statistical uncertainties should be expressed by confidence intervals, not p values.

Xie xie Thank you! Special thanks to Dr. Anne Chao for inviting me and giving me the chance to know this beautiful country.

Supplementary Material: Population Genetics
Gst = 1- Hs/Ht We’ve already seen how badly this works as a measure of compositional differentiation in ecology, but it is still the standard measure of compositional differentiation on genetics (it also has other more legitimate uses in genetics).

A measure of allelic differentiation
The complement of the Morisita-Horn index (q=2) is a measure of dissimilarity for genetics: D = [(HT – HS)/(1 – HS)] [n/(n-1)] = 1- GD /GS If all n subpopulations consist of k equally common alleles, this measure gives the proportion of each subpopulation’s alleles that are unique to that subpopulation. This is a measure of pure differentiation, independent of average within-subpopulation heterozygosity, unlike GST. It should replace GST when heterozygosity-based allelic differentiation is the quantity of interest.

Link to ecological and genetic models
Hubbell’s neutral model of biodiversity Finite island model in population genetics Kind of like the ideal gas or the two-body problem of physics; they are simple enough that we can solve them analytically.

Genetic divergence Traditional Gst at equilibrium under the same model is: Gst = 1- Hs/Ht ≈ 1/(1+ 4Nm + 4Nu) ≈ 1/(1+ 4Nm) n = # of subpopulations, N= size of subpopulations, m = migration rate, u = mutation rate. Differentiation D at equilibrium is: D ≈ 1/{1 + m/[u(n-1)] These factors control neutral divergence in subdivided populations. Very different from the standard view.

Genetic divergence Population genetics rule of thumb: more than 1 migrant per generation = little or no differentiation in the absence of natural selection. Counter-intuitive for large N. This is wrong since it only tells us that GST will be low, not that real allelic differentiation will be low. GST can be low even for completely differentiated subpopulations, or can be high even when subpopulations show little differentiation.

Genetic divergence D is controlled by m/[u(n-1)] Gst is controlled by Nm + Nu Find cases where they predict opposite effects and test their predictions.

Nm= 10 so classical theory predicts no differentiation; m/[u(d-1)] = 44, 1, 0.1

Nm= 1/20 so classical theory predicts high differentiation; m/[u(d-1)] = 50, 1, 0.1

Xie xie Thank you! Special thanks to Dr. Anne Chao for inviting me and giving me the chance to know this beautiful country.

Phylogenetic diversity
We should include phylogenetic distance between species in our measures of diversity. We can ask more interesting questions and make more meaningful conservation recommendations.

Faith’s phylogenetic diversity PD

But we should take into account abundances for some applications

Phylogenetic diversity
Mean diversity of a set of lineages. More useful is the amount of evolutionary history represented in the lineages, or amount of evolutionary work done on the assemblage over some period of time T. = Mean diversity * T (in years or base changes).

Choice of T

Partitioning phylogenetic diversity

Quantifying biological diversity Lou Jost

Similar presentations

Presentation on theme: "Quantifying biological diversity Lou Jost"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Quantifying biological diversity Lou Jost

Similar presentations

Presentation on theme: "Quantifying biological diversity Lou Jost"— Presentation transcript:

Similar presentations

About project

Feedback