Phylogenetic Diversity Measures Based on Hill Numbers Anne Chao National Tsing Hua University Institute of Statistics Hsin-Chu, Taiwan 30043 Eco-Stats.

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Introduction to the analysis of community data Vojtech Novotny Czech Academy of Science, University of South Bohemia & New Guinea Binatang Research Center.
Introduction to Phylogenies
Evolution of Biodiversity
Community Phylogenetic structure with R. Central question in community ecology What processes are responsible for the identity and relative abundances.
The neutral model approach Stephen P. Hubbell (1942- Motoo Kimura ( )
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Plant Molecular Systematics (Phylogenetics). Systematics classifies species based on similarity of traits and possible mechanisms of evolution, a change.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Maximum likelihood (ML) and likelihood ratio (LR) test
15 The Nature of Communities. 15 The Nature of Communities Case Study: “Killer Algae!” What Are Communities? Community Structure Interactions of Multiple.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Community Diversity dynamics of community species composition.
Maximum likelihood (ML)
2-1 Sample Spaces and Events Conducting an experiment, in day-to-day repetitions of the measurement the results can differ slightly because of small.
Visual Recognition Tutorial
9/17/071 Community Properties Reading assignment: Chapter 9 in GSF.
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
Maximum likelihood (ML)
Rarefaction and Beta Diversity James A. Danoff-Burg Dept. Ecol., Evol., & Envir. Biol. Columbia University.
Measurements of Ecological Diversity How to measure Diversity in an ecological system Laila, Vimal, & Rozie.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
OUR Ecological Footprint …. Ch 20 Community Ecology: Species Abundance + Diversity.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Chapter 8 The Discrete Fourier Transform 2 Introduction  In Chapters 2 and 3 we discussed the representation of sequences and LTI systems in terms.
Calculating Diversity Class 3 Presentation 2. Outline Lecture Class room exercise to calculate diversity indices.
Species Richness, Simpson’s, and Shannon-Weaver…oh my…
Investment Analysis and Portfolio management Lecture: 24 Course Code: MBF702.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Basic Concepts of Algebra
Statistical Decision Theory
PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)
MANAGEMENT AND ANALYSIS OF WILDLIFE BIOLOGY DATA Bret A. Collier 1 and T. Wayne Schwertner 2 1 Institute of Renewable Natural Resources, Texas A&M University,
Chapter Twelve Census: Population canvass - not really a “sample” Asking the entire population Budget Available: A valid factor – how much can we.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
Measuring Diversity.
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
BINF6201/8201 Molecular phylogenetic methods
Genetic consequences of small population size Chapter 4
All Hands Meeting 2005 The Family of Reliability Coefficients Gregory G. Brown VASDHS/UCSD.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Taxonomy Species Concepts, & Biological Diversity EEOB September 2004.
Course Outline (Tentative) Fundamental Concepts of Signals and Systems Signals Systems Linear Time-Invariant (LTI) Systems Convolution integral and sum.
The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables.
Lecture V Probability theory. Lecture questions Classical definition of probability Frequency probability Discrete variable and probability distribution.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
"Classical" Inference. Two simple inference scenarios Question 1: Are we in world A or world B?
CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)
Measurement of Biological Diversity: Shannon Diversity Index and Shannon’s Equitability Comparing the diversity found in two or more habitats.
Species richness The number of species is an important biological variable that scientists try to quantify.
From the population to the sample The sampling distribution FETP India.
Classification and Regression Trees
Lesson Overview Lesson Overview Modern Evolutionary Classification 18.2.
The Problem of Pattern and Scale in Ecology - Summary What did this paper do that made it a citation classic? 1.It summarized a large body of work on spatial.
Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.
1 Phylogenetic Diversity, Similarity and Differentiation Measures based on Hill Numbers Speaker: 邱春火 清華大學統計研究所 博士後.
Biodiversity.
Quantifying biological diversity Lou Jost
Estimating Entropy and Diversity Profiles Based on
Functional and Phylogenetic diversity estimated by genomics as a way to assess structure of the communities Saverio Vicario Institute of Biomedical Technology.
Quantifying Scale and Pattern Lecture 7 February 15, 2005
Patterns, Practicality & Preservation
Introduction to the analysis of community data
Summarizing Data by Statistics
A Comparison of Forest Biodiversity Metrics Using Field Measurements and Aircraft Remote Sensing Kaitlyn Baillargeon Scott Ollinger,
Evaluating the Ability to Derive Estimates of Biodiversity from Remote Sensing Kaitlyn Baillargeon Scott Ollinger, Andrew Ouimette,
Species diversity: rarefaction, evenness and indices
Presentation transcript:

Phylogenetic Diversity Measures Based on Hill Numbers Anne Chao National Tsing Hua University Institute of Statistics Hsin-Chu, Taiwan Eco-Stats Symposium The University of New South Wales Sydney, Australia July 11-12, 2012

Collaborators of this work: Chun-Huo Chiu National Tsing Hua Univ Taiwan Lou Jost EcoMinga Foundation Ecuador

Outline: Traditional Diversity Measures (do not consider species relatedness) Focus on: Doubling property & Hill numbers Phylogenetic Diversity via Hill Numbers (consider taxonomic or phylogenetic distance between species) Simple Illustrative Examples Statistical Estimation (brief)

Bird species diversity

Diversity for the class Crustacea (greatest diversity in the oceans)

Dazzling Orchid Diversity… Some are abundant, some are rare, some still undiscovered

Variance vs. Diversity Variance Numerical Variables Categories (Species) Diversity

Biodiversity Definition Variety and variability among living organisms and the ecological complexes in which they occur Variation of life at all levels of biological organization

Biodiversity Levels Gene diversity- diversity of genes within a species Species (or taxonomic/phylogenetic) diversity- diversity among species in an ecosystem Ecosystem (or functional) diversity- diversity of different ecosystems on Earth.

Traditional Species Diversity: S species and indexed by 1, 2,.., S Species absolute abundance/biomass (A 1, A 2, …, A S ) Species relative abundance/biomass (p 1, p 2, …, p S ) sum=1

Traditional Biodiversity Measures/Indices “Diversity measures” is a diverse issue: Different indices/measures quantify different aspects Two components - species richness - evenness among abundances

Species: 4 More diverse Species: 3

Uneven Species: 3 Even More Diverse

4 species Uneven 3 species Even Which one is more diverse?

Gini-Simpson Index (Gini 1912; Simpson 1949) (Gini-Simpson Index) Take two individuals, the probability that they belong to different species: (Simpson Index/Concentration, Repeat Rate)

Shannon (1948) entropy Measure of uncertainty uncertainty in the species identity of a randomly sampled individual

Doubling Property MacArthur (1965), Hill (1973) There are two completely distinct (no overlapped species) communities, each with diversity measure X Combine these two with equal weight, the diversity should become 2X An essential minimum requirement for a “diversity” that ecologists expect “Replication principle” in economics (Dalton 1920): extension to K communities/groups

What kinds of measures satisfy doubling property? Species richness? Entropy? Gini-Simpson index? Yes!! no!!

Species richness = 8 Entropy? > 2.08 Gini-Simpson index? > 0.875

Species richness = 8 Exp(entropy) = 8 Inverse (Simpson) = 8 If a measure cannot satisfy RP in this simple completely distinct case, we would not expect it to work for complicated cases

Hill’s (1973) Family of Diversity Indices of order q q = 0, 0 D = species richness q =1, 1 D = exponential of entropy q = 2, 2 D = inverse of Simpson index

“Order” q (Tsallis 2001; Keylock 2005) The order q determines the measure’s sensitivity to species frequencies q > 1, sensitive to common species q < 1, sensitive to rare species q = 1, weighs species by their frequencies, without favoring either common or rare species

Hill numbers: transform to units of “species” Entropy = 1.39 is equivalent to exp(1.39) = 4 “species” Gini-Simpson index = 0.75 is equivalent to 1/(1-0.75) = 4 “species”

Hill Numbers: “Species Equivalent” “Effective number of species” The number of equally-common species that would be needed to give the same diversity as the community in study For equally-common community, Hill numbers are equal to species richness for all orders of q;

25 “Effective number of species” Community: S species {p 1, p 2, …, p S } Hill numbers = D for an order q Simple Community: D species with equal relative abundances {1/D, …,1/D} 

Hill numbers: An intuitive equivalence p1p1 p2p2 … pSpS … Complex community : = Then Simple community :

Examples: four hypothetical communities There are 100 species, 500 individuals A: equally-common B: slightly uneven C: moderately uneven D: highly uneven

Quantifying species diversity by a profile of Hill numbers Equally Common Slightly uneven Moderately uneven highly uneven

Diversity partitioning via Hill numbers Partitioning gamma (regional) diversity into alpha (within-community) diversity and beta (between-community) diversity Intense debates on additive or multiplicative? Chao et al. (2012) proposed a resolution that both converge to the same classes of similarity measures: Jaccard, Sorenson (q = 0), Horn (q=1) and Morisita-Horn similarity measures (q =2)

Phylogenetic Diversity : Community 1 Community 2 All else being equal, which community is more diverse?

Species in community 2 is more phylogenetically diverse than community 1 Pielou (1975, p. 17) was the first to notice the concept of diversity could be broadened to consider taxonomic difference between species. Community 1 Community 2

“I think” Tree of Life The first-known sketch by Charles Darwin of an evolutionary tree describing the relationships among groups of organisms s/darwin/idea/treelg.php

p1p1 p2p2 p3p3 p1p1 p2p2 p3p3 Phylogenetic Diversity Measures : We not only consider the relative abundance of species, but also the phylogenetic relationship among species. And, satisfy the essential requirement “replication principle”.

Doubling Property for Phylogenetic Diversity Two completely phylogenetically distinct assemblages (no shared lineages), with the same phylogenetic diversity =X. Assemblages are pooled in equal proportions, then the pooled assemblage has phylogenetic diversity 2X. Similar extension to N assemblages

35 Doubling Property in phylogenetic version Two completely phylogenetically distinct (no overlapped tree branch) across assemblages, each with diversity measure X Combine these two, the diversity becomes 2X

Pioneering Work in phylogenetic diversity (1) Branch-length-based measure: Phylogenetic Diversity PD (Faith 1992) sum of the branch lengths of the phylogeny connecting all species from tips to root Satisfy “replication principle”.

Faith (2002) PD: total branches length Lineages completely distinct

Pioneering work (2) Weitzman (1992, 1993, 1998) from a perspective of economic theory of biodiversity preservation “Unfortunately, Noah’s Ark has a limited capacity …. and a (limited) budget available for biodiversity preservation…” What to preserve?

The Noah ’ s Ark: the agony of choice The woodpecker might have to go! Courtesy of Ramon Teja,

Traditional Phylogenetic Species richness Faith PD (Faith 1992) Entropy Phylogenetic entropy (Allen et al. 2009) Gini-Simpson Quadratic entropy (Rao 1982) Hill Numbers Chao, Chiu and Jost (2010)

Pioneering Work (3) Quadratic entropy (Rao 1982) d ij : phylogenetic distance between species i and j, p i and p j denote species relative abundance of species i and j. Q: mean phylogenetic distance between any two randomly chosen individuals in a community Phylogenetic entropy (Allen et. al. 2009) L i : length of branch i, a i : the abundance descending from branch i. A parametric class based on Tsallis entropy (Pavoine et. al. 2009) I 0 = Faith’s PD minus the tree height I 1 = phylogenetic entropy Hp I 2 = Rao’s Q measure

Phylogenetic diversity measures Except for Faith’s PD, all indices mentioned above do NOT satisfy the “replication principle”. (Need transformations!) Chao et al. (2010) were motivated to develop a unified class of phylogenetic diversity measures based on Hill numbers Satisfy “replication principle”

Faith’s PD = 24 Phylogenetic entropy H P ? > 6.24 Rao’s Q ? > 2.625

Phylogenetic Diversity Measures: Two parameters: Order q in Hill number Time parameter T: Consider the phylogenetic diversity through T years ago t=0 (Present time ) p 1 +p p p 2 3 p 1 p 2 p 3 p 4 slice 1 slice 2 slice 3 L 1 L 2 L 3 L 4 L 5 L 6 L 7

Basic approach based on Hill Numbers for shared lineages At any given moment t, slice the tree, we can find the lineage (branch cuts, “species”) and its relative abundance (measure of their importance in the present-day community) Obtain Hill number q D(t) at moment t. Average over from the present time to T years ago Call this average diversity as “Mean Diversity of order q over T years”, it is in units of “lineage” (or “species”).

Conceptual framework for q = 0 Connect Faith’s PD to mean species richness For a fixed T, the nodes divide the phylogenetic tree into Segment 1, 2 and 3 with duration (length) T1, T2, and T3 In any moment of Segment 1, there are 4 lineages (i.e., 4 branches cut) Segment 2, there are 3 lineages Segment 3, there are 2 lineages The mean lineage (species) richness over the time interval [−T, 0] is (T1/T) ×4 + (T2/T) ×3 + (T3/T) ×2 = total branch length in [-T, 0] / T (Mean Phylogenetic Diversity of order 0 over T years) If T = height of tree, then

Conceptual framework for q > 0 To incorporating abundance, use lineage abundance: sum of the relative abundances descended from the branch There are T1 assemblages with abundance vector{p1, p2, p3, p4 }, T2 assemblages with abundance vector {p1, p2+p3, p4 } and T3 assemblages with abundance vector {p1+p2+p3, p4 }. There are a total of T1+T2+T3 = T assemblages and each is given the same weight 1/T. The “Mean diversity of order q over T years” is the following average

Mean Phylogenetic Diversity of order q over T years General Formula B T : all branches in the time interval [-T, 0] L i : the length (duration) of Branch i in the set B T a i : the total relative abundance descended from Branch i

Interpretation of mean diversity Mean effective number of completely distinct lineages (species) over T years Link to traditional diversity: When all species are completely equally distinct with branch lengths T (including T = 0, ignoring phylogeny)

“Effective number of lineages (species)” Assemblage: S species {p 1, p 2, …, p S } Mean diversity = for an order q, time T Assemblage: lineages with equal relative abundances, completely distinct all with branch length T 

Related Measure: Branch Diversity q = 0, branch diversity reduces to Faith’s PD Branch diversity: the amount of evolutionary “work” done on the assemblage or the effective lineage- years or lineage-length (or other units) contained in the tree in the time period [−T, 0]

Generalize and unify existing measures: Order q = 0 = Total branch lengths in [-T, 0] / T Order q =1 Order q = 2

PD/T = 8 Exp(H P /T) = 8 1/(1-Q/T) = 8

Taxonomic Diversity of Level = 3 Phylogenetic tree based on the classical Linnaean taxonomic categories

CT: Thinned Site (gray/blue) CU: Un-thinned Site (black/red) Shimatani (2001) Four- level taxonomic tree Phylogenetic tree by PHYLOMATIC (Webb & Donoghue 2004 )

Traditional Species diversity: Hill numbers for two sites Thinned Site Un-thinned Site

Order q Site CT (thinned site)Site CU (un-thinned site) q= q= q= Shimatani (2001) concluded that the traditional diversity indices and the taxonomic diversity give different conclusions about the effect of thinning. Our results based on “Mean Phylogenetic Diversity” are consistent with those based on the traditional species diversity for q = 0, 1 and 2.

Diversity profile Non-phylogenetic: Use a profile of Hill numbers (as a function of order q) to quantify diversity of a community Phylogenetic: Use three profiles (q = 0, 1, 2); each is a function of time T to quantify phylogenetic diversity All these measures satisfy “doubling property”

Based on species richness (q = 0), the diversity of the thinned site dominates that of un-thinned site for all values of T. But for the common species (q = 1) and very abundant species (q = 2), we have the reverse conclusion. Mean Phylogenetic Diversity Un-thinned Site Thinned Site Un-thinned Site Thinned Site Un-thinned Site Thinned Site

Extensions The general cases of non-ultrametric trees Partitioning phylogenetic Hill numers: phylogenetic alpha, beta, gamma diversity measures and related similarity measures (Chiu, Jost & Chao 2013) Extension to dendrogram-based functional diversity (Petchey and Gaston, 2002) Extension to distance-based functional diversity

Statistical Estimation for traditional diversity measures Depends on the order q q = 0 “species richness estimation” q = 1 “Shannon entropy estimation” and its exponential q = 2 widely used in genetics (gene identity, or heterozygosity) Nearly unbiased estimator exists Non-surprisingly Non-trivial Surprisingly Non-trivial Non-surprisingly trivial

q = 0 “species richness estimation” Since Fisher, Corbert and Williams (1943) Curve fitting (fitting a parametric curve to SAC) Parametric models for species abundances Non-parametric approach Rarefaction/extrapolation of species accumulation curve (by estimating expected species richness for a finite size sample or sample completeness

q = 1 “Entropy estimation” Since Shannon (1948) Traditional bias-reduction Jackknife for bias-reduction Bayesian approaches Coverage-adjusted estimator Estimation via Renyi’s entropies Polynomial representation

Other Related Estimation Issues Hill numbers: Estimation of gamma, alpha and beta diversity and related similarity/differentiation measures Their phylogenetic generalization

Main References: Chao, A., Chiu C.-H. and Jost, L. (2010). Phylogenetic diversity measures based on Hill numbers. Philosophical Transactions of the Royal Society B., 365, Chiu, C.-H., Jost, L. and Chao, A. (2013). Phylogenetic beta diversity, similarity, and differentiation measures based on Hill numbers. To appear in Ecological Monographs. Chao, A., Gotelli, N. G., Hsieh, T. C., Sander, E. L., Ma, K. H., Colwell, R. K. and Ellison, A. M. (2013). Rarefaction and extrapolation with Hill numbers: a framework for sampling and estimation in species biodiversity studies. To appear in Ecological Monographs.

Nanney (2004) “We are all blind men (and women) trying to describe a monstrous elephant of ecological and evolutionary diversity...”

Heaven is under our feet as well as over our heads Henry David Thoreau, Writer and Naturalist ( ) THANK YOU VERY MUCH!! THANK YOU VERY MUCH!!