“Evolutionary speculation constitutes a kind of metascience, which has the same intellectual fascination for some biologists that metaphysical speculation possessed for some mediaeval scholastics. It can be considered a relatively harmless habit, like eating peanuts, unless it assumes the form of an obsession; then it becomes a vice” (Stanier, 1970)
Linnaean classification Two major characteristics KingdomAnimalia PhylumChordata ClassMammalia OrderPrimates FamilyHominidae Genus Homo Speciessapiens BIOL E-127 – 10/01/07
Tree of Life: primary divisions
Tree of Life: three “domains” Based on 16S rRNA (Woese, 1987):
Tree of Life: three “domains” Based on 16S rRNA (Pace, 1997):
Tree basics: meaning
Tree basics: rotation
Tree basics: shape
Tree basics: lengths, unrooted cladograms vs. phylograms
Tree basics: character change
Key phylogenetic terms
Phenetics vs. cladistics
Lysozyme amino acid changes in unrelated ruminants Phenetics vs. cladistics
Maximum Parsimony Parsimony – shortest tree (fewest homoplasies)
Microbial systematics Formerly Pseudomonas (partial list): Ralstonia, Burkholderia, Hydrogenophaga, Sphingomonas, Methylobacterium, Cellvibrio, Xanthomonas, Acidovorax, Hydrogenophillus, Brevundimonas, Pandoraea
Molecular phylogenetics Zuckerkandl & Pauling Molecules as documents of evolutionary history. J Theor Biol. 8: Neutral theory (Motoo Kimura, 1968)
16S rRNA as phylogenetic marker Why a good molecule?
Process to analyze sequence data
Ortholog vs. paralog?
Good Dataset [A1, A2, A3, A4] [A1, B2, A3, A4] Bad Dataset A B species 1 species 2 species 3 species 4 A1 B1 A2 B2 A4 B4 A3 B3 1. Collect Sequence Data Ortholog vs. paralog?
2. Sequence Alignment CGGATAAAC CGGATAGAC CGCTGATAAAC CGGATAC taxa1 taxa2 taxa3 taxa4 Alignment
3. Choose Models Ancestral Sequences Observed Sequences ? Model Choose “model”
Example: Neighbor Joining (NJ) 4. Choose Methods Taxa Characters Species A ATGGCTATTCTTATAGTACG Species B ATCGCTAGTCTTATATTACA Species C TTCACTAGACCTGTGGTCCA Species D TTGACCAGACCTGTGGTCCG Species E TTGACCAGTTCTCTAGTTCG A B C D E Choose methods: distance-based A B C D E Species A Species B Species C Species D Species E ---- A B C D E Species A Species B Species C Species D Species E M(AB)=d(AB) -[(r(A) + r(B)]/(N-2)
4. Choose Methods Maximum Parsimony (MP): Model: Evolution goes through the least number of changes Maximum Likelihood (ML): L (data| model) Bayesian Inference Markov chain Monte Carlo (MCMC) method for sampling from posterior probability distribution Discrete character methods
5. Assess Reliability I. Bootstrap Re-sampling to produce pseudo-dataset (random weighting) II. Jacknife Sampling with replacement III. Permutation test Random deletion of sub-dataset Randomize dataset to build null likelihood distribution CGATCGTTA CAATGATAG CGCTGATAA CGCTGATCG taxa1 taxa2 taxa3 taxa Dataset1: Dataset2: … Dataset1: Dataset2: … Assess reliability
5. Assess Reliability Example analysis: ancestry of HIV-1 (Gao et al., 1999)
5. Assess Reliability Further analysis: timing of HIV-1 (Korber et al., Nature 288: )