Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Tree of Life From Ernst Haeckel, 1891.

Similar presentations

Presentation on theme: "The Tree of Life From Ernst Haeckel, 1891."— Presentation transcript:

1 The Tree of Life From Ernst Haeckel, 1891

2 But, is there only one “Tree of Life?”
There are many theories of evolution Basic idea: Speciation is caused by physical separation into groups where different genetic variants become dominant Basic Tennant: Any two species share a common ancestor some time in the distant past

3 We are generally considering a “Gene Tree” as opposed to a “Species Tree.”
Divergence within a gene generally happens before splitting into species occurs. In order to get a picture of evolution involving species, there is a need to look at collections of genes as opposed to individual genes.

4 Classical phylogenetic analysis: morphological features
number of legs, lengths of legs, etc. Modern biological methods allow for the use of molecular features Gene sequences Protein sequences Analysis based on homologous sequences (e.g., globins) in different species

5 Use of Molecular Data Provides an objective criteria for constructing phylogenetic trees Basic data includes Gene sequences Protein sequences Analysis based on homologous sequences in different species

6 However, gene/protein sequence can be homologous for different reasons:
Orthologs -- sequences diverged after a speciation event Paralogs -- sequences diverged after a duplication event Xenologs -- sequences diverged after a horizontal transfer (e.g., by virus)

7 Two main kinds of information are contained in trees:
The tree topology The tree metric The form or shape of the tree distance or branch length 2 1 3 6

8 NO! YES! Cycle Family trees can look like this
These two trees are topologically equivalent. The one on the right is more common in biology texts.

9 Trees can be unrooted or rooted

10 If there are n leaf nodes or taxa, how many different trees are possible?
NR = Number of Possible Rooted Trees = NU = Number of Possible Unrooted Trees = The numerator contains the computationally insidious factorial function. Well, so does the denominator, but it is for a much smaller number.

11 Three Leaf Nodes Four Leaf Nodes Only one unrooted tree is possible
C Only one unrooted tree is possible Four Leaf Nodes C A D C B D A B C B A D Three different unrooted trees are possible

12 A Table Showing the Growth of Unrooted and Rooted Trees
3 1 4 15 5 105 6 945 7 10395 8 135135 9 10 11 12 1.375*10-10 What do you notice in this table? Why is this true?

13 To Further Make the Point
If we are creating a tree with 15 different taxa, there are 213,458,046,767,875 possible rooted trees. Assuming a computer can create a tree in 10-9 seconds, it would take 2.47 days of computation time to create them. If 20 species, 8,200,794,532,637,891,559,337 possible trees and the same computer would take 259,867 years to generate this many trees!

14 If we assume the Molecular Clock is working, then the distance from the root to each leaf is the same. The above tree does not assume a Molecular Clock!

Lindell Bromham and David Penny

16 Distance data can be generated from character data:
Jukes-Cantor where p = percent of mismatches Kimura where P = percent transitions Q = percent transversions

17 Next we create a matrix of these distances
species A B C dAB - dAC dBC D dAD dBD dCD For Example: species A B C D 9 - 8 11 12 15 10 E 18 13 5

18 Input: distance matrix between species Outline:
Simple Distance-Based Method Unweighted-pair-group method with arithmetic mean Input: distance matrix between species Outline: Cluster species together Initially clusters are singletons At each iteration combine two “closest” clusters to get a new one


20 UPGMA Clustering Let Ci and Cj be clusters, define distance between them to be When combining two clusters, Ci and Cj, to form a new cluster Ck, then

21 Begin with the following distance matrix B C D E A 4 2 3 - 1
Species B C D E A 4 2 3 - 1 Closest Pair is {B, D} so cluster them, C1 = {B, D} d(C1,A) = 1/2 (4 + 4) = 4 d(C1,C) = 1/2 (4 + 4) = 4 d(C1,E) = 1/2 (4 + 4) = 4 B D 0.5 Tree at end of Stage 1

22 Create a new matrix that includes C1 A C E C1 4 - 2 3
Species A C E C1 4 - 2 3 Closest are A and C, so C2 = {A, C} d(C1, C2) = 1/2 (4 + 4) = 4 d(C2, E) = 1/2 (3 + 3) = 3 A C 1 B D 0.5 Tree at end of Stage 2

23 Once again we revise the distance matrix:
Species C2 E C1 4 - 3 We create group C3 = {C2, E} = {{A, C}, E} d(C1, C3) = 1/6 ( d(B,A) + d(B,C) d(B,E) + d(D,A) + d(D,C) + d(D,E)) = 1/6( )=4 B D 0.5 E C A 1 1.5 NOTE: This tree satisfies the Molecular Clock Assumption. This is a basic property of UPGMA produced trees. Completed Tree

24 The Fitch-Margoliash(FM) Algorithm
A weaker requirement is additivity In “real” tree, distances between species are the sum of distances between intermediate nodes k c b j a i

25 Consequences of Additivity
Suppose input distances are additive For any three leaves Thus k c b j a m i

26 Applying this idea to three taxa, A, B, and C:
z A y x B Using the fact that: x + y = d(A,B) x z = d(A,C) y + z = d(B,C) and a little high school algebra, we have x = 1/2 (d(A,B) + d(A,C) – d(B,C)) y = 1/2 (d(A,B) + d(B,C) – d(A,C)) z = 1/2 (d(A,C) + d(B,C) – d(A,B)

27 We will apply this criterion to the following data:
B C D E A .31 1.01 .75 1.03 - 1.00 .69 .90 .61 .42 .37 We note that A and B are the closest, but to group them without the assumption of equal distance from a common ancestor, we temporarily group C-D-E and use the three taxa case: d(A,C-D-E) = 1/3( ) = .93 d(B,C-D-E) = 1/3( ) = .863

28 From the formulas two slides previous we have:
C-D-E .7415 A .1215 .1885 B Recall, the joining of C-D-E was only temporary so that we could get accurate distances for joining A and B Separating C, D, and E and combining A and B for the rest of the algorithm gives the table: C D E A-B 1.005 .72 .965 - .61 .42 .37

29 We now have the table: C D E A-B 1.005 .72 .965 - .61 .42 .37 The closest distance is D,E. So we combine everything else into a single group A-B-C d(D,A-B-C) = 1/3( ) = .683 d(E,A-B-C) = 1/3( ) = .783 D E A-B-C .683 .783 - .37

30 This yeilds an intermediate tree of:
.135 .548 A-B-C .235 E We keep the edges joining D and E while discarding the grouping A-B-C. We now have four edges of our tree and two groupings. d(A-B,D-E) = 1/4 ( ) = .8425 d(A-B,C) = .72 d(C,D-E) = 1/2 ( ) = .515

31 We can now produce the table: C D-E A-B 1.005 .8425 - .515
Again applying the distance formulas we have the tree: C .33875 .6625 A-B .17625 D-E

32 All that remains is to compute a and b
0.1215 0.1885 0.135 0.235 The average of A and B from their common vertex is For D and E the average distance is .185 So for the value of a we have – .155 = for the value of b we have – .185 = The negative value for b is a cause for concern about the quality of our data. If we are confident of our data and since is close to 0, most researchers would assign a value of 0 to b.

33 One concern is that we have produced an unrooted tree for our five species is that we have an unrooted tree and no real clue on where to place the root. Sometimes physical evidence can help with the placement of a root for the tree; however, many times such evidence does not exist. A common heuristic practice is to include an extra taxon that is more distantly related to those under consideration than they are to each other. Such a taxon is called an outgroup. The biological asumption is that this group must have split from the others before they split from themselves.

Download ppt "The Tree of Life From Ernst Haeckel, 1891."

Similar presentations

Ads by Google