Presentation is loading. Please wait.

Presentation is loading. Please wait.

Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.

Similar presentations

Presentation on theme: "Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204."— Presentation transcript:

1 Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204

2 Phylogenetics Attempts to infer the evolutionary history of a group of organisms or sequences of nucleic acids or proteins  Phylogenetic methods can be used for the study of evolutionary relationships between species of organisms as well as genes  Attempt to reconstruct evolutionary ancestors  Estimate time of divergence from ancestor

3 Phylogenetic Trees We can use phylogenetic trees to illustrate the evolutionary relationships among groups of species or genes Leaf nodes of the tree are the species or genes we are comparing, interior nodes are inferred common ancestors

4 Phylogenetic Trees

5 History Taxonomists used anatomy and physiology to group and classify organisms  Morphological features like presence of feathers or number of legs When protein sequencing, and later DNA sequencing became common, amino acid and DNA sequences became the common way to contruct trees

6 Phylogenetic Tree constructed from aa sequences of Cytochrome C protein

7 The Big Picture Determine the species or genes to be studied Acquire homologous sequence data Use multiple sequence alignment software like ClustalW to align Clean up data by hand Use phylogenetic analysis software like Phylip based on techniques we will study Verify experimentally

8 Phylogenetics Can be used to solve a number of interesting problems  Forensics  HIV virus mutates rapidly  Predicting evolution of influenza viruses  Predicting functions of uncharacterized genes - ortholog detection  Drug discovery  Vaccine development  Target inferred common ancestor

9 Types of Data Two categories  Numerical data  Distance between objects  E.g.evolutionary distance between two species  Usually derived from sequence data  Character data  Each character has a finite number of states  E.g. number or legs = 1, 2, 4  DNA = {A, C, T, G}

10 Phylogenetic Trees Trees are composed of nodes and branches  Terminal or leaf nodes correspond to a gene or organism for which data has been collected  Internal nodes usually represent an inferred common ancestor that gave rise to two independent lineages sometime in the past

11 Rooted and Unrooted Trees Some trees make an inference about a common ancestor and the direction of evolution and some don’t  First type is called a rooted tree and has a single node designated as root which is the common ancestor  Second type is called an unrooted tree  Specifies only relationship between nodes and says nothing about direction of evolution

12 Rooted and Unrooted Trees R ABCDE Time BC A E D

13 Rooted and Unrooted Trees Roots can usually be assigned to unrooted trees using an outgroup  Species unambiguously separated the earliest from others being studied  E.g. baboons in case of humans and gorillas  For three species there are 3 possible rooted trees, but only one possible unrooted tree

14 Rooted and Unrooted Trees In fact the numbers of rooted (N R ) and unrooted trees (N U ) for n species is  N R = (2n - 3)!/2 n-2 (n - 2)!  N U = (2n - 5)!/2 n-3 (n - 3)! Data SetsRooted TreesUnrooted Trees 211 331 4153 510515 1034,459,4252,027,025 15213,458,046,767,8757,905,853,580,625 208,200,794,532,637,891,559,375221,643,095,476,699,771,875

15 Rooting Trees Trees can be rooted by using the outgroup method previously mentioned, or by putting the root midway between the two most distant species as determined by branch length  Branch length measures the amount of difference that occurred along a branch  Assumes the species are evolving in a clock- like manner

16 Rooting a Tree

17 More Tree Terminology Structure of a phylogenetic tree can be represented in Newick format using nested parentheses  (((B, C), (D, E)), A) If we lack data to tell in which order two or more independent lineages occurred in the past, the tree may be multifurcating (more than two ancestors) otherwise, it is bifurcating (exactly two ancestors per interior node)

18 Character and Distance Data Character-based methods use aligned DNA or protein sequences directly for tree inference Species AATCGAATCGTTCCGGA Species BATCCAATAGTTCCGGA Species CAACGAATCCTACCGGT Species DATCGTTTCCAACCGCT Species EATAGATTCGTTCGGGA

19 Character and Distance Data Distance-based methods must transform the sequence data into a pairwise similarity matrix for use during tree inference SpeciesABCD B2--- C45-- D795- E3578

20 Distance-Based Methods Given such an input matrix we want to find an edge-weighted tree where the leafs of the tree correspond to the species and the distances measured between two leaves corresponds to the corresponding matrix value for the leaves

21 UPGMA UPGMA (Unweighted Pair Group Method with Arithmetic mean) is the oldest distance matrix method  Uses a distance matrix representing measure of genetic distance between pairs of species being considered  Clusters the two closest species  Compute new distance matrix using arithmetic mean to first cluster  Repeat until all species grouped


23 Estimation of Branch Length Scaled trees, where the length of the branches correspond to the degree to which sequences have diverged are called cladograms If rates of evolution are assumed to be constant in all lineages then internal nodes are placed at equal distances from each of the species they give rise to on a bifurcating tree (UPGMA ex.)

24 UPGMA So UPGMA is very simple and generates rooted trees, however… Major weakness is that the algorithm assumes that rates of evolution are the same among different lineages This does not fit existing biological data, so probably shouldn’t use UPGMA to build phylogenetic trees

25 Transformed Distance Method Several distance matrix-based alternatives to UPGMA allow different rates of evolution within different lineages  Oldest and simplest is the transformed distance method which takes advantage of an outgroup  Other lineages only evolve separately from each other after they diverged and since the outgroup diverged first we can use it as a frame of reference to compare how much the other lineages evolved by seeing when they diverged

26 Neighbor’s Relation Method One variant of UPGMA tries to pair species in such a way as to minimize the sum of the branch lengths  On a rooted tree, pairs of species separated from each other by only one node are called neighbors  We have important relationships between neighbors of a phylogenetic tree with four nodes

27 Neighbor’s Relation Method A B C D a b d e c d AC + d BD = d AD + d BC = a + b + c + d + 2e = d AB + d CD + 2e d AB + d CD < d AC + d BD d AB + d CD < d AD + d BC The following hold for this tree

28 Neighbor’s Relation Method Consider all possible pairwise arrangements of four species, and determine which satisfies the four point condition (set of 2 inequalities) This process can be iterated to generate a complete tree, but the process is unfeasible for large sets of species

29 Neighbor-Joining Methods Other neighborliness approaches are available as well Neighbor-joining methods start with all species arranged in a star tree a b d c e a b c d e

30 Neighbor-Joining Methods The pair of nodes pulled out (grouped) at each iteration are chosen so that the total length of the branches on the tree is minimized After a pair of nodes is pulled out, it forms a cluster in the tree and is included in further rounds of iteration (and a new distance matrix is generated) The tree’s total branch length is calculated as: Q 12 = (N - 2)d 12 -  (d 1i )-  (d 2i )

Download ppt "Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204."

Similar presentations

Ads by Google