Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.

Slides:



Advertisements
Similar presentations
LG 4 Outline Evolutionary Relationships and Classification
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Multiple Sequence Alignment & Phylogenetic Trees.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Phylogenetic Analysis
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
© Wiley Publishing All Rights Reserved. Phylogeny.
Bioinformatics and Phylogenetic Analysis
The Tree of Life From Ernst Haeckel, 1891.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
Phylogenetic trees Tutorial 6. Distance based methods UPGMA Neighbor Joining Tools Mega phylogeny.fr DrewTree Phylogenetic Trees.
Phylogenetic trees Sushmita Roy BMI/CS 576
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Molecular phylogenetics
P HYLOGENETIC T REE. OVERVIEW Phylogenetic Tree Phylogeny Applications Types of phylogenetic tree Terminology Data used to build a tree Building phylogenetic.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
PHYLOGENETIC TREES Dwyane George February 24,
1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.
Phylogentic Tree Evolution Evolution of organisms is driven by Diversity  Different individuals carry different variants of.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
 Read Chapter 4.  All living organisms are related to each other having descended from common ancestors.  Understanding the evolutionary relationships.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Chapter 8 Molecular Phylogenetics: Measuring Evolution.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Phylogenetics.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: (1) Phylogeny inference or “tree building”
Section 2: Modern Systematics
Phylogenetic basis of systematics
Section 2: Modern Systematics
Multiple Alignment and Phylogenetic Trees
The Tree of Life From Ernst Haeckel, 1891.
Phylogenetic Trees.
Phylogeny and the Tree of Life
Phylogeny.
Presentation transcript:

Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204

Phylogenetics Attempts to infer the evolutionary history of a group of organisms or sequences of nucleic acids or proteins  Phylogenetic methods can be used for the study of evolutionary relationships between species of organisms as well as genes  Attempt to reconstruct evolutionary ancestors  Estimate time of divergence from ancestor

Phylogenetic Trees We can use phylogenetic trees to illustrate the evolutionary relationships among groups of species or genes Leaf nodes of the tree are the species or genes we are comparing, interior nodes are inferred common ancestors

Phylogenetic Trees

History Taxonomists used anatomy and physiology to group and classify organisms  Morphological features like presence of feathers or number of legs When protein sequencing, and later DNA sequencing became common, amino acid and DNA sequences became the common way to contruct trees

Phylogenetic Tree constructed from aa sequences of Cytochrome C protein

The Big Picture Determine the species or genes to be studied Acquire homologous sequence data Use multiple sequence alignment software like ClustalW to align Clean up data by hand Use phylogenetic analysis software like Phylip based on techniques we will study Verify experimentally

Phylogenetics Can be used to solve a number of interesting problems  Forensics  HIV virus mutates rapidly  Predicting evolution of influenza viruses  Predicting functions of uncharacterized genes - ortholog detection  Drug discovery  Vaccine development  Target inferred common ancestor

Types of Data Two categories  Numerical data  Distance between objects  E.g.evolutionary distance between two species  Usually derived from sequence data  Character data  Each character has a finite number of states  E.g. number or legs = 1, 2, 4  DNA = {A, C, T, G}

Phylogenetic Trees Trees are composed of nodes and branches  Terminal or leaf nodes correspond to a gene or organism for which data has been collected  Internal nodes usually represent an inferred common ancestor that gave rise to two independent lineages sometime in the past

Rooted and Unrooted Trees Some trees make an inference about a common ancestor and the direction of evolution and some don’t  First type is called a rooted tree and has a single node designated as root which is the common ancestor  Second type is called an unrooted tree  Specifies only relationship between nodes and says nothing about direction of evolution

Rooted and Unrooted Trees R ABCDE Time BC A E D

Rooted and Unrooted Trees Roots can usually be assigned to unrooted trees using an outgroup  Species unambiguously separated the earliest from others being studied  E.g. baboons in case of humans and gorillas  For three species there are 3 possible rooted trees, but only one possible unrooted tree

Rooted and Unrooted Trees In fact the numbers of rooted (N R ) and unrooted trees (N U ) for n species is  N R = (2n - 3)!/2 n-2 (n - 2)!  N U = (2n - 5)!/2 n-3 (n - 3)! Data SetsRooted TreesUnrooted Trees ,459,4252,027, ,458,046,767,8757,905,853,580, ,200,794,532,637,891,559,375221,643,095,476,699,771,875

Rooting Trees Trees can be rooted by using the outgroup method previously mentioned, or by putting the root midway between the two most distant species as determined by branch length  Branch length measures the amount of difference that occurred along a branch  Assumes the species are evolving in a clock- like manner

Rooting a Tree

More Tree Terminology Structure of a phylogenetic tree can be represented in Newick format using nested parentheses  (((B, C), (D, E)), A) If we lack data to tell in which order two or more independent lineages occurred in the past, the tree may be multifurcating (more than two ancestors) otherwise, it is bifurcating (exactly two ancestors per interior node)

Character and Distance Data Character-based methods use aligned DNA or protein sequences directly for tree inference Species AATCGAATCGTTCCGGA Species BATCCAATAGTTCCGGA Species CAACGAATCCTACCGGT Species DATCGTTTCCAACCGCT Species EATAGATTCGTTCGGGA

Character and Distance Data Distance-based methods must transform the sequence data into a pairwise similarity matrix for use during tree inference SpeciesABCD B2--- C45-- D795- E3578

Distance-Based Methods Given such an input matrix we want to find an edge-weighted tree where the leafs of the tree correspond to the species and the distances measured between two leaves corresponds to the corresponding matrix value for the leaves

UPGMA UPGMA (Unweighted Pair Group Method with Arithmetic mean) is the oldest distance matrix method  Uses a distance matrix representing measure of genetic distance between pairs of species being considered  Clusters the two closest species  Compute new distance matrix using arithmetic mean to first cluster  Repeat until all species grouped

UPGMA A B C E D A B C E D

Estimation of Branch Length Scaled trees, where the length of the branches correspond to the degree to which sequences have diverged are called cladograms If rates of evolution are assumed to be constant in all lineages then internal nodes are placed at equal distances from each of the species they give rise to on a bifurcating tree (UPGMA ex.)

UPGMA So UPGMA is very simple and generates rooted trees, however… Major weakness is that the algorithm assumes that rates of evolution are the same among different lineages This does not fit existing biological data, so probably shouldn’t use UPGMA to build phylogenetic trees

Transformed Distance Method Several distance matrix-based alternatives to UPGMA allow different rates of evolution within different lineages  Oldest and simplest is the transformed distance method which takes advantage of an outgroup  Other lineages only evolve separately from each other after they diverged and since the outgroup diverged first we can use it as a frame of reference to compare how much the other lineages evolved by seeing when they diverged

Neighbor’s Relation Method One variant of UPGMA tries to pair species in such a way as to minimize the sum of the branch lengths  On a rooted tree, pairs of species separated from each other by only one node are called neighbors  We have important relationships between neighbors of a phylogenetic tree with four nodes

Neighbor’s Relation Method A B C D a b d e c d AC + d BD = d AD + d BC = a + b + c + d + 2e = d AB + d CD + 2e d AB + d CD < d AC + d BD d AB + d CD < d AD + d BC The following hold for this tree

Neighbor’s Relation Method Consider all possible pairwise arrangements of four species, and determine which satisfies the four point condition (set of 2 inequalities) This process can be iterated to generate a complete tree, but the process is unfeasible for large sets of species

Neighbor-Joining Methods Other neighborliness approaches are available as well Neighbor-joining methods start with all species arranged in a star tree a b d c e a b c d e

Neighbor-Joining Methods The pair of nodes pulled out (grouped) at each iteration are chosen so that the total length of the branches on the tree is minimized After a pair of nodes is pulled out, it forms a cluster in the tree and is included in further rounds of iteration (and a new distance matrix is generated) The tree’s total branch length is calculated as: Q 12 = (N - 2)d 12 -  (d 1i )-  (d 2i )