1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Classification systems have changed over time as information has increased. Section 2: Modern Classification K What I Know W What I Want to Find Out L.
Phylogenetic trees Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Chapter 2.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
© Wiley Publishing All Rights Reserved. Phylogeny.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
The Tree of Life From Ernst Haeckel, 1891.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Lecture 24 Inferring molecular phylogeny Distance methods
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Introduction to Bioinformatics Molecular Phylogeny Lesson 5.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Phylogenetic trees Tutorial 6. Distance based methods UPGMA Neighbor Joining Tools Mega phylogeny.fr DrewTree Phylogenetic Trees.
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
P HYLOGENETIC T REE. OVERVIEW Phylogenetic Tree Phylogeny Applications Types of phylogenetic tree Terminology Data used to build a tree Building phylogenetic.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
BINF6201/8201 Molecular phylogenetic methods
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Chapter 8 Molecular Phylogenetics: Measuring Evolution.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Construction of Substitution Matrices
Calculating branch lengths from distances. ABC A B C----- a b c.
17.2 Modern Classification
Molecular Phylogeny. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Tutorial 5 Phylogenetic Trees.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Part 9 Phylogenetic Trees
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Fitch-Margoliash Algorithm 1.From the distance matrix find the closest pair, e.g., A & B 2.Treat the rest of the sequences as a single composite sequence.
Reconstructing and Using Phylogenies 16. Concept 16.1 All of Life Is Connected through Its Evolutionary History All of life is related through a common.
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Multiple Sequence alignment and Phylogenetic trees.
Phylogenetic basis of systematics
Phylogeny - based on whole genome data
Clustering methods Tree building methods for distance-based trees
Methods of molecular phylogeny
Motif discovery and Phylogenetic trees.
The Tree of Life From Ernst Haeckel, 1891.
Inferring phylogenetic trees: Distance and maximum likelihood methods
Phylogenetic Trees.
Unit Genomic sequencing
Presentation transcript:

1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein from a model organism. We know what it does but we do not know who does the same in human? A protein related to a disease We have no idea what it does in relation to the disease

retinol-binding protein odorant-binding protein apolipoprotein D

RBP4 and obesity retinol-binding protein odorant-binding protein apolipoprotein D

Scoring matrices let you focus on the big (or small) picture retinol-binding protein retinol-binding protein PAM250 PAM30 Blosum45 Blosum80

PSI-BLAST generates scoring matrices more powerful than PAM or BLOSUM retinol-binding protein retinol-binding protein

Phylogenetic trees

7 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features between organisms. Today, molecular sequence data are mainly used for phylogenetic analyses. One tree of life A sketch Darwin made soon after returning from his voyage on HMS Beagle (1831–36) showed his thinking about the diversification of species from a single stock (see Figure, overleaf). This branching, extended by the concept of common descent, Phylogeny in Greek =the origin of the tribe

8 Haeckel (1879)Pace (2001)

9 Molecular phylogeny uses trees to depict evolutionary relationships among organisms. These trees are based upon DNA and protein sequence data Human Chimpanzee Gorilla Orangutan Gorilla Chimpanzee Orangutan Human Molecular analysis: Chimpanzee is related more closely to human than the gorilla Pre-Molecular analysis: The great apes (chimpanzee, Gorilla & orangutan) Separate from the human

10 What can we learn from phylogenetics tree?

Was the extinct quagga more like a zebra or a horse? Determine the closest relatives of one organism in which we are interested

12 Which species are closest to Human? Human Chimpanzee Gorilla Orangut an Gorilla Chimpanzee Orangutan Human

13 Human Evolution Modern Man Neanderthals

14 Example Metagenomics A new field in genomics aims the study the genomes recovered from environmental samples. A powerful tool to access the wealthy biodiversity of native environmental samples Help to find the relationship between the species and identify new species

10 6 cells/ ml seawater 10 7 virus particles/ ml seawater >99% uncultivated microbes How can we discover new species in the ocean?

16 Relationships can be represented by Phylogenetic Tree or Dendrogram A B C D E F

17 Phylogenetic Tree Terminology Graph composed of nodes & branches Each branch connects two adjacent nodes A B C D E F R

18 Rooted tree Human Chimp Chicken Gorilla Human Chimp Chicken Gorilla Un-rooted tree Phylogenetic Tree Terminology

19 Rooted vs. unrooted trees

20 How can we build a tree with molecular data? -Trees based on DNA sequence (rRNA) -Trees based on Protein sequences

Basic algorithm for constructing a rooted tree Unweighted Pair Group Method using Arithmetic Averages (UPGMA) Assumption: Divergence of sequences is assumed to occur at a constant rate  Distance to root is equal Sequence a ACGCGTTGGGCGATGGCAAC Sequence b ACGCGTTGGGCGACGGTAAT Sequence c ACGCATTGAATGATGATAAT Sequence d ACACATTGAGTGTGATAATA abcd

22 abcd a0875 b8039 c7308 d5980 Moving from Similarity to Distance Distance Table Sequence a ACGCGTTGGGCGATGGCAAC Sequence b ACACATTGAGTGTGATCAAC Sequence c ACACATTGAGTGAGGACAAC Sequence d ACGCGTTGGGCGACGGTAAT Distances * Sequences Dab = 8 Dac = 7 Dad = 5 Dbc = 3 Dbd = 9 Dcd = 8 * Can be calculated using different distance metrics

23 abcd a0875 b8039 c7308 d5980 a d c b Step 1:Choose the nodes with the shortest distance and fuse them. Constructing a tree starting from a STAR model

24 a Step 2: recalculate the distance between the rest of the remaining sequences (a and d) to the new node (e) and remove the fused nodesfrom the table. d c,b e a ade a056 d507 e670 D (ea) = (D(ac)+ D(ab)-D(cb))/2 D (ed) = (D(dc)+ D(db)-D(cb))/2 abcd a0875 b8039 c7308 d5980

25 !!!The distances Dce and Dde are calculated assuming constant rate evolution d c e a ade a056 d507 e670 b D ce D de Step 3: In order to get a tree, un-fuse c and b by calculating their distance to the new node (e)

26 a a,d c e ade a056 d507 e670 b D ce D de f Next… We want to fuse the next closest nodes

27 a c e fe f04 e40 b D af D de f d D ce D bf Finally D (ef) = (D(ea)+ D(ed)-D(ad))/2 We need to calculate the distance between e and f

28 a d c b acbd f e From a Star to a tree

29 IMPORTANT !!! Usually we don’t assume a constant mutation rate and in order to choose the nodes to fuse we have to calculate the relative distance of each node to all other nodes. Neighbor Joining (NJ)- is an algorithm which is suitable to cases when the rate of evolution varies

30 Human Evolution Tree Neighbor Joining UPGMA

The down side of phylogenetic trees - Using different regions from a same alignment may produce different trees.

Problems with phylogenetic trees

Bacillus E.coli Pseudomonas Salmonella Aeromonas Lechevaliera Burkholderias Problems with phylogenetic trees

What to do ?

35 A.We create new data sets by sampling N positions with replacement. B.We generate such pseudo-data sets. C.For each such data set we reconstruct a tree, using the same method. D.We note the agreement between the tree reconstructed from the pseudo-data set to the original tree. Note: we do not change the number of sequences ! Bootstrapping

Bootstrapped tree Less reliable Branch Highly reliable branch

37 Open Questions Do DNA and proteins from the same gene produce different trees ? Can different genes have different evolutionary history ?

38