Alexis Dereeper Homology analysis and molecular phylogeny CIBA courses – Brasil 2011.

Slides:



Advertisements
Similar presentations
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Advertisements

Phylogenetic analysis To infer and study evolutionary history of homologous gene families Manuel Ruiz (CIRAD, Data Integration team) Alexis Dereeper (IRD)
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
An Introduction to Phylogenetic Methods
Wellcome Trust Workshop Working with Pathogen Genomes Module 6 Phylogeny.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic reconstruction
Types of homology BLAST
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
© Wiley Publishing All Rights Reserved. Phylogeny.
Bioinformatics and Phylogenetic Analysis
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Probabilistic methods for phylogenetic trees (Part 2)
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
Bioinformatics tools for phylogeny and visualization
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Multiple sequence alignment
Phylogenetic Analysis
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
An Introduction to Bioinformatics
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
Christian M Zmasek, PhD 15 June 2010.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
How to Raise the Dead: The Nuts & Bolts of Ancestral Sequence Reconstruction Jeffrey Boucher Theobald Laboratory.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
Introduction to Phylogeny Cédric Notredame Centro de Regulacio Genomica Adapted from Aiden Budd’s Lecture on Phylogeny.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Copyright OpenHelix. No use or reproduction without express written consent1.
Phylogeny and Genome Biology Andrew Jackson Wellcome Trust Sanger Institute Changes: Type program name to start Always Cd to phyml directory before starting.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Phylogenetics.
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Step 3: Tools Database Searching
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
What is BLAST? Basic BLAST search What is BLAST?
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
Phylip PHYLIP (the PHYLogeny Inference Package) is a package of programs for inferring phylogenies (evolutionary trees). PHYLIP is the most widely-distributed.
What is BLAST? Basic BLAST search What is BLAST?
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
Phylogenetic basis of systematics
Basics of BLAST Basic BLAST Search - What is BLAST?
BLAST program selection guide
Phylogenetic Inference
Goals of Phylogenetic Analysis
Methods of molecular phylogeny
Summary and Recommendations
Explore Evolution: Instrument for Analysis
Pairwise Sequence Alignment
Basic Local Alignment Search Tool
Summary and Recommendations
Presentation transcript:

Alexis Dereeper Homology analysis and molecular phylogeny CIBA courses – Brasil 2011

Alexis Dereeper Data selection Sequence alignment Method selection Bayesian Maximum likelihood Parsimony Calculate or estimate the better tree fitting the data Test the reliability of the obtained tree Probabilistic methods Distance methods Calculate distance Model? Optimization steps for a phylogenetic analysis CIBA courses – Brasil 2011

Alexis Dereeper Phylogeny.fr “The Phylogeny.fr platform transparently chains programs to automatically perform phylogenetic analysis tasks” CIBA courses – Brasil 2011

Alexis Dereeper Homology analysis What is sequence homology? Not a quantitative concept (to differentiate to similarity or identity : 28%identity): genes are homologous or not Homologs: genes coming from a common ancestor Paralogs: homologs coming from a duplication event Orthologs: homologs coming from a speciation event Homology and function: homology does not mean same function systematically. Closest orthologs may have the same function but more distant orthologs show rarely the same phenotypic role (but same role in a specific metabolic pathway) On the other hand, paralogs rapidly acquire different functions. CIBA courses – Brasil 2011

Alexis Dereeper How are homologous sequences similar? From 100% identity to a few nt/aa in common No rule, no limit. Estimation is based on the probability that 2 sequences are similar by chance (e-value): DNA: e-value 70% Protein: e-value 25% Sequences without noticeable resemblance can be homologous (similarity found at the 3D structure level). Otherwise, a important resemblance is generally interpreted as a homology, and not as a convergent evolution CIBA courses – Brasil 2011 Homology analysis

Alexis Dereeper How to detect homology? By sequence comparison= sequence alignment 1- Local alignment (ex:Blast) Conceived to search for similar regions Alignment of a particular sequence against a bank of sequences (Swith &Waterman) 2- Global alignment (ex: ClustalW) Conceived to compare homologous sequences on their full length (Needleman & Wunsh) CIBA courses – Brasil 2011 Homology analysis

Alexis Dereeper Classical Blast output Different Blast programs : ●BlastN (Query: DNA / Subject : DNA) ●BlastP (Query: protein/ Subject : protein) ●BlastX (Query: DNA / Subject : protein) ●TBlastN (Query: protein/ Subject : DNA) ●TBlastX (Query: translated DNA / Subject : translated DNA) score Evalue= inform the accuracy of score CIBA courses – Brasil 2011 Homology analysis

Alexis Dereeper Blast Explorer Enable an assisted selection of homologous sequences using various criterias Post-processing of Blast results: Guide tree (similarity tree) and possible selection on branches and leaves Score / evalue distribution Taxonomic arborescence of hits CIBA courses – Brasil 2011

Alexis Dereeper BBMH method (Best Blast Mutual Hits) ou RBH (Reciprocal Best Hit) Ortholog databases/banks: ●Inparanoid (eukaryotes) ●HomoloGene (eukaryotes) ●OrthoMCL DB ●COG (Clusters of Ortholog Groups of proteins) (prokaryotes et eukaryotes) ●GreenPhyl (plants) Proteome Species1 Proteome Species2 CIBA courses – Brasil 2011 Homology analysis

Alexis Dereeper Phylogenetic analysis Step 1 : Multiple alignment (global alignment) Alignment softwares: ClustalW Muscle Tcoffee 3DCoffee (optimize the alignment with 3D structure) Mafft Alignment formats : Fasta, Clustal, Phylip, Nexus Alignment visualization/edition softwares SeaView Jalview BioEdit fast slow CIBA courses – Brasil 2011

Alexis Dereeper Step 2 : Alignment cleaning Removal of divergent regions showing a low phylogenetic signal (not very informative) These regions may not be homologous or may have been saturated by substitutions (ex: synonymous sites in coding regions) => Cleaned alignment more suitable for a phylogenetic analysis Alignment curation software GBlocks CIBA courses – Brasil 2011 Phylogenetic analysis

Alexis Dereeper Step 3 : Phylogenetic reconstruction Step 3a: Choose a method for phylogenetic reconstruction 4 main methods/algorithms: Distance method 2 by 2 (UPGMA, Neighbor Joining) oFastDist, BIONJ, Neighbor Maximum parsimony oDNAPars, TNT Maximum likelihood oPhyML, PAML Bayesian inference oMrBayes, Beast Output format : distance matrix, Newick format Choose the correct compromise between speed and performance CIBA courses – Brasil 2011 Phylogenetic analysis

Alexis Dereeper Step 3 : Phylogenetic reconstruction Step 3b: Choose parameters and evolution models Different evolution models indicating the substitution rate for aa or nt: DNA oJuke Cantor, Kimura, F81, HKY85, GTR protein oJTT, WAG, Dayhoff Evolution test softwares: Test and selection of the best substitution model (and parameters) adapted to dataset (having the maximum likelihood) ProtTest, ModelTest (based on PhyML) CIBA courses – Brasil 2011 Phylogenetic analysis

Alexis Dereeper Step 3 : Phylogenetic reconstruction Step 3c: Estimate the branch robustness Bootstrap procedure 1- Re-sampling of sequences on columns : creation of a pseudo-alignment by taking some sites randomly and tree computing again. 2- Reiterate the process N times. 3- For each branch of the initial tree, we count the number of times we can observe it into bootstrap trees. The higher is this number, the more accurate is the branch aLRT test (approximate Likelihood Ratio Test) (Anisimova & Gascuel, Syst Biol, 2006) Integrated in PhyML Much faster (PhyML launched only one time) CIBA courses – Brasil 2011 Phylogenetic analysis

Alexis Dereeper Step 4 : Visualization and edition of phylogenetic tree Graphical tools available to display trees from Newick format : TreeDyn DrawGram, DrawTree ATV NJPlot Graphical output formats : PNG, SVG, PDF… Step 5 : Interpretation of the tree CIBA courses – Brasil 2011 Phylogenetic analysis