BLAST program selection guide

Slides:



Advertisements
Similar presentations
Large scale genomes comparisons Bioinformatics aspects (Introduction) Fredj Tekaia Institut Pasteur EMBO Bioinformatic and Comparative.
Advertisements

MCB 5472 Blast, Psi BLAST, Perl: Arrays, Loops J. Peter Gogarten Office: BPB 404 phone: ,
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Phylogenetic reconstruction
Types of homology BLAST
Comparative genomics Joachim Bargsten February 2012.
M ulti P aranoid Automatic Clustering of Orthologs and Inparalogs Shared by Multiple Proteomes Andrey Alexeyenko Ivica Tamas Gang Liu Erik L.L. Sonnhammer.
Xenolog: Homologs resulting from horizontal gene transfer.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Bioinformatics and Phylogenetic Analysis
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
"Nothing in biology makes sense except in the light of evolution" Theodosius Dobzhansky.
FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways.
Protein Modules An Introduction to Bioinformatics.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Bioinformatics Genome anatomy Comparisons of some eukaryotic genomes Allignment of long genomic sequences Comparative genomics Oxford Grid Reconstruction.
Microbial Evolution Zoology/Anthro/Botany 410 Nicole T. Perna April24, 2014.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
Alexis Dereeper Homology analysis and molecular phylogeny CIBA courses – Brasil 2011.
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
MCB 5472 Assignment #5: RBH Orthologs and PSI-BLAST February 19, 2014.
The Evolutionary History of Biodiversity
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Genome Alignment. Alignment Methods Needleman-Wunsch (global) and Smith- Waterman (local) use dynamic programming Guaranteed to find an optimal alignment.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
Genome Analysis II Comparative Genomics Jiangbo Miao Apr. 25, 2002 CISC889-02S: Bioinformatics.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Using blast to study gene evolution – an example.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Organon Tim Hulsen, Oss, November 11, 2003.
Phylogeny & Systematics
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
BINF6201/8201: Molecular Sequence Analysis Dr. Zhengchang Su Office: 351 Bioinformatics Building Office hours: Tuesday and Thursday:
Ch. 26 Phylogeny and the Tree of Life. Opening Discussion: Is this basic “tree of life” a fact? If so, why? If not, what is it?
First & Last Name August X, 2000 Evolution
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Evolutionary genomics can now be applied beyond ‘model’ organisms
Evolution of eukaryotic genomes
Phylogeny & the Tree of Life
Basics of Comparative Genomics
Pipelines for Computational Analysis (Bioinformatics)
In-Text Art, Ch. 16, p. 316 (1).
Genome Annotation Continued
The Tree of Life From Ernst Haeckel, 1891.
Chapter 26 Phylogeny and the Tree of Life
Identify D. melanogaster ortholog
Phylogeny and Systematics
What do you with a whole genome sequence?
Chapter 20 Phylogenetic Trees.
Pairwise Sequence Alignment
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
Phylogenetics Chapter 26.
Gautam Dey, Tobias Meyer  Cell Systems 
Basics of Comparative Genomics
Basic Local Alignment Search Tool
Chapter 26 Phylogeny and the Tree of Life
Chapter 20 Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
1 2 Biology Warm Up Day 6 Turn phones in the baskets
Presentation transcript:

BLAST program selection guide http://www.ncbi.nlm.nih.gov/blast/producttable.shtml#tab31

Orthology, Paralogy, Xenology Homology Orthology, Paralogy, Xenology

Fitch WM.  Trends Genet. 2000 May;16(5):227-31. 

Analogy vs Homology Analogy The relationship of any two characters that have descended convergently from unrelated ancestors. Homology The relationship of any two characters that have descended, usually with divergence, from a common ancestral character.

Orthology The relationship of any two homologous characters whose common ancestor lies in the cenancestor of the taxa from which the two sequences were obtained. Paralogy The relationship of any two homologous characters arising from a duplication of the gene for that character. Xenology The relationship of any two homologous characters whose history, since their common ancestor, involves an interspecies (horizontal) transfer of the genetic material for at least one of those characters.

A classic example (Figure from NCBI)

Test Yourself A1 – B1 A1 – B2 A1 – C3 B1 – C2 C2 – C3 B2 – C3 C3 – AB1

Test Yourself A1 – B1 = Ortho A1 – B2 = Ortho A1 – C3 = Ortho B1 – C2 = Para (out) C2 – C3 = Para (in) B2 – C3 = Ortho C3 – AB1= Xeno

Homology on a Genome-Scale How many and which genes are common to two or more organisms? Which genes differentiate one organism from another? How is homology related to function?

Orthologs are the set of genes/proteins with gene trees identical to the species tree. We can understand other types of homology relationships by comparison to the species tree. But often we don’t know the species tree, and phylogenetic methods are complex

Consider two genomes Use BLASTP to compare one set of proteins (proteome) to the other Which set will you use as the query and which as the database? What criteria will you use to define “a match”? GenomeA – gene 1 GenomeB– gene 1 A1, A3, B2 and B3 are homologs (assuming the aligned regions overlap) GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3

Reciprocal Best Hits Use BLASTP to compare sets of proteins (proteome) to each other First using GenomeA to query against GenomeB Then using GenomeB to query against GenomeA Save only one best match for each query Save only the reciprocal best matches as “orthologs” GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 GenomeA – gene 1 GenomeB– gene 1 Lose A3-B2 and A1-B3 homology GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3

One case where RBH works GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 One case where RBH works GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 GenomeA – gene 1 Glucose transport GenomeB – gene 2 Glucose transport GenomeA – gene 3 Fructose transport GenomeB – gene 3 Galactose transport

One case where RBH fails GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 One case where RBH fails GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 In paralogs- duplication since speciation GenomeA – gene 1 Glucose transport GenomeA– gene 3 Glucose transport GenomeB– gene 2 Fructose transport GenomeB – gene 3 Galactose transport

Software/Methods for Predicting Orthologs from Genome Sequences RBH RSD (Reciprocal Shortest Distance) INPARANOID RIO Orthostrapper Ortholuge TribeMCL OrthoMCL

Li L, Stoeckert CJ Jr, Roos DS Li L, Stoeckert CJ Jr, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003 Sep;13(9):2178-89.

Pre-computed OrthoMCL results http://www.orthomcl.org/

Evaluating performance No “gold standard” set of true orthologs Latent Class Analysis Agreement between methods provides confidence 27,562 proteins from 6 eukarotes assigned to Pfams

Performance Metrics actual \ predicted negative positive Negative TN Accuracy – Proportion correct TN+TP/total TPR (Recall) – Proportion of predicted positives that are correct TP/FP+TP Sensitivity – Proportion of positives correctly predicted TP/FN+TP Specificity – Proportion of negatives correctly predicted TN/TN+FP actual     \     predicted negative positive Negative TN FP Positive FN TP

Chen F, Mackey AJ, Vermunt JK, Roos DS Chen F, Mackey AJ, Vermunt JK, Roos DS. Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS ONE. 2007 Apr 18;2(4):e383.

Method Comparison

Is context useful for assigning homology type? Prokaryotes vs eukaryotes Evolutionary origin Paralogs that arise as tandem repeats of single genes Parlogs that arise from duplication of larger regions Xenologs that arise from acquisition of a similar gene from another lineage

Example: pectate lyases of soft-rot enterobactia may be SymBets, but genome context suggests they may not be orthologs