Presentation is loading. Please wait.

Presentation is loading. Please wait.

BLAST program selection guide

Similar presentations


Presentation on theme: "BLAST program selection guide"— Presentation transcript:

1 BLAST program selection guide

2 Orthology, Paralogy, Xenology
Homology Orthology, Paralogy, Xenology

3 Fitch WM.  Trends Genet. 2000 May;16(5):227-31. 

4 Analogy vs Homology Analogy
The relationship of any two characters that have descended convergently from unrelated ancestors. Homology The relationship of any two characters that have descended, usually with divergence, from a common ancestral character.

5 Orthology The relationship of any two homologous characters whose common ancestor lies in the cenancestor of the taxa from which the two sequences were obtained. Paralogy The relationship of any two homologous characters arising from a duplication of the gene for that character. Xenology The relationship of any two homologous characters whose history, since their common ancestor, involves an interspecies (horizontal) transfer of the genetic material for at least one of those characters.

6 A classic example (Figure from NCBI)

7 Test Yourself A1 – B1 A1 – B2 A1 – C3 B1 – C2 C2 – C3 B2 – C3 C3 – AB1

8 Test Yourself A1 – B1 = Ortho A1 – B2 = Ortho A1 – C3 = Ortho
B1 – C2 = Para (out) C2 – C3 = Para (in) B2 – C3 = Ortho C3 – AB1= Xeno

9 Homology on a Genome-Scale
How many and which genes are common to two or more organisms? Which genes differentiate one organism from another? How is homology related to function?

10 Orthologs are the set of genes/proteins with gene trees identical to the species tree.
We can understand other types of homology relationships by comparison to the species tree. But often we don’t know the species tree, and phylogenetic methods are complex

11 Consider two genomes Use BLASTP to compare one set of proteins (proteome) to the other Which set will you use as the query and which as the database? What criteria will you use to define “a match”? GenomeA – gene 1 GenomeB– gene 1 A1, A3, B2 and B3 are homologs (assuming the aligned regions overlap) GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3

12 Reciprocal Best Hits Use BLASTP to compare sets of proteins (proteome) to each other First using GenomeA to query against GenomeB Then using GenomeB to query against GenomeA Save only one best match for each query Save only the reciprocal best matches as “orthologs” GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 GenomeA – gene 1 GenomeB– gene 1 Lose A3-B2 and A1-B3 homology GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3

13 One case where RBH works
GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 One case where RBH works GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 GenomeA – gene 1 Glucose transport GenomeB – gene 2 Glucose transport GenomeA – gene 3 Fructose transport GenomeB – gene 3 Galactose transport

14 One case where RBH fails
GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 One case where RBH fails GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 In paralogs- duplication since speciation GenomeA – gene 1 Glucose transport GenomeA– gene 3 Glucose transport GenomeB– gene 2 Fructose transport GenomeB – gene 3 Galactose transport

15 Software/Methods for Predicting Orthologs from Genome Sequences
RBH RSD (Reciprocal Shortest Distance) INPARANOID RIO Orthostrapper Ortholuge TribeMCL OrthoMCL

16

17 Li L, Stoeckert CJ Jr, Roos DS
Li L, Stoeckert CJ Jr, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res Sep;13(9):

18 Pre-computed OrthoMCL results

19 Evaluating performance
No “gold standard” set of true orthologs Latent Class Analysis Agreement between methods provides confidence 27,562 proteins from 6 eukarotes assigned to Pfams

20 Performance Metrics actual \ predicted negative positive Negative TN
Accuracy – Proportion correct TN+TP/total TPR (Recall) – Proportion of predicted positives that are correct TP/FP+TP Sensitivity – Proportion of positives correctly predicted TP/FN+TP Specificity – Proportion of negatives correctly predicted TN/TN+FP actual     \     predicted negative positive Negative TN FP Positive FN TP

21 Chen F, Mackey AJ, Vermunt JK, Roos DS
Chen F, Mackey AJ, Vermunt JK, Roos DS. Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS ONE Apr 18;2(4):e383.

22 Method Comparison

23 Is context useful for assigning homology type?
Prokaryotes vs eukaryotes Evolutionary origin Paralogs that arise as tandem repeats of single genes Parlogs that arise from duplication of larger regions Xenologs that arise from acquisition of a similar gene from another lineage

24 Example: pectate lyases of soft-rot enterobactia may be SymBets, but genome context suggests they may not be orthologs


Download ppt "BLAST program selection guide"

Similar presentations


Ads by Google