Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using blast to study gene evolution – an example.

Similar presentations


Presentation on theme: "Using blast to study gene evolution – an example."— Presentation transcript:

1 Using blast to study gene evolution – an example.
Introduction to bioinformatics, lesson 3b. Using blast to study gene evolution – an example.

2 NCBI diagram

3 Orthologs Homologous sequences are orthologous if they were separated by a speciation event: If a gene exists in a species, and that species diverges into two species, then the copies of this gene in the resulting species are orthologous.

4 Orthologs Orthologs will typically have the same or similar function in the course of evolution. Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes.

5 Orthologs ancestor a speciation a a descendant 2 descendant 2

6 Paralogs Homologous sequences are paralogous if they were separated by a gene duplication event: If a gene in an organism is duplicated, then the two copies are paralogous.

7 Paralogs Orthologs will typically have the same or similar function.
This is not always true for paralogs due to lack of the original selective pressure upon one copy of the duplicated gene, this copy is free to mutate and acquire new functions.

8 Paralogs a Duplication a b

9 Orthologs and Paralogs
Duplication a b Orthologs Speciation Orthologs a b a b Species a Species b

10 NCBI diagram

11 What is conservation ? Functionally or structurally important sites are conserved: Conserved sites  “slow” evolving sites Variable sites  “fast evolving” sites A functionally or structurally important sites – are subject to stronger evolutionary pressure = Purifying selection force

12 Finding conservation regions from an alignment
S1 KITAYCELARTDMKLGLDFYKGVSLANWVCLAKWESGYN S2 MPFERCELARTLKRMADADIRGVSLANWVCLAKWFWDGG S3 MPFERCELARTLKRMMDADIRGVSLANWVCLAKWFWDGG From the MSA and the tree, one can determine how conserved is a gene.

13 Mol. Biol. Evol. (2005) 22:

14 Protocol

15 Search for Human-mouse orthologous protein pairs
Step 1 - BLAST Search for Human-mouse orthologous protein pairs

16 Step 1 - BLAST The orthologs are defined as pairs of reciprocal BLAST hits. Eliminate genes with more than one potential orthologous sequence. Select only genes which the human protein was functionally annotated.

17 Step 2 – Evolutionary Rates
For each orthologous pair: Alignment at the amino acid level. Measure conservation The data set contained 6,776 human-mouse gene pairs.

18 Step 3 – Assignment of Temporal Categories
Using BLAST for finding homologous genes in 6 different eukaryotic genomes.

19 Schizosaccharomyces pombe
Takifugu rubripes Caenorhabditis elegans Drosophila melanogaster Arabidopsis thaliana Saccharomyces cerevisiae

20 What is Old ? What is Presence ?
METAZOANS Presence of any homolog in all the 6 genomes. DEUTEROSTOMES What is Presence ? TETRAPODS Using an e-value cutoff of in BLAST. Drosophila melanogaster Caenorhabditis elegans Takifugu rubripes

21 METAZOANS - Animals whose bodies consist of many cells, as distinct from Protozoa, which are unicellular; all animals commonly recognized as animals. DEUTEROSTOMES - The second of the two main groups of bilaterally symmetrical animals. The name derives from 'deutero' (second) 'stome' (mouth), referring to the origin of the definitive mouth as an opening independent from the blastopore of the embryo. TETRAPODS - Any four-legged animals, including mammals, birds, reptiles and amphibians.

22 Results

23 Negative correlation between “age” of genes and the rate of evolution
CONSERVATION CONSERVATION CONSERVATION CONSERVATION

24 Control. Changing the sensitivity of the BLAST detection to a more conservative one of 10-10, did not significantly affect the result.

25 Explanations

26 Functional constraint remained constant throughout the evolutionary history of each gene, but the newer genes are less constrained than older genes.

27 Functional constraints are not constant, rather they are weak at the time of origin of a gene and they become progressively more stringent with age.

28 Eran Elhaik, Niv Sabath, and Dan Graur
Mol. Biol. Evol. 23(1):1–

29 Goal To show that these results are an artifact caused by our inability to detect similarity when genetic distances are large.

30 Simulation

31 The evolutionary process
Ala Arg Val Replacement probabilities Rat Dog Cat Mouse Fly אז נגיד שבשורש של העץ היה V. בהינתן V, התרחשה אבולוציה לאורך העץ ובהינתן המודל האבולוציוני.

32 The evolutionary process
Ala Arg Val Replacement probabilities Rat Dog Cat Mouse Fly V אז נגיד שבשורש של העץ היה V. בהינתן V, התרחשה אבולוציה לאורך העץ ובהינתן המודל האבולוציוני.

33 The evolutionary process
Ala Arg Val Replacement probabilities Rat Dog Cat Mouse Fly V בקודקוד הבא V נשאר V כי רוב הסיכוי שלא נראה התמרה עפ"נ זמן אבו' קצר.

34 The evolutionary process
Ala Arg Val Replacement probabilities Rat Dog Cat Mouse Fly L V אנחנו ממשיכים הלאה אח"כ V מותמר ב-Leu,

35 The evolutionary process
Ala Arg Val Replacement probabilities L I M V Rat Dog Cat Mouse Fly ושוב בפיצול הבא Leu לא משתנה. וכך ממשיכים עד שממלאים את כל הקודקודים בעץ. הקודקוד' הפנימיים מייצגים את הרצפים שבאבות הקדמוניים שנכחדו, והרצפים הסופיים שבעלים מייצגים רצפים שקיימים היום. לכן התהליך הזה מחקה אבו, של עמודה אחת באליינמנט.

36 The evolutionary process
And repeat the process for all positions… (assume: each position evolves independently) ואנחנו יכולים לחזור על התהליך ליצירת רצפים ארוכים, כאשר ההנחה היא שכל עמדה עוברת אבולוצ' בצורה ב"ת בשאר הרצף. Rat L M T G S H M G N F I I Mouse L M T G S G M A N H V I Cat I M T G S H I G Y A M F Dog M M T G S G I G L T R A Fly V M T G S W R G R M Y A ...

37 Generate terminal sequences with the following phylogenetic relationships:
All the genes originated in the common ancestor of A,B,C,D,E and are, thus, of equal age. Remote homologous genes from increasingly more distant taxa. Similar to the human and mouse orthologous genes. A B C D E

38 Simulation They simulated genes with 101 different rates.
High rate -> higher likelihood for a amino acid replacement in each branch.

39 Simulation Use BLAST, at the same way that Alba and Castresana used it, to detect homology between gene A to genes C,D and E.

40 Only one different – the groups names
OLD METAZOANS DEUTEROSTOMES TETRAPODS SENIORS ADULTS TEENAGERS TODDLERS

41 Results

42 Same as Alba and Castresana

43 But all the simulated genes are at the same “age”. What is the problem ???

44 We can only count genes that are identified as homologous by the protocol

45 Alba and Castresana may have, thus, failed to spot the vast majority of homologs from among the fastest evolving genes

46 The vast majority of the fastest evolving genes are undetectable even when the cutoffs are extremely permissive.

47 Conclusion

48 The inverse relationship between evolutionary rate and gene age is an artifact caused by our inability to detect similarity when genetic distances are large.

49 Since genetic distance increases with time of divergence and rate of evolution, it is difficult to identify homologs of fast evolving genes in distantly related taxa. Thus, fast evolving genes may be misclassified as “new”.

50 Slowly evolving genes evolve slowly
!!! So, the only conclusion that can be drawn from Alba and Castresana’s study is that


Download ppt "Using blast to study gene evolution – an example."

Similar presentations


Ads by Google