Using blast to study gene evolution – an example.

Slides:



Advertisements
Similar presentations
EVIDENCE OF EVOLUTION.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
The Concept of Functional Constraint. The intensity of purifying selection is determined by the degree of intolerance characteristic of a site or a genomic.
 Species evolve with significantly different morphological and behavioural traits due to genetic drift and other selective pressures.  Example – Homologous.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Tree of Life Chapter 26.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Classification of Living Things. 2 Taxonomy: Distinguishing Species Distinguishing species on the basis of structure can be difficult  Members of the.
Phylogenetic reconstruction
Comparative genomics Joachim Bargsten February 2012.
Molecular Evolution Revised 29/12/06
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication.
CS273a Lecture 8, Win07, Batzoglou Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion.
Bioinformatics and Phylogenetic Analysis
Alternative splicing and evolution Daniel Jeffares.
CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.
Protein Modules An Introduction to Bioinformatics.
Short Primer on Comparative Genomics Today: Special guest lecture 12pm, Alway M108 Comparative genomics of animals and plants Adam Siepel Assistant Professor.
The Evidence for Evolution. Problem: How did the great diversity of life originate? Alternative Solutions: A. All living things were created at the same.
The diversity of genomes and the tree of life
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Comparative Genomics of the Eukaryotes
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
Origins and impact of constraints in evolution of gene families Boris E. Shakhnovich and Eugene V.Koonin Genome Research 2006, October 19 Stella Veretnik.
Molecular Clock. Rate of evolution of DNA is constant over time and across lineages Resolve history of species –Timing of events –Relationship of species.
Lecture #3 Evidence of Evolution
The Evolutionary History of Biodiversity
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Introduction to Phylogenetics
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Comparative genomics Haixu Tang School of Informatics.
Phylogenetic analysis taken from and es/MSAPhylogeny.htm.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Organon Tim Hulsen, Oss, November 11, 2003.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster.
Chapter 26 Phylogeny and Systematics. Tree of Life Phylogeny – evolutionary history of a species or group - draw information from fossil record - organisms.
Evidence for Evolution by Natural Selection.
Phylogeny and Taxonomy. Phylogeny and Systematics The evolutionary history of a species or related species Reconstructing phylogeny is done using evidence.
Lesson Overview Lesson Overview Modern Evolutionary Classification 18.2.
Phylogeny and the Tree of Life
Sequence similarity, BLAST alignments & multiple sequence alignments
Genetics and Evolutionary Biology
BLAST program selection guide
Basics of Comparative Genomics
Comparative Genomics.
Pipelines for Computational Analysis (Bioinformatics)
EVIDENCE OF EVOLUTION.
Lecture #3 Evidence of Evolution
The Evidence for Evolution
Chapter 4 The Interrupted Gene.
Phylogeny and Systematics
LECTURE 1: Phylogeny and Systematics
Chapter 26- Phylogeny and Systematics
6.2 Evidence of Evolution Key concepts: What evidence supports the theory of evolution? How do scientists infer evolutionary relationships among organisms?
Evolutionary Trees.
Unit Genomic sequencing
Basics of Comparative Genomics
Evolution and Natural Selection
Phylogeny and the Tree of Life
Presentation transcript:

Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b. Using blast to study gene evolution – an example.

NCBI diagram

Orthologs Homologous sequences are orthologous if they were separated by a speciation event: If a gene exists in a species, and that species diverges into two species, then the copies of this gene in the resulting species are orthologous.

Orthologs Orthologs will typically have the same or similar function in the course of evolution. Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes.

Orthologs ancestor a speciation a a descendant 2 descendant 2

Paralogs Homologous sequences are paralogous if they were separated by a gene duplication event: If a gene in an organism is duplicated, then the two copies are paralogous.

Paralogs Orthologs will typically have the same or similar function. This is not always true for paralogs due to lack of the original selective pressure upon one copy of the duplicated gene, this copy is free to mutate and acquire new functions.

Paralogs a Duplication a b

Orthologs and Paralogs Duplication a b Orthologs Speciation Orthologs a b a b Species a Species b

NCBI diagram

What is conservation ? Functionally or structurally important sites are conserved: Conserved sites  “slow” evolving sites Variable sites  “fast evolving” sites A functionally or structurally important sites – are subject to stronger evolutionary pressure = Purifying selection force

Finding conservation regions from an alignment S1 KITAYCELARTDMKLGLDFYKGVSLANWVCLAKWESGYN S2 MPFERCELARTLKRMADADIRGVSLANWVCLAKWFWDGG S3 MPFERCELARTLKRMMDADIRGVSLANWVCLAKWFWDGG From the MSA and the tree, one can determine how conserved is a gene.

Mol. Biol. Evol. (2005) 22:598-606

Protocol

Search for Human-mouse orthologous protein pairs Step 1 - BLAST Search for Human-mouse orthologous protein pairs

Step 1 - BLAST The orthologs are defined as pairs of reciprocal BLAST hits. Eliminate genes with more than one potential orthologous sequence. Select only genes which the human protein was functionally annotated.

Step 2 – Evolutionary Rates For each orthologous pair: Alignment at the amino acid level. Measure conservation The data set contained 6,776 human-mouse gene pairs.

Step 3 – Assignment of Temporal Categories Using BLAST for finding homologous genes in 6 different eukaryotic genomes.

Schizosaccharomyces pombe Takifugu rubripes Caenorhabditis elegans Drosophila melanogaster Arabidopsis thaliana Saccharomyces cerevisiae

What is Old ? What is Presence ? METAZOANS Presence of any homolog in all the 6 genomes. DEUTEROSTOMES What is Presence ? TETRAPODS Using an e-value cutoff of 10-4 in BLAST. Drosophila melanogaster Caenorhabditis elegans Takifugu rubripes

METAZOANS - Animals whose bodies consist of many cells, as distinct from Protozoa, which are unicellular; all animals commonly recognized as animals. DEUTEROSTOMES - The second of the two main groups of bilaterally symmetrical animals. The name derives from 'deutero' (second) 'stome' (mouth), referring to the origin of the definitive mouth as an opening independent from the blastopore of the embryo. TETRAPODS - Any four-legged animals, including mammals, birds, reptiles and amphibians.

Results

Negative correlation between “age” of genes and the rate of evolution CONSERVATION CONSERVATION CONSERVATION CONSERVATION

Control. Changing the sensitivity of the BLAST detection to a more conservative one of 10-10, did not significantly affect the result.

Explanations

Functional constraint remained constant throughout the evolutionary history of each gene, but the newer genes are less constrained than older genes.

Functional constraints are not constant, rather they are weak at the time of origin of a gene and they become progressively more stringent with age.

Eran Elhaik, Niv Sabath, and Dan Graur Mol. Biol. Evol. 23(1):1–3. 2006

Goal To show that these results are an artifact caused by our inability to detect similarity when genetic distances are large.

Simulation

The evolutionary process Ala Arg Val … Replacement probabilities … Rat Dog Cat Mouse Fly אז נגיד שבשורש של העץ היה V. בהינתן V, התרחשה אבולוציה לאורך העץ ובהינתן המודל האבולוציוני.

The evolutionary process Ala Arg Val … Replacement probabilities … Rat Dog Cat Mouse Fly V אז נגיד שבשורש של העץ היה V. בהינתן V, התרחשה אבולוציה לאורך העץ ובהינתן המודל האבולוציוני.

The evolutionary process Ala Arg Val … Replacement probabilities … Rat Dog Cat Mouse Fly V בקודקוד הבא V נשאר V כי רוב הסיכוי שלא נראה התמרה עפ"נ זמן אבו' קצר.

The evolutionary process Ala Arg Val … Replacement probabilities … Rat Dog Cat Mouse Fly L V אנחנו ממשיכים הלאה אח"כ V מותמר ב-Leu,

The evolutionary process Ala Arg Val … Replacement probabilities … L I M V Rat Dog Cat Mouse Fly ושוב בפיצול הבא Leu לא משתנה. וכך ממשיכים עד שממלאים את כל הקודקודים בעץ. הקודקוד' הפנימיים מייצגים את הרצפים שבאבות הקדמוניים שנכחדו, והרצפים הסופיים שבעלים מייצגים רצפים שקיימים היום. לכן התהליך הזה מחקה אבו, של עמודה אחת באליינמנט.

The evolutionary process And repeat the process for all positions… (assume: each position evolves independently) ואנחנו יכולים לחזור על התהליך ליצירת רצפים ארוכים, כאשר ההנחה היא שכל עמדה עוברת אבולוצ' בצורה ב"ת בשאר הרצף. Rat L M T G S H M G N F I I Mouse L M T G S G M A N H V I Cat I M T G S H I G Y A M F Dog M M T G S G I G L T R A Fly V M T G S W R G R M Y A ...

Generate terminal sequences with the following phylogenetic relationships: All the genes originated in the common ancestor of A,B,C,D,E and are, thus, of equal age. Remote homologous genes from increasingly more distant taxa. Similar to the human and mouse orthologous genes. A B C D E

Simulation They simulated genes with 101 different rates. High rate -> higher likelihood for a amino acid replacement in each branch.

Simulation Use BLAST, at the same way that Alba and Castresana used it, to detect homology between gene A to genes C,D and E.

Only one different – the groups names OLD METAZOANS DEUTEROSTOMES TETRAPODS SENIORS ADULTS TEENAGERS TODDLERS

Results

Same as Alba and Castresana

But all the simulated genes are at the same “age”. What is the problem ???

We can only count genes that are identified as homologous by the protocol

Alba and Castresana may have, thus, failed to spot the vast majority of homologs from among the fastest evolving genes

The vast majority of the fastest evolving genes are undetectable even when the cutoffs are extremely permissive.

Conclusion

The inverse relationship between evolutionary rate and gene age is an artifact caused by our inability to detect similarity when genetic distances are large.

Since genetic distance increases with time of divergence and rate of evolution, it is difficult to identify homologs of fast evolving genes in distantly related taxa. Thus, fast evolving genes may be misclassified as “new”.

Slowly evolving genes evolve slowly !!! So, the only conclusion that can be drawn from Alba and Castresana’s study is that