Pipelines for Computational Analysis (Bioinformatics)

Pipelines for Computational Analysis (Bioinformatics)
Tutorial on Comparative Genomics part IV NSF DBI

What are the questions? We can use computational techniques from bioinformatics to address a number of core questions about the evolution of species from the large number of genome sequences that have been generated. Some core questions are: Which genes are found uniquely in a genome or set of closely related genomes? Which genes were duplicated or lost at different points in the evolution of species? Which protein encoding genes have evolved particularly rapidly, potentially changing function under selective pressure, on any particular lineage?

How do we answer these questions?
We will need a full bioinformatics pipeline to address these questions systematically. The first few steps in this pipeline involve identifying the sets of related genes (based upon sequence similarity) from the same genome and from the genomes of other species to identify a gene family. A tool called BLAST is used to do this efficiently. A sample BLAST output from the NCBI website is shown below.

Multiple Sequence Alignment
Once a gene family has been obtained, the next step is to produce a multiple sequence alignment that shows the historical relationship of each amino acid position in a given protein to that in every other protein and identifies positions that do not share a historical relationship (are not homologous). Taken from als/index.php/eb/article/view/e b.2010.e7/2536, the alignment at right shows a sample multiple sequence alignment.

Phylogenies (Gene Trees)
Phylogenetic trees show the relationship of taxa, which can be species or genes, to each other. The vertical direction in the trees shown approximates time and following the branching pattern shows the relationships. Internal points (nodes) that connect taxa reflect their most recent common ancestor. There are several ways to estimate phylogenetic trees from patterns of sequence similarity, most commonly using an explicit mathematical model for the evolutionary process of sequence divergence. A tree derived from an alignment as an example is shown below.

Reconciling gene trees and species trees
From large collections of genome sequence data, we have established relationships among many species. These have been assembled into phylogenetic trees, called species trees. We now know that we can build gene trees for individual genes. Comparing these trees tells us where particular genes originated, were duplicated, or lost. The totality of this information tells us the sets of genes that changed along particular species tree lineages. Taken from the PrIMETV website ( a sample reconciliation of primate genes is shown.

Detecting Positive Diversifying Selection
Because there are 64 codons that encode 20 amino acids, there are then 2 types of nucleotide changes in protein coding regions of genomes. Synonymous substitutions change the codon but to one that still encodes the same amino acid. Nonsynonymous substitutions encode change the codon to one that encodes a different amino acid. If the protein is evolving without selective pressures, one expects that rates of these two types of changes to be the same. However, because proteins already function, many amino acid mutations make proteins worse and are eliminated from the population by selection. This makes the rate of synonymous change faster. However, in rare cases, the nonsynonymous rates is faster. This can be an indication of a selective pressure to change protein function on a particular gene tree branch. The figures on the next slide show the genetic code and a lineage of a phylogenetic tree with the ratio biased towards more nonsynonymous change.

Figures for Positive Diversifying Selection
On the left is a figure of the genetic code taken from GeneticCode21-version-2.svg.png. The multiple codons that encode each individual amino acid (besides M and W) are shown. On the right is an image showing the ratio of nonsynonymous and synonymous rates for the myostatin gene, where it is greater than 1 on the grey lineages of the species tree for this single copy gene. This is taken from Mol. Phylo. Evol. 33:782 (2004).

Pipelines for Computational Analysis (Bioinformatics)

Similar presentations

Presentation on theme: "Pipelines for Computational Analysis (Bioinformatics)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pipelines for Computational Analysis (Bioinformatics)

Similar presentations

Presentation on theme: "Pipelines for Computational Analysis (Bioinformatics)"— Presentation transcript:

Similar presentations

About project

Feedback