Presentation on theme: "Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso."— Presentation transcript:
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso
Papers Pellegrini, et al. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. (1999) PNAS 96, 4285- 4288. Marcotte, et al. Localizing proteins in the cell from their phylogenetic profiles. (2000) PNAS 97, 12115-12120.
Basic Idea: Sequence alignment is a good way to infer protein function, when two proteins do the exact same thing in two different organisms. Proteins with > 30% sequence identity have the same fold, and typically the same function.
Basic Idea: But can we decide if two proteins function in the same pathway, such as histidine biosynthesis, or the same biomolecular structure, such as the flagella or ribosome, even if they don’t do the exact same thing? Yes. Assume that if the two proteins function together they must evolve in a correlated fashion: so every organism that has a homolog of one of the proteins must also have a homolog of the other protein.
Phylogenetic Profile For a given protein, BLAST against N sequenced genomes. Construct a vector with N coordinates. If protein has a homolog in the organism n, set coordinate n to 1. Otherwise set it to 0. Protein P1: 0 0 1 0 1 1 0 0
Functional Link Assign a degree of functional linkage between P1 and P2 based on the number of positions (or bits) at which their profiles differ. Protein P2: 0 1 1 0 1 1 0 0 Protein P1: 0 0 1 0 1 1 0 0
What They Did: Computed phylogenetic profiles for 4,290 proteins in E. Coli. Aligned each protein sequence P i with the proteins from 16 other fully sequenced genomes. Proteins coded by genome n are defined as including a homolog of P i if they align to P i with a score that is deemed statistically significant.
Conclusions Comparing profiles is useful tool for identifying the complex or pathway in which a protein participates. As the number of fully sequenced genomes increases scientists will be able to construct longer more informative profiles. In 1999, 100 more genomes were due to be completed in next few months. Suggests that as eukaryotic genomes come out profiles will be a useful tool for studying pathways in higher organisms.
Evolutionary Origin of Eukaryotic Cell Mitochondria, chloroplasts and perhaps other organelles descended from microbes captured by progenitors of eukaryotic cells. You exist because of a bad case of indigestion!
Evolutionary Origin of Eukaryotic Cell This endosymbiosis was stabilized by shifting of genes of organelle into nuclear genome and transport systems being established to shuttle organellar proteins form cytoplasm into organelles. Contemporary mitochondrial genome encode only a few genes (<20), primarily large integral membrane proteins which can’t be transported.
Evidence Proteins of these organelles have molecular properties resembling prokaryotic rather than eukaryotic proteins: 1.Average lengths 2.Domain composition 3.Amino acid composition 4.Homologs among prokaryotes
Phylogenetic profiles Will show that proteins with similar phylogenetic profiles localize to similar subcellular locations. Actually, will primarily show this for the mitochondria.
Calculating phylogenetic profiles In this study, the value at each position of the profile is equal to -1/log E, where E is the BLAST expectation value of best matching protein in a genome. Calculated only for E < 1x10 -6 and 1.0 otherwise. So zero is a perfect match and one is no match.
Three Categories Prokaryote Derived: Only has homologs in prokaryotes. Eukaryote Derived: Only has homologs in eukaryotes. Organism Specific: Has no homologs. Why split these categories? Should have different functions and roles in mitochondria.
Linear Discriminant Functions MPNon-MP t Varying t increases prediction accuracy at the expense of coverage.
Testing Algorithm First, predicted the location of yeast proteins of known location (open diamonds). Second, a jackknife test was performed. Repeated 100 times with different random sets (filled diamonds). Coverage 58% at 50% accuracy. Third, used yeast proteins as training set and worm proteins as test set. Coverage 65% at 50% accuracy.
Prediction Applied algorithm to all yeast proteins. Estimate ~630 total mitochondrion- targeted genes in yeast or 10% of genome. Applied algorithm to all worm proteins. Estimate ~660 total mitochondrion- targeted genes in worms of 4% of genome.
Verifications Tested whether functions of newly predicted mitochondrial proteins matched functions of known mitochondrial protein better than the functions of a random set of proteins. (Jacard Coefficient, Pie Charts) Fraction of predicted mitochondrial proteins with predicted transmembrane segments or signal peptides. 2D gel of whole rat liver and human placental mitochondria reveals ~250-350 visible proteins.
Conclusions There is information in the phylogenetic profiles, but it is quite noisy. Yields approximate numbers of genes migrated to the nuclear genomes from the mitochondria. Gives even more evidence for endosymbiotic theory. However, verifications did not confirm results as much as one might like. Perhaps fundamental assumption flawed.