Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparing the Complete S. cerevisiae and C. elegans Proteomes: Orthology and Divergence Stephen A. Chervitz Saccharomces Genome Database NCBI Boston University.

Similar presentations


Presentation on theme: "Comparing the Complete S. cerevisiae and C. elegans Proteomes: Orthology and Divergence Stephen A. Chervitz Saccharomces Genome Database NCBI Boston University."— Presentation transcript:

1 Comparing the Complete S. cerevisiae and C. elegans Proteomes: Orthology and Divergence Stephen A. Chervitz Saccharomces Genome Database NCBI Boston University

2 2 Goals of this study n Explore protein sequence and domain conservation between S. cerevisiae and C. elegans. u Unicellular vs. multicellular lifestyles n Classify yeast and worm similarity groups using functional annotation of yeast genes. n Enhance the SGD website and add value to the worm genomic sequence.

3 3 Organization of this study n Shared core biology u Whole protein sequence comparisons n Divergence u Protein domain comparisions n No gene predictions n No mitochondrial sequence

4 4 Definitions n Orthologs: Genes from different species that perform the same biological function and are likely to be evolved from a common ancestral gene. n Paralogs: Genes that perform different biological functions in the same species that likely arose by duplication and divergence from a common ancestral gene.

5 5 Genome Scorecards Saccharomyces cerevisiaeCaenorhabditis elegans x200X20,000 No. of cells: 1 ~1000 Size (Mbp): 12 97 Chromosomes: 16 6 Predicted ORFs: 6,217 19,099 Percent coding: 72% 27% ORFs with gene names: 3,344 (53%) 688 (4%)

6 6 Core biology is carried out by similar numbers of proteins

7 7 Building a Biological Rosetta Stone P-Value Yeast ORFs with functional description Worm orthologs with functional description 1e-10 86% 64% 1e-20 89% 69% 1e-40 93% 61% 1e-60 96% 74% 1e-80 96% 74% 1e-100 98% 77% 1e-200 98% 88%

8 8 Distribution of core biological functions conserved in both yeast and worm

9 9 Core Biological Functions Signal Transduction: kinases, phosphatases, Ras superfamily and other GTP-binding proteins,GDP/GTP exchange factors, ADP-ribosylation factors, adenylyl/guanylyl cyclases, phosphatidylinositol kinases, EF-hand proteins DNA/RNA Metabolism: polymerases, helicases, topoisomerases, repair/recombination-related, nucleases, primases, splicing factors, initiation/elongation factors (transcription & translation), tRNA synthetases, histone acetylases/deacetylases Transport & Secretion: ABC transporters, permeases, vesicle coat & fusion proteins, clatherin-accociated, protein targeting, signal recognition particle, nuclear pore-associated Cytoskeletal : Actin, myosin, tubulin, actin-related proteins, actin-interacting proteins, septins, cytokinesis-related proteins

10 10 Core Biological Functions (cont’d) Ribosomal : ribosomal proteins (small & large subunit), ribosome processing proteins Protein Folding and Degradation: heat shock proteins, chaperonins, proteasome subunits, ubiquitin-related, peptidyl prolyl cis-trans isomerase, protein disulfide isomerases, aminopeptidases, post-translational modifying enzymes (farnesyltransferase, myristoyltransferase, glycosylation, GPI- anchoring) Intermediary Metabolism: dehydrogenases, reductases, mutases, lyases, isomerases, carboxylases, decarboxylases, nucleotide biosynthetic enzymes, transaminases, deaminases, epimerases, oxygenases, cytochromes, flavoproteins

11 11 Constructing Sequence Similarity Groups

12 12 Similarity Groups: MCM DNA replication initiator complex

13 13 Similarity Groups: Tubulin

14 14 Multiple Sequence Alignments

15 15 Domain Analysis n 122 common eukaryotic protein domains. n Associated with regulation of gene expression and signal transduction. n Compare occurrence and domain architectures in yeast and worm protein sequences. n Position-dependent weight matrices (profiles) to detect domains (PSI-BLAST). n Classify worm-only, yeast-only, and shared domains.

16 16 Worm-Only Domains n Nuclear hormone receptors n Epidermal growth factor n Degenerins n FMRFamides (neuropeptides) n Cadherin n PTB (phosphotyrosine binding) n T-box, SMAD (transcription factor domains) n Insulin-like peptides n Laminin NT

17 17 Yeast-Only Domains n C6 (Zn-binding cluster) n ASPES (DNA-binding)

18 18 Shared Domains (Yeast & Worm) n Protein kinase catalytic n C2H2 Finger n AAA ATPase n DAG Kinase n Arrestin n Ankyrin n SWI/SNF helicase n RING-finger n bHLH n RHO GAP/GEF n Plecstrin homology n SH3 n Ubiquitin n SH2 n cNMP-signaling domains n CaM EF-hands n Homeodomains n Potassium channels n 7TM receptors n HINT n Immunoglobulin n LRR n vWA n MATH n POZ n LIM

19 19 Frequency of occurrence of common domains Domain counts are normalized to the number of proteins with a given domain per 1000 genes.

20 20 Conclusions n Core biological functions are carried out by orthologous proteins occurring in comparable numbers in yeast and worm. n These represent approx. 40% of the predicted yeast ORFs and 20% of the predicted worm ORFs. n Regulatory and signaling proteins in worm do not have orthologs in yeast but often share domains. n Complete results are available online at SGD at http://genome-www.stanford.edu/Saccharomyces/worm

21 21 Future Directions n Incorporate more sensitive sequence search results. n More sophisticated clustering scheme. u Multi-domain proteins and weak similarities. n Up-to-date with to changes in the genomic datasets. u Add/remove protein coding regions u Correction of errors in the genomic sequence u Sequence name changes n Extended annotation support. u Controlled vocabularies, gene function ontologies. n Comparative genomics framework for additional genomes. n More flexible browsing of genome-wide similarities. u Prototype yeast genome protein similarity Java viewer

22 22 Genome-wide protein similarity view n Explore protein sequence similarities within or between genomes n Graphical user interface n Available at SGD for the yeast genome n Sequence Resources, Protein Similarity View

23 23

24 24 Acknowledgements Saccharomyces Genome Database (Stanford) u Gavin Sherlock u Cathy Ball u Selina Dwight u Midori Harris u Kara Dolinski u Shuai Weng u Eric Hester u Mike Cherry u David Botstein

25 25 Acknowledgements (cont’d) NCBI (Nat’l Library of Medicine) u L. Aravind u Eugene Koonin Boston University u Scott Mohr u James Freeman u Temple Smith n Neomorphic Software (Berkeley) u www.neomorphic.com

26 26 Extra slides

27 27 Single-linkage clustering and multi- domain proteins “Chaining” 1. 2. 3.

28 28 Whole genomic DNA microarray DeRisi et al.(1997) Science 278: 680

29 29 Building a Biological Rosetta Stone


Download ppt "Comparing the Complete S. cerevisiae and C. elegans Proteomes: Orthology and Divergence Stephen A. Chervitz Saccharomces Genome Database NCBI Boston University."

Similar presentations


Ads by Google