PSI Structural Genomics Knowledgebase Helen M. Berman Bottlenecks Workshop April 14, 2008.

1 PSI Structural Genomics Knowledgebase Helen M. Berman Bottlenecks Workshop April 14, 2008

3 PSI SG Knowledgebase Knowledgebase Vision  The PSI Structural Genomics Knowledgebase (PSI SG KB) will turn the products of the PSI effort into major advances in knowledge that can be used to understand living systems and human disease  The PSI SG KB will be a key resource for the advancement of biology, biochemistry, functional genomics, pharmacology, bioinformatics, chemistry, education and clinical medicine

4 PSI SG Knowledgebase Knowledgebase Goals To provide a “marketplace of ideas” that  connects protein sequence information to 3D structures and homology models  enhances functional annotations  provides access to new experimental protocols and materials To kick start and enable advancements in structural genomics  by communicating and providing visibility and accessibility of information and technology advances of the PSI  through presentation and discussion of the most provocative challenges with the general community  by fostering community collaborations

5 PSI SG Knowledgebase To capture, make accessible, and highlight elements of the high throughput pipelines for general use in the community and to leverage such information through the generation of hundreds of thousands of molecular models and functional annotation. Standard metrics will be used to measure progress. Genomic Based Target Selection Data Collection Structure Determination Isolation, Expression, Purification,Crystallization PDB Deposition & Release Models Annotations Publications Metrics Technology Experimental Tracking Scope Target Selection Materials

6 PSI SG Knowledgebase Knowledgebase Users  Biologists  Biochemists  Functional Genomists  Pharmacologists  Bioinformatics  Chemists  Clinical Researchers and Physicians  Teachers and Students

7 KB Site Features News and Events Molecules of Unknown Function Link to Functional Sleuth Gallery Featured Structure Link to Technology Module Technology Feature Search by - Sequence - Keyword - PDB ID

8 PSI SG Knowledgebase PSI SG KB Portal  Collects sequences, common features, and common identifiers  Maintains correspondences in local database  Delivers aggregate reports, inventories, and e- publications which contain links to PSI projects, modules and external resources  Delivers featured articles describing: PSI news and events, featured molecules and technologies, molecules of unknown function  Provides collaborative environments for discussion, annotation, and target suggestions

9 PSI SG Knowledgebase PDB ID Sequence Keyword Queries PSI Modules PSI Centers PSI Info Site Related Biological Resources Archival Sequence Databases Domain Databases (Pfam) Literature (PubMed) TargetD B PepcDB PDB TargetDB Sequences PDB Sequences Portal Resource Database Keyword Database PSI SG KB Portal Databases Models Portal

10 PSI SG Knowledgebase Modules Modules derived from PSI information and external resources  Target Selection & Experimental Data Tracking  Materials Repository  Models  Annotation  Metrics  Technology  Outreach

11 PSI SG Knowledgebase Target Selection & Experimental Data Tracking  Target Selection – PSI-2 BIG4  Family definitions and target management  TargetDB  Search by sequence, Target ID, project site, status, update date, protein name, and source organism  Links to other sequence databases, domain databases, other structural genomics centers, and PDB  Download target data  Target statistics summary  PepcDB  All the functionality of TargetDB plus –Experimental protocols –Detailed status history of experimental trials –Information on failed experiments

12 PSI SG Knowledgebase Experimental Tracking PepcDB Search Form Protocol Keywords Search

16 Materials Repository

17 PSI SG Knowledgebase PSI Materials Repository Module

19 Modeling Portal Current Phase 1 Model Portal contains  Models from 4 PSI centers and 2 public model databases (SwissModel and ModBase) integrated on a common UniProt reference system.  Current release consists of 5.8 million comparative protein models for 1.97 million distinct UniProt entries.




23 PSI SG Knowledgebase Modeling Portal

24 PSI SG Knowledgebase Metrics Module  Provides objective measures of the progress and output of the PSI project  Centered around “Goals and Milestones” document

25 PSI SG Knowledgebase PSI-2 Summary Statistics Updated April 1, 2008 I.1.ANumber of novel experimental PSI-2 structures1031 I.1.BNumber of distinct experimental PSI-2 structures non- redundant sequences 1428 I.1.DTotal number of experimental PSI-2 structures1628 I.1.ENumbers of experimentally determined distinct residues319977 Numbers of experimentally determined novel residues225518 I.2.JNumber of experimental structures of human proteins61 I.2.KNumber of experimental structures of eukaryotic proteins186 I.2.MNumber of experimental structures of membrane proteins1 I.2.NNumber of experimental structures determined at the atomic level using x-ray crystallography 1484 Number of experimental structures determined at the atomic level using NMR methods 144

26 PSI SG Knowledgebase PSI-2 Summary Statistics for Domain and Modeling Leverage I.1.CNumber and Size of BIG Domain Families for which PSI-2 provides the first Experimental Structure Representative 474 Number and Size of MEGA Domain Families for which PSI-2 provides the first Experimental Structure Representative 399 I.1.ENumbers of Experimentally Determined Distinct BIG Family Residues 76579 Numbers of Experimentally Determined Distinct MEGA Family Residues 76121 I.3.ATotal Modeling Leverage583735 I.3.BNovel Modeling Leverage114407 Updated January 15, 2008 Updated February 21, 2008

27 PSI SG Knowledgebase Technology Module Genomic Based Target Selection Data Collection Structure Determination PDB Deposition & Release Functional Annotation Publication PSI Centers are actively developing technologies and methodologies for all aspects of the structure determination pipeline Isolation, Expression, Purification,Crystallization

28 PSI SG Knowledgebase Technology Module Progress  Phase 1 Technology Portal in place  Summary Information from all PSI Centers  Keyword search from KB portal

35 Outreach Module Provides information to the public about the products and accomplishments of the PSI  Media reports  Publications  Community activities  Plans for a Nature Gateway

37 Current Annotation Module  10 PSI Interactive Services for Sequence, Structure and Functional Annotations  11 PSI Galleries and Summaries of Sequence, Structure and Functional Annotations  35 other resources for annotation Provides paths to unravel sequence, structure, function relationships

38 PSI SG Knowledgebase Annotation Module

40 Biological Annotation of Novel Proteins March 7,8 2008 Calit2, UCSD  Participants  PSI groups  Annotation system authors  General biological community  Outcome  Recommendations for standard annotations  Processes for community input

41 PSI SG Knowledgebase Standard Annotations Genomic features: gene identifier, name and synonyms, operon/regulon mappings Protein sequence features: amino acid sequence, taxonomy & phylogeny, sequence database accession, isoform, SNPs, PTMs, sequence families, residue conservation. Structure features: oligomeric state, structure and functional domains, DNA binding motifs, nests & clefts, sites of interaction, residue regions of protein-protein, ligand-protein, catalytic sites, secondary structure, structural neighbors and comparison of groups of structures with common feature, properties/features mapped to 3D and their similarities (e.g. electrostatics, cavities, conserved residues, quality assessment ) Ligands: chemical structure, interactions, functional role. Functional classification: GO, FunCat, EC, epitope mapping, cellular location, organ location, substrate specificity, disease involvement Mapping to Biological Systems: mapping to networks and pathways (e.g. Reactome, Kegg, HPRD, BioCyc, Reactome, KEGG, HPRD, NetPath, MINT, MIPS, DIP, STRING, STITCH, PROLINKS) Literature: synonyms for protein names, links to PubMed by database identifier and related text and authors

42 PSI SG Knowledgebase Future Improvements Experimental Data Tracking -  Standardization of the protocols in PepcDB  PepcDB data deposition tool  Integration with the Materials Repository Materials Repository -  Searchable database of clones  Ordering system  Integration with PepcDB and PSI SGKB Models Module -  Public web service interface  Additional quality assessment  Interactive homology modeling

43 PSI SG Knowledgebase Future Improvements Technology Module -  Improved navigation over technology topic areas  Keyword search option of descriptions and publications PSI SGKB -  Integration with Nature Gateway  Simple presentation and search of standard annotations  Incorporation of data about ligands and modified-residues  Molecular visualization tool

44 PSI SG Knowledgebase Acknowledgements KB TeamModules Wendy TaoTorsten Schwede (Models) Raship ShahAndrei Kouranov (Exp. Data Tracking) James ChunPaul Adams (Technology) John WestbrookWladek Minor (Publications) Josh La Baer (Materials) Rajesh Nair (Metrics) Access Information

