Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Integration and Management A PDB Perspective.

Similar presentations


Presentation on theme: "Data Integration and Management A PDB Perspective."— Presentation transcript:

1 http://www.pdb.org/ info@rcsb.org Data Integration and Management A PDB Perspective

2 What is PDB? Single international repository of three- dimensional data for biological macromolecules Public community resource Established at Brookhaven in 1971 (7 structures) Moves to RCSB in 1998 wwPDB established in 2004 > 25,000 structures in PDB

3 Community Scientific Community - at all levels –Structural biologists (crystallography, NMR, cryo-EM) –Biologists –Computational biologists Journals General Community –Secondary school –General public Internal –RCSB PDB staff –wwPDB members

4 Data Representation Macromolecular Crystallographic Information Framework XML DTD/Schema Mapping SQL Schema Mapping CORBA IDL Mapping Supporting emerging ontology representations - OWL

5 Elements of Dictionary Metadata Data Attributes –Definition –Examples –Data type (primitive type/regular expression patterns) –Range or allowed values Classes –Categories –Subcategories –Category groups Associations –Parent-child relationships –Interdependencies/exclusivity –Methods

6 Difficult Issues Resolving semantic ambiguities – encoding meaning Integrating controlled vocabularies Separation of primary and derived information Supporting rapid evolution of science

7 What’s Driving Data Definition IUCr-sponsored community effort Automated data acquisition Data management and data exchange for PDB New technologies (e.g. cryo-electron microscopy) High-throughput structure determination and structural genomics

8 Target Selection Protein Production Structure Determination PDB Deposition Merged Project Data Crystal Production Project Database Exchange Dictionary Typical Project Deposition Data Flow

9 Data Sharing Nightmare

10 Incremental Data Pipeline

11 Current Integration Strategy Provide software tools to collect bits of data from the output from each program step Convert data in log and output files to a common representation Merge the data corresponding to the successful outcome Provide an editor tool to enter remaining data and check consistency of results

12 Data Deposition and Annotation PDB ID Distribution Site Depositor Archival Data Core DB PDB Entry ADIT Annotate Validate Depositor Approval Validation Report Corrections Step 2 Step 3 Step 4 Step 1 Functional Annotation Step 5

13 Integrated Data Processing System ADIT ADITsrv ADIT ADITsrv Reports Final Files MAXIT Validation Database Loader Metadata Dictionaries Data Views Client Input Tool Data Assembled by Depositor ADIT ADITsrv

14 Features of System Different dictionaries without software changes Metadata customization of both functionality and content Automatically scales with changes in content Can be distributed to multiple deposition sites Reference data and standard nomenclature (ERFs) Self-monitoring

15 Data Distribution ApplicationsApplications mmCIF Data Files ( Data Reference Standard ) API Servers Relational Database mmCIF Parsers XML Files

16 Automatic Production of Macromolecular Structure API Components PDB Exchange Dictionary + API Specific Data Dictionaries CORBA IDL, SQL Schema, XML DTD/Schemas, Data Loaders Database Access Classes Metamodel Framework

17 Management Complex challenges in technology and sociology Communicate and work with diverse community Help create and enforce community policies and standards Must take advantage of the most current innovations in new technologies New technologies must be introduced so as to enable and not disrupt the users of the resource Beyond all else is the need for good data and a robust data representation

18 Access RCSB Protein Data Bank Site http://www.pdb.org/ OpenMMS site (Java implementation) http://openmms.sdsc.edu/ RCSB PDB Software Download Site (C++ and Python implementation, NDB server) http://deposit.pdb.org/mmcif/FILM/ RCSB PDB Dictionary Resource Site http://deposit.pdb.org/mmcif/ RCSB PDB Beta Data Site ftp://beta.rcsb.org/pub/pdb/uniformity/data/

19 http://www.pdb.org/ info@rcsb.org Operated by three members of the RCSB: Rutgers, The State University of New Jersey; San Diego Supercomputer Center at the University of California, San Diego; Center for Advanced Research in Biotechnology/UMBI/NIST The RCSB PDB is supported by funds from the National Science Foundation (NSF), the National Institute of General Medical Sciences (NIGMS), the Office of Science, Department of Energy (DOE), the National Library of Medicine (NLM), the National Cancer Institute (NCI), the National Center for Research Resources (NCRR), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), and the National Institute of Neurological Disorders and Stroke (NINDS). The RCSB PDB is a member of the wwPDB (http://www.wwpdb.org/)


Download ppt "Data Integration and Management A PDB Perspective."

Similar presentations


Ads by Google