An Overview of the RCSB Protein Data Bank

An Overview of the RCSB Protein Data Bank http://www.pdb.org/ info@rcsb.org

History of the PDB 1970s Community discussions about how to establish a PDB Cold Spring Harbor meeting in protein crystallography PDB established at Brookhaven (October 1971; 7 structures) 1980s Number of structures increases as technology improves Community discussions about requiring depositions IUCr guidelines established Number of structures deposited increases Independent biological databases established – e.g., the NDB 1990s mmCIF project completed Structural genomics begins PDB moves to RCSB 2000s RCSB PDB renewed wwPDB established

PDB Mission To provide the most accurate, well-annotated data in the most timely and efficient way possible to facilitate new discoveries and advances in science

Structural Biology Understand biological processes through structural analyses Several methods (X-ray, NMR, cryo-electron microscopy)

Number of released entries Year

Growth of Molecular Complexity

From the NIH Request for Proposals for Structure Genomics Centers: “The next step beyond the human genome project” Structural Genomics “These studies should lead to an understanding of structure/function relationships and the ability to obtain structural models of all proteins identified by genomics. This project will require the determination of a large number of protein structures in a high- throughput mode.”

The Rules Driving Structural Genomics There is much information derived from structure that is not available from sequence alone yet there are 2-3 orders of magnitude more sequences that structures There is a high likelihood that if two sequences are similar they will have similar structures Two dissimilar sequences can share a similar structure as a result of divergent or convergent evolution Similar structures may confer similar functions

Challenges Growth in number of structures Increase in complexity of structures New methods for structure determination Demand for complex queries Demand for more annotation Integration with other genomic and proteomic information Larger and more diverse community of users

PDB Timeline ?18000057000N/A Average #of Web hits/day 9000?48312178792 # of structures deposited/year 60000? 2379389421727 Total structures 2008200319981993

The Data Pipeline

Target Selection Crystallomics Data Collection Structure Solution Structure Refinement Functional Annotation Structure Determination (X-ray) Isolation, Expression, Purification,Crystallization PDB Deposition Publication

Data Processing Data Flow

System for Data Collection and Archiving Depositor Reports Final Files Database Loader Data Views MAXITValidation Metadata Dictionaries ADIT AutoDep Input Tool Data

Data Processing System Features Different dictionaries without software changes Simple customization of both functionality and content Automatically scales with changes in content Can be distributed to multiple deposition sites Reference data and standard nomenclature (ERFs)

Data Content of Each PDB Entry 1970’s –Name, source, reference, resolution, sequence,secondary structure, crystal data, coordinates, unstructured remarks 1990’s –Name, source, reference,resolution, refinement details, data collection and processing details,symmetry details, biological unit information, missing residues, related entries, sequence, ligand and ions, secondary structure, crystal data, coordinates, few unstructured remarks

Content Coverage

Annotation and Validation ADIT –Reviewing, adding, correcting entry information Maxit –File format conversions Blast Automation Tool results –Sequence discrepancies, protein names, synonyms, source info, EC number Validation Server Reports –Format and nomenclature consistency –Sequence/coordinate mismatches –Geometrical checks (NUCheck, PROCHECK) –Experimental checks (SFCheck) Ligand Depot, ChemDraw RasMol for Visualization PubMed, Citation Tracker, Citation Tool

Data Uniformity Sequence –Resolve anomalies relative to Swiss-Prot, GenBank –Resolve anomalies between sequence and atom Atom nomenclature –Atom naming problems in 40% of structures –Redundant atom labels –Errors in chirality –Biologically active molecule described Ligands –Names standardized –http://deposit.pdb.org/public-components-erf.cif Biological assembly ftp://beta.rcsb.org/pub/pdb/uniformity/data/mmCIF/ The Protein Data Bank: Unifying the Archive. Nucleic Acids Research 2002, 30:245-248

Additional Requirements of Structural Genomics All data in Materials and Methods section of a journal should be captured Tracking of all experiments must be publicly available

Extending Data Dictionaries for Deposition X-ray –Structure determination data items –http://deposit.pdb.org/mmcif/sg-data/xstal.html NMR –Structure determination data items –http:// deposit.pdb.org /mmcif/sg-data/nmr.html Protein Production –http:// deposit.pdb.org /mmcif/sg-data/protprod.html

Current Integration Strategy Collect bits of mmCIF output from each program step Merge the mmCIF data from each step Use ADIT deposition tool to enter remaining data and check results Make all data files available in the representation of the exchange dictionary

Target Registration Database TargetDB http://targetdb.pdb.org/ All targets downloadable in XML (~51,000 Targets) Targets downloaded from 18 centers weekly Target search by: Sequence (FASTA), project target ID, project site, status (selected, cloned, expressed, … in PDB), update date, protein name, source organism Report output in HTML, FASTA, and XML Integrates PDB entry sequences (~55,600 sequences) Includes PDB pre-release sequence data Provides links to related sequence databases Open to all Structural Genomics projects Summary reports of target or project progress

Beyond TargetDB PepcDB Protein Expression, Purification, and Crystallization Database All information about targets including the protocols for protein production

Incremental Data Pipeline

Current Query System Search Fields FLAT FILES DB INTEGRATION LAYER Query Result Browser Query Result Browser Structure Explorer Structure Explorer FTP tree (download) SearchLite KEYWORD SEARCH DERIVED DATA CORE DB BMCD WWW User Interfaces CGI INTEGRATION LAYER LUCENE POMSYBASE

Biological Assembly Tutorial at http://www.rcsb.org/pdb/biounit_tutorial.html View Structure page 1AEW Horse Apoferritin Hempstead, P. D., Yewdall, S. J., Fernie, A. R., Lawson, D. M., Artymiuk, P. J., Rice, D. W., Ford, G. C., Harrison, P. M.: Comparison of the three-dimensional structures of recombinant human H and horse L ferritins at high resolution. J Mol Biol 268 pp. 424 (1997)

Structure Explorer Summary Page Search by EC number Go to EC site Go to NCBI Taxonomy Search by author Go to PubMed Abstract Search for related citations Search by Chemical Component

Design of the New PDB Database 3-tier Architecture –Separates database, applications and presentation –Supports high access rates on multiple machines –Serves very large data sets

Navigation Persistent Navigation Bar Site Search Hierarchical Menu Items Persistent Search Box Integrated Help (Context-sensitive) Getting Started

Browsing Gene Ontology Enzyme Classification Taxonomy Ligands Disease CATH/SCOP

Searching PubMed Abstracts

Detailed Reports

Molecular Visualization Simple viewer built from Molecular BiologyToolkit http://mbt.sdsc.edu Envisioned to be a future query interface, e.g. “what other structures contain this ligand?” Molecular Biology Toolkit authors: John Moreland and Apostol Gramada 4HHB Fermi, G., Perutz, M. F., Shaanan, B., Fourme, R.: The crystal structure of human deoxyhaemoglobin at 1.74 A resolution. J Mol Biol 175 pp. 159 (1984)

Worldwide PDB (wwPDB) –RCSB (Research Collaboratory for Structural Bioinformatics) –PDBj (Osaka University) –Macromolecular Structure Database (EBI) To ensure that PDB files remain in a single archive to best serve the worldwide community of depositors and users http://www.wwpdb.org/

http://www.pdb.org/ Operated by three members of the RCSB: Rutgers, The State University of New Jersey; San Diego Supercomputer Center at the University of California, San Diego; Center for Advanced Research in Biotechnology/UMBI/NIST. The RCSB PDB is supported by funds from the National Science Foundation (NSF), the National Institute of General Medical Sciences (NIGMS), the Office of Science, Department of Energy (DOE), the National Library of Medicine (NLM), the National Cancer Institute (NCI), the National Center for Research Resources (NCRR), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), and the National Institute of Neurological Disorders and Stroke (NINDS).

MSD-EBI RCSB PDB PDBj at Osaka

RCSB-PDB Team RCSB PDB Team: Ken Addess, Helen M. Berman, Wolfgang F. Bluhm, Phil Bourne, Kyle Burkhardt, Al Carlson, Li Chen, Sharon Cousin, Nita Deshpande, Shuchismita Dutta, Zukang Feng, Lew-Christiane Fernandez, Judith L. Flippen-Anderson, Gary Gilliland, Rachel Kramer Green, Vladimir Guranovic, Shri Jain, Jeff Merino-Ott, Rose Oughtred, Irina Persikova, Suzanne Richman, Melcoir Rosas, Kathryn Rosecrans, Bohdan Schneider, Wayne Townsend-Merino, Elizabeth Walker, John Westbrook, Huanwang Yang, Jasmin Yang, Christine Zardecki

An Overview of the RCSB Protein Data Bank

Similar presentations

Presentation on theme: "An Overview of the RCSB Protein Data Bank"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Overview of the RCSB Protein Data Bank

Similar presentations

Presentation on theme: "An Overview of the RCSB Protein Data Bank"— Presentation transcript:

Similar presentations

About project

Feedback