Management and Distribution of Chemical Data in the Protein Data Bank John Westbrook, Dimitris Dimitropoulos, Jasmine Young, Peter Rose, Philip E. Bourne.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
3D Molecular Structures C371 Fall Morgan Algorithm (Leach & Gillet, p. 8)
1.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Dictionaries and Ontologies in Structural Biology.
Update on PDB Data Deposition Specifications
Recent developments 1) Tests (outlier analysis) and Bug fixing ( with Paul) 2) Regeneration of Values of Bonds and Bond-angles existing all structures.
The MEMOPS Programming Framework Wayne Boucher, Cambridge
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
High Throughput Processing of the Structural Information of the Protein Data Bank Zoltán Szabadka, Vince Grolmusz Department of Computer Science Eötvös.
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
Structure Representation and Coordinates Format Lecture 3 Structural Bioinformatics Dr. Avraham Samson
Recommendations and Questions wwPDB/CCDC/D3R Ligand Validation Workshop Center for Integrative Proteomics Research, Rutgers 7/30-31/2015 Group D, Academic.
Worldwide Protein Data Bank Worldwide Protein Data Bank Agenda  Welcome and Introductions  Overview of recent wwPDB progress.
Number of released entries Year. Growth of Molecular Complexity Number of Chains Year Number of Structures Containing that Number of Chains.
Richard White Biodiversity Data. Outline Biodiversity: what is it? – Definitions: is biodiversity: A resource? Something which can be measured? How to.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Bringing Structure to Biology: Small Molecules and the PDBe
Evaluation of Structure Quality Using RCSB PDB Tools Kyle Burkhardt, Lead Data Annotator The RCSB PDB at Rutgers University.
X-ray crystallography NMR cryoEM Experimental approaches for structural biology.
23 rd August 2005CCP4-RCSB Workshop IUCr 2005 Florence Italy 1 N6: A Protein Crystallographic Toolbox: The CCP4 Software Suite and RCSB PDB Deposition.
IDInstanceStatusSelectionDetailsScore LIGA505 LIGAND CODE MATCHED XYZ XYPA503CLOSE MATCH XYPA504NO MATCH MANA500PASSED GLC A501 PASSED NAG A502 PASSED.
Increasing the Value of Crystallographic Databases Derived knowledge bases Knowledge-based applications programs Data mining tools for protein-ligand complexes.
EMBL-EBI Adel Golovin MSDsite The project is funded by the European Commission as the TEMBLOR, contract-no. QLRI-CT under the RTD programme.
Usability Issues Documentation J. Apostolakis for Geant4 16 January 2009.
The IUPAC Stability Constants Database (SC-Database) The definitive collection of all significant published metal-complex stability constants Title Structure.
Configuration Management (CM)
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
Worldwide Protein Data Bank Worldwide Protein Data Bank History of the PDB  1970s  Community discussions about how to establish.
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Crystallographic Databases I590 Spring 2005 Based in part on slides from John C. Huffman.
I am not a PDBid I am a Biological Macromolecule Philip E. Bourne University of California San Diego
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
Baylor College of Medicine Wah Chiu*, PI Grigore Pintilie* Matthew Baker* Matthew Dougherty Steven Ludtke Rutgers University Helen Berman, co-PI Catherine.
Data Integration and Management A PDB Perspective.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
Structure database: PDB Tuomas Hätinen. Protein Data Bank A repository for 3-D biological macromolecular structure. It includes proteins, nucleic acids.
EBI is an Outstation of the European Molecular Biology Laboratory. MSDchem and the chemistry of the wwPDB EMBO 22nd-26th September 2008 EMBL-EBI Hinxton.
Protein Data Bank: An Introduction Learning to Use the RCSB PDB Portal.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
Real World Experiences in Operating a Collaboratory: The Protein Data Bank Helen M. Berman Board of Governors Professor of Chemistry.
EM Maps and Models in EMDB/PDB. Growth of EM entries
Worldwide Protein Data Bank wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Worldwide Protein Data Bank Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable.
EMBL-EBI Chemistry & the PDB MSDchem Primary Developer: Dimitris Dimitropoulos.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
Worldwide Protein Data Bank wwPDB Common D&A Project Full Project Team Meeting Rutgers March 16-19, 2010.
Topic 1 Roland Dunbrack. Modeling of Biological Units Model data files of single proteins may require –sequence alignment(s) to templates (entry and chain)
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe Search Services (PDBelite, PDBePro and BIObar) Sanchayita Sen, Ph.D. PDB Depositions.
Economics and Impact of the Protein Data Bank (PDB) Archive
PDBe Protein Interfaces, Surfaces and Assemblies
Take a REST from manual searching: PDBe, programmatically
Introduction to RCSB PDB Data, Tools and Resources
Chapter 2 Molecular Mechanics
Dimitris Dimitropoulos
The Protein Data Bank: Evolution of a key resource in biology
1.b What are current best practices for selecting an initial target ligand atomic model(s) for structure refinement from X-ray diffraction data?
Volume 21, Issue 6, Pages (June 2013)
The temporary site to download BALBES:
Volume 26, Issue 6, Pages e2 (June 2018)
Presentation transcript:

Management and Distribution of Chemical Data in the Protein Data Bank John Westbrook, Dimitris Dimitropoulos, Jasmine Young, Peter Rose, Philip E. Bourne and Helen Berman RCSB Protein Data Bank U.S. Government Chemical Databases and Open Chemistry August 26, 2011

What is the Protein Data Bank? Single international archive for information about the structure of large biological molecules PDB depositions should be restricted to atomic coordinates that are substantially determined by experimental measurements on specimens containing biological macromolecules Outcome of a Workshop on Archiving Structural Models of Biological Macromolecules (2006) Structure 14:

What is the content of the PDB?  Public archive (August 2011)  More than 75,000 entries  More than 550,000 files  Requires over 115 GB of storage  Data dictionaries  Derived data files  For each entry  Atomic coordinates  Sequence information  Description of structure  Experimental data  Release status information  Internal archive  Depositor correspondence  Depositor contact information  Paper records  Documentation  Historical records from Day One

Who manages the PDB? NSF, NIGMS, DOE, NLM, NCI, NINDS, NIDDK NLM EMBL-EBI, Wellcome Trust, BBSRC, NIGMS, EU NBDC-JST

Who uses the PDB? Depositors Users

Number of released entries Year:

Chemical data in PDB Understanding the interactions between proteins and small molecules is key to understanding biological function  Providing accurate chemical descriptions is a major focus of PDB annotation  All polymer and small molecule chemical components are described in the PDB Chemical Component Dictionary  Significant software and data infrastructure has been created to maintain this dictionary and to provide a consistent chemical representation across the PDB archive  Chemical representation in the PDB is under constant scrutiny and is continuously improved

Deposited coordinates Chemical components Perceived covalent structure New? Chemical Component Dictionary Standardize residue/atom nomenclature Yes No Compare with dictionary Process deposited entry Annotate chemical definition How does new chemistry enter the PDB?

PDB entry 3dnb; 1.3 Å resolutionPDB entry 6bna; 2.21 Å resolution Chemical data in PDB are experimentally derived subject to modeling restraints Assessing data quality

How are data checked now?  Chemistry  Polymer (match to sequence DB and internal consistency)  Ligands, ions, inhibitors (match to dictionary)  Geometry  Close contacts  Valence geometry  Torsion angles  Experimental data  Model vs. structure factors

Method-specific Validation Task Forces have been convened to collect recommendations and develop consensus on method-specific issues, including validation checks that should be performed and identification of validation software applications. On-going focus on data quality X-ray Validation  2008 Workshop on Next Generation Validation Tools for the wwPDB  White paper accepted by Structure  Chair: Randy J. Read (University of Cambridge) 3DEM Validation  Meeting September 2010  Chairs: Richard Henderson (Maps, Cambridge University), Andrej Sali (Models, UCSF)  White paper in progress NMR Validation  Meetings held September 2009, January 2011  Report in progress  Chairs: Gaetano Montelione (Rutgers), Michael Nilges (Institut Pasteur) Small-Angle Scattering  Members: Jill Trewhella (University of Sydney), Dmitri Svergun (EMBL Hamburg), Andrej Sali (UCSF), Mamoru Sato (Yokohama City University), John Tainer (Scripps)

Documenting PDB chemistry in the Chemical Component Dictionary  Library of all polymer and non-polymer chemical components in PDB  ~13,000 chemical component definitions  400 additional definitions of amino acid protonation variants  ~700 new components released this year  ~1700 component definitions updated this year  Maintained by members of the wwPDB

wwPDB resources wwpdb.org

Chemical Component Dictionary and data download options  Chemical definitions in mmCIF, PDBML/XML and SDF/MOL formats  Tabulations of SMILES, InChI and InChI key descriptors for each chemical definition  Bundles of coordinates extracted from PDB entries for each ligand in the archive, stored in mmCIF, PDBML and SDF/MOL formats

Chemical Component Dictionary content  Molecular names and synonyms  Chemical formula, formula weight, and formal charge  Atom and residue nomenclature  Polymer linking type  Model coordinates (an example from a PDB entry)  Computed coordinates (Corina or OpenEye)  Connectivity and bond types  Stereochemistry and aromaticity  Systematic names (ACDLabs & OpenEye)  SMILES, InChi, and InChiKey descriptors  Release status and revision history

Chemical Component Dictionary Interpretation Definitions include  Common or representative forms of the molecule  Generally neutral and complete molecules  Off-the-shelf reagents used to prepare an experimental sample  Model coordinates from a single experimental observation  Computed coordinates from programs: Corina or OpenEye/Omega

Searching the Chemical Component Dictionary ligand-expo.rcsb.org Search options  Molecular Name  Formula  SMILES  InChI/InChIKey  PDB component identifier  Chemical substructure Browsing options  Standard and modified amino acids  Standard and modified nucleotides  Selected top-selling pharmaceuticals  Common aromatic ring systems

Ligand Expo: Browse dictionary content

Ligand Expo: View chemical details

Ligand Expo: Find data in related resources

Find small molecules at the RCSB PDB Simple search for all entries containing a particular ligand

RCSB PDB Small molecule Advanced Search  Interactive chemical structure search with graphics  Exact, substructure, superstructure, MW searches  Restricted formula searches

RCSB PDB report and display of molecular interactions

Access  RCSB Protein Data Bank   Ligand Expo  ligand-expo.rcsb.org  wwPDB   Dictionary Resources  mmcif.pdb.org  pdbml.pdb.org

Acknowledgements Operated by two members of the RCSB: Supported by: NIGMS The RCSB PDB is a member of the