Bringing Structure to Biology: Small Molecules and the PDBe

Slides:



Advertisements
Similar presentations
Crystal Structure EPrints: Source Through the Open Archive Initiative S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge.
Advertisements

Data Curation in Crystallography: Publisher Perspectives JISC Data Cluster Consultation Workshop CCLRC, Didcot, Oxon 10 October 2006.
Configuration management
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
3D Molecular Structures C371 Fall Morgan Algorithm (Leach & Gillet, p. 8)
1.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Insight into Molecular Geometry and Interactions using Small Molecule Crystallographic Data John Liebeschuetz Cambridge Crystallographic.
Dictionaries and Ontologies in Structural Biology.
Helping Students Succeed at Identifying Organic Compounds: Optimizing Location and Content of a Guide to the Literature Susan K. Cardinal & Kenneth J.
High Throughput Processing of the Structural Information of the Protein Data Bank Zoltán Szabadka, Vince Grolmusz Department of Computer Science Eötvös.
Supplement 02CASE Tools1 Supplement 02 - Case Tools And Franchise Colleges By MANSHA NAWAZ.
Management and Distribution of Chemical Data in the Protein Data Bank John Westbrook, Dimitris Dimitropoulos, Jasmine Young, Peter Rose, Philip E. Bourne.
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Protein Interfaces, Surfaces and Assemblies
Recommendations and Questions wwPDB/CCDC/D3R Ligand Validation Workshop Center for Integrative Proteomics Research, Rutgers 7/30-31/2015 Group D, Academic.
Protein Tertiary Structure Prediction
Number of released entries Year. Growth of Molecular Complexity Number of Chains Year Number of Structures Containing that Number of Chains.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Evaluation of Structure Quality Using RCSB PDB Tools Kyle Burkhardt, Lead Data Annotator The RCSB PDB at Rutgers University.
EMBL-EBI MSD-mine. EMBL-EBI MSD-mine overview  Web application for online data analysis and mining For the advanced MSDSD researcher Interactive ad-hoc.
CCP-EM community meeting 7 February 2013 EMDB and beyond Ardan Patwardhan and Gerard Kleywegt Protein Data Bank in Europe EMBL-EBI.
Protein 3D-structure analysis Exercises. Practicals Find update frequency for RCSB PDB: weekly. When was the last update? How many protein structures.
Increasing the Value of Crystallographic Databases Derived knowledge bases Knowledge-based applications programs Data mining tools for protein-ligand complexes.
EMBL-EBI Adel Golovin MSDsite The project is funded by the European Commission as the TEMBLOR, contract-no. QLRI-CT under the RTD programme.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Databank in Europe (PDBe)‏ An Introduction.
SMART Teams: Students Modeling A Research Topic Jmol Training 101!
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
Data and Dissemination Core 1. Overview and EFI Website – Heidi Imker, UIUC 2. EFI LabDB LIMS – Wladek Minor, UVA 3. SFLD – Patsy Babbitt, UCSF (post lunch)
X-ray Validation Package Present status Swanand Gore PDBe D&A meeting : 21-Oct-2010.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Crystallographic Databases I590 Spring 2005 Based in part on slides from John C. Huffman.
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
Data Integration and Management A PDB Perspective.
EBI is an Outstation of the European Molecular Biology Laboratory. MSDchem and the chemistry of the wwPDB EMBO 22nd-26th September 2008 EMBL-EBI Hinxton.
Protein Data Bank: An Introduction Learning to Use the RCSB PDB Portal.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
EBI is an Outstation of the European Molecular Biology Laboratory. Validation & Structure Quality.
EBI is an Outstation of the European Molecular Biology Laboratory. Sanchayita Sen, Ph.D. PDB Depositions Validation & Structure Quality.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
Real World Experiences in Operating a Collaboratory: The Protein Data Bank Helen M. Berman Board of Governors Professor of Chemistry.
CCDC Tools for Mining Structural Databases Or – Building Solid Foundations for a Structure Based Design Campaign John Liebeschuetz,
Worldwide Protein Data Bank Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable.
EMBL-EBI Chemistry & the PDB MSDchem Primary Developer: Dimitris Dimitropoulos.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
Worldwide Protein Data Bank wwPDB Common D&A Project Full Project Team Meeting Rutgers March 16-19, 2010.
©CMBI 2008 Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential.
EMBL-EBI Data Archives – An Overview. The EMBL-EBI mission Provide freely available data and bioinformatics services to all facets of the scientific community.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
EMBL-EBI Dimitris Dimitropoulos MSD-mine. EMBL-EBI MSD-mine overview  Web application for online data analysis and mining  For the advanced MSDSD researcher.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe Search Services (PDBelite, PDBePro and BIObar) Sanchayita Sen, Ph.D. PDB Depositions.
Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Databank in Europe (PDBe)‏ An Introduction.
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
PDBe Protein Interfaces, Surfaces and Assemblies
Take a REST from manual searching: PDBe, programmatically
Introduction to RCSB PDB Data, Tools and Resources
PDBemotif A web based integrated search service to understand ligand binding and secondary structure properties in macromolecular structures.
Getting the Most out of the PDBe
Dimitris Dimitropoulos
1.b What are current best practices for selecting an initial target ligand atomic model(s) for structure refinement from X-ray diffraction data?
Presentation transcript:

Bringing Structure to Biology: Small Molecules and the PDBe

PDBe overview PDB is a core molecular database at EMBL-EBI PDBe is a founding partner of Worldwide Protein Data Bank (wwPDB) Founder of Electron Microscopy Data Bank (EMDB) Mission: Bringing Structure to Biology Major activities: Deposition and annotation site for structural data on biomacromolecules (X-ray, NMR, EM) Integrated resource of high-quality macromolecular structural data and related information Provide tools and services for accessing, exploiting and disseminating structural data to the wider biomedical community The Worldwide Protein Data Bank (wwPDB) consists of organizations that act as deposition, data processing and distribution centers for PDB data.1 Members are: RCSB PDB (USA), PDBe (Europe) and PDBj (Japan), and BMRB (USA – NMR structures). The wwPDB's mission is to maintain a single PDB archive of macromolecular structural data that is freely and publicly available to the global community. We are actively involved in an effort to integrate data from major biomedical resources at EMBL-EBI and across the world. This "Structure Integration with Function, Taxonomy and Sequence" (SIFTS) initiative integrates data from a number of bioinformatics resources and is used by major global sequence, structure and protein-family resources. PDBe specialise in providing tools and services that exploit the wealth of structural knowledge contained within the PDB archive

wwPDB partners Collaborate on “data in” Policy issues Weekly releases Chemical component database Deposition and annotation procedures Archive quality and remediation Journal interactions Validation standards and format specifications Friendly competition on “data out” Serving PDB data with added-value PDB-based services… Other services, resources and activities

PDB Depositions 10,000th PDBe annotated structure - April 2011 (2yf6) Structure-based drug design seeks to identify and optimize such interactions between ligands and their host molecules, typically proteins, given their three-dimensional structures. This optimization process requires knowledge about interaction geometries and approximate affinity contributions of attractive interactions that can be gleaned from crystal structure and associated affinity data. 10,000th PDBe annotated structure - April 2011 (2yf6) www.pdbe.org/2yf6

Chemical Component Dictionary Compounds in the PDB Small molecules bound to macromolecules Individual components of macromolecules wwPDB maintains dictionary descriptions for all unique chemical components Name, synonyms, formula, SMILES, … Atoms and bonds Ideal and representative coordinates Each new component assigned a unique 3-letter identifier Release coincides with the release of the parent PDB entry The Chemical Component Dictionarya is as an external reference file describing all residue and small molecule components found in PDB entries. This dictionary contains detailed chemical descriptions for standard and modified amino acids/nucleotides, small molecule ligands, and solvent molecules. over 14839 ligands. Each chemical definition includes descriptions of chemical properties such as stereochemical assignments, chemical descriptors (SMILES & InChI), systematic chemical names, and idealized coordinates (generated using Molecular Networks' Corina, and if there are issues, OpenEye's OMEGA). The dictionary is organized by the 3-character alphanumeric code that PDB assigns to each chemical component. New chemical component definitions appear in the dictionary as the entries in which they are observed are released in the PDB archive; consequently, the dictionary is updated with each weekly PDB release. The dictionary is regularly reviewed and remediated. This dictionary is part of the core "reference" information of the PDBe relational database and is consistently referenced by all macromolecular structures for all bound molecules as well as standard and modified amino acids. Since every residue and every atom in the PDBe database references a ligand and an atom in this dictionary, this is the repository that defines the link between proteins and chemistry.

Molecule search options Compound name Ligand 3-letter code SMILES Formula (exact or range) e.g. C6-10 N4 O2 S0 Chemical substructure www.pdbe.org/chem

PDBe Home Page http://www.ebi.ac.uk/pdbe

Open chemistry sketchpad Ligands and the PDBe Open chemistry sketchpad

Ligands and the PDBe

Ligands and the PDBe There is a published paper for the entry. Meaning of icons: The source organism was man (or a fungi, etc.), the sample of the biomacromolecule was obtained by expression and purification, X-ray data, the entry contains a protein, (not DNA), and is a small molecule.

2D Ligand Interaction Diagrams www.pdbe.org/leview Interaction diagrams for any given PDB entry Interactive control of distance criteria Diagram customisation Image export png, jpg, eps… Generates schematic 2D diagrams of protein-ligand binding site interactions for any given PDB file. S-benzyl-glutathione (GSB) Human Glyoxalase inhibitor (1guh)

PDBeXpress: rapid access to protein-ligand interaction statistics Understand and assess binding site interactions Provide chemists with quick answers to common questions without the need to construct complex search queries What residues interact? Which enzymes interact? What binds here? www.pdbe.org/express PDBeXpress is a small collection of tools that extract and present information and statistics on protein-ligand binding site interactions. So, it can help you understand and assess those interactions that are important to molecular recognition The idea behind PDBeXpress was to provide users with quick answers to common questions without them having to construct complex search queries, or learn how to use specialist interfaces. Service allows you to answer 3 questions… For any ligand in the PDB – what residues interact. In a similar way, for any ligand in the PDB, we can retrieve the enzymes with which this ligand interacts – 30,000 enzyme structures in the PDB Finally, can approach the problem from the opposite direction - What binds here? This tool enables you to identify ligands in the PDB that interact with a given (protein environment) set of residues.

What residues interact? RTL - Retinol PDB three-letter ligand code Ligand name e.g. what residues interact with RTL enter either the PDB three-letter ligand code or the name of the ligand below RTL – RETINOL the tool will retrieve the residues with which this ligand interacts, as observed in PDB entries.

What residues interact? RTL - Retinol The results are plotted on an interactive graph that show the frequency of interaction with each aa residue. Frequency is plotted as a % of total interactions that ligand makes. Clicking on a bar will give you details of the interactions (LEU) Link to PDB entries containing those interactions and a link to PDBeMotif for advanced analyses options – more on this later… The results can be further refined by filtering on a Pfam or CATH domain. Download statistical data; Print or export pdf, image

Which enzymes interact? MAN – Mannose PDB three-letter ligand code Ligand name For any ligand in the PDB, this tool will retrieve the enzymes with which this ligand interacts, as observed in PDB entries.

Which enzymes interact? MAN – Mannose PDB three-letter ligand code Ligand name The results are plotted on an interactive graph, which can be used to drill down to deeper levels of EC, or to view the PDB entries in which the interactions occur.

What binds here? Search for ligands that interact with a given set of residues Can specify a partial or exact binding environment Search for ligands that interact with a given set of residues (e.g. "HIS ASP SER") Might represent a sub-pocket, or conserved motif that you believe is important to binding. Simply specify the residues by clicking in the table.. Can specify a partial or exact binding environment

What binds here? and the results are plotted on an interactive graph showing the ligands that bind to the given environment and how frequently. This graph links to PDB entries containing the interactions, as well as advanced analysis options within PDBeMotif.

PDBeMotif: powerful and flexible searching 11/26/10 11/26/10 11/26/10 18/11/11 PDBeMotif: powerful and flexible searching PDBeXpress modules driven by PDBeMotif PDBeMotif allows to combine protein sequence, chemical structure and 3D data in a single search Express modules are build on top of Motif… which is a powerful and flexible search engine… allows you to combine protein sequence, chemical structure and 3D data in a single search In additional to analysing PDB files… upload your own file for analysis 19 19 19 19

PDBeMotif: powerful and flexible searching 11/26/10 11/26/10 11/26/10 18/11/11 PDBeMotif: powerful and flexible searching construct queries based on - ligands and their 3D environment secondary structure elements and small 3D motifs protein φ/ψ angle sequences - sequential representation of the protein geometry results can be analysed against UniProt, CATH, PFAM or EC Which ligands bind within a given environment and frequencies of interactions, compare the binding environment of two different ligands Common binding site features, or local structural similarities across different protein families, or otherwise unrelated structures. 20 20 20 20

How do sulphones and sulphonamides prefer to interact? In the PDB, 39% of the ligand sulfonyl groups are found to form a hydrogen bond with either a protein donor or a structural water molecule, while 74% are located in or close to van der Waals distance (3.3-3.9A ° ) to an aliphatic group. Notably, of the sulfonyl groups situated in a hydrophobic environment in the PDB, only 36% are found to interact simultaneously as a hydrogen bond acceptor but 79% of the hydrogen-bonded sulfonyl groups are found to interact simultaneously with a hydrophobic group. These findings clearly indicate a dual character of the weakly polar sulfonyl groups as a hydrogen bond acceptor and as a hydrophobic group. Closest interactions (distances in A ° ) formed by the sulfonyl oxygen atoms of a cathepsin S ligand within the active site (PDB code 2fra) Human Cathepsin S (PDB: 2fra) Electrostatic H-bonds VdW bonds M. Stahl, A Medicinal Chemist’s Guide to Molecular Interactions J. Med. Chem. 2010, 53, 5061–5084 5061

Ligands need careful validation CCDC analysis of ligand geometries (using Relibase+/Mogul/EDS) Around 20% of recently determined structures have geometric errors that could potentially cause a misleading interpretation of the binding interactions Wrong Unusual/Strained Correct There is a wealth of information about ligand interactions that can be exploited, particlaly in a design context.... However.... Does assume data is accurate and valid. An analysis put together by some of the CCDC colleagues has highlighted some problems in PDB ligand quality. This semi-manual analysis shows that over 3 time-periods, the fraction of good ligands in PDB is not rapidly increasing. Around 20% of recently determined structures have geometric errors that they could potentially cause a misleading interpretation of the binding interactions Wrong ligands have a serious error in density or geometry or close contacts. Dubious ones are possibly strained ligands. OK ligands have only minor errors in torsions or rings. ligands need more attention is that their refinement is tricky and error-prone. Reliable dictionaries are harder to come by, electron density in MX is often not enough to identify small-mols unambiguously, and any problem in ligand quality does not have big impact on overall quality indicator like R factor. Surely we at PDB need to think about validating ligand quality and implement processes to prevent bad ligands being deposited. Liebeschuetz, J.W., Hennemann, J. The good, the bad and the twisted: A survey of ligand geometry in protein crystal structures J. Comput. Aid. Mol. Des., 26, 169-183 (2012)

The solution… Mogul – a Knowledge-based library of molecular geometry derived from the Cambridge Structural Database (CSD) Enables rapidly validation of the complete geometry of a given query structure and identification of unusual features

MoU with CCDC wwPDB/CCDC Memorandum of Understanding wwPDB gets to use Mogul for validation of all current and future compounds in the PDB wwPDB gets to incorporate and redistribute CSD coordinates for all current and future ligand compounds in the PDB wwPDB gets to use Mogul and CSD coordinates to derive dictionaries for all current and future compounds in the PDB Mogul to be implemented as part of a new validation pipeline for PDB structures. This validation service to be made available via an independent server…

Prevention is the best cure Thanks to collaboration with CCDC We can add CSD coordinates for all existing small molecules in the PDB (and variants, e.g. D-amino acids) that also occur in the CSD We can use these coordinates and Mogul to derive refinement dictionaries Grade (Global Phasing; uses Mogul and RM1) Will improve quality and consistency of the archive We can provide reasonable starting coordinates and refinement dictionaries for all existing compounds in the PDB We can add CSD coordinates at annotation time for new small molecules that also occur in the CSD wwPDB gets to use Mogul and CSD coordinates to derive dictionaries for all current and future compounds in the PDB

Future of the PDB? At present PDB is a historic archive We have to accept and distribute everything “Archive” – i.e., what was described in the literature Essentially provider-centric We capture X-ray detector type but not ligand function… Organised by entry rather than molecule/complex/… Shifting user communities/demands We must serve the consumers of structural data (non-experts) Don’t think in terms of PDB entry codes Can’t tell a good from a bad model More non-expert users than experts Don’t think in terms of PDB entries Can’t tell a good from a bad model We have to understand what they know, what they want to find and what they want to do New ways to access structural information New ways to handle structural information Provide current-best-practice models Integration with other databases 27

PDBe Team February 2012

Funding 29 29 29

Thank you! Tutorials… Contact us… Follow us… www.pdbe.org http://www.ebi.ac.uk/pdbe/resources/educationTabContent/tutorials/PDBeChem.pdf   http://www.ebi.ac.uk/pdbe-apps/quips?story=XmasFactor&auxpage=XmasChemTut http://www.ebi.ac.uk/pdbe/docs/Tutorials/PDBeChem.html Contact us… www.pdbe.org pdbehelp@ebi.ac.uk Follow us… http://www.facebook.com/proteindatabank http://twitter.com/PDBeurope