Presentation is loading. Please wait.

Presentation is loading. Please wait.

EBI is an Outstation of the European Molecular Biology Laboratory. Protein Databank in Europe (PDBe)‏ An Introduction.

Similar presentations


Presentation on theme: "EBI is an Outstation of the European Molecular Biology Laboratory. Protein Databank in Europe (PDBe)‏ An Introduction."— Presentation transcript:

1 EBI is an Outstation of the European Molecular Biology Laboratory. Protein Databank in Europe (PDBe)‏ An Introduction

2 Protein Databank in Europe18.02.092 Introduction Based at the European Bioinformatics Institute (EBI), an outstation of the European Molecular Biology Laboratory (EMBL) at Hinxton, UK Started in 1996 with the goal of providing an autonomous structural database capability in Europe The aims of the group are to provide: a deposition site via which macromolecular structures can be added to the PDB (AutoDep) or EM (EMDep). a stable and clean repository of macromolecular structure data services that allow users to access, search and retrieve structural data

3 Protein Databank in Europe18.02.093 Protein Databank in Europe (PDBe) group Is one of the four sites around the world that where 3D structures may be deposited. Provides stable and clean repository of macromolecular structure data. Has services that allow users to access, search and retrieve structural data from a single web access point.

4 Protein Databank in Europe18.02.094 worldwide Protein Data Bank (wwPDB)‏ Consists of four sites RCSB (USA), PDB-j (Japan) BMRB (USA) and PDBe. PDB is the single repository of all publicly available macromolecular structures. The PDB started in 1971 and now has around 54,000 entries and new entries are added weekly. Structures are deposited by scientists and contents are freely available. The format of the archive is flat-files with fixed line format, although an improved flat-file format (mmCIF) is available.

5 EBI is an Outstation of the European Molecular Biology Laboratory. PDBe Tasks Deposition site Data clean-up Database design and implementation Retrieve data

6 Protein Databank in Europe18.02.096 Structure Determination NMR: High Field Spectrometer cryo-EM: Electron microscope X-ray crystallography: synchrotron

7 Protein Databank in Europe18.02.097 Full deposition site from June 1999 18% of all submissions via the EBI. Closely collaborate with the other wwPDB members for a single unified archive.. Depositions started June 2002 Depositions and Curation

8 AutoDep 4.0 A structure deposition and archiving system. Based on Java/XML technology. Available free under license for academic and industry use. Easy to install and use for in-house archiving before deposition to the PDB via the PDBe interface. http://www.ebi.ac.uk/msd-srv/autodep4

9 Protein Databank in Europe18.02.099 Disadvantages of Flat files… Macromolecular structures are very complex. Existing PDB format is incapable of fully describing even existing structures. Format is not readily extensible, to cope, for example, with structural genomics data. Historical archive is non-uniform and poorly populated. Search and retrieval of flat files is difficult and/or inaccurate.

10 ATOM 2567 N PHE B 175 7.821 -25.530 -22.848 1.00 8.71 ATOM 2568 CA PHE B 175 8.845 -25.172 -21.877 1.00 9.41 ATOM 2569 C PHE B 175 9.449 -23.798 -22.169 1.00 10.02 ATOM 2570 O PHE B 175 10.664 -23.613 -22.103 1.00 10.37 ATOM 2571 CB PHE B 175 9.928 -26.251 -21.848 1.00 9.53 ATOM 2572 CG PHE B 175 10.969 -26.137 -22.982 1.00 10.03 ATOM 2573 CD1 PHE B 175 12.356 -25.819 -22.988 1.00 10.51 ATOM 2574 CD2 PHE B 175 11.725 -27.211 -23.402 1.00 10.25 ATOM 2575 CE1 PHE B 175 11.821 -27.095 -22.869 1.00 11.17 ATOM 2576 CE2 PHE B 175 12.282 -26.086 -24.008 1.00 10.95 ATOM 2577 CZ PHE B 175 10.953 -26.335 -23.622 1.00 11.38 PHENYLALANINE All looks normal ?

11 PHENYLALANINE Not Quite an Outlier!! All looks normal ?

12 PDBe Curation Authentication of source That the protein is from human and not rabbit, for example ! Authentication of structure Comparison of structure against raw data. Geometry and Stereochemistry. Provide results back to depositor. Validation of correct methodology used Whether X-Ray, NMR or EM. Conformity to standards Follows PDB format specifications Error checks Consistency checks - to identify simple typos Homo sapiens and not Homo sapien (single human?). Outlier detection - to identify suspect records

13 Adopt standards Use NCBI taxonomy database to ensure correct organism names Use Uniprot database to ensure correct protein description Enzyme database Annotated ligand information

14 What happens when these checks fail?  Raise issue with the depositor But the depositor might:  be unavailable  not interested  not know the answer anyway  not be sure about which data have the problem The older the entry, the less likely the depositor can/will help 18.02.0914Protein Databank in Europe

15 What is the solution? Don’t rush and define another format Represent the structure data in a meaningful way (use data model)‏

16 The benefits of a database Historically, data have been curated as flat-files, with few, if any, checks on the consistency of the archive There are many problems with the legacy files: some can be corrected or at least detected automatically during database loading; many must be manually corrected prior to loading Once loaded, the entire archive can be subjected to various all-against-all comparisons that further enforce uniformity across entries $COLI COLI E. COLI ESCHERCHIA COLI ESCHERICHI $COLI ESCHERICHIA $ COLI ESCHERICHIA COLI ESCHERICHIA COLI. EXCHERICHIA COLI EXPRESCHERICHIA COLI Spelling errors abound, e.g. 23 versions of this humble bug: ESCHERICHIA COLI

17 PDBe maintains a curated database of HET compounds, against which legacy data will be compared Ligands are often named inconsistently or even entirely incorrectly, e.g.  -D-mannose (MAN) vs  -D-mannose (BMA)‏ Errors are detected using a graph-based structure comparison algorithm Benefits - ligand nomenclature Beta Alpha

18 Protein Databank in Europe18.02.09 PDBe Relational Database PDBe Structure Deposition site Database support & development Services wwPDB: PDB@RCSB MSD@EBI PDBj Annotation W W W Reference, search and retrieval W W W Rationalization of data

19 Protein Databank in Europe18.02.09Protein Databank in Europe07.10.201519 Database organization External Processes PDBe Search database PDB files derived data Loading (validation against mmCIF dictionary SQL Query Entire Archive loaded in 24 hours

20 Protein Databank in Europe18.02.09Protein Databank in Europe07.10.201520 The PDBe database is organized on the mmCIF data model and mirrors the cotnent/hierarchy and organization of the mmCIF format. can be loaded from scratch for all the current 54000 entries in the PDB in less than 24 hours. is enriched by addition of other external “derived data”. Derived data include cross-referencing against CATH, SCOP, UniProt, GO etc. Also in this process are Characterization of ligand-binding sites. Derivation of secondary structure information Derivation of quaternary structure using PISA.

21 Protein Databank in Europe18.02.0921 Some Implementation Issues  The PDBe database is large and complex:  50,000+ PDB entries  40+ tables in the warehouse, many very large  Cross-referenced against SwissProt, PubMed etc.  Need to expose as much of the data as possible, without making the interface too complex.  Tools for different categories of end-user  "Novice" user  Experienced user  Expert user

22 Protein Databank in Europe18.02.0922 PISA biological assemblies PDBeChem ligand data Electron Density Visualisation AstexViewer PDBePro, PDBelite Fold matching PDBeMotif Linking to Domain data, eFamily Sequence Mapping, SIFTS

23 Protein Databank in Europe18.02.0923 Quaternary Structure PISA provides an automated method for the determination of putative protein complexes, derived from PDB entries Crystal symmetry matrices are applied to a protein structure, and possible complexes are detected by consideration of buried surface areas PISA assignments form the basis for the assemblies for a given PDB entry in the PDBe

24 Protein Databank in Europe18.02.0924 PISA Complex divining ! Best !! Expand Crystal Symmetry Possible Assemblies Loss of accessible surface area >10% of total surface. True complexes also look good ! Analyze surface and contacts ASU Contents

25 Protein Databank in Europe18.02.0925 PISA assemblies 1E94PISA assembly

26 PDBe Searches Biobar – Mozilla/Netscape toolbar application for searching the MSD PDBelite – web form application for searching the MSD PDBepro – applet for searching the MSD PDBechem – complete collection of all the chemical species and small molecules in the PDB EMsearch – search tool for electron microscopy depositions PDBefold – Secondary Structure Matching (SSM) tool for protein structure comparison PDBesite – active site database search PDBemotif – 3D structural motif

27 Query capabilities in PDBe  Browsing (click and read)‏  Simple search  select records with some constraints (Biobar)‏  More elaborate search  select specific fields of some records with constraints on some fields (PDBelite)‏  Complex querying  ability to return an answer that results from a "live" computation, and was not part of any record of the database (PDBepro)‏

28 Protein Databank in Europe18.02.0928 PDBe provides… Clean biological data Integrated data A single web access point Query interfaces for different users (Beginner, Occasional or expert). Interconnected views of the data relating structure, sequence, text & experimental details.

29 A database for all Search database


Download ppt "EBI is an Outstation of the European Molecular Biology Laboratory. Protein Databank in Europe (PDBe)‏ An Introduction."

Similar presentations


Ads by Google