EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.

Slides:



Advertisements
Similar presentations
Protein Structure.
Advertisements

EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
Web Resources for Bioinformatics Vadim Alexandrov and Mark Gerstein.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
An analysis of pdb-care (PDB CArbohydrate REsidue check): a program to support annotation of complex carbohydrate structures in PDB files by Thomas Lütteke.
The Protein Data Bank (PDB)
Protein databases Henrik Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
High Throughput Processing of the Structural Information of the Protein Data Bank Zoltán Szabadka, Vince Grolmusz Department of Computer Science Eötvös.
1 Computational Biology, Part 13 Retrieving and Displaying Macromolecular Structures Robert F. Murphy Copyright  1996, 1999, All rights reserved.
1 Computational Biology, Part 11 Retrieving and Displaying Macromolecular Structures Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Management and Distribution of Chemical Data in the Protein Data Bank John Westbrook, Dimitris Dimitropoulos, Jasmine Young, Peter Rose, Philip E. Bourne.
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
Protein Interfaces, Surfaces and Assemblies
Number of released entries Year. Growth of Molecular Complexity Number of Chains Year Number of Structures Containing that Number of Chains.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Bringing Structure to Biology: Small Molecules and the PDBe
Coordinate handling and exploitation An overview of coordinate functionality in CCP4 suite Coordinate functionality in REFMAC group of programs (A. Vaguine)
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Transmembrane proteins in the Protein Data Bank: identification and classification Gabor, E. Tusnady, Zsuzanna Dosztanyi and Istvan Simon Bioinformatics,
EMBL-EBI Adel Golovin MSDsite The project is funded by the European Commission as the TEMBLOR, contract-no. QLRI-CT under the RTD programme.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Databank in Europe (PDBe)‏ An Introduction.
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes MSD Protein.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
Data Integration and Management A PDB Perspective.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
EBI is an Outstation of the European Molecular Biology Laboratory. MSDchem and the chemistry of the wwPDB EMBO 22nd-26th September 2008 EMBL-EBI Hinxton.
Protein Data Bank: An Introduction Learning to Use the RCSB PDB Portal.
EBI is an Outstation of the European Molecular Biology Laboratory. Quaternary Structure.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
1 EMBL Outstation — The European Bioinformatics Institute Removing redundancy in SWISS-PROT and TrEMBL.
EBI is an Outstation of the European Molecular Biology Laboratory. Sanchayita Sen, Ph.D. PDB Depositions Validation & Structure Quality.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-PISA a web based service for understanding Protein Interfaces, Surfaces and Assemblies.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Worldwide Protein Data Bank Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable.
AutoDep 4.0 A data deposition and archival system Sameer Velankar.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Bioinformatics Project BB201 Metabolism A.Nasser
EMBL-EBI Chemistry & the PDB MSDchem Primary Developer: Dimitris Dimitropoulos.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
©CMBI 2008 Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe Search Services (PDBelite, PDBePro and BIObar) Sanchayita Sen, Ph.D. PDB Depositions.
Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Databank in Europe (PDBe)‏ An Introduction.
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
OncoTrack Bioinformatics Workshop Max Planck Institute for Molecular Genetics, Berlin Wednesday 6 th November 2013 TimeSubject 13:30-15:00 Introduction.
Cheminformatics and Metabolism Team The EBI Enzyme Portal.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
PDBe Protein Interfaces, Surfaces and Assemblies
Protein databases Henrik Nielsen
Take a REST from manual searching: PDBe, programmatically
Introduction to RCSB PDB Data, Tools and Resources
PDBemotif A web based integrated search service to understand ligand binding and secondary structure properties in macromolecular structures.
Getting the Most out of the PDBe
Number of released entries
Introduction to Bioinformatics
Protein structure prediction.
Introduction to Databases
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI

Established in 1996 at the European Bioinformatics Institute – autonomous structural database capability in Europe. One of the four sites around the world where structural data can be deposited. Stable and clean repository for macromolecular structure data. Services that allow users to access, search and retrieve structural data from a single web access point. The Protein Data Bank in Europe (PDBe) group

Depositor AutoDep4.0 “Raw” PDB file Automated + Manual Curation “Annotated” PDB file Depositor’s comments Structure release Data Processing at PDBe

Data Deposition at the PDBe using AutoDep4.0 Structure deposition and archival tool developed at the PDBe (EBI). Based on Java/XML technology. Available freely under license for academic and industry users. Easy to install and use for in-house archiving before deposition to the PDB via the PDBe interface.

The Curation Process Raw information obtained from the Depositor - a) atomic coordinates (proteins, nucleic acids, Ligands, solvents) b) source of the macromolecule c) number of protein chains present in the asymmetric unit d) experimental data (structure factor file) Three Phases of Curation – 1)Automated Curation 2)Manual Curation 3)Final Checks.

Automated Curation Consists of series of programs written in Fortran and Perl Annotators contribute ideas and programs in order to improve the curation process We work in a Unix command line interface This is the first Step : a big wrapper

The Wrapper Automatically generates: Chain ID for every HETATM and HOH (gets the chain ID of the closest polypeptide chain) Quaternary structure, according to PISA (REM300&350) Structure validation: Close contacts (REM500) and chirality checks Solvent molecules that lie farther than expected from the protein (REM525) HELIX, SHEET, SSBOND, CISPEP records Residue by residue Mapping against the Uniprot database Dohlc output

Contents of a Curated PDB file Sequence related information: 1)Sequences (SEQRES) – all macromolecules present during crystallization, including expression tags and residues missing from the coordinates due to disorder. 2) Sequence Database reference (DBREF) - provides mapping (FASTA alignment) between the sequence (SEQRES) against the Uniprot database.

Macromolecular Structure Database Checks made … Is the Uniprot accession number correct? The sequence similarity between the Uniprot sequence and the target sequence should be minimum ~95% Identification of N- and C-termini cross references with the Uniprot and addition of fragment information (if any) to the COMPND record. Merge the data from the Uniprot entry to COMPND (Molecule name), SOURCE (Scientific name of the organism) and KEYWDS Addition of EC number, if available

Curation procedures continued…. 1) If no Sequence database reference available: the sequence is self- referenced (i.e. the database reference will be the PDB entry itself). 2) Additional details regarding the sequence (gaps, cloning artifact, structural disorder is provided in REMARK 999 3) Disagreement between a Uniprot sequence and the sequence present in the PDB file (SEQADV): marked as a) Engineered Mutation, b) conflict or c) microheterogeneity. 4) Residues missing from the coordinates – listed in REMARK 465 5) Non-hydrogen atoms missing from the coordinates- listed in REMARK 470 6) Zero-occupancy residues - REMARK 475 7) Zero-occupancy atoms - REMARK 480 8) Related PDB entries (same Uniprot Accession numbers) are listed in REMARK 900 9) Backbone discrepancies

Ligand Curation Ligands interacting with a protein/DNA chain → substrate, product, inhibitor (drug molecule), metal ion, modified amino acid or nucleotide. MODRES token added for Modified amino acids and nucleotides which are part of the polymer (i.e. protein/DNA) chain. Specialized software (Do Het Link and Connect records) used to get the bond type, stereochemistry and IUPAC compliant name for each ligand in the structure. DOHLC is a graph based structure comparison algorithm – checks each ligand/HET with dictionary definition, renames residues and atoms. Generates REMARK 620(metal coordination), LINK and CONECT records. DOHLC failing – bad geometry, incomplete ligand or new HETGROUP If no match found for a HETGROUP – new ligand created HETGROUP with missing atoms - REMARK 610 HETGROUP with zero-occupancy atoms – REMARK 615

Generating Assembly Information ASU Contents Expand Crystal Symmetry Analyze surface and contacts Best !! Possible Assemblies Loss of accessible surface area >10% of total surface. True complexes also look good ! Biological unit – Biologically relevant form of the molecule Quaternary structures – the way protein chains tend to associate with one another The matrices forming the quaternary structure are reported as BIOMT records in REMARK 350

1E94PISA assembly PISA assemblies

Structure validation

Macromolecular Structure Database Final Checks: Programs check for PDB format accuracy and internal consistency Manual check by another Annotator Automatic generation of the letter to depositor + Manual addition of special comments