Worldwide Protein Data Bank www.wwpdb.org September 7, 2007.

Slides:



Advertisements
Similar presentations
ICP 7-th Regional Coordinators Meeting World Bank, Washington D.C.
Advertisements

COUNTER: improving usage statistics Peter Shepherd Director COUNTER December 2006.
2004 NERC, NPCC & New England Compliance Programs John Norden Manager, Operations Training, Documentation & Compliance August 31, 2003 RC Meeting.
Vilnius, 29 June 2007 CIOMS Recommendations on Ethics in Medical Research The Eighth Global Forum on Bioethics in Research Gottfried Kreutz Dr. med., Dipl.-Chem.;
A Spanish Technology Platform for Sustainable Chemistry SusChem-ES: An example of a National Technology Platform for Sustainable Chemistry. Mª Eugenia.
COUNTER Update Peter Shepherd Project Director COUNTER STM Innovations Seminar, 2 December 2005.
Status on the Mapping of Metadata Standards
PAINLESS PERIODIC REVIEW Cynthia Steinhoff Anne Arundel Community College Arnold, Maryland.
NATIONAL LIBRARY OF MEDICINE PubMed Central Edwin Sequeira National Library of Medicine May 26, 2004.
Notification of Digital requirements for the Draft Plan – Damascus, August Regional Information Meeting and Workshop related to the RRC-06.
Making the System Operational
CCPN project modeling framework University of Cambridge European Bioinformatics Institute MSD group.
MSCG Training for Project Officers and Consultants: Project Officer and Consultant Roles in Supporting Successful Onsite Technical Assistance Visits.
Configuration management
Management Plans: A Roadmap to Successful Implementation
Page 1 October 31, 2000 An Introduction to Large-Scale Software Development Steve Varnau Core HP-UX Operation October 31, 2000.
Heppenheim Producer-Archive Interface Specification Status of standardisation project Main characteristics, major changes, items pending.
PD Plan Agenda August 26, 2008 PBTE Indicators Track
Node Lessons Learned James Hudson Wisconsin Department of Natural Resources.
USG INFORMATION SECURITY PROGRAM AUDIT: ACHIEVING SUCCESSFUL AUDIT OUTCOMES Cara King Senior IT Auditor, OIAC.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
The COUNTER Code of Practice for Books and Reference Works Peter Shepherd Project Director COUNTER UKSG E-Books Seminar, 9 November 2005.
1.
Dictionaries and Ontologies in Structural Biology.
Update on PDB Data Deposition Specifications
AUDITS Process and Corrective Actions OIG RolesGAO ROLES – OIG –OIG Lead Auditor –OCFO – owner of MATS and Agency Audit Process –OEI AA – Designated OEI.
Archives and Information Retrieval
Management and Distribution of Chemical Data in the Protein Data Bank John Westbrook, Dimitris Dimitropoulos, Jasmine Young, Peter Rose, Philip E. Bourne.
6 th Annual Focus Users’ Conference Texas Reporting Presented by: Bethany Heslam.
Slide: 1 27 th CEOS Plenary |Montréal | November 2013 Agenda Item: 15 Chu ISHIDA(JAXA) on behalf of Rick Lawford, GEO Water CoP leader GEO Water.
1 EEC Board Policy and Research Committee October 2, 2013 State Advisory Council (SAC) Sustainability for Early Childhood Systems Building.
Information and Communication Technologies in the field of general education in Armenia NATIONAL CENTER OF EDUCATIONAL TECHNOLOGIES.
Worldwide Protein Data Bank Worldwide Protein Data Bank Agenda  Welcome and Introductions  Overview of recent wwPDB progress.
1 Public Outreach October 2008 By Adelina Murtezaj – Public Relation Officer For Inaugural Partnership Activity between ICC and ERO.
Number of released entries Year. Growth of Molecular Complexity Number of Chains Year Number of Structures Containing that Number of Chains.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Atlanta Public Schools Project Management Framework Proposed to the Atlanta Board of Education to Complete AdvancED/SACS “Required Actions” January 24,
Evaluation of Structure Quality Using RCSB PDB Tools Kyle Burkhardt, Lead Data Annotator The RCSB PDB at Rutgers University.
23 rd August 2005CCP4-RCSB Workshop IUCr 2005 Florence Italy 1 N6: A Protein Crystallographic Toolbox: The CCP4 Software Suite and RCSB PDB Deposition.
“Reaching across Arizona to provide comprehensive quality health care for those in need” Our first care is your health care Arizona Health Care Cost Containment.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Databank in Europe (PDBe)‏ An Introduction.
Encounter Data Validation: Review and Project Update August 25, 2015 Presenters: Amy Kearney, BA Director, Research and Analysis Team Thomas Miller, MA.
Second Annual Japan CDISC Group (JCG) Meeting 28 January 2004 Julie Evans Director, Technical Services.
1 What’s Next for Financial Management Line of Business (FMLoB)? AGA/GWSCPA 6 th Annual Conference Dianne Copeland, Director, FSIO May 8, 2007.
Comprehensive Educator Effectiveness: New Guidance and Models Presentation for the Special Education Advisory Committee Virginia Department of Education.
Comprehensive Educator Effectiveness: New Guidance and Models Presentation for the Virginia Association of School Superintendents Annual Conference Patty.
Worldwide Protein Data Bank Worldwide Protein Data Bank History of the PDB  1970s  Community discussions about how to establish.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Data Integration and Management A PDB Perspective.
Structure database: PDB Tuomas Hätinen. Protein Data Bank A repository for 3-D biological macromolecular structure. It includes proteins, nucleic acids.
Protein Data Bank: An Introduction Learning to Use the RCSB PDB Portal.
NOAA Cooperative Institutes John Cortinas, Ph.D. OAR Cooperative Institute Program, Program Manager NOAA Cooperative Institute Committee, Chairperson.
SWIS Digital Inspections Project Chris Allen, Information Management Branch California Integrated Waste Management Board August 22, 2008.
Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe
Performance Management A briefing for new managers.
Real World Experiences in Operating a Collaboratory: The Protein Data Bank Helen M. Berman Board of Governors Professor of Chemistry.
National Enrolment Service (NES) Overview October 2015 – June 2016.
Worldwide Protein Data Bank wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.
Worldwide Protein Data Bank Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable.
Worldwide Protein Data Bank wwPDB Common D&A Project Full Project Team Meeting Rutgers March 16-19, 2010.
Managing Challenging Projects Presented to the class of: Dr. Jane Mackay M.J. Neely School of Business.
Data Coordinators Conference – 2014 Laura Marroquin CASEWORKER/JCMS Specialist Everything New Data Coordinators Should Know.
Welcome. Contents: 1.Organization’s Policies & Procedure 2.Internal Controls 3.Manager’s Financial Role 4.Procurement Process 5.Monthly Financial Report.
Butte Falls Charter School Open House & Annual Title I Meeting
The Protein Data Bank: Evolution of a key resource in biology
VERMONT INFORMATION TECHNOLOGY LEADERS
SPR&I Regional Training
ACTION PLAN Texas Association for Bilingual Education
The site to download BALBES:
TEXAS DSHS HIV Care services group
Presentation transcript:

Worldwide Protein Data Bank September 7, 2007

Worldwide Protein Data Bank Agenda Welcome and introductions Accomplishments Remediation rollout summary Toward the future Break Matters arising –Incorrect structures Executive session Feedback to wwPDB Set next meeting date

Worldwide Protein Data Bank wwPDB Achievements October September 2007 Continued growth of archive Website updates Publications and presentations Time-stamped archive Remediation rollout Annotation document One stop shop: NMR, cryoEM

Worldwide Protein Data Bank Depositions since wwPDB establishment

Worldwide Protein Data Bank PDB entry processing ,997 entries in PDB Today 10-Jul ,578 entries in PDB Size now is 4 times larger than when the 3 sites started In 1999, 2361 entries were deposited In 2006, 7282 entries were deposited We handle more than 3 times as many entries per year with less staff – and all wwPDB sites produce high quality annotated PDB entries No current backlog of unprocessed entries

Worldwide Protein Data Bank Time-stamped copies of the archive 57 Gbytes of data for 2006, released January 2, Gbytes of data for July 2007 snapshot Both include –PDB format entries –mmCIF format entries –PDBML format entries –Experimental data –Dictionary, schema, and format documentation

Worldwide Protein Data Bank Outreach wwPDB website Discussion forums NMR Task Force Publications Professional society meetings

Worldwide Protein Data Bank

Worldwide Protein Data Bank Joint publications Nucleic Acids Research, 35: D301 (2007) –The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data Nature Structure Molecular Biology, 14:354 (2007) –Reply to: Building meaningful models of glycoproteins Nature Biotechnology, 25: 854 (2007) –Response to Overhauling the PDB Methods in Molecular Biology, in press –Data deposition and annotation at the wwPDB Structural Bioinformatics 2nd Edition, in press –The wwPDB

Worldwide Protein Data Bank Interactions since October 2006 Exchange visits –MSD/RCSB PDB (4) –PDBj/RCSB PDB (1) –PDBj/BMRB (2) –BMRB/RCSB PDB (1) Phone conference with site directors-twice a year VTCs among staff –BMRB/RCSB PDB twice a month (ADIT-NMR) –MSD/RCSB PDB twice a week (annotation procedures, remediation) –RCSB PDB/PDBj and BMRB/PDBj on necessary occasions among staff –MSD/RCSB PDB ~2 per day –PDBj/RCSB PDB ~2 per day

Worldwide Protein Data Bank New initiatives One stop shop for NMR data and models One stop shop for electron microscopy maps and models (NIH-funded)

Worldwide Protein Data Bank Recommendations from 2006 wwPDBAC report Implement the recommendations from November modeling workshop (Berman et al. Structure 14, ) –Models phased out October 16, 2006 Rollout remediated data to superusers by December 31, 2006; to all users by July 1 st 2007; Provide access to PDB formatted files following the most current format. –Superusers had access to data November 2006, all users in April 2007

Worldwide Protein Data Bank Recommendations from 2006 wwPDBAC report Work with SAXS community to create appropriate representation of these data, and circulate progress reports to the Committee as appropriate –Not done Expand the four character PDB ID codes before the number of depositions reaches 400,000 –Number of available PDB ID codes has been increased by allowing IDs to start with a character Develop and present a formal recommendation to the wwPDBAC regarding the purview of the PDB at our September 2007 meeting in Princeton, NJ –In process

Worldwide Protein Data Bank Recommendations from 2006 wwPDBAC report Coordinate with the wwPDBAC to obtain formal letters of support when seeking funding; establish a coordinated plan to both educate and lobby funding agency representatives; establish a charitable organization to serve as a conduit for receipt of both grant funding and gifts from pharmaceutical and biotechnology companies, involving individual Committee members as needed. –Funding Representatives Round Table Discussion

Worldwide Protein Data Bank Remediation

Worldwide Protein Data Bank Key drivers Chemistry and nomenclature Sequence and taxonomy Citations Viruses

Worldwide Protein Data Bank IUPAC, NMR, and the PDB Atom nomenclature and NMR restraints John L. Markley

Worldwide Protein Data Bank History of the NMR-led requested remediation of hydrogen atom nomenclature When BMRB was established in the late 1980s, it adopted the IUPAC atom nomenclature recommendations from Biochemistry 9, , 1970 At that time, we noted that NMR structures being deposited in the PDB did not adhere to these recommendations (particularly for H-atoms; e.g. HB1/HB2 instead of HB2/HB3), and I brought this to the attention of the director of the PDB at Brookhaven with the request that it be remedied A group of NMR spectroscopists led by Kurt Wüthrich worked with the NMR community to develop recommendations for the deposition of NMR structures; all agreed that the prior IUPAC recommendations be maintained (Pure & Appl. Chem., 70, , 1998) Over the years, wwPDB Task Force on NMR has pushed strongly for remediation of atom nomenclature

Worldwide Protein Data Bank Accomplished: atom nomenclature remediation Nomenclature in PDB now matches that in BMRB The single format will avoid confusion and errors All discrepancies have been resolved in the remediated files, with the minor exception of atoms at the C-terminus IUPAC-IUBMB-IUPAB wwPDB H'' HXT O' O O'' OXT –Since these atoms are not observed by NMR spectroscopists, we do not consider this to be a problem –We plan to write an addendum to the IUPAC-IUBMB-IUPAB Recommendations for submission to Pure & Appl. Chem. to formalize these as accepted atom designators

Worldwide Protein Data Bank Remediation of NMR structure files Required the linking of structure files and restraint files Atom names, residue numbers and chain identifiers needed to be updated Remediation of restraint files required the unpacking, parsing, and regularization of legacy information contained in PDB MR files into the NMR Restraints Grid

Worldwide Protein Data Bank NMR Restraints Grid development BMRB, University of Wisconsin-Madison, USA MSD, European Bioinformatics Institute, Hinxton, UK Department of Computer Sciences/Condor Project, University of Wisconsin, USA Department of NMR Spectroscopy, Utrecht University, The Netherlands Centre for Molecular and Biomolecular Informatics, Radboud University, The Netherlands

Worldwide Protein Data Bank NMR Restraints Grid development PDB MR files are converted into NMR-STAR NMR-STAR file and the corresponding PDB coordinate file are parsed; the information is connected inside the CCPN framework; and the results are written out as NMR-STAR files; converted restraint files are filtered to remove redundant restraints Files made available in the NMR Restraints Grid with access from links in each corresponding PDB entry NMR restraint data files with atom nomenclature corresponding to remediated PDB data files will be available by the end of 2007

Worldwide Protein Data Bank Current state of the NMR Restraints Grid Grid contains 3583 entries with a total of 3,882,595 parsed restraints 3583 entries out of 6508 in PDB have restraints Database is updated continuously as new PDB entries are released that have associated NMR restraints

Worldwide Protein Data Bank Recent agenda items considered by the wwPDB NMR Task Force Strongly recommend that restraints be mandatory for all NMR depositions to the PDB Commissioned the development of procedures for representing uncertainty in NMR structures and for specifying the single model meant to be most representative of the structure Task Force should write an article for J. Biomol. NMR on its recommendations for data representation and submission of experimental data It was suggested that the Task Force begin to discuss validation issues

Worldwide Protein Data Bank Most X-ray structures are supported by structure factors

Worldwide Protein Data Bank Less than half of NMR structures are supported by restraint data

Worldwide Protein Data Bank Most structural genomics centers regularly provide restraints, but the overall average is low Number of NMR structures deposited Percent of deposited structures with restraints Structural genomics center

Worldwide Protein Data Bank Remediation rollout Helen M. Berman

Worldwide Protein Data Bank Remediation: scope and statistics All primary citations verified (45K) Sequences & taxonomy updated for 61K sequences Ligand stereochemistry and nomenclature for 13M monomers and 170K non-polymer molecules Symmetry and coordinate transformations for 280 virus entries diffraction source & beamline updates ~1000 miscellaneous uniformity issues

Worldwide Protein Data Bank Remediation process Corrections contributed and reviewed by all wwPDB members Corrections on the archival mmCIF data files tracked in a version tracking system (CVS) New PDBx/mmCIF, PDBML-XML, and PDB format data files produced Validated by each wwPDB group Staged public testing began January 2007 Iterative corrections based on external comments made through July 2007 Remediated archive released August 1, 2007

Worldwide Protein Data Bank Remediation-supporting infrastructure Internal (wwPDB) CVS archive remediation data files Internal (wwPDB) rsync distribution site for remediated data files Early tests of web, rsync, & ftp distribution sites for dictionaries, PDB, mmCIF, and XML data files Complete wwPDB ftp site for remediated data and dictionaries updated with remediation corrections and weekly PDB updates 200K CVS remediated data file updates 1M+ remediated file updates to support testing and distribute from January present

Worldwide Protein Data Bank Checking the remediated files Haruki Nakamura

Worldwide Protein Data Bank Different checks References to external databases Data processing consistency checks PDBML/XML validation Database loads User-contributed diagnostics

Worldwide Protein Data Bank References to external databases Sequence and taxonomy (UniProt) Primary Citations (PubMed)

Worldwide Protein Data Bank Data processing consistency checks Covalent geometry and stereochemistry Compliance with wwPDB Chemical Component Dictionary –Molecular and stereochemical assignment –Atom and residue nomenclature Compliance with PDB Exchange Dictionary –Data types, controlled vocabularies, parent-child relations External tools such as WhatIF

Worldwide Protein Data Bank PDBML/XML schema validation Version control Data type consistency Data ranges Controlled vocabularies Referential integrity XPath traversal of PDBML data hierarchy

Worldwide Protein Data Bank Database loads Diagnostics obtained from loading remediated data into existing database systems –Relational databases used by MSD-EBI and RCSB PDB –XML database used by PDBj

Worldwide Protein Data Bank User-contributed diagnostics Batch checking of remediated files by Phenix revealed consistency issues with alternate conformations - Ralf Grosse-Kunstleve Batch checking for inconsistent linkages and missing residues by docking software - Tommy Carstensen Nomenclature - Tom Goddard & Chimera Group Sequence and assembly diagnostics - Roland Dunbrack Relational data integrity diagnostics - Dan Bosler Nomenclature and experimental details - Clemens Vonrhein Many specific issues related to chemical assignments, disorder, and nomenclature

Worldwide Protein Data Bank Looking toward the future Kim Henrick

Worldwide Protein Data Bank Annotation project Standardize annotation rules and policies among wwPDB sites Document annotation rules and policies Create venue to update annotation rules and policies as necessary

Worldwide Protein Data Bank Annotation project How did we get there? Review and discussion of each PDB field by and VTC Document written and reviewed by all staff Final review by site directors Software compliant to new annotation procedures implemented Tested software and trained annotators Published document on web (January 2007)

Worldwide Protein Data Bank Annotation document Specification of ALL fields in PDB file Clarification of policies –Assignment of PDB IDs –Release of files and information –Changes to entries Clarification of data representation –Chain ID for all atoms in the file –Multi-model representation for alternate conformation or disorder –Chimeras –Microheterogenity

Worldwide Protein Data Bank PDB IDs and DOIs Credit for a PDB entry in CVs Used as a reference in publications – db4hhb/pdb See also DOIs for Biological Databases Philip E. Bourne, CrossRef 7th Annual Meeting, 1 November 2006 Cambridge, MA

Worldwide Protein Data Bank Outstanding issues Microheterogeniety Disorder Large structures

Worldwide Protein Data Bank wwPDB and software developers ACA 24 th July 2007 meeting in Salt Lake City Future Challenges for the PDB: What should the PDB be doing in 2015? Attended by software developers and wwPDB staff

Worldwide Protein Data Bank July 24 meeting Technical discussions TLS Multiple models Large structure demand for one file per structure Microheterogeneity Twinning George Sheldrick, Paul Adams and Garib Murshudov produce a draft of the PDB format to describe twinning and to represent the data in HKLF Procedural outcomes Yearly developer meeting Editorial board to assist in difficult annotation problems Ongoing electronic forum

Worldwide Protein Data Bank Toward a single processing tool This weekend – wwPDB retreat with contributors from RCSB PDB Rutgers and UCSD, BMRB, PDBj, and EBI-EMBL Task – come to agreement to pool resources to produce a single deposition tool and design of new processing pipeline