Afternoon session: The archival problem and infrastructure for solutions Prof John R Helliwell Interactive Publications.

Slides:



Advertisements
Similar presentations
Usage statistics in context - panel discussion on understanding usage, measuring success Peter Shepherd Project Director COUNTER AAP/PSP 9 February 2005.
Advertisements

Visualisation of chemical data Brian McMahon Research & Development Officer International Union of Crystallography 5 Abbey Square Chester CH1 2HU
Comb-e-Chem Jeremy Frey Sept 2003 From e-Science to Jeremy Frey School of Chemistry University of Southampton, UK X-ray single Mol STM.
S.J. Coles a*, M.B. Hursthouse a, R.A. Stephenson a, P. Cliff b, E. Lyon b, M. Patel b J. Downing c & P. Murray-Rust.
© S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.
Opening the Research Data Lifecycle Workshop Capturing and Sharing Research Data Simon Coles School of Chemistry, University of Southampton, U.K.
Crystallographic Metadata Simon Coles CrystalGrid Collaboratory Foundation Meeting September 2004.
A distributed architecture for crystallography data, metadata, and applications John C. Bollinger Indiana University Molecular Structure Center, Bloomington,
Data and metadata in the Reciprocal Net John C. Bollinger Indiana University Molecular Structure Center, Bloomington, IN.
S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge b. a School of Chemistry, University of Southampton, UK.; b School of Electronics.
RCUK, Octiber Archiving research data and research publications. Dr Leslie Carr, Intelligence, Agents Multimedia, University of Southampton Dr Simon.
Information Management and Publication in Crystallography I2S2 Workshop Future of Data Management Systems in the Structural Sciences, RAL, Oxon, 1 April.
Data Curation in Crystallography: Publisher Perspectives JISC Data Cluster Consultation Workshop CCLRC, Didcot, Oxon 10 October 2006.
Publisher perspective eBank/R4L/SPECTRa Joint Consultation Workshop London Metropole Hotel 20 October 2006.
Protein Structure.
Accessing the data: going beyond what the author wanted to tell you Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU,
Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
A vision involving raw data archiving via local archives as a supplement to the existing processed data archives (PDB, CSD, ICDD etc) John R. Helliwell,
Changing methods of data sharing in crystallography Professor John R Helliwell Imperial College, June 28th, 2006 The University of Manchester
Data activities of the International Union of Crystallography Brian McMahon IUCr 5 Abbey Square Chester CH1 2HU
1.
Update on PDB Data Deposition Specifications
University of Southampton, U.K.
Click to edit Master subtitle style JISC XYZ Project Principal Investigator: Peter Murray-Rust Project Team: Nick England, Brian Brooks Unilever Centre,
Disseminating crystallography results the Indiana way John C. Huffman and John C. Bollinger Indiana University Molecular Structure Center, Bloomington,
Crystallographic Data Publication at Source International Union of Crystallography Peter R. Strickland and Brian McMahon IUCr 5 Abbey Square Chester CH1.
William Y. Arms Corporation for National Research Initiatives March 22, 1999 Object models, overlay journals, and virtual collections.
Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
23 rd August 2005CCP4 Workshop IUCr 2005 Florence Italy 1 N6: A Protein Crystallographic Toolbox: The CCP4 Software Suite and PDB Deposition Tools IUCr.
Agenda: DMWG SM policy status ESIP meeting recap Reminder - DM Webinar Series New and updated web pages on DM website Metadata Training Sessions CDI meeting.
X-ray crystallography NMR cryoEM Experimental approaches for structural biology.
The importance of DART for funding agencies Dr. Ingrid Kissling-Näf.
Improved Reporting of Crystal Structures: the Impact of Publishing Policy on Data Quality Brian McMahon 1, Peter R. Strickland 1 and John R. Helliwell.
Information Sources in Crystallography Your Logo Here Gregory K. Youngen Physics/Astronomy Librarian University of Illinois at Urbana-Champaign Gregory.
The TARDIS Framework A Federated Repository Solution For Raw Diffraction Datasets Steve Androulakis, Monash University, Melbourne Australia I2S2 Workshop.
Dataset Citation: From Pilot to Production Mark Martin Assistant Director, Office of Scientific and Technical Information U.S. Department of Energy.
ACCESS for VALIDITY ACCESS for INNOVATION. Starting January 2011 for NEW proposals Not voluntary – “integral part” of proposal and FastLane Required for.
Authors Project Database Handler The project database handler dbCCP4i is a small server program that handles interactions between the job database and.
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
High-Throughput Crystallography at Monash Noel Faux Dept of Biochemistry and Molecular Biology Monash University.
The DART Project: building the new collaborative e- research infrastructure Presentation to 2006 AusWeb Conference.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Crystallographic Databases I590 Spring 2005 Based in part on slides from John C. Huffman.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
ISGO: The International Structural Genomics Organization Goals of ISGO Develop standards and policies for structural genomics Sponsor international meetings.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
Now launched! Visit nature.com/scientificdata Honorary Academic Editor Susanna-Assunta Sansone Advisory.
Data Integration and Management A PDB Perspective.
Structure database: PDB Tuomas Hätinen. Protein Data Bank A repository for 3-D biological macromolecular structure. It includes proteins, nucleic acids.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Metadata for structural science Workshop on research metadata in context Nijmegen, 7–8 September 2010 Simon Lambert STFC e-Science UK.
Real World Experiences in Operating a Collaboratory: The Protein Data Bank Helen M. Berman Board of Governors Professor of Chemistry.
Hjw 1 IUCr working group: Diffraction Data Deposition  Members Steve Androulakis (TARDIS representative) John R. Helliwell (IUCr, Chairman of IUCr Journals.
NLBIF The Netherlands Biodiversity Information Facility NLBIF The Netherlands Biodiversity Information Facility Cees Hof Netherlands Biodiversity Information.
Towards a Structural Biology Work Bench Chris Morris, STFC.
DOE Data Management Plan Requirements
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Economics and Impact of the Protein Data Bank (PDB) Archive
PDBe Protein Interfaces, Surfaces and Assemblies
Ian Bruno, Suzanna Ward The Cambridge Crystallographic Data Centre
Building A Repository for Digital Objects
Organic Chemistry Lesson 21 X-ray crystallography.
Jmol Training Session Part I: Introduction to the Protein Data Bank
CCP4-PDB Workshop ACA 2004 Chicago
The site to download BALBES:
Breaking out of 2D: Interactive PDFs
Presentation transcript:

Afternoon session: The archival problem and infrastructure for solutions Prof John R Helliwell Interactive Publications and the Record of Science ICSTI Winter Workshop Paris, Monday, February 8, 2010

JRH research, publications background Professor of Structural Chemistry DSc Physics Approx 200 research papers; 5 books (2 as monographs) Editor-in-Chief of journals published by IUCr (Acta Crystallographica, Journal of Applied Crystallography, Journal of Synchrotron Radiation) IUCr Representative to ICSTI

What needs to be in place for interactive content to be available in the future? Emulation of legacy software environments? How to package, identify and interlink the independent components of a complex article? Can we handle distributed articles? Can we identify and retrieve slices through large archived data sets? How to work with changing data sets? What is worth keeping anyway?

The importance of data for publication Interactive figures depend on data Semantic value is added to data, or forms additional (meta)data Fundamental principle of research publication: the work is reproducible –exact experimental conditions are given –data are preserved/accessible –in recent case of animal clones, ‘samples’ also had to be made available upon request Increasing requirement to archive primary data

Data and publication in crystallography A reasonable state of affairs... –molecular models archived by journals (CIFs: interactive figures) –reduced diffraction data preserved by databases or some journals (data validation; retracted papers)... but with room for improvement –molecular dynamics for the crystalline state difficult to interpret; whole diffraction images preferable for archiving –scientific fraud in structural biology/chemistry: archiving of diffraction images provides better security against such frauds –but diffraction data images from crystal diffraction experiments are uncompressed, file sizes large. Thus limited appetite (and resources) to preserve it

Crystals, diffraction spots and smears, molecules and dynamics Zoom

Some archive technical details Protein Data Bank: 60,000 macromolecular structures –80% derived from crystal structure analysis –archive doubling in size every 2 to 3 years –coordinate file for typical protein ~0.25 Mb; derived from core diffraction data of 1Mb; extracted from ~1 Gb of diffraction images data. –data sets need to be archived in quintuplicate (EBI Director to JRH Jan ) –thus 60,000 x 1Gb x 5= 300 Terabytes of primary data for PDB currently –cost estimate for PDB to be the sole primary archive provider ca GBP 200,000 per annum: unable to take on this responsibility on Currently researcher agrees to hold project diffraction images for at least 5 years and release them upon request; no archiving commitment from research sponsor Solution in distributed or federated archives (experimental facilities / laboratories / data repositories)?