Usecases: 1.ISIS Neutron Source 2.DP for HEP Matthew Viljoen STFC, UK APARSEN-EGI workshop: preserving big data for research Amsterdam Science Park 4-6.

Slides:



Advertisements
Similar presentations
Peter Berrisford RAL – Data Management Group SRB Services.
Advertisements

An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
December 2008 MRC Data Support Services (DSS) Chris Morris 13 th February 2009 Sharing Research Data: Pioneers, Policies and Protocols The seventh cat.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
A Very Brief Introduction to iRODS
11© 2011 Hitachi Data Systems. All rights reserved. HITACHI DATA DISCOVERY FOR MICROSOFT® SHAREPOINT ® SOLUTION SCALING YOUR SHAREPOINT ENVIRONMENT PRESENTER.
Depositing and Disseminating Digital Resources Alan Morrison Collections Manager AHDS Subject Centre for Literature, Linguistics and Languages.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
Data Preservation Best Practices for preserving your research data for future reuse The goal of data preservation is to ensure that your data is in a sustainable.
August 14, 2015 Research data management – an introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
XenData Digital Archives Simplify your video archive workflow XenData LTO Video Archive Solutions Overview © Copyright 2013 XenData Limited.
Digital Asset Management for All? Visualising a Flexible DAMS Solution for Small and Medium Scale Institutions Paul Bevan Llyfrgell Genedlaethol Cymru.
MANAGING YOUR RESEARCH DATA: PLANNING TO SHARE ……………………………………………………………………………………………………………………………….…………………………….. ……………………………………………………………......…... RESEARCH.
Long-Term Data Preservation in HEP Challenges, Opportunities and Solutions(?) Workshop on Best Practices for Data Management & Sharing.
Long-Term Data Preservation in HEP Challenges, Opportunities and Solutions(?) Joint Data Preservation RDA-3 International Collaboration.
Managing Research Data – The Organisational Challenge at Oxford James A J Wilson Friday 6 th December,
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
Challenges of Digital Media Preservation Karen Cariani, Director Media Library and Archives Dave MacCarn, Chief Technologist.
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
OCLC Online Computer Library Center Digital Preservation with OCLC Digitization Standards: Issues & Updates Taylor Surface, OCLC.
Context and Linking in the Research Lifecycle CERIF and other standards Catherine Jones Scientific Information Group Scientific Computing Department STFC.
Electronic publications in the Swiss National Library ELAG 2005 CERN, Geneva, June 1-3, 2005 Barbara Signori Swiss National Library (SNL)
Planetary Science Archive PSA User Group Meeting #1 PSA UG #1  July 2 - 3, 2013  ESAC PSA Archiving Standards.
Elements of a Data Management Plan: Roles and Responsibilities Ruth Duerr National Snow and Ice Data Center Version 1.0 Review Date.
October 24, 2015 Research data management – a brief introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
Alastair Duncan STFC Pre Coffee talk STFC July 2014 The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project.
June 3, 2016 Research data management – an introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
Services for Object Storage and Preservation March 2008 All content in these slides is considered work in progress. In no way does it represent an absolute.
Small steps and lasting impact: making a start with preservation or It’s not all NASA Patricia Sleeman Digital Archives and Repositories University of.
WP5 – Virtual Laboratories. WP5 Deliverables  D5.1: Specific requirements for the virtual laboratories M6  D5.2: Deployment of Specification of the.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Mantid Current Development and Future Plans Nicholas Draper ICNS 2013.
CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.
Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
PaNdata ODI Open Data Infrastructure INFRA : Data infrastructures for e-Science PaNdata-ODI will develop, deploy and operate an Open Data Infrastructure.
DISCUSSION DRAFT ONLY Data Management METRICS for NNDC and CLASS David Hermreck.
Mike Hildreth DASPOS Update Mike Hildreth representing the DASPOS project 1.
Data Preservation in HEP Use Cases, Business Cases, Costs & Cost Models Grid Deployment Board International Collaboration for Data.
Data Preservation at Rutherford Lab David Corney 9 th July 2010 KEK.
Open Access data at VLIZ Experience in retrieving data from EMODnet “Data ingestion, archiving, citation and DOI” June 26, 2014.
Preservation of LEP Data There is still hope Is there? Marcello Maggi, Ulrich Schwickerath, Matthias Schröder, , DPHEP7 1.
International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics RECODE - Final Workshop - January.
SciencePAD Open Software for Open Science Alberto Di Meglio – CERN.
The DPHEP Collaboration & Project(s) Services, Common Projects, Business Model(s) EGI “towards H2020” Workshop December 2013 International.
Online survey analysis tools Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Preservation Functionality in a Digital Archive Erik Oltmans Koninklijke Bibliotheek Raymond J. van Diessen IBM Business Consulting Services Hilde van.
Research Data Management 26 th April 2016 Federica Fina, Data Scientist, University of St Andrews Library.
PARTHENOS-project.eu EOSC market demand for art, humanties and cultural heritage Amsterdam– EGI Conference– 7/4/2016 Franco Niccolucci Scientific Coordinator,
GNU EPrints 2 Overview Christopher Gutteridge 19 th October 2002 CERN. Geneva, Switzerland.
School on Grid & Cloud Computing International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN CoE offerings Simon Lambert STFC All Hands Meeting, Amsterdam,
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
Evolution of storage and data management
HEP LTDP Use Case & EOSC Pilot
An Approach to Software Preservation
Scientific Computing Department
EOSCpilot WP4: Use Case 5 Material for
An Introduction to Tessella and The Safety Deposit Box Platform
Digital Archiving & Preservation : How to compare and contrast
Research Data Context Preservation in SCAPE
Computing Infrastructure for DAQ, DM and SC
What does DPHEP do? DPHEP has become a Collaboration with signatures from the main HEP laboratories and some funding agencies worldwide. It has established.
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Building an open library without walls : Archiving of particle physics data and results for long-term access and use Joanne Yeomans CERN Scientific Information.
Presentation transcript:

Usecases: 1.ISIS Neutron Source 2.DP for HEP Matthew Viljoen STFC, UK APARSEN-EGI workshop: preserving big data for research Amsterdam Science Park 4-6 March 2014

Use Case 1 ISIS Neutron Source

Setting the Scene – Big Data at SCD, STFC Solutions using CASTOR, DMF, SDB, Panasas and home grown Primarily Linux based. ORACLE Storagetek SL8500 robot with T10K(A-D) media 18PB on tape and 9PB on disk (CASTOR) 6PB on disk (Panasas) Users: High Energy Particle Physics (CERN users) STFC Facilities (Diamond Synchrotron, ISIS Neutron Source, …) Complete end-to-end data solution offered for large scale facilities: Data ingest, data archival, metadata, portal for data retrieval and DOI services

ISIS Neutron Source Pulsed Neutron and Muon source. At RAL, Harwell, UK. Run by STFC ~3000 scientists supported. clean energy and the environment, pharmaceuticals and health care… nanotechnology and materials engineering, catalysis and polymers… fundamental studies of materials Techniques Muon spectroscopy, Neutron diffraction, Neutron spectroscopy, Neutron reflectometry, Small angle scattering Data collection From KBs to GBs per visit. Currently ~11TB to date. New experiment (e.g. IMAT) up to 2TB per visit

ISIS Data Policy, Management and Access Well defined policy: All raw data and the associated metadata obtained as a result of free (non-commercial) access to ISIS, reside in the public domain, with ISIS acting as the custodian All raw data and the associated metadata obtained as a result of ‘commercial-in- confidence’ access to ISIS will be owned exclusively by the commercial user. Commercial users must agree with their relevant instruments scientists how they wish their raw data and metadata to be managed before the start of any experiment. Also: Access to raw data and metadata beyond the period that it is stored on instrument-related computers will be via a searchable on-line catalogue Access to the on-line catalogue will be restricted to those who register with STFC/ISIS as users of the on-line catalogue.

Data Management for ISIS DP Here!

Accessing Data via DOI – Landing Page

Accessing Data via DOI – Data Portal

Data Preservation solution – Tessella Safety Deposit Box (SDB) Primary copy on disk (Windows File Store). Served to users on demand. Copy of ALL data stored for long term backup and preservation on tape using SDB by Tessella (and DMF) SDB uses SIP at ingest which reads OAIS NeXus standard file format. NeXus validator checks data. Metadata generated. Well defined data. (see nexusformat.org. Synchrotron/neutron scattering driven) Definable workflows for migration of data to new formats. Continuous validation of data ‘bit rot’

Unresolved issues Data Preservation is a dark archive. Yet to put into place mechanics for accessing it. Future data volume increase. How many copies? All on spinning disk? Granularity of DOIs and How do we relate datasets together? (raw->reduced->derived). What if they all have different DOIs?

Usecase 1 - Summary All ISIS data stored and available for download (with provisos in DM plan) Data preservation in place for retaining data for long period Scientists responsible for documentation/annotation of their data and provenance

Usecase 2 – DP for HEP With thanks to Jamie Shears for his input Views expressed here are my own

Intro to DP( )HEP DPHEP… is a study group focusing on data persistency and long term analysis for HEP and including LHC data at CERN. Representation from many national labs aims to converge to a common set of specifications for this.

The Problem Particle accelerators are very expensive - e.g. €3bn for LHC. To maximize returns, we need to preserve data and knowledge to reproduce past analyses and perform new ones. DP has been done as a somewhat ‘ad hoc’ approach in the past.

Exascale Preservation Current WLCG archives are 10s of PB (CERN has 100PB). Next 2 decades, estimates are up to 5EB Scaling up past DP successes. e.g. LEP - 10TB until Data/SW still available and usable. Past DESY HERA experiments – 1PB preserved + usable

DPHEP Approach 1. Digital library tools & services, together with a Portal 2. Sustainable software, coupled with advanced virtualization techniques and validation Frameworks 3. Draw from proven past bit preservation successes together with a sustainable funding model with an outlook to 2040/50 4. Open Data – over and above simple Open Access

Challenges Not all HEP data is open. Experiments are reviewing their Open Data policies. Training needs are different for different communities: DP Service providers in bit preservation Software developers Scientists The documentation problem in DP HEP. Who, what and how much. Technological difficulties of DP HEP and scaling up to Exascale Porting software is time consuming. Will old software compile on new compilers?

Thank you Questions