Presentation is loading. Please wait.

Presentation is loading. Please wait.

Usecases: 1.ISIS Neutron Source 2.DP for HEP Matthew Viljoen STFC, UK APARSEN-EGI workshop: preserving big data for research Amsterdam Science Park 4-6.

Similar presentations


Presentation on theme: "Usecases: 1.ISIS Neutron Source 2.DP for HEP Matthew Viljoen STFC, UK APARSEN-EGI workshop: preserving big data for research Amsterdam Science Park 4-6."— Presentation transcript:

1 Usecases: 1.ISIS Neutron Source 2.DP for HEP Matthew Viljoen STFC, UK APARSEN-EGI workshop: preserving big data for research Amsterdam Science Park 4-6 March 2014

2 Use Case 1 ISIS Neutron Source

3 Setting the Scene – Big Data at SCD, STFC Solutions using CASTOR, DMF, SDB, Panasas and home grown Primarily Linux based. ORACLE Storagetek SL8500 robot with T10K(A-D) media 18PB on tape and 9PB on disk (CASTOR) 6PB on disk (Panasas) Users: High Energy Particle Physics (CERN users) STFC Facilities (Diamond Synchrotron, ISIS Neutron Source, …) Complete end-to-end data solution offered for large scale facilities: Data ingest, data archival, metadata, portal for data retrieval and DOI services

4 ISIS Neutron Source Pulsed Neutron and Muon source. At RAL, Harwell, UK. Run by STFC ~3000 scientists supported. clean energy and the environment, pharmaceuticals and health care… nanotechnology and materials engineering, catalysis and polymers… fundamental studies of materials Techniques Muon spectroscopy, Neutron diffraction, Neutron spectroscopy, Neutron reflectometry, Small angle scattering Data collection From KBs to GBs per visit. Currently ~11TB to date. New experiment (e.g. IMAT) up to 2TB per visit

5 ISIS Data Policy, Management and Access Well defined policy: http://www.isis.stfc.ac.uk/user-office/data-policy11204.html http://www.isis.stfc.ac.uk/user-office/data-policy11204.html 3.1.1 All raw data and the associated metadata obtained as a result of free (non-commercial) access to ISIS, reside in the public domain, with ISIS acting as the custodian. 3.1.2 All raw data and the associated metadata obtained as a result of ‘commercial-in- confidence’ access to ISIS will be owned exclusively by the commercial user. Commercial users must agree with their relevant instruments scientists how they wish their raw data and metadata to be managed before the start of any experiment. Also: 3.3.1 Access to raw data and metadata beyond the period that it is stored on instrument-related computers will be via a searchable on-line catalogue. 3.3.2 Access to the on-line catalogue will be restricted to those who register with STFC/ISIS as users of the on-line catalogue.

6 Data Management for ISIS DP Here!

7 Accessing Data via DOI – Landing Page

8 Accessing Data via DOI – Data Portal

9 Data Preservation solution – Tessella Safety Deposit Box (SDB) Primary copy on disk (Windows File Store). Served to users on demand. Copy of ALL data stored for long term backup and preservation on tape using SDB by Tessella (and DMF) SDB uses SIP at ingest which reads OAIS NeXus standard file format. NeXus validator checks data. Metadata generated. Well defined data. (see nexusformat.org. Synchrotron/neutron scattering driven) Definable workflows for migration of data to new formats. Continuous validation of data ‘bit rot’

10 Unresolved issues Data Preservation is a dark archive. Yet to put into place mechanics for accessing it. Future data volume increase. How many copies? All on spinning disk? Granularity of DOIs and How do we relate datasets together? (raw->reduced->derived). What if they all have different DOIs?

11 Usecase 1 - Summary All ISIS data stored and available for download (with provisos in DM plan) Data preservation in place for retaining data for long period Scientists responsible for documentation/annotation of their data and provenance

12 Usecase 2 – DP for HEP With thanks to Jamie Shears for his input Views expressed here are my own

13 Intro to DP( )HEP DPHEP… is a study group focusing on data persistency and long term analysis for HEP and including LHC data at CERN. Representation from many national labs aims to converge to a common set of specifications for this.

14 The Problem Particle accelerators are very expensive - e.g. €3bn for LHC. To maximize returns, we need to preserve data and knowledge to reproduce past analyses and perform new ones. DP has been done as a somewhat ‘ad hoc’ approach in the past.

15 Exascale Preservation Current WLCG archives are 10s of PB (CERN has 100PB). Next 2 decades, estimates are up to 5EB Scaling up past DP successes. e.g. LEP - 10TB until 2000. Data/SW still available and usable. Past DESY HERA experiments – 1PB preserved + usable

16 DPHEP Approach 1. Digital library tools & services, together with a Portal 2. Sustainable software, coupled with advanced virtualization techniques and validation Frameworks 3. Draw from proven past bit preservation successes together with a sustainable funding model with an outlook to 2040/50 4. Open Data – over and above simple Open Access

17 Challenges Not all HEP data is open. Experiments are reviewing their Open Data policies. Training needs are different for different communities: DP Service providers in bit preservation Software developers Scientists The documentation problem in DP HEP. Who, what and how much. Technological difficulties of DP HEP and scaling up to Exascale Porting software is time consuming. Will old software compile on new compilers?

18 Thank you Questions matthew.viljoen@stfc.ac.uk


Download ppt "Usecases: 1.ISIS Neutron Source 2.DP for HEP Matthew Viljoen STFC, UK APARSEN-EGI workshop: preserving big data for research Amsterdam Science Park 4-6."

Similar presentations


Ads by Google