Replicate Research Data Safely eudat.eu/b2safe www.eudat.eu B2SAFE How to replicate your data using EUDAT’s B2SAFE Version 3 November 2015 This work is.

Slides:



Advertisements
Similar presentations
An Overview From a Technical Perspective Sebastien Korner Representing the DPN Technical Team PASIG May 22, 2013.
Advertisements

Safe Replication and Data Staging Core Services of the EUDAT CDI Johannes Reetz, RZG 2nd EUDAT conference Rome, 29 Oct 2013.
The Digital Preservation Network at UT Austin Chris Jordan Texas Advanced Computing Center.
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
EUDAT Towards a pan-European Collaborative Data Infrastructure Ari Lukkarinen CSC-IT Center for Science, Finland APA Conference, November 6th, 2012.
EUDAT Data Services for Research “The Story” Per Öster Director, Research Infrastructures CSC – IT Center for Science Ltd.
An Introduction to DuraCloud Carissa Smith, Partner Specialist Michele Kimpton, Project Director Bill Branan, Lead Software Developer Andrew Woods, Lead.
DuraCloud Managing durable data in the cloud Michele Kimpton, Director DuraSpace.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
Per Møldrup-Dalum State and University Library SCAPE Information Day State and University Library, Denmark, SCAPE Scalable Preservation Environments.
Schets van het landschap Deel C Presentatie EUDAT.
Delivering business value through Context Driven Content Management Karsten Fogh Ho-Lanng, CTO.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
PRACE-2IP WP10 - iRODS workshop iRODS CINES Gerard GIL (CINES) – (Linkoping September 2012)
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The pan-European.
Sync and Exchange Research Data b2drop.eudat.eu This work is licensed under the Creative Commons CC-BY 4.0 licence B2DROP EUDAT’s Personal.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT The European.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
DuraCloud Open technologies and services for managing durable data in the cloud Michele Kimpton, CBO DuraSpace.
EUDAT: Data sharing and management in a collaborative data infrastructure Rob Baxter, EPCC, University of Edinburgh.
Find Research Data b2find.eudat.eu B2FIND User Training How to find data objects and collections using EUDAT’s B2FIND This work is licensed.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No B 2 DROP User.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No B2SHARE How to.
Store and Share Research Data b2share.eudat.eu B2SHARE How to share and store research data using EUDAT’s B2SHARE This work is licensed under.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
b2access.eudat.eu B2ACCESS The simple and secure authorisation and authentication platform of EUDAT This work is licensed under the Creative.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Data Preservation.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
CLARIN EUDAT2020 uptake plan Dieter Van Uytvanck CLARIN ERIC EUDAT User Forum, Rome.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EPOS and EUDAT.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
Get Data to Computation eudat.eu/b2stage B2STAGE How to shift large amounts of data Version 4 February 2016 This work is licensed under the.
B2access.eudat.eu B2ACCESS User Training How to register with B2ACCESS Version 1 February 2016 This work is licensed under the Creative Commons.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The use of the.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No West-Life.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Public access.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Collaboration.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Support to scientific.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Services.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No TURBASE-DNS: A.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Herbadrop.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Aalto Data Repository.
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
Research data storage service IDA © 2016 OKM ATT 2014–2017 initiative Licensed under Creative Commons BY 4.0Creative.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No LTER- Europe &
PIDs in EUDAT Webinar, 15 Februari 2013
The EUDAT Services Suite
Tokamak data mirror for JET and MAST Moving towards an open data repository for European nuclear fusion research.
EUDAT’s engagement with the Earth Sciences
AAI for a Collaborative Data Infrastructure
Mark van de Sanden Giovanni Morelli
EUDAT Collaborative Data Infrastructure
Workshop Data curation and the EUDAT Collaborative Data Infrastructure
DATA SPHINX & EUDAT Collaboration
Implementing an Institutional Repository: Part II
NFFA Europe.
An EUDAT-based FAIR Data Approach for Data Interoperability
European Research Data Services, Expertise & Technology Solutions
EUDAT Site and Service Registry
DATATURB Direct simulation data of turbulent flows
Implementing an Institutional Repository: Part II
RDA uptake activities and plans: ESGF
How to Implement an Institutional Repository: Part II
EOSC-hub Contribution to the EOSC WGs
Presentation transcript:

Replicate Research Data Safely eudat.eu/b2safe B2SAFE How to replicate your data using EUDAT’s B2SAFE Version 3 November 2015 This work is licensed under the Creative Commons CC-BY 4.0 licence. Attribution: EUDAT –

eudat.eu/b2safe B2SAFE is… B2SAFE is a robust, safe and highly available service which allows community and departmental repositories to implement data management policies on research data across multiple administrative domains in a trustworthy manner

eudat.eu/b2safe B2SAFE is part of EUDAT... a pan-European initiative building a sustainable cross-disciplinary and cross-national data infrastructure providing a set of shared services for accessing and preserving research data supporting multiple research communities by working closely with them to deliver these technical services as part of the EUDAT Collaborative Data Infrastructure (CDI)

eudat.eu/b2safe A truly pan-European Infrastructure EUDAT offers common data services to both research communities and individuals through a network of 35 European organisations. Our vision is to enable European researchers from any discipline to preserve, find, access, and process data in a trusted environment, as part of a Collaborative Data Infrastructure. European infrastructures Technology Providers Research Communities

eudat.eu/b2safe Community-Driven Solutions EUDAT services are designed, built and implemented based on user community requirements.

eudat.eu/b2safe Where is B2SAFE in the EUDAT suite? B2SAFE Replicate Research Data Safely

eudat.eu/b2safe Better safe than sorry…. to guard against data loss in long-term archiving and preservation, to optimize access for users from different regions, and to bring data closer to powerful computers for compute-intensive analysis. In today’s rich data-storage ecosystems, large data centres must offer a robust, safe and highly available replication service to allow community and departmental repositories to replicate their research data: “I want to replicate my collection X to two data centres and store the collection safely for 10 years”.

eudat.eu/b2safe replicate research data into secure data stores archive and preserve research data in the long-term bring data close to powerful compute resources co-locate data with different communities benefit from economies of scale The ideal solution for communities with no facility for archival to: Features: large-scale storage robust and highly available permanent PIDs

eudat.eu/b2safe B2SAFE Features (1/2) Based on the execution of auditable data policy rules and the use of persistent identifiers (PIDs). Respects the rights of the data owners to define the access rights for their data and to decide how and when they are made publicly referenceable. Employs Data Policy Manager to allow centrally managed, community-defined data policies. B2SAFE Training

eudat.eu/b2safe B2SAFE Features (2/2) Uses site rule-engines to implement and enforce policy rules. Aggregates data from different disciplines into a storage system of trustworthy and capable data service providers. Supports repository packages (e.g. DSPACE, FEDORA) and a lightweight HTTP-based solution. B2SAFE Training

eudat.eu/b2safe Who can benefit? Small and medium-sized repositories lacking the capacity to store data over longer periods of time without long-term funding for the preservation of their data without adequate compute capacity for data-intensive computational services Data producers and data consumers who need to be sure that trusted centres are taking care of their data who want to access added-value services on data sources of interest to them who wish to perform interdisciplinary research on top of data from the heterogeneous EUDAT communities

eudat.eu/b2safe What makes B2SAFE unique Data are stored in the EUDAT Collaborative Data Infrastructure (CDI) with known policies. Therefore, data are stored in transparent infrastructures across Europe. Communities can benefit from the professionally managed EUDAT infrastructure and concentrate their effort and budget on their core research. EUDAT is building a suite of additional services relevant for the “engine under the hood” of e- science infrastructures (e.g. EPOS, EMSO, CLARIN). Data are stored next to HTC & HPC servers ideal for compute - intensive data processing.

eudat.eu/b2safe How can you use B2SAFE? Any community and departmental data repositories can use B2SAFE. EUDAT experts can help setup the followed requered technologies Persistent Identifiers (PIDs). Metadata describing the properties and context of the data being replicated. iRODS (recommended) or similar data management technology for federation. To help these groups use the B2SAFE service, EUDAT offers documentation, training material and a service helpdesk. For more information please

eudat.eu/b2safe Safe Replication with B2SAFE EUDAT CDI Domain of registered data PID Data Centre Store EPIC service

eudat.eu/b2safe What happens? Data from the Community repository is replicated in other data centres….. …distributed across Europe.

eudat.eu/b2safe What happens step by step? iRods PID Data Center Store 1 Community repository Digital Object (DO) unique identifier (PID) to the DO PID Data ingestion Data replication own PID system OR iRODS rules iRods Community Centre iRods PID Data Center Store 2 Based on community policy PID assignment

eudat.eu/b2safe ROR : Repository of Records, the repository where data was stored first. PPID : Parent PID, the persistent identifier associated to the source object in a replication chain. If the chain has only two elements, the master copy and the first replica, then the PPID = ROR. Original DO and replicas

eudat.eu/b2safe EUDAT partners are already using B 2 SAFE

eudat.eu/b2safe Community centre EUDAT centre CLARIN ENES VPH Lifewatch Replicate my collection X to three data centres CINECA BSC EPCC EPOS

eudat.eu/b2safe EPOS EUDAT and EPOS community set up a collaboration to provide safe back-up and service redundancy to the Italian seismologist community. The set up of the automated data transfer between EPOS community and EUDAT is: EPOS joined the EUDAT CDI EUDAT defined a specific policy with EPOS The iRODS irsync protocol was chosen to achieve the best performance. In order to achieve an hourly synchronization, checksum sync and file-age limit options are used.

eudat.eu/b2safe How to replicate the INGV data to B2SAFE - The process Each digital object ingested by CINECA has been registered, assigning to it a Persistent Identifier (PID) iRODS irsync tool, running multiple irsync processes The data archive, so far, amount to 28,6 TB files PID Registry EUDAT CDI – CINECA node The PIDs are registered into the PID registry, which is hosted at SURFsara and based on the EPIC service

eudat.eu/b2safe Experimental features The current B2SAFE implementation is able to support only a simple messaging model: the synchronous one. Messaging is an experimental feature that provides the results in case of asynchronous (server side triggered) replication process. The messages are posted to a queue which can be accessed via an HTTP interface. The users who ingest data into B2SAFE via GridFTP are not able to retrieve the pid of the object. Metadata management is an experimental feature, that supports this functionality. When enabled it provides a set of metadata properties for each data object, storing them into a file (json), placed in (nearly) the same path of the related data object.

eudat.eu/b2safe B2SAFE Summary B2SAFE offers: functionality to replicate datasets across different data centres in a safe and efficient way long-term solution for archiving and preserving research data an entry point to bring data closer to powerful computers for compute-intensive analysis

eudat.eu/b2safe Future features Easy setup. B2SAFE provides a script to build rpm and deb packages. Plan to provide downloadable, easy to install packages (i.e. click-install-run ). New extensions - connectors. For now, it is possible to ingest data into B2SAFE stored on a file system or in the DSPACE repository. New connectors for FEDORA and ePRINTS are planned to be implemented. Improve the service with “ dynamic data ” (streaming data) capabilities. Further integration with B2ACCESS. Support authorization on basis of community access rules.

eudat.eu/b2safe For more info: B2DROP User Documentation: