EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 www.eudat.eu Data Preservation.

Slides:



Advertisements
Similar presentations
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Advertisements

Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.
Supporting Customized Archival Practices Using the Producer-Archive Workflow Network (PAWN) Mike Smorul, Mike McGann, Joseph JaJa.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
CORDRA Philip V.W. Dodds March The “Problem Space” The SCORM framework specifies how to develop and deploy content objects that can be shared and.
Understanding Active Directory
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
1 APARSEN - WP2200 Identifiers and Citability Interoperability Framework for PI systems Webinar on PI - 15 February 2013 Maurizio Lunghi.
CNRI Handle System and its Applications
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Microsoft Active Directory(AD) A presentation by Robert, Jasmine, Val and Scott IMT546 December 11, 2004.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Chapter 9 Section 2 : Storage Networking Technologies and Virtualization.
1 Introduction to Microsoft Windows 2000 Windows 2000 Overview Windows 2000 Architecture Overview Windows 2000 Directory Services Overview Logging On to.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Processing services.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The pan-European.
Sync and Exchange Research Data b2drop.eudat.eu This work is licensed under the Creative Commons CC-BY 4.0 licence B2DROP EUDAT’s Personal.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT The European.
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
Active Directory. Computers in organizations Computers are linked together for communication and sharing of resources There is always a need to administer.
Replicate Research Data Safely eudat.eu/b2safe B2SAFE How to replicate your data using EUDAT’s B2SAFE Version 3 November 2015 This work is.
Store and Share Research Data b2share.eudat.eu B2SHARE How to share and store research data using EUDAT’s B2SHARE This work is licensed under.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT EGI interoperability.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EPOS and EUDAT.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
EUDAT Data Policy Manager Mark van de Sanden (SURFsara) Maria Francesca Iozzi (SIGMA/University of Oslo) Claudio Cacciari (CINECA) RDA 3 rd Plenary meeting.
Get Data to Computation eudat.eu/b2stage B2STAGE How to shift large amounts of data Version 4 February 2016 This work is licensed under the.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The use of the.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Public access.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
International Planetary Data Alliance Registry Project Update September 16, 2011.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Support to scientific.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Services.
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Data preparation Initial registration.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Herbadrop.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Aalto Data Repository.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No LTER- Europe &
PIDs in EUDAT Webinar, 15 Februari 2013
The EUDAT Services Suite
Vincenzo Spinoso EGI.eu/INFN
Introduction to Data Management in EGI
CMIP6 / ENES Data TF Meeting: DKRZ
EUDAT Site and Service Registry
RDA uptake activities and plans: ESGF
Leveraging PIDs for object management in data infrastructures RDA UK Node Workshop, July Tobias Weigel (DKRZ)
Presentation transcript:

EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Data Preservation B2SAFE, B2HANDLE Data Preservation team: Claudio Cacciari (Cineca) Tobias Weigel (DKRZ) Maarten Hoogerwerf (DANS)

B2 SERVICE SUITE

Replicate Research Data Safely eudat.eu/b2safe B2SAFE B2SAFE is a robust, safe and highly available service which allows community and departmental repositories to implement data management policies on research data across multiple administrative domains in a trustworthy manner. Data Management Policies team: Claudio Cacciari (Cineca) Robert Verkerk (SurfSARA) Elena Erastova (MPCDF) Javier Quinteros (GFZ) Adil Hasan (SIGMA)

eudat.eu/b2safe replicate research data into secure data stores archive and preserve research data in the long-term bring data close to powerful compute resources co-locate data with different communities benefit from economies of scale The ideal solution for communities with no facility for archival to: Features: large-scale storage robust and highly available permanent PIDs

eudat.eu/b2safe Who benefits Repositories lacking the capacity and / or funding to offer reliable storage and access services over longer periods of time Repositories without adequate compute capacity for data-intensive computational services based on their data Data producers who need to be sure that trusted centres are taking care of their data Consumers wishing to access optimized services on data sources of interest to them Consumers who wish to apply interdisciplinary data- intensive methods using data collected from various communities

eudat.eu/b2safe How is it implemented? The service relies on iRODS technology

eudat.eu/b2safe Storage Virtualization

eudat.eu/b2safe Storage Virtualization - clients

eudat.eu/b2safe Storage Virtualization - namespace

eudat.eu/b2safe Distributed system

eudat.eu/b2safe Metadata

eudat.eu/b2safe Federation

eudat.eu/b2safe How it works? EUDAT has built an additional layer on top of iRODS to streamline the processes which supports the replication and long term data archiving. iRODS + EUDAT B2SAFE package + back-end storage = B2SAFE service

eudat.eu/b2safe EUDAT B2SAFE package EUDAT B2SAFE package = rules + scripts

eudat.eu/b2safe Data ingestion

eudat.eu/b2safe Data replication

eudat.eu/b2safe Data replication and asynchronous responses

eudat.eu/b2safe Persistent Identifier registration

eudat.eu/b2safe How you can use it? Exploiting remote services You can connect to the service using one of the available clients and upload your data sets GridFTP Web clients webDAV

eudat.eu/b2safe How you can use it? Integrating via API You can connect your existing framework or tools via API EUDAT API are work in progress (HTTP API) B2SAFE API (iRODS) EUDAT HTTP API iRODS HTTP REST API iRODS Java API iRODS C API iRODS Python API

eudat.eu/b2safe How you can use it? Joining the federation The community deploy a local instance of the B2SAFE service and join the EUDAT federation EUDAT data center A EUDAT data center B Community data center X

eudat.eu/b2safe What’s next? Authentication Authorization Domain specific metadata

eudat.eu/b2safe Authentication Federated approach

eudat.eu/b2safe

Authorization Some principles that we will follow: To reflect the permissions and roles of the community To be, as much as possible, interoperable with existing tools User Attributes Authorization service (XACML) User Roles

eudat.eu/b2safe xacml

eudat.eu/b2safe Domain specific metadata The current B2SAFE implementation is able to manage system metadata. The next version will be aware of every kind of metadata (descriptive and domain specific) The approach to support the domain specific ones will be progressive: Initially will just be supported the ingestion of metadata so that the B2SAFE service is aware of them Later they will be indexed and made discoverable

eudat.eu/b2safe Roadmap The next B2SAFE release is expected by April. It will include the support to the federated authentication approach of the EUDAT infrastructure. And the support to metadata ingestion. Authorization and metadata indexing will require more time.

b2drop.eudat.eu Usage Patterns Register a PID for a DO Register PIDs for a whole collection recursively Replication Replication with asynchronous PID registration Integrity check between a DO and its replica Recover failed transfers from the logging system's queue Update the location of a DO in the PID record Update the checksum of a DO in the PID record

b2drop.eudat.eu Preserve my data for 10 years Register PIDs for a whole collection recursively Replication Recover failed transfers from the logging system's queue Integrity check between a DO and its replica Update the location of a DO in the PID record Update the checksum of a DO in the PID record DATA SETS Data management policy

eudat.eu/b2safe Data Policy Manager: what is it? The Data Policy Manager is a tool to define policies for data management, like for example: replicate the data set A from repository X to repository Y and keep the two replicas in sync with a sync frequency of six hours. Validate the policies. Distribute policies among the data centers. Collect back the status of the policies

eudat.eu/b2safe Data Policy Manager: how it works?

Register your Research Data B2HANDLE Development Team: Tobias Weigel, Merret Buurman (DKRZ) Robert Verkerk (surfSARA) Christos Kanellopoulos, Nicolas Liampotis, Themis Zamani (GRNET) B2Handle PID services for EUDAT and user communities

B2 SERVICE SUITE

Challenges in object management 1.Object locations change over time 2.Objects can migrate between repositories 3.Object access may be costly or restricted 4.Object management needs to be scalable and efficient 5.Object management must become independent from storage concerns

Sophisticated use of PIDs can address these challenges Persistent Identifiers provide a means to access objects over time and through changes in location and ownership EUDAT uses Handle System PIDs (Handles) Minimal metadata (PID/Handle records) can facilitate management tasks, enable access and information tools and increase end-user value Object (black box) IDENTIFIER: 11098/ABCDEF LOCATION:... CHECKSUM:... REPLICA:...

Exemplary use cases for PIDs Persistent Identifiers for... Any kind of domain data object (e.g. audio files, spreadsheets,...) Metadata objects (e.g. XML documents uploaded by users, metadata harvested from catalogs) Specific usage includes... Linking data with metadata Linking different copies of the same data object (replica linking)

Handle structure A Handle consists of a prefix and a suffix : Prefix: administrative namespace hosted on a specific server Suffix: usually generated by the service managing the object and can have any form EUDAT follows a best practice to use UUID-based suffixes prefix/suffix: 11098/B41B295C-C39E-11E2-BC56-5A4013FEDFA4

Handle System organizational aspects The DONA foundation administrates the global Handle System namespaces (prefixes) DONA has approved an initial set of Multi-primary Prefix Administrators (MPAs): CNRI GWDG Coalition for Handle Services – China MPAs have the right to issue new prefixes.

ePIC: Persistent Identifiers for eResearch A community of practice for PID services An organizational framework through which ePIC members can acquire Handle prefixes GWDG acts as the MPA for ePIC Contracts between ePIC members and GWDG ePIC is concerned with the quality of Handle services Rules for prefix assignment and basic policies for PID management

B2HANDLE operations: EUDAT Handle Server sites WP6 operations contact: Dejan Vitlacil You do not have to be an ePIC member to be a B2HANDLE server site.

The B2HANDLE Python library b2handle: A Python library for interaction with EUDAT Handle services setuptools-enabled Python package Requires contact to one of the EUDAT Handle server sites Stable state; official release of v1.0 also for use by EUDAT user communities mid February

B2HANDLE: Available at GitHub Code repository:

B2HANDLE documentation Technical documentation:

B2HANDLE library features Methods to read, create, modify Handles and their records Queries against native Handle REST interface Support for multiple locations per object (10320/loc entries) Automatic management of Handle value indexes Support for Handle reverse-lookup via additional Java servlet

Quality Assurance for B2HANDLE Continuous Integration (CI): 132+ tests written, covering >86% of code Tests automatically run upon every commit (GRNET Jenkins instance) Technical documentation built and published on every change to master Continued support and improvement Patching and release process supported by CI Possible future split into stand-alone pyhandle library published on PyPI

Unifying PID records across EUDAT All EUDAT PIDs should follow the same record structure, use the same keys and encodings Picking up ideas from the RDA PIT Working Group Normative descriptions in a growing document

EUDAT PID policies initiative: Motivation The key element for PIDs is trust: Users need to trust the identifiers, which falls back onto the organizations managing them DOIs are citable due to policies and trust in them Traditional use of PIDs (e.g. citable DOIs) does not match EUDAT data management requirements Different models for object life cycles Objects may have a limited life span

EUDAT PID policies intiative: First suggestions Build trust by enabling third parties to verify the PID service quality Agreed and controlled Quality of Service (QoS) Transparent business processes for management of nodes, prefixes, PIDs and objects Ensure that no PID is lost (never delete PIDs) Address user needs and sustain added value Support for several object life cycle models (temporary objects, long-term archival, migration) Introduce governance and support mechanisms... PID policy document up for community review!

Questions?