Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workshop Data curation and the EUDAT Collaborative Data Infrastructure

Similar presentations


Presentation on theme: "Workshop Data curation and the EUDAT Collaborative Data Infrastructure"— Presentation transcript:

1 Workshop Data curation and the EUDAT Collaborative Data Infrastructure
Amsterdam, Monday, February 22, 2016 13:30-17:00 René van Horik (DANS) Luca Trani (KNMI) Marjan Grootveld (DANS) Sarah Jones (DCC) This work is licensed under the Creative Commons CC-BY 4.0 licence. Attribution: EUDAT –

2 Data Curation and the EUDAT CDI
Goal of the workshop: To present and discuss the EUDAT Collaborative Data Infrastructure, with an emphasis on curation aspects Source:

3 Program # Time Topic Presenter 1 13:30-13:40 Introduction
13:30-13:40   Introduction René van Horik (DANS) 2 13:40-14:20 The EUDAT project and services 3 14:20-15:00 Use Case. How is EUDAT used in practice Luca Trani (KNMI) 15:00-15:30 Break 4 15:30-15:50 Positioning the services Marjan Grootveld (DANS) 5 15:50-16:30 Data Management Planning Sarah Jones (DCC) 6 16:30-17:00 Discussion

4 Introduction What are essential services for researchers and research communities to manage and curate research data?

5 EUDAT Project and Services

6 EUDAT: A pan-European Infrastructure
EUDAT offers common data services, supporting multiple research communities as well as individuals, through a geographically distributed, resilient network of 35 European organisations EUDATs vision is to enable European researchers and practitioners from any research discipline to preserve, find, access, and process data in a trusted environment, as part of a Collaborative Data Infrastructure

7 Community-Driven Solutions
EUDAT services are designed, built and implemented based on user community requirements. PHYSICAL SCIENCES & ENGINEERING SOCIAL SCIENCES & HUMANITIES MATERIALS & ANALYTICAL FACILITIES ENVIRONMENTAL SCIENCES MAPPER BIOMEDICAL & MEDICAL SCIENCES

8

9

10 Where is B2DROP in the B2 Service suite?
B2DROP lets you transfer data stored on B2DROP to other B2 services

11 An ideal solution for researchers and scientists to:
Store and exchange data with colleagues and team members, including research data not finalized for publishing share data with fine-grained access controls synchronize multiple versions of data across different devices Features: 20GB storage per user Living objects, so no PIDs Versioning and offline use Desktop synchronisation

12 What can users do? Users can
have access to 20GB of storage space for research data access and manage files from any device and any location define with whom to exchange data, for how long and how

13 What type of data can I store?
Any kind of temporary research data Primary data or processed data, data as the basis for a publication. Both empirical and theoretical data Any kind of format and file: papers, spreadsheets, audio­visual media, provided the data source or purpose of the data has a scientific background. Data are uploaded from your local or departmental storage device as digital objects. Please read our service Terms of Use for further information.

14 Register to use B2DROP Easy and quick registration at

15 It’s simple to use Intuitive and simple user interface
User-friendly interface and easy-to-use storage facilities Drag or drop files for storage Create new files and folders Share data with others with one click

16 How & Where are my data stored
B2DROP is hosted at the Jülich Supercomputing Centre Daily backups of all files in B2DROP are taken and kept on tape. Underlying technology is ownCloud 7

17

18 Where is B2SHARE in the B2 Service suite?
B2SHARE uses other EUDAT services such as B2SAFE and B2STAGE to provide reliability and data retention

19 A winning solution for researchers, scientists and communities to:
store data safely at a trusted and certified data centre Register data to guarantee long-term persistence control access and share data with colleagues and the world Features: metadata management permanent PIDs Open Access support

20 What’s the B2SHARE added-value?
Your data is ... Hosted so there are no hardware or network worries on depositor side Assigned a permanent ID which means it is always retraceable to you Stored alongside queriable & findable metadata and automatically available via the B2FIND metadata catalogue Managed and stored by a trusted and certified data centre

21 What can users do? Users can …
Search for data sets that are publicly accessible Deposit and share data (only registered users) Download data (only registered users) What happens to data once users have deposited it? EUDAT has no claim over the data deposited in B2SHARE and depositors remain entirely responsible for the data they deposit. EUDAT retains the right of archiving, i. e. creating replicas at trusted centers to take care of long-term persistence.

22 What data can users upload?
In virtually any kind of format.... Papers Spreadsheets Audio-visual media, Research data Primary data Processed data Data as basis for a publication Empirical data Theoretical data Data source or purpose of the data is scientific Unlimited file uploads. 2GB per file (for larger files then contact EUDAT via Just a note... Make sure that you are allowed to upload your data and are compliant with all data protection laws regarding sensitive data and restrictions on where this data is stored. Unrelated personal data should not be stored on B2SHARE.

23 Search for data sets Click on “Search” at

24 Data Collections and PIDs

25

26 Where is B2SAFE in the EUDAT suite?
Replicate Research Data Safely

27 The ideal solution for communities with no facility for archival to:
replicate research data into secure data stores archive and preserve research data in the long-term bring data close to powerful compute resources co-locate data with different communities benefit from economies of scale Features: large-scale storage robust and highly available permanent PIDs

28 Who can benefit? Small and medium-sized repositories Data producers and data consumers lacking the capacity to store data over longer periods of time without long-term funding for the preservation of their data without adequate compute capacity for data-intensive computational services who need to be sure that trusted centres are taking care of their data who want to access added-value services on data sources of interest to them who wish to perform interdisciplinary research on top of data from the heterogeneous EUDAT communities

29 What happens? Data from the Community repository is replicated in other data centres….. …distributed across Europe.

30 B2SAFE Summary B2SAFE offers:
functionality to replicate datasets across different data centres in a safe and efficient way  long-term solution for archiving and preserving research data an entry point to bring data closer to powerful computers for compute-intensive analysis

31

32 The EUDAT Service Suite
The EUDAT service suite represents an integrated set of services to support researchers manage their data through the data lifecycle. As your data moves through the data lifecycle, EUDAT services will help you manage your data using best practices followed by some of the world’s largest communities. The services available cover a wide range of functionalities. B2SAFE enables communities to replicate and safely store their large-scale data on robust, reliable datacentres operated by the EUDAT partners. B2HANDLE registers all data on EUDAT with a unique identifier which can be globally resolved on the standard handle system. B2DROP allows EUDAT users to easily exchange working data, while B2SHARE allows to deposit and disseminate final research data at a smaller scale, but easier than with B2SAFE. B2FIND allows searches on the EUDAT metadata and is one of the key enablers of multi-disciplinary research on EUDAT. B2ACCESS is the simple and secure authorisation and authentication platform of EUDAT, which allows single sign-on on EUDAT’s public and internal service. B2STAGE, the subject of this talk, offers communities an entry-point to ingest and replicate into EUDAT large volumes of data. Data ingested through B2STAGE are registered with a Persistent Identifier using the mechanism adopted by B2SAFE.

33 Facilitating communities to:
move large amounts of data between data stores and high-performance compute resources re-ingest computational results back into EUDAT deposit large data sets into EUDAT resources for long-term preservation EUDAT offers the B2STAGE service, which allows big, research data to move efficiently between storage and computation. The service also takes care of depositing the computation output from the HPC facilities to EUDAT. B2STAGE can also be used to deposit the community data into the EUDAT facilities. B2STAGE uses the established gridFTP protocol to ensure high-speed transfer between the sites. Data transfer is reliable and requires very little user interaction. B2STAGE also assigns PIDs to computational output that the user elects to inject back into the EUDAT datacentres. Features: high-speed transfer reliable and light-weight manages permanent PIDs

34 Why use B2STAGE? High level benefits
Research challenges are getting larger and more complex: E.g. full-Earth climate simulation, coupled simulations of multiple organs in the human body, seismic analyses of earthquakes at continental scale Researcher data and compute demands are rising fast Efficient transfer of data to high performance computing (HPC) workspaces is essential especially in distributed computing, where resources are geographically dispersed B2STAGE was conceived to deal with modern day research challenges. As hardware and research software improve, scope for research is broadening. Communities now pursue large-scale simulations, for example developing models for climate simulation encompassing the whole of the Earth, as opposed to isolated regions. Scientists simulate not only organs in the human body, but also their interactions. Similarly, earthquake data are now collected and processed for areas as large as entire continents. The common requirement of such research challenges is that they generate and process increasing volumes of data, with typical workflows requiring data to be processed in a distributed fashion, so as to cope with the pace of data generation. In order for this to be possible, data need to be transferred in an efficient way to the high-performance or high-throughput computing resources, and this is where B2STAGE comes in.

35 Specific User Requirements
Why use B2STAGE? Specific User Requirements Facilitates transfer of large data collections from EUDAT storage resources to HPC facilities. Provides the means to re-ingest computational results back into the EUDAT infrastructure. Ingests data sets into EUDAT resources for long-term preservation. Offers reliable, efficient, easy-to-use tools to manage data transfers. The Data Staging Script is the only tool handling data transfer using PIDs. B2STAGE was developed to address specific user requirements. The fundamental use-case is to allow data already ingested into EUDAT to move to HPC facilities for processing. This is important not only in the case where the community that deposited the data process them as per their original intention, but also in more advanced scenarios of inter-disciplinary research on open data. In this case B2STAGE moves heterogeneous data for processing allowing data combinations that were not previously thought of. B2STAGE also allows users to push the results of the computation safely back into EUDAT, where they may be preserved and/or further replicated according to the community policies. B2STAGE is developed over the gridFTP protocol, which make data transfers reliable and efficient. To ease use, EUDAT has developed the companion Data Staging Script, a client-side tool that facilitates the data transfer commands and handles PIDs for the data resources involved.

36 Who can use B2STAGE? Researchers can transfer large data collections from EUDAT storage resources to HPC facilities for processing. Community Managers can replicate community data through a lightweight service and ingest data sets to EUDAT storage resources for long term preservation. The main end-users are EUDAT researchers, who can transfer their data between storage and computation as part of their day-to-day workflows. Using B2STAGE to inject community data into EUDAT is generally a function of Community Managers.

37 How can you use B2STAGE? EUDAT offers B2STAGE to all registered researchers and interested communities, enabling them to make use of the service to stage data out of EUDAT, and ingest computational results back. Access to remote HPC facilities should be negotiated and arranged by individual users in parallel. To help researchers use the B2STAGE service, EUDAT offers documentation, training material and a service helpdesk. As per slide.

38 B2STAGE summary B2STAGE offers:
data staging functionalities to easily and efficiently transfer data from EUDAT storage resources to HPC facilities a powerful mechanism to ingest data onto EUDAT resources a script to facilitate the staging, ingest and retrieval of PID information of transferred data B2STAGE is unique in handling PIDs for the data As per bullets

39

40 B2 Service Suite B2FIND is part of the EUDAT B2service suite that represents an integrated set of services to support researchers and research communities to manage data through the data lifecycle. As your data moves through the data lifecycle, EUDAT services will help you manage your data using best practices followed by some of the world’s largest communities. B2FIND enables browsing and discovery of data stored through the B2SAFE and B2SHARE services.

41 A metadata catalogue service to:
seek data objects and collections using powerful metadata searches catalogue community data by means of selected metadata browse through multi-disciplinary data collections filtered by content, provenance and temporal keywords Features: simple to use standards-based comprehensive catalogue

42 B2FIND is for… Researchers or end users who want to search and access research data Communities and other providers of research data who are interested in publishing their metadata in EUDAT B2FIND addresses two precise different target segments: - Researchers or end users who want to search and access research data - Communities and other providers of research data who are interested in publishing their metadata in EUDAT

43 B2FIND is... a simple and user-friendly discovery service on metadata steadily harvested from a wide range of research communities In a nutshell B2FIND provides a simple and user-friendly discovery service on metadata steadily harvested from a wide range of research communities

44 Why B2FIND? It is based on a comprehensive joint metadata catalogue of research data collections stored in EUDAT data centres and other repositories Metadata is mapped and offered covering a wide range of integrated communities What is the main added value of B2FIND? First of all B2FIND is based on a comprehensive joint metadata catalogue of research data collections stored in EUDAT data centers and other repositories. Secondly metadata is mapped and offered covering a wide range of integrated communities. Currently over 15 communities from different disciplines spanning from climate research to Social Sciences, from Biodiversity to Linguistics, from Archaeology to Seismology are integrated in B2FIND including metadata from ENES, CESSDA, DARIAH and many others. In a nutshell B2FIND allows research communities to publish metadata in EUDAT’s B2FIND catalogue by making their research data searchable, viewable and accessible to the public, improving their interoperability and re-use.

45 B2FIND is easy to use! b2find.eudat.eu
B2FIND has a simple to use web interface reachable at b2find.eudat.eu. More details are given in the User Training presentation (specifically targeting researchers) and in the B2FIND integration presentation (specifically targeting research communities).

46 Other EUDAT Services

47 B2 Service Suite B2ACCESS B2Handle
Note that this slide shows our ambition to integrate these services making it possible to migrate data deposited at B2DROP to B2SHARE, B2SAFE

48

49 EUDAT Communities & Pilots
7 Core communities (CLARIN, ELIXIR, ENES, EPOS, ICOS, LTER Europe, VPH) Data Pilots 24 applications, of which 7 = Earth sciences, energy and environment, 6 = Biomedical and life sciences, 6 = Social Sciences and Humanities 5 = Physical Sciences and Engineering potential user audience = 40,000 users cumulative storage resource request of up to 4.3PB

50 Program # Time Topic Presenter 1 13:30-13:40 Introduction
13:30-13:40   Introduction René van Horik (DANS) 2 13:40-14:20 The EUDAT project and services 3 14:20-15:00 Use Case. How is EUDAT used in practice Luca Trani (KNMI) 15:00-15:30 Break 4 15:30-15:50 Positioning the services Marjan Grootveld (DANS) 5 15:50-16:30 Data Management Planning Sarah Jones (DCC) 6 16:30-17:00 Discussion


Download ppt "Workshop Data curation and the EUDAT Collaborative Data Infrastructure"

Similar presentations


Ads by Google