EUDAT: Data sharing and management in a collaborative data infrastructure Rob Baxter, EPCC, University of Edinburgh.

Slides:



Advertisements
Similar presentations
EUDAT Towards a European Collaborative Data Infrastructure Alison Kennedy and Rob Baxter Jan 2012.
Advertisements

Joint Information Systems Committee Digital Library Services BL/JISC Workshop Rachel Bruce JISC Programme Director The Digital Library and its Services,
EUDAT Towards a pan-European Collaborative Data Infrastructure Ari Lukkarinen CSC-IT Center for Science, Finland Digital Research Conference Oxford, 12.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Repositories, Federations, APIs, Policies - wrap up - Peter Wittenburg these slides are just a personal summary of major points they do not represent per.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Welcome to the Conference !! Juan Bicarregui Chair, APA Executive.
EUDAT Towards a European Collaborative Data Infrastructure Damien Lecarpentier – CSC, IT Center for Science, Finland ISC’11, Hamburg, 20 June 2011.
EUDAT Towards a pan-European Collaborative Data Infrastructure Ari Lukkarinen CSC-IT Center for Science, Finland APA Conference, November 6th, 2012.
Repositories, Workspaces, Web Services - some ideas - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure Nijmegen,
Project number: Towards common operations Wouter Los 1 & Sanna Sorvari 2 1 University of Amsterdam 2 Finnish Meteorological Institute.
Project number: Data and Data Requirements Wouter Los University of Amsterdam.
EUDAT Training Session RDA Plenary Dublin, March 25th, 2014 B2Share Nordic “ An example of a service that facilitates Data Discovery and uses PIDs and.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Trusted Digital Archives and the Data Seal of Approval Peter Doorn Data Archiving.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO and the Peter Doorn Data Archiving and Networked Services EUDAT Conference Trust.
A disaggregated model for preservation of E-Prints Gareth Knight SHERPA DP Project Arts and Humanities Data Service.
Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
IRODS workshop, September , Linköping (Sweden) iRODS Workshop users needs summary Agnès Ansari – Wednesday, 26 September.
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
Exploring ‘Workspaces’ Tom Visser, SARA compute and networking services, Amsterdam Garching Workshop 21 st September 2010.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The pan-European.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT The European.
Why to care about research?
Replicate Research Data Safely eudat.eu/b2safe B2SAFE How to replicate your data using EUDAT’s B2SAFE Version 3 November 2015 This work is.
Store and Share Research Data b2share.eudat.eu B2SHARE How to share and store research data using EUDAT’s B2SHARE This work is licensed under.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
1 st EGI CTA VT meeting 18 January 2013 C. Vuerli (INAF, Italy), N. Neyroud (CNRS/IN2P3/LAPP, France)
The Global Scene Wouter Los University of Amsterdam The Netherlands.
B2 Nordic – call for pilot. Introduction B2 Nordic: initiative proposed to NeIC Uptake of the EUDAT B2 service suite in the Nordics. 2.
CLARIN EUDAT2020 uptake plan Dieter Van Uytvanck CLARIN ERIC EUDAT User Forum, Rome.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EPOS and EUDAT.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
Get Data to Computation eudat.eu/b2stage B2STAGE How to shift large amounts of data Version 4 February 2016 This work is licensed under the.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The use of the.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No West-Life.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Public access.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Support to scientific.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Services.
EGI-InSPIRE EGI-InSPIRE RI EGI strategy towards the Open Science Commons Tiziana Ferrari EGI-InSPIRE Director at EGI.eu.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No TURBASE-DNS: A.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Herbadrop.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Enriching Europeana.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Aalto Data Repository.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No LTER- Europe &
PIDs in EUDAT Webinar, 15 Februari 2013
Towards a pan-European Collaborative Data Infrastructure
EUDAT Towards a European Collaborative Data Infrastructure
The EUDAT Services Suite
Tokamak data mirror for JET and MAST Moving towards an open data repository for European nuclear fusion research.
EUDAT: collaborative pan-European infrastructure providing research data services, training and consultancy This work is licensed.
EUDAT’s engagement with the Earth Sciences
AAI for a Collaborative Data Infrastructure
Donatella Castelli CNR-ISTI
Data Services at CSC ©2016 OKM ATT initiative Licensed under Creative Commons BY 4.0.
Data Access and Re-use Carl Johan Håkansson EUDAT Service Area Manager
EUDAT Collaborative Data Infrastructure
Workshop Data curation and the EUDAT Collaborative Data Infrastructure
DATA SPHINX & EUDAT Collaboration
NFFA Europe.
An EUDAT-based FAIR Data Approach for Data Interoperability
Common Solutions to Common Problems
European Research Data Services, Expertise & Technology Solutions
Publishing data and metdata From iRODS to repositories
EUDAT Site and Service Registry
DATATURB Direct simulation data of turbulent flows
EOSC-hub Contribution to the EOSC WGs
Presentation transcript:

EUDAT: Data sharing and management in a collaborative data infrastructure Rob Baxter, EPCC, University of Edinburgh

European Data – EUDAT Start Date: 1 st October 2011 Duration: 39 months Budget:16.3 M€ (9.3 M€ EC) Call:INFRA Consortium: 25+ partners from 13 countries –“Data scientists”, data centres, technology providers Goals: –create a cost-effective, high-quality Collaborative Data Infrastructure (CDI) –…that meets users’ needs in a flexible and sustainable way –…across geographical and discipline boundaries

EUDAT consortium: data centres and data scientists

EUDAT vision – the CDI Trust Data Curation Data Generators Users Common Data Services Community Support Services User-focused functionality, data capture & transfer, VREs Data discovery & navigation, workflow creation, annotation, interpretability Persistent storage, identification, authenticity, workflow execution, mining

Five core research communities* CLARIN: Common Language Resources and Technology Infrastructure LifeWatch: Biodiversity Data and Observatories EPOS: European Plate Observatory System ENES: Service for Climate Modelling in Europe VPH: The Virtual Physiological Human All share common challenges: –Reference models and architectures –Persistent data identifiers –Metadata management –Distributed data sources –Data interoperability EUDAT has to work bottom up to “refactor” cross-community common services * and even more associate communities!

EUDAT CDI network RZG FZJ SURF sara BSC CSC EPCC CINECA Generic data centres Community data sites

Collaborative Data Infrastructure community A services upload access sharing discovery by GUI by API persistence – replication – archival – description – cataloguing community B services community repository data centre CDI Gateway services EUDAT CDI capabilities

Collaborative Data Infrastructure i/face EUDAT core Data Centre PID B2FIND (catalogue) B2FIND (catalogue) EUDAT core Data Centre EUDAT core Community s/w Data Centre B2SHARE (store) B2SHARE (UI) B2FIND (UI) Community s/w “Joining” the CDI “Using” the CDI EUDAT service architecture & technology http, gftp, iRODS CKAN, SOLR Invenio iRODS EPIC/Handle OAI-PMH B2SAFE B2STAGE

Heterogeneity and Homogeneity The generic data centres* are quite homogeneous –(* actually, we’re more generic HPC centres) –Big disks; big filesystems (Lustre, GPFS); TSM, DMF The community sites… aren’t –Anything from a small research group to DKRZ Challenge is to build (distributed) foundations on the DCs that are easily usable by the CSs Reclaim the Web!

EUDAT Guiding Policy Principles 1.Data deposited with the EUDAT CDI will be preserved long-term 2.Data are best curated in their own communities. 3.Access to data in the EUDAT CDI is free at the point of use 4.For an EUDAT community repository to be designated a Trustworthy Digital Repository (TDR), it follows that EUDAT services and infrastructure must be a suitable target for “TDR outsourcing” 5.EUDAT will not assert ownership of any data it holds

Open Access Principles Two further principles on open access: –all data in the CDI should, in time, become full open access. Open access is the norm for CDI data; –embargo periods for original producers are fully supported, on condition that such data become openly accessible when the embargo period expires. These imply: –policy harmonisation –a common licensing scheme

EUDAT Licensing Comparatively easy at one level… –We will (almost certainly) recommend CC 4.0 BY/SA for open data Rather difficult at another… –Persuading all members of the network to sign up! –Maybe OK if site owns data copyright –But some sites hold third-party copyrighted data Will need to be pragmatic –Follow DANS’s maxim: “Open if possible, restricted if necessary”

Policy Harmonisation Aim for 2014: a roadmap across EUDAT –Ideally, comprehensive study with yes/no answers –Create a “heatmap” of the policy landscape across sites –Identify areas of harmony, areas needing further work Adopt a taxonomy/set of headings –From APARSEN’s Exemplar Good Governance Structures and Data Policies –From Data Seal of Approval –From Open Access principles

Top Three Challenges: #1 Policy automation Policies → requirements and constraints –“ensure at least 3 copies of this object are extant” –“ensure no copy of this object leaves the UK” Need automatic means to propagate rules deep into the infrastructure –Will need tagging, annotation of data objects In progress

Top Three Challenges: #2 Distributed authorisation Propagating authz is a special case of policy Needed especially for replicas managed in different administrative domains Need fine-grain control for sensitive data Currently no common solution

Top Three Challenges: #3 Designing for change Needs to be straightforward to join or leave the CDI Need to connect at common level (with sufficient richness to make it worthwhile) while not disrupting site-by-site operations Need connection-oriented, protocol-oriented approach In place; arguably needs revised for “CDI 2.0”

Conclusions Heterogeneity is EUDAT’s biggest challenge –And its reason for being Adopting open access principles may help –Possibilities in streamlining policy, licensing Open questions right now: –Propagation of authorisation requirements –Identification of a useful core metadata set –Streamlining the connections that new nodes need –Legal entity or not?