Mark van de Sanden Giovanni Morelli

Slides:



Advertisements
Similar presentations
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Advertisements

Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
CLARIN Infrastructure Vision (and some real needs) Daan Broeder CLARIN EU/NL Max-Planck Institute for Psycholinguistics.
…. PrePlanPrepareMigratePost Pre- Deployment PlanPrepareMigrate Post- Deployment First Mailbox.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Empowering people-centric IT Unified device management Access and information protection Desktop Virtualization Hybrid Identity.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The pan-European.
Sync and Exchange Research Data b2drop.eudat.eu This work is licensed under the Creative Commons CC-BY 4.0 licence B2DROP EUDAT’s Personal.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT The European.
Find Research Data b2find.eudat.eu B2FIND User Training How to find data objects and collections using EUDAT’s B2FIND This work is licensed.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No B 2 DROP User.
Replicate Research Data Safely eudat.eu/b2safe B2SAFE How to replicate your data using EUDAT’s B2SAFE Version 3 November 2015 This work is.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No B2SHARE How to.
Store and Share Research Data b2share.eudat.eu B2SHARE How to share and store research data using EUDAT’s B2SHARE This work is licensed under.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No B2ACCESS LSDMA.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
b2access.eudat.eu B2ACCESS The simple and secure authorisation and authentication platform of EUDAT This work is licensed under the Creative.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
CLARIN EUDAT2020 uptake plan Dieter Van Uytvanck CLARIN ERIC EUDAT User Forum, Rome.
European Life Sciences Infrastructure for Biological Information ELIXIR Cloud Roadmap Chairs: Steven Newhouse, EMBL-EBI & Mirek Ruda,
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
Get Data to Computation eudat.eu/b2stage B2STAGE How to shift large amounts of data Version 4 February 2016 This work is licensed under the.
B2access.eudat.eu B2ACCESS User Training How to register with B2ACCESS Version 1 February 2016 This work is licensed under the Creative Commons.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Services.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
PaaS services for Computing and Storage
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
Accessing the VI-SEEM infrastructure
PIDs in EUDAT Webinar, 15 Februari 2013
Towards a pan-European Collaborative Data Infrastructure
This work is licensed under the Creative Commons CC-BY 4.0 licence.
The EUDAT Services Suite
RDS / AAF / ANDS / NeCTAR / AARNET Data Lifecycle framework
Tokamak data mirror for JET and MAST Moving towards an open data repository for European nuclear fusion research.
EUDAT’s engagement with the Earth Sciences
AAI for a Collaborative Data Infrastructure
Joslynn Lee – Data Science Educator
Identity Management and Authorization
An Overview of Data-PASS Shared Catalog
Engaging with Users Daan Broeder Meertens Institute & CLARIN ERIC
Data Services at CSC ©2016 OKM ATT initiative Licensed under Creative Commons BY 4.0.
THE STEPS TO MANAGE THE GRID
EGI-Engage Engaging the EGI Community towards an Open Science Commons
Identity Management and Authorization
Research Data Archive - technology
Data Access and Re-use Carl Johan Håkansson EUDAT Service Area Manager
Solutions for federated services management EGI
EUDAT Collaborative Data Infrastructure
Workshop Data curation and the EUDAT Collaborative Data Infrastructure
DATA SPHINX & EUDAT Collaboration
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
NFFA Europe.
Pilots in AARC Arnout Terpstra (AARC2) / Paul van Dijk (AARC1)
Module 01 ETICS Overview ETICS Online Tutorials
Common Solutions to Common Problems
European Research Data Services, Expertise & Technology Solutions
Technical Capabilities
EUDAT Site and Service Registry
MMG: from proof-of-concept to production services at scale
Joining the EOSC Ecosystem
EOSC-hub Contribution to the EOSC WGs
Check-in Identity and Access Management solution that makes it easy to secure access to services and resources.
LifeWatch AARC Pilot Fernando Aguilar 13th FIM4R Workshop
Presentation transcript:

Mark van de Sanden Giovanni Morelli EUDAT Provide support to data management by an European CDI (Collaborative Data Infrastructure) Mark van de Sanden Giovanni Morelli

What kind of problems we want(try) to solve Outline What kind of problems we want(try) to solve Different management system for different communities Quality of data sets Class of users What about our solutions (B2<services>) B2DROP, B2SHARE,B2SAFE,B2STAGE,B2HANDLE,B2ACCESS,… B2<service> integration Project and Service Enabling Community / EUDAT interaction Practical use cases

If there are hundreds of Research Infrastructures, how many different data management systems can be sustained? www.eudat.eu 3

Where Does EUDAT Fit In? Scientists personal data Homeless scientists Citizen scientists Into the data model Community repositories Institute repositories

Community Support Services Where Does EUDAT Fit In? Data Curation Trust User functionalities, data capture & transfer, virtual research environments Data Generators Users Data discovery & navigation, workflow generation, annotation, interpretability Community Support Services Diagram based on High Level Expert Group (HLEG) on Scientific Data Persistent storage, identification, authenticity, workflow execution, mining Common Data Services

Who can use EUDAT service Single researcher Team Community Upload and download Upload, add metadata, share Periodic transfers, quality checks … Different strategies for different usage scenarios

EUDAT Collaborative Data Infrastructure EUDAT generic data service provider storage, workflows, processing, archive   deposit Community Repositories (thematic data centres)   access deposit  

EUDAT Collaborative Data Infrastructure Community “use” EUDAT

EUDAT Collaborative Data Infrastructure Community “join” EUDAT

B2 Service Suite B2ACCESS B2Handle

EUDAT2020 Synchronize multiple versions Who Citizens Scientists and small teams What Store and exchange data Synchronize multiple versions Ensure automatic desktop synchronization Why Ease of Use Trusted European Service Based on ownCloud, open source (GNU AGPLv3) access and manage permissions to files from any device and any location, via browser, desktop, mobile apps and WebDAV up to 20GB of storage space for research data simple to use and open to all researchers, scientists (e.g. self-registration) synchronize and exchange data with one or multiple users users decide with whom to exchange data, for how long and how EUDAT2020 Further integration with EUDAT CDI (e.g. B2SHARE) Integration with B2ACCESS to enable access by many different Identity Providers Cloud Storage Federation, collaboration with GEANT in OpenCloudMesh Assess B2DROP as workspace area to computing facilities

EUDAT2020 Who Small to Medium Teams What Store data (incl. software) and add domain meta data Share registered research data worldwide Preserve (small-scale) research data for long-term Why Register Data for Publications Make known to wider community EUDAT2020 Based on Invenio, open source (GPL v3) Supports 8 community-metadata templates Data assigned a persistent identifier and a checksum Access via a HTTP Rest API Open accessible, user self-registration Data owner defines access policy Open access license choose feature Discipline choose feature Open harvestable metadata, harvested by B2FIND Further integration with EUDAT CDI (e.g. B2DROP, B2SAFE) Integration with B2ACCESS (incl eduGAIN), focus on authorization Embargo period Editing of metadata Data versioning and annotation Extended HTTP Restful API interface Easy installable software package

Collection of official RDA documents

Service Integration Bidirectional Integration

Who Community Data Managers ‘Sophisticated’ Organisations What Provide an abstraction layer which virtualizes large-scale data resources Guard against data loss in long-term archiving and preservation Optimize access for users from different regions Bring data closer to powerful computers Why Performance Replication between trusted sites Data Preservation EUDAT2020 Support iRODS v4 Support metadata Optimize and extend policies to support data curation and provenance Further integration with B2ACCESS Support authorization on basis of community access rules Assess B2SAFE as workspace area to computing facilities

Data Policy Manager EUDAT2020 Data policies are centrally managed Policy rules are implemented and enforced by site-local rule engines Policies describe in an abstract language Community data managers must authenticate to provide trust Support policies for data replication and integrity checking Central logging for auditable data policies to monitor execution Active collaboration with the RDA Practical Policy WG EUDAT2020 Handover to operations Extend number of policies supported Focus on data curation and provenance policies Integrate with B2ACCESS

Who Users and Communities with Significant Computational Needs What Transfer large data collections from EUDAT storages to external HPC facilities for processing Copy large data sets, ingesting them onto EUDAT storage resources Why Integration/Collaboration with PRACE Simplify Data Transfer EUDAT2020 Further develop HTTP to a mature interface and extend functionality to metadata Native support PIDs within GridFTP transfers Extend EUDAT client API library to other B2 services (e.g. B2SHARE, B2FIND, PID) Further integration with B2ACCESS Extension of the B2SAFE and B2FIND services, which allow users to store, preserve and find data Providing access via GridFTP and basic HTTP data-staging script facilitates staging, ingestion and retrieval of persistent identifier (PID) information of transferred data Start development of EUDAT client API library and command line tools Integrated with EUDAT Federated AAI on basis of X.509 certificates

EUDAT2020 Get quick overviews of available data Who Anyone What Find collections of scientific data quickly and easily, irrespective of their origin, discipline or community Get quick overviews of available data Browse through collections using standardized facets Why Unique collection Ease of Searching EUDAT2020 On the EUDAT1, MDTF/B2FIND community wiki page, 15 additional community repository services are planned or in enabling phase. At the developers workshop Heinrich and Yann Le Franc talked about the the integration of the B2NOTE service with B2FIND. Based on CKAN, open source (GNU AGPL v3) Facetted search (e.g. 10 facets) and full text search, recently added timeline search Focus on community recommended metadata 13 community repositories harvested, more lined up Open accessible, no registration needed Open harvestable metadata Searchable via Web-based GUI and HTTP RESTful API Harvesting of metadata stored in B2SAFE Community customizations Annotation of datasets Further assess RDF and Linked Data Further assess scalability and performance

EUDAT M6 Review - Services and Operations Who Groups or Communities who want to make their data citable What Follows policies to register data and make it long term refer- and citable Reliability through mutual PID mirroring Provides abstraction layer between a globally unique persistent identifier and physical location of data objects Machine readable via HTTP RESTful API Why Simple integration Technology Agnostic Development plan Develop the policies for the B2HANDLE service (e.g. PID namespace mngmt) Migrate service from Handle v7 to v8 Define PID Information Types for data, metadata, collection records Integrate with Data Type Registry service Consolidate B2HANDLE API library with EUDAT API library EUDAT M6 Review - Services and Operations EUDAT 6M EC Review, 28th October 2015, Brussels

EUDAT2020 Who Anyone wanting to use the B2 Services What Complies with community ownerships and access rights, basis of trust Credential conversion approach (e.g. SAML, OpenID, X.509, Username/password) Identity provider for citizen scientists Why Use your own ID in federated environment EUDAT2020 Integration with operational and all B2 services B2SHARE B2DROP B2STAGE B2SAFE B2HANDLE, DPM, CREG , TTS, Integration with community IdP domains and portal environments Enabling access via eduGAIN social IDs enabling access via ORCID CLARIN IdPs Focus on authorization Collaborate on cross e-infrastructure access (e.g. PRACE, EGI) Extend European collaboration via AARC (e.g. Geant, Terena)

B2ACCESS

EUDAT CDI Production Environment per Project (multiple-SP) Helpdesk & Support Data Project Enabling EUDAT Security Officer Security Team, CSIRT Security per Service Provider (SP) Network, Configuration Compute Resources Service Hosting, Service on Demand Service Deployment Storage, Storage Services 14 general data centres of national or European importance which are involved and engaged in national and European projects (PRACE, EGI, WLCG, HBP) and various ESFRI collaborations. the potential of extendable - aggregated storage capacity beyond the order of 1000 PB - network connectivity of 10Gb/s per centre - large capacity and capability computing facilities (different platforms, technologies, vendors) long experiences with storage and archive technologies operational tools & policies known across different e-infrastructures The centres (sites) provide Resources and Service Components within their administrative domain according to specified levels of quality. The Operations Coordination with Security, Helpdesk and Data Project enabling Teams ensure a reliable service-oriented collaboration of all participating Service Providers. Operational Services Service and Resource Provisioning & Coordination potential of > 1000 PB aggregated; 10Gb/s per site

Operational tools & Central Services EUDAT Wiki, JIRA CROWD (AAI), SVN Service Hosting Framework rct.eudat.eu RCT (Project Coord.) to be replaced by DPCP creg.eudat.eu CDI Config DB Sites, Service Comp. cmon.eudat.eu Monitoring (cmon) to be replaced: A&R M. http://eudat.eu/support-request helpdesk.eudat.eu Helpdesk TTS

Understanding the enabling process all the actors Data pilot document (WP4) Pre sale Service Portfolio (WP2) Data Project Coordination Portal Interface Community TTS TTS TTS Service & Resource Provisioning (WP6 – T6.2) Data Project X Data Project Y Data Project Z WP6 – T6.3 Deploy Service X Enabling Team Service Y Enabling Team Service Z Enabling Team Small/Large Customization (WP5) Production Production GOCDB User Support Monitoring

Understanding the enabling Deploy actors Project Enabler TTS Data Project X Deploy WP6 – T6.3 TTS Service Integrator Service X Enabling Team Service integration into community

Service Integrator(s) Understanding the enabling Project Lifecycle and relationship with Project Enablers and Service Integrators data project/service enabling still under discussion Planned service enabling at community side (repository) only, EUDAT provider selected, but storage service not yet provided Enabling (repos) Project Enabler(s) Service Integrator(s) Enabling service enabling at community and EUDAT side service is operational, but there are still some issues: e.g initial data transfer not complete, security or quality assessment pending, community or provider did not confirmed production readiness Pre-Production Documentation service deployed and integrated across all participating project partners (community repository and EUDAT nodes, community confirmed production readiness Production User Documentation

Data pilots overview 23 data pilots selected for enabling in EUDAT2020 Scientific domain Applicant Community

Data pilots overview Total storage request 1220-4300 TB Requested EUDAT services

Questions…