Introduction to Data Management in EGI

Slides:



Advertisements
Similar presentations
Data Management Expert Panel - WP2. WP2 Overview.
Advertisements

High Performance Computing Course Notes Grid Computing.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
IMDGs An essential part of your architecture. About me
Data Management The GSM-WG Perspective. Background SRM is the Storage Resource Manager A Control protocol for Mass Storage Systems Standard protocol:
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1 NETE4631 Working with Cloud-based Storage Lecture Notes #11.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
VMware vSphere Configuration and Management v6
WebFTS File Transfer Web Interface for FTS3 Andrea Manzi On behalf of the FTS team Workshop on Cloud Services for File Synchronisation and Sharing.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
Padova, 5 October StoRM Service view Riccardo Zappi INFN-CNAF Bologna.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
European Grid Initiative Data Services and Solutions Part 2: Data in the cloud Enol Fernández Data Services.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
HNC COMPUTING - Network Concepts 1 Network Concepts Network Concepts Network Operating Systems Network Operating Systems.
GRNET Cloud Services and Collaborations Kostas Koumantaros {kkoum at grnet.gr}
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Andrea Manzi CERN EGI Conference on Challenges and Solutions for Big Data Processing on cloud 24/09/2014 Storage Management Overview 1 24/09/2014.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
St. Petersburg, 2016 Openstack Disk Storage vs Amazon Disk Storage Computing Clusters, Grids and Cloud Erasmus Mundus Master Program in PERCCOM Author:
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Fault – Tolerant Distributed Multimedia Streaming Web Application By Nirvan Sagar – Srishti Ganjoo – Syed Shahbaaz Safir
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
The EGI Federated Cloud
EGEE Data Management Services
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
CLOUD ARCHITECTURE Many organizations and researchers have defined the architecture for cloud computing. Basically the whole system can be divided into.
CASTOR: possible evolution into the LHC era
Course: Cluster, grid and cloud computing systems Course author: Prof
Jean-Philippe Baud, IT-GD, CERN November 2007
Grid and Cloud Computing
Computing Clusters, Grids and Clouds Globus data service
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Introduction to Distributed Platforms
StoRM: a SRM solution for disk based storage systems
Vincenzo Spinoso EGI.eu/INFN
Unified Data Access and MGMT. in Distributed hybrid Cloud
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
StoRM Architecture and Daemons
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
EGI UMD Storage Software Repository (Mostly former EMI Software)
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
University of Technology
AWS COURSE DEMO BY PROFESSIONAL-GURU. Amazon History Ladder & Offering.
Data Management cluster summary
An Introduction to Computer Networking
AWS Cloud Computing Masaki.
From Prototype to Production Grid
INFNGRID Workshop – Bari, Italy, October 2004
Data Management Components for a Research Data Archive
Presentation transcript:

Introduction to Data Management in EGI Vincenzo Spinoso vincenzo.spinoso@egi.eu EGI.eu/INFN

Outline Categorisation of data services in EGI Status and future plans

Components Data management is performed by interoperable components Different components address different needs Storage management at site level Transfer between sites Security Catalogue, metadata

How data are managed at site level? Storage endpoints How data are managed at site level?

Storage endpoints A unique namespace is provided to the client Authentication and encryption guarantee confidentiality and integrity Several protocols are supported for file access and transfer Distribute data across several disk servers guarantees scalability at site level If tapes are provided, access to tape is transparent

Storage endpoints DPM Lustre or GPFS StoRM

What about interoperability, access, transfers?

Access, transfers DPM StoRM Abstraction layer SRM GridFTP WebDAV NFS/pNFS «Storage element» Applications and users can interact with the endpoints using different protocols SRM offers storage management disk/tape transparent management interface between different transfer protocols standard interface GridFTP offers advanced data transfer Parallel streams Fault tolerance Security (authorization, encryption) Optimization

Access, transfers DPM StoRM Abstraction layer SRM GridFTP WebDAV NFS/pNFS «Storage element» Applications and users can interact with the endpoints using different protocols WebDAV offers a «web-based network file system» Widely supported by many OSes Standard (IETF) NFS4.1 provides «local access» (fast, POSIX)

Access, transfers DPM Abstraction layer SRM GridFTP WebDAV NFS/pNFS

Data transfer scheduling Can transfers be scheduled?

Data transfer scheduling schedule continuous sustained data transfer across multiple endpoints prioritize inter-VO and intra-VO file transfers Many different clients available towards several protocols (SRM, GridFTP, webdav… ) Useful in the VO management context to control data transfers

Catalogue Where are my files? lfn:grid/20150407/store/data/run1312

Catalogue LFC hierarchical view of files to users, with a UNIX-like client interface Logical File Name (LFN) to Storage URL (SURL) mappings authorization on namespace

EGI «whole picture» Really complex infrastructure based on elementary «bricks» each VO chooses its «recipe» of components mature and stable integration in a unified release controls stability of the «off-line» machinery operations control stability of the «on-line» machinery

Globus Online provides robust and easy to use file transfer capabilities Web interface Transfer management Performance monitoring Retries after failures, autorecover when possible It’s a service, hosted at www.globusonline.eu (US) But the files that the service moves among EGI sites DO NOT LEAVE Europe GridFTP «3rd party transfer» is used Files copied directly between the EGI endpoints

iRODS Provides high level abstraction layer on top of storage resources Users focus on their data, not on where they are on the data grid Provides native metadata catalogue Multiple authentication plugins (password, PAM, GSI… ) Multiple access protocols (POSIX, S3, RADOS… ) Rule-oriented approach: «policies» can be easily implemented as data management tasks Ongoing integration in the EGI infrastructure

FedCloud IaaS Capabilities Computing VM Management VM Marketplace Storage Block Storage Object Storage

Block Storage Persistent block level storage to use with VMs Use as any other block device from VMs Snapshotable Simple usage Consistent and low-latency performance SSDs (in some sites) High Performance From GB to TB Create and attach to VMs on demand Scale to your needs

Object Storage API Access Scalable Sharing Data storage infrastructure for storing and retrieving data from anywhere at any time Simple REST APIs for managing and accessing data API Access Store as much data as needed. Get accounted only for the space used. Scalable Define ACLs on each object, share publicly your data Sharing

Block Storage vs Object Storage Access only from within a VM only at the same site the VM is located from any device connected to the internet. Sharing not possible possible (data can be kept private or public) Accounting for the entire volume, regardless how much of it is actually used only for the data stored Integration easy with any application capable to write/read file from a local disk requires a client to be integrated within the application

Use Cases Block Storage Object Storage Application hosting Data Processing Database Large Data File Storage & Backup Static Content Media Serving & Sharing Big Data

in order to integrated a product in UMD please follow instructions onhttps://wiki.egi.eu/wiki/EGI_Software_Component_Delivery Questions?