EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org www.glite.org gLite Data Management System Yaodong Cheng CC-IHEP, Chinese Academy.

Slides:



Advertisements
Similar presentations
© 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area.
Advertisements

Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Workflows over Grid-based Web services General framework and a practical case in structural biology gLite 3.0 Data Management David García Aristegui Grid.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Grid Data Management Assaf Gottlieb - Israeli Grid NA3 Team EGEE is a project funded by the European Union under contract IST EGEE tutorial,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
E-science grid facility for Europe and Latin America Updates on Storage and Cataloguing Annamaria Muoio - INFN Tutorial for trainers 01/07/2008.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data Grid Services/SRB/SRM & Practical Hai-Ning Wu Academia Sinica Grid Computing.
The LCG File Catalog (LFC) Jean-Philippe Baud – Sophie Lemaitre IT-GD, CERN May 2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware Data Management in gLite.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Nov. 18, EGEE and gLite are registered trademarks gLite Middleware Usage Dusan.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
INFSO-RI Enabling Grids for E-sciencE Experiences with LFC and comparison with RNS Erwin Laure Jean-Philippe.
E-science grid facility for Europe and Latin America Data Management Services E2GRIS1 Rafael Silva – UFCG (Brazil) Universidade Federal.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
EGEE is a project funded by the European Union under contract IST Grid Data Management Roberto Barbera Univ. Of Catania and INFN
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
SEE-GRID-SCI Storage Element Installation and Configuration Branimir Ackovic Institute of Physics Serbia The SEE-GRID-SCI.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Medical Data Manager 1 Dicom retrieval : overview of the DPM One command line to retrieve a file:
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Data Management System Giuseppe Andronico.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Data management in LCG and EGEE David Smith.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite Data Management Components Presenter.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data management in EGEE.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Data Management Hands-on Juan Eduardo Murrieta.
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra gLite 1.4 Data Management System Salvatore Scifo, Riccardo Bruno Test.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra Data Management System gLite – LCG – FiReMan Salvatore Scifo INFN Catania.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Algiers, EUMED/Epikh Application Porting Tutorial, 2010/07/04.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) LFC Installation and Configuration Dong Xu IHEP,
Enabling Grids for E-sciencE EGEE-II INFSO-RI Status of SRB/SRM interface development Fu-Ming Tsai Academia Sinica Grid Computing.
Introduction to Storage Element Hsin-Wei Wu Academia Sinica Grid Computing Center, Taiwan.
Grid Data Management Assaf Gottlieb Tel-Aviv University assafgot tau.ac.il EGEE is a project funded by the European Union under contract IST
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Data Management Maha Metawei
User Domain Storage Elements SURL  TURL LFC Domain (LCG File Catalogue) SA1 – Data Grid Interoperation Enabling Grids for E-sciencE EGEE-III INFSO-RI
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
Martedi 8 novembre 2005 Consorzio COMETA “Progetto PI2S2” FESR Data Management System Annamaria Muoio -- INFN Catania PI2S2 First Tutorial -- Messina,
EGEE Data Management Services
Jean-Philippe Baud, IT-GD, CERN November 2007
gLite Basic APIs Christos Filippidis
Java API del Logical File Catalog (LFC)
gLite Data management system overview
gLite Grid Services Salma Saber
Introduction to Data Management in EGI
Introduction to reading and writing files in Grid
Hands-On Session: Data Management
Data Management Ouafa Bentaleb CERIST, Algeria
Data services in gLite “s” gLite and LCG.
Architecture of the gLite Data Management System
gLite Data and Metadata Management
INFNGRID Workshop – Bari, Italy, October 2004
Data Management system in gLite middleware
Presentation transcript:

EGEE-II INFSO-RI Enabling Grids for E-sciencE gLite Data Management System Yaodong Cheng CC-IHEP, Chinese Academy of Sciences The 6th Joint Training of OMII-Europe&CNGrid Hong kong, January, 2008

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 Outline Grid Data Management Challenge Storage Elements and SRM File Catalogs and DM tools File Transfer Service

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 The Grid DM Challenge Heterogeneity ―Data are stored on different storage systems using different access technologies Distribution –Data are stored in different locations – in most cases there is no shared file system or common namespace Data need to be moved between different locations ―Need common interface to storage resources –Storage Resource Manager (SRM) ―Need to keep track where data is stored –File and Replica Catalogs ―Need scheduled, reliable file transfer –File transfer service

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 Introduction Assumptions: –Users and programs produce and require data –the lowest granularity of the data is on the file level (we deal with files rather than data objects or tables) –Data = files Files: –Mostly, write once, read many –Located in Storage Elements (SEs) –Several replicas of one file in different sites –Accessible by Grid users and applications from “anywhere” –Locatable by the WMS (data requirements in JDL) Also… –WMS can send (small amounts of) data to/from jobs: Input and Output Sandbox –Files may be copied from/to local filesystems (WNs, UIs) to the Grid (SEs)

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 gLite Grid Storage Requirements The Storage Element is the service which allow a user or an application to store data for future retrieval. Manage local storage (disks) and interface to Mass Storage Systems(tapes) like: –HPSS, CASTOR, dCache … Be able to manage different storage systems uniformly and transparently for the user (providing an SRM interface). Support basic file transfer protocols: –GridFTP mandatory –Others if available (https, ftp, etc) Support a native I/O (remote file) access protocol –POSIX (like) I/O client library for direct access of data (GFAL)

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 Basic DM issues Where to store? –SE: Storage Element  CASTOR, DPM, dCache … Where to find? –File Catalogue  LFC: LCG File Catalogue How to access? –Uniform storage manager interface: SRM –Basic Transfer protocol: GridFTP, … –Advanced Transfer service: FTS –Posix-like IO: GFAL

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 SRM in an example

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 SRM in an example

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 Storage Resource Management Data are stored on disk pool servers or Mass Storage Systems. Storage resource management needs to take into account: –Transparent access to files (migration to/from disk pool) –File pinning –Space reservation –File status notification –Life time management The SRM (Storage Resource Manager) takes care of all these details. –The SRM is a single interface that takes care of local storage interaction and provides a Grid interface to the outside world. In gLite, interactions with the SRM is hidden by higher level services (DM tools and APIs).

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 gLite SE types gLite 3.0 data access protocols: –File Transfer: GSIFTP (GridFTP) –File I/O (Remote File access): gsidcap insecure RFIO secured RFIO (gsirfio) Classic SE: –GridFTP server –Insecure RFIO daemon (rfiod) – only LAN limited file access –Single disk or disk array –No quota management –Does not support the SRM interface

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 gLite SE types (II) Mass Storage Systems (eg. Castor) –Files migrated between front-end disk and back-end tape storage hierarchies –GridFTP server –Insecure RFIO (Castor) –Provide a SRM interface with all the benefits Disk pool managers (eg. dCache and gLite DPM) –manage distributed storage servers in a centralized way –Physical disks or arrays are combined into a common (virtual) file system –Disks can be dynamically added to the pool –GridFTP server –Secure remote access protocols (gsidcap for dCache, gsirfio for DPM) –SRM interface

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 gLite Storage Element

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 File Naming conventions Logical File Name (LFN) –An alias created by a user to refer to some item of data, e.g. “lfn:cms/ /run2/track1” Globally Unique Identifier (GUID) –A non-human-readable unique identifier for an item of data, e.g. “guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6” Site URL (SURL) (or Physical File Name (PFN) or Site FN) –The location of an actual piece of data on a storage system, e.g. “srm://pcrd24.cern.ch/flatfiles/cms/output10_1” (SRM) “sfn://lxshare0209.cern.ch/data/alice/ntuples.dat” (Classic SE) Transport URL (TURL) –Temporary locator of a replica + access protocol: understood by a SE, e.g. “rfio://lxshare0209.cern.ch//data/alice/ntuples.dat”

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 Name conventions Users primarily access and manage files through “logical filenames” Mapping by the “LFC” catalogue server Defined by the userLFC Namespace LFC has a directory tree structure /grid/ /

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 Two sets of commands LFC = LCG File Catalogue  LCG = LHC Computing Grid  LHC = Large Hadron Collider –Use LFC commands to interact with the catalogue only  To create catalogue directory  List files –Used by you and by lcg-utils lcg-utils –Couples catalogue operations with file management  Keeps SEs and catalogue in step! –copy files to/from/between SEs –Replicated

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 What is a file catalog

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 LFC (LCG File Catalog) It keeps track of the location of copies (replicas) of Grid files LFN acts as main key in the database. It has: –Symbolic links to it –Unique Identifier (GUID) –System metadata –Information on replicas –One field of user metadata

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 LFC Features 1.Cursors for large queries 2.Timeouts and retries from the client 3.User exposed transactional API (+ auto rollback on failure) 4.Hierarchical namespace and namespace operations (for LFNs) 5.Integrated GSI Authentication + Authorization 6.Access Control Lists (Unix Permissions and POSIX ACLs) 7.Checksums 8.Integration with VOMS (VirtualID and VirtualGID)

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 LFC basics Defined by the userLFC Namespace LFC has a directory tree structure /grid/ / All members of a given VO have read-write permissions in their directory Commands look like UNIX with “lfc-” in front (often) We will use /grid/gilda/training/hongkong/…

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 LFC Catalog commands lfc-chmod Change access mode of the LFC file/directory lfc-chown Change owner and group of the LFC file-directory lfc-delcomment Delete the comment associated with the file/directory lfc-getacl Get file/directory access control lists lfc-ln Make a symbolic link to a file/directory lfc-ls List file/directory entries in a directory lfc-mkdir Create a directory lfc-rename Rename a file/directory lfc-rm Remove a file/directory lfc-setacl Set file/directory access control lists lfc-setcomment Add/replace a comment

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 lcg-utils DM tools High level interface (CL tools and APIs) to –Upload/download files to/from the Grid (UI,CE and WN SEs) –Replicate data between SEs and locate the best replica available –Interact with the file catalog Definition: A file is considered to be a Grid File if it is both physically present in a SE and registered in the File Catalog lcg-utils ensure the consistency between files in the Storage Elements and entries in the File Catalog

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 lcg-utils commands lcg-cp Copies a Grid file to a local destination lcg-cr Copies a file to a SE and registers the file in the LRC lcg-del Deletes one file (either one replica or all replicas) lcg-lg Gets the guid for a given lfn or surl lcg-rep Copies a file from SE to SE and registers it in the LRC lcg-aa Adds an alias in RMC for a given guid lcg-la Lists the aliases for a given LFN, GUID or SURL lcg-gt Gets the turl for a given surl and transfer protocol lcg-lr Lists the replicas for a given lfn, guid or surl lcg-ra Removes an alias in RMC for a given guid lcg-rf Registers a SE file in the LRC (optionally in the RMC) lcg-uf Un-registers a file residing on an SE from the LRC

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 Data movement introduction Grids are naturally distributed systems The means that data also needs to be distributed –First generation data distribution mainly concentrated on copy protocols in a grid environment: –gridftp –http + mod_gridsite But copies controlled by clients have problems…

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 Direct Client Controlled Data Movement Although transport protocol may be robust, state is held inside client – inconvenient and fragile. Client only knows about local state, no sense of global knowledge about data transfers between storage elements. –Storage elements overwhelmed with replication requests –Multiple replications of the same data can happen simultaneously –Site has little control over balance of network resources - DOS

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 Transfer Service Clear need for a service for data transfer –Client connects to service to submit request –Service maintains state about transfer –Client can periodically reconnect to check status or cancel request –Service can have knowledge of global state, not just a single request –Load balancing –Scheduling

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 File Transfer Service FTS is a low level data movement service Why is it needed? –Improves reliability for transfers –Provides asynchronous file transfer  schedule transfers when resources are available –Provides control of transfer properties (channel concept) No catalogue interactions yet   users have to handle SURL

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 Transfer Service Architecture –Clients submit jobs via SOAP over https. –Jobs are lists of URLs in srm:// format. Some transfer parameters can be specified (streams, buffer sizes). –Clients cannot subscribe for status changes, but can poll. –C command line clients. C, Java and Perl APIs available. –Backend databases supported: MySQL and Oracle. –Web service runs in Tomcat5 container, agents runs as normal daemons.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 gLite FTS: Channels –FTS Service has a concept of channels –A channel is a unidirectional connection between two sites –Transfer requests between these two sites are assigned to that channel –Channels usually correspond to a dedicated network pipe (e.g., OPN) associated with production –But channels can also take wildcards: –* to MY_SITE : All incoming –MY SITE to * : All outgoing –* to * : Catch all –Channels control certain transfer properties: transfer concurrency, gridftp streams. –Channels can be controlled independently: started, stopped, drained.

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 gLite FTS: Agents VO Agents Any job submitted to FTS is first handled by the VO agent VO agent authorises job and changes its state to “Pending” VO agents can perform other tasks – naturally these can be VO specific: –Scheduling –File catalog interaction Channel Agents –Transfers on channel are managed by the channel agent –Channel agents can perform inter-VO scheduling

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 FTS summary Efficient and easy way to manage File movement service gLite File Transfer Service FTS –File movement is asynchronous – submit a job  Held in file transfer queue –Task execution is demanded to FTS –User can monitor job status through jobID –Maintains state of job transfers –Manage the network and the storage at both ends –Define the concept of a CHANNEL: a link between two SEs –Channels can be managed by the channel administrators, i.e. the people responsible for the network link and storage systems –These are potentially different people for different channels –Optimize channel bandwidth usage – lots of parameters that can be tuned by the administrator –VOs using the channel can apply their own internal policies for queue ordering (i.e. professor’s transfer jobs are more important than student’s)

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 Data Management Services Summary Storage Element – save data and provide a common interface –Storage Resource Manager (SRM) Castor, dCache, DPM, … –Native Access protocols rfio, dcap, nfs, … –Transfer protocols gsiftp, ftp, … Catalogs – keep track where data are stored –File Catalog –Replica Catalog –Metadata Catalog Data Movement – schedules reliable file transfer –File Transfer ServicegLite FTS (manages physical transfers) AMGA Metadata Catalogue LCG File Catalog (LFC)

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33 References gLite documentation homepage – DM subsystem documentation – LFC and DPM documentation – FTS user guide – v1.0.pdf

Enabling Grids for E-sciencE EGEE-II INFSO-RI The 6th Joint Training of OMII-Europe&CNGrid, Hong kong, January, /33