FP6−2004−Infrastructures−6-SSA-026409 www.eu-eela.org E-infrastructure shared between Europe and Latin America gLite Data Management System Tony Calanducci.

Slides:



Advertisements
Similar presentations
Data Management Expert Panel - WP2. WP2 Overview.
Advertisements

Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Grid Data Management Assaf Gottlieb - Israeli Grid NA3 Team EGEE is a project funded by the European Union under contract IST EGEE tutorial,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
INFSO-RI Enabling Grids for E-sciencE Data Management Ron Trompert SARA Grid Tutorial, September 2006.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
The LCG File Catalog (LFC) Jean-Philippe Baud – Sophie Lemaitre IT-GD, CERN May 2005.
E-science grid facility for Europe and Latin America Updates on Storage and Cataloguing Annamaria Muoio - INFN Tutorial for trainers 01/07/2008.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
EGEE-II INFSO-RI Enabling Grids for E-sciencE gLite Data Management System Yaodong Cheng CC-IHEP, Chinese Academy.
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America The AMGA metadata catalog with use cases.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Data Management Hands-on Claudio Cherubino.
The LCG File Catalog (LFC) Jean-Philippe Baud – Sophie Lemaitre IT-GD, CERN May 2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware Data Management in gLite.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
INFSO-RI Enabling Grids for E-sciencE AMGA Metadata Server - Metadata Services in gLite (+ ARDA DB Deployment Plans with Experiments)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks AMGA PHP API Claudio Cherubino INFN - Catania.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
INFSO-RI Enabling Grids for E-sciencE Experiences with LFC and comparison with RNS Erwin Laure Jean-Philippe.
E-science grid facility for Europe and Latin America Data Management Services E2GRIS1 Rafael Silva – UFCG (Brazil) Universidade Federal.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
SEE-GRID-SCI Storage Element Installation and Configuration Branimir Ackovic Institute of Physics Serbia The SEE-GRID-SCI.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Alexandre Duarte CERN IT-GD-OPS UFCG LSD 1st EELA Grid School.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Medical Data Manager 1 Dicom retrieval : overview of the DPM One command line to retrieve a file:
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Data Management System Giuseppe Andronico.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Data management in LCG and EGEE David Smith.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data management in EGEE.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Data Management Hands-on Juan Eduardo Murrieta.
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra gLite 1.4 Data Management System Salvatore Scifo, Riccardo Bruno Test.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra Data Management System gLite – LCG – FiReMan Salvatore Scifo INFN Catania.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Algiers, EUMED/Epikh Application Porting Tutorial, 2010/07/04.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) LFC Installation and Configuration Dong Xu IHEP,
Grid Data Management Assaf Gottlieb Tel-Aviv University assafgot tau.ac.il EGEE is a project funded by the European Union under contract IST
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Data Management Maha Metawei
INFSO-RI Enabling Grids for E-sciencE Practicals on LFC and gLite DMS Tony Calanducci Emidio Giorgio INFN Retreat between GILDA.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
Martedi 8 novembre 2005 Consorzio COMETA “Progetto PI2S2” FESR Data Management System Annamaria Muoio -- INFN Catania PI2S2 First Tutorial -- Messina,
EGEE Data Management Services
Jean-Philippe Baud, IT-GD, CERN November 2007
GFAL Grid File Access Library
gLite Basic APIs Christos Filippidis
StoRM: a SRM solution for disk based storage systems
Security and Replication of Metadata with AMGA
gLite Data Management Services
gLite Data management system overview
gLite Grid Services Salma Saber
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Introduction to Data Management in EGI
Data Management Ouafa Bentaleb CERIST, Algeria
Data services in gLite “s” gLite and LCG.
Architecture of the gLite Data Management System
gLite Data and Metadata Management
INFNGRID Workshop – Bari, Italy, October 2004
Data Management system in gLite middleware
Presentation transcript:

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Data Management System Tony Calanducci INFN Catania First EELA Grid tutorial for users and system administrators Madrid, 20-24th February 2006

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 2 Outline Grid Data Management Challenge Storage Elements, SRM and glite I/O File and Replica Catalogs (LFC and Fireman) File Transter Components LCG and gLite DMS comparison

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 3 The Grid DM Challenge Heterogeneity –Data are stored on different storage systems using different access technologies Distribution –Data are stored in different locations – in most cases there is no shared file system or common namespace –Data need to be moved between different locations –Need common interface to storage resources  Storage Resource Manager (SRM) –Need to keep track where data is stored  File and Replica Catalogs –Need scheduled, reliable file transfer  File transfer and placement services

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 4 Storage Element – save date and provide a common interface –Storage Resource Manager(SRM) Castor, dCache, DPM, … –Native Access protocolsrfio, dcap, nfs, … –Transfer protocolsgsiftp, ftp, … I/O Server – provides a POSIX-I/O interface to user gLite-I/O Catalogs – keep track where data are stored –File Catalog –Replica Catalog –File Authorization Service –Metadata Catalog File Transfer – schedules reliable file transfer –Data Scheduler (only designs exist so far) –File Transfer ServicegLite FTS (manages physical transfers) –File Placement ServicegLite FPS (FTS and catalog interaction in a transactional way) Data Management Services Overview gLite File and Replica Catalog FireMan AMGA Metadata Catalogue LCG File Catalog (LFC)

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 5 Data services in gLite File Access Patterns: –Write once, read-many –Rare append-only updates with one owner –Frequently updated at one source - replicas check/pull new version –(NOT frequent updates, many users, many sites) File naming –Mostly, see the “logical file name” (LFN) –LFN must be unique:  includes logical directory name  in a VO namespace –E.g. /gLite/myVOname.org/runs/12aug05/data1.res 3 service types for data –Storage –Catalogs –Movement

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 6 SRM in an example She is running a job which needs: Data for physics event reconstruction Simulated Data Some data analysis files She will write files remotely too They are at CERN In dCache They are at Fermilab In a disk array They are at Nikhef in a classic SE

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 7 SRM in an example dCache Own system, own protocols and parameters Castor No connection with dCache or classic SE classic SE Independent system from dCache or Castor You as a user need to know all the systems!!! SRM I talk to them on your behalf I will even allocate space for your files And I will use transfer protocols to send your files there

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 8 Storage Resource Management Data are stored on disk pool servers or Mass Storage Systems storage resource management needs to take into account –Transparent access to files (migration to/from disk pool) –File pinning –Space reservation –File status notification –Life time management SRM (Storage Resource Manager) takes care of all these details – SRM is a Grid Service that takes care of local storage interaction and provides a Grid interface to outside world In gLite, Interactions with the SRM is hidden by higher level services (glite I/O)

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 9 Grid Storage Requirements Manage local storage and interface to Mass Storage Systems like –HPSS, CASTOR, DiskeXtender (UNITREE), … Provide an SRM interface Support basic file transfer protocols –GridFTP mandatory –Others if available (https, ftp, etc) Support a native I/O access protocol –POSIX (like) I/O client library for direct access of data

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 10 gLite Storage Element

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 11 File and Replica Catalogs LCG Middleware: LFC (LCG File Catalog) gLite Middleware: FiReMan

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 12 Name conventions (LFC) Logical File Name (LFN) –An alias created by a user to refer to some item of data, e.g. “lfn:cms/ /run2/track1” Globally Unique Identifier (GUID) –A non-human-readable unique identifier for an item of data, e.g. “guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6” Site URL (SURL) (or Physical File Name (PFN) or Site FN) –The location of an actual piece of data on a storage system, e.g. “srm://pcrd24.cern.ch/flatfiles/cms/output10_1” (SRM) “sfn://lxshare0209.cern.ch/data/alice/ntuples.dat” (Classic SE) Transport URL (TURL) –Temporary locator of a replica + access protocol: understood by a SE, e.g. “rfio://lxshare0209.cern.ch//data/alice/ntuples.dat”

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 13 The LFC It keeps track of the location of copies (replicas) of Grid files LFN acts as main key in the database. It has: –Symbolic links to it (additional LFNs) –Unique Identifier (GUID) –System metadata –Information on replicas –One field of user metadata

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 14 LFC Features –Cursors for large queries –Timeouts and retries from the client –User exposed transactional API (+ auto rollback on failure) –Hierarchical namespace and namespace operations (for LFNs) –Integrated GSI Authentication + Authorization –Access Control Lists (Unix Permissions and POSIX ACLs) –Checksums –Integration with VOMS

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 15 Data Management CLIs & APIs lcg_utils: lcg-* commands + lcg_* API calls –Provide (all) the functionality needed by the LCG user –Transparent interaction with file catalogs and storage interfaces when needed –Abstraction from technology of specific implementations Grid File Access Library (GFAL): API –Adds file I/O and explicit catalog interaction functionality –Still provides the abstraction and transparency of lcg_utils edg-gridftp tools: CLI –Complete the lcg_utils with low level GridFTP operations –Functionality available as API in GFAL –May be generalized as lcg-* commands

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 16 lcg-utils commands Replica Management lcg-cpCopies a grid file to a local destination lcg-crCopies a file to a SE and registers the file in the catalog lcg-delDelete one file lcg-repReplication between SEs and registration of the replica lcg-gtGets the TURL for a given SURL and transfer protocol lcg-sdSets file status to “Done” for a given SURL in a SRM request File Catalog Interaction lcg-aaAdd an alias in LFC for a given GUID lcg-raRemove an alias in LFC for a given GUID lcg-rfRegisters in LFC a file placed in a SE lcg-ufUnregisters in LFC a file placed in a SE lcg-laLists the alias for a given SURL, GUID or LFN lcg-lgGet the GUID for a given LFN or SURL lcg-lrLists the replicas for a given GUID, SURL or LFN

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 17 LFC C API lfc_deleteclass lfc_delreplica lfc_endtrans lfc_enterclass lfc_errmsg lfc_getacl lfc_getcomment lfc_getcwd lfc_getpath lfc_lchown lfc_listclass lfc_listlinks lfc_listreplica lfc_lstat lfc_mkdir lfc_modifyclass lfc_opendir lfc_queryclass lfc_readdir lfc_readlink lfc_rename lfc_rewind lfc_rmdir lfc_selectsrvr lfc_setacl lfc_setatime lfc_setcomment lfc_seterrbuf lfc_setfsize lfc_starttrans lfc_stat lfc_symlink lfc_umask lfc_undelete lfc_unlink lfc_utime send2lfc lfc_access lfc_aborttrans lfc_addreplica lfc_apiinit lfc_chclass lfc_chdir lfc_chmod lfc_chown lfc_closedir lfc_creat lfc_delcomment lfc_delete Low level methods (many POSIX-like):

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 18 LFC commands lfc-chmodChange access mode of the LFC file/directory lfc-chownChange owner and group of the LFC file-directory lfc-delcommentDelete the comment associated with the file/directory lfc-getaclGet file/directory access control lists lfc-lnMake a symbolic link to a file/directory lfc-lsList file/directory entries in a directory lfc-mkdirCreate a directory lfc-renameRename a file/directory lfc-rmRemove a file/directory lfc-setaclSet file/directory access control lists lfc-setcommentAdd/replace a comment Summary of the LFC Catalog commands

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 19 LFC other commands Managing ownership and permissions: lfc-chmod lfc-chown Managing ACLs: lfc-getacl lfc-setacl Renaming: lfc-rename Removing: lfc-rm Remember that per user mapping can change in every session. The default is for LFNs and directories to be VO- wide readable. Consistent user mapping will be added soon. An LFN can only be removed if it has no SURLs associated. LFNs should be removed by lcg-del, rather than lfc-rm.

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 20 SRM File and Replica Catalog Files & replicas: Name Conventions (gLite) Symbolic Link in logical filename space Logical File Name (LFN) –An alias created by a user to refer to some item of data, e.g. “lfn:cms/ /run2/track1” Globally Unique Identifier (GUID) –A non-human-readable unique identifier for an item of data, e.g. “guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6” Site URL (SURL) (or Physical File Name (PFN) or Site FN) –The location of an actual piece of data on a storage system, e.g. “srm://pcrd24.cern.ch/flatfiles/cms/output10_1” (SRM) “sfn://lxshare0209.cern.ch/data/alice/ntuples.dat” (Classic SE) Transport URL (TURL) –Temporary locator of a replica + access protocol: understood by a SE, e.g. “rfio://lxshare0209.cern.ch//data/alice/ntuples.dat” Symbolic Link 1 Symbolic Link n GUID Physical File SURL n Physical File SURL TURL 1 TURL n.... LFN

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 21 File names and identifiers in gLite Globally unique identifier Site URL Transport URL: includes protocol user need only see these

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 22 Client SRM Storage The client asks the SRM for the file providing an SURL (Site URL) 2.The SRM asks the storage system to provide the file 3.The storage system notifies the availability of the file and its location 4.The SRM returns a TURL (Transfer URL), i.e. the location from where the file can be accessed 5.The client interacts with the storage using the protocol specified in the TURL 3 4 SRM Interactions

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 23 FireMan: gLite File and Replica Catalog File Catalog –Allows for operation on the logical file namespaces that it manages (ex: making directories, renaming files, creating symbolic link) –Manages LFNs, keeping internally LFN-GUID mappings Replica Catalog –Exposes operations concerning the replication aspect of the grid files (ex: listing, adding and removing replicas to a file identified by its GUID) –Gives access to the GUID-SURL mappings File Authorization Service (FAS) –Request authorization - based on the DN and the Groups from the user’s delegated credentials StorageIndex –Allows WMS interactions (file location for the RB) Metadata Catalog –File-Based Metadata Fireman = File and Replica Manager –Provides all the previous services

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 24 Not in Release 1 Fireman Catalog Interface ServiceBase FASBase MetaBaseFileCatalogReplicaCatalog FiReMan Interface Structure MetaSchema Logical File Namespace managementFileCatalog Replica locationsReplicaCatalog File-based metadataMetaBase Metadata ManagementMetaSchema Authentication and Authorization information (ACLs)FASBase Service MetadataServiceBase WMS interaction and global file locationStorageIndex StorageIndex

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 25 gLite FiReMan Catalog details Implemented on top of Oracle and MySQL Interface Structure Web Service interface (WSDL) Mostly Bulk operations Stateless interaction No transactions outside Bulk ServiceBase FASBase MetaBaseFileCatalogReplicaCatalog FiReMan StorageIndex StorageIndex: file location for broker FAS: File Access Service (ACLs) File Catalog: directory structure in LFN namespace Replica Catalog: location of replicas Meta: additional (user defined metadata)

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 26 gLite-I/O Client only sees a simple API library and a Command Line Interface –GUID or LFN can be used, i.e. open(“/grid/myFile”) GSI Delegation to gLite I/O Server Server performs all operations on User’s behalf –Resolve LFN/GUID into SURL and TURL Operations are pluggable –Catalog interactions –SRM interactions –Native I/O FiReMan RLS, RMC SRM rfio dcap gsiftp Server Client open(LFN) Catalog Modules Protocol Modules SRM API AliEn FC MSS aio LFN – GUID – SURL mappings SURL - TURL mappings

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 27 gLite I/O commands and API glite-getRetrieve a file from the Grid using LFN or GUID glite-putPut a local file into the Grid, assigning LFN glite-rmRemove a file (replica!) from the Grid using LFN or GUID Summary of the gLite I/O command line tools glite_open glite_read glite_write glite_creat glite_fstat glite_lseek glite_close glite_unlink glite_error glite_strerror glite_posix_open glite_posix_read glite_posix_write glite_posix_creat glite_posix_fstat glite_posix_lseek glite_posix_close glite_posix_unlink glite_filehandle Summary of the gLite I/O API calls (C only)

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 28 File Open rfio

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 29 I/O server interactions Provided by site Provided by VO

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 30 Data Movement (I) Many Grid applications will distribute a LOT of data across the Grid sites Need efficient and easy way to manage File movement service gLite File Transfer Service FTS –Manage the network and the storage at both ends –Define the concept of a CHANNEL: a link between two SEs –Channels can be managed by the channel administrators, i.e. the people responsible for the network link and storage systems –These are potentially different people for different channels –Optimize channel bandwidth usage – lots of parameters that can be tuned by the administrator –VOs using the channel can apply their own internal policies for queue ordering (i.e. professor’s transfer jobs are more important than student’s) gLite File Placement Service –It IS an FTS with the additional catalog lookup and registration steps, i.e. LFNs and GUIDs can be used to perform replication. Could’ve been called File Replication Service. (replica = managed/catalogued copy)

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 31 Data Movement (II) File movement is asynchronous – submit a job –Held in file transfer queue Data scheduler –Single service per VO – can be distributed –VO can apply policies (priorities, preferred sites, recovery modes..) Client interfaces: –Browser –APIs –Web service “File transfer” –Uses SURL “File placement” –Uses LFN or GUID, accesses Catalogues to resolve them

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 32 Data movement (II) File movement is asynchronous – submit a job –Held in file transfer queue FPS fetches job transfer requests, contact File Catalogue obtaining source / destination SURLs Task execution is demanded to FTS User can monitor job status through jobID FTS maintains state of job transfers When job is done, FPS updates file entry in the catalogue adding the new replica

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 33 Baseline: GridFTP Data transfer and access protocol for secure and efficient data movement Standardized in the Global Grid Forum extends the standard FTP protocol –Public-key-based Grid Security Infrastructure (GSI) or Kerberos support (both accessible via GSS-API -Third-party control of data transfer -Parallel data transfer -Striped data transfer Partial file transfer -Automatic negotiation of TCP buffer/window sizes -Support for reliable and restartable data transfer -Integrated instrumentation, for monitoring ongoing transfer performance

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 34 Reliable File Transfer GridFTP is the basis of most transfer systems Retry functionality is limited –Only retries in case of network problems; no possibility to recover from GridFTP a server crash GridFTP handles one transfer at a time –No possibility to do bulk optimization –No possibility to schedule parallel transfers Need a layer on top of GridFTP that provides reliable scheduled file transfer –FTS/FPS –Globus RFT (layer on top of single gridftp server) –Condor Stork

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 35 FTS vs FPS File Transfer Service (FTS) –Acts only on SRM SURLs or gsiftp URLs – submit(source-SURL, destination-SURL) File Placement Service (FPS) –A plug-in into the File Transfer that allows to act on logical file names (LFNs) –Interacts with replica catalogs (similar to gLite-I/O) –Registers replicas in the catalog – submit(transferJobs) (transferJob = sourceLFN, destinationSE) Job DB FTS WebService FPS plugin Catalog

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 36 FTS vs FPS (II) Using the File Transfer Service (FTS) –Initiate and monitor transfer –Plugin takes care of catalog interactions Using the File Placement Service (FPS) –Lookup source SURL in replica catalog –Initiate and monitor transfer –After successful transfer register new replica in the catalog FTS and FPS offer the same interface –Difference only in input parameters to the submit command  SURLs vs. LFNs –Different configuration  FPS requires catalog endpoint

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 37 Data Movement Stack

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 38 Differences to LCG (II) Storage Element –gLite defines the SE to have 3 interfaces:  Storage Resource Management (SRM) interface  Gridftp interface  Native I/O interface (rfio, dcap, nfs,..) –LCG only requires the gridftp interface (“classic SE”) gLite: SRM is mandatory for each SE POSIX-like I/O: GFAL: –client-side interaction with the SRM, storage and catalogs –user certificate is used –no atomicity guarantee gLite – I/O: –provides a server to process SRM, native I/O and catalog interactions –client delegates user credential to glite I/O server –glite I/O owns files on SE

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 39 Differences to LCG (III) File Transfer Management: LCG provides command-line utilities through lcg-util to move data. All the operations are performed on the client. –Blocking operation – client has to wait until the copy/replication is done –Scaling and Network resource management issue – if every job issues wide-area file movement operations from the worker nodes in a cluster, this will easily clog up the network gLite provides services for asynchronous and bulk data movement –File Transfer –File Placement (transfer including catalog registration)

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 40 DM Interaction Overview File and Replica Catalog StorageIndex Fireman Database WMS Storage Element SRM Storage gLite I/OgridFTP File Transfer and Placement Service FTS FPS Transfer Agent Database VOMS MyProxy Get credential Store credential File I/O File namespace and Metadata mgmt File replication Proxy renewalReplica Location WSDL API

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 41 Grid Metadata Services Metadata services on the Grid comes in 2 flavours: –File metadata –Simple, generalized rel. DB services: Example from EGEE-BioMed community Files LFNProduction Images GUIDDate Patient IDDoctor NameHospital Patient

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 42 A R D A What is AMGA? AMGA is the Metadata Catalogue for gLite: AMGA started out as ARDA's tool to investigate metadata access on the GRID AMGA is officially released in gLite release 1.5 AMGA works in 2 modes: –Side-by-Side a File Catalogue (LFC): File Metadata –Standalone: General relational data on Grid AMGA has 2 front ends: –SOAP with PTF standardised interface –Text-based TCP streaming protocol (proprietary, documented) AMGA has ideas from many people: UK GridPP Metadata Group, GAG (HEP), gLite DM-team, PTF, LHCb

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators A Common Interface AMGA implements a common interface designed in close collaboration of gLite and ARDA teams (P. Kunszt, R. Rocha, N. Santos, B. Koblitz) Again: many ideas from UK GridPP Metadata group, LHCb (Bookkeeping, GANGA), GAG, PTF... Design Ideas: –Versatility: Usable for HEP as well as Biomed (security) –Modular: Interface for Entry manipulation, schemes, security  Possible Add-on to File Catalogue –Allows stateless & statefull implementations –Few requirements on back end, can be SQL-DB, XML... Description of WSDL at A R D A

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 44 DB Access on the Grid Traditional DB access doesn't work on Grid: “Traditional” Way: ODBC, JDBC,... SQL via ODBC, JDBC proprietary Protocols Server SQL-DB Client Application API Server SQL-DB DB-Service SQL Client Application API SOAP XML-RPC Text “Service”: LFC, AMI, RefDB,... +Lightweight Client +Security: GSI, x509 − Performance − Implementation: State +Performance +Simple Implementation − Security, Monitoring − Authentication, resource management??

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 45 Access Control on the Grid Access control to resources on the Grid is done via a Virtual Organization Management System (VOMS): VOMS Authentica te with X509 Cert VOMS-Cert with Group & Role information VOMS- Cert Resource management AM GA OracleOracle Oracle VO MS

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators Security Concepts Security very important for BioMed, not for HEP Security ↔ Speed Standalone catalogue has: –ACLs for dirs and Unix permissions dirs/entries –Built-in group-management as in AFS AMGA + LFC back end: –Posix ACLs + Unix permissions for dirs/entries (ACLs currently not checked: slow!) –Users/groups via VOMS Currently no security on attribute basis –AMGA allows to create views: Safer, faster, similar to RDBMS Security tested by GILDA team for standalone catalogue, liked built-in group management & ACLs, but we need feedback from BioMed!

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators Basic Concepts Entry –Has key (unique string) and attributes Attribute –Has name (string), type (depends on backend, support for basic types) –Belongs to schema –An entry in a schema has a value for each attribute Schema (in AMGA: directory) –Has name and list of attributes –In AMGA: Every entry belongs to one schema, schemas are hierarchical: /collaboration1/jobs Query –SELECT... WHERE... clause in SQL-like query language

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators Example mdclient -p8822 lxb0709 Connected to lxb0709:8822 ARDA Metadata Server Query> dir / >> >grid< >> >collection< Query> dir /grid/arda >> >lfn-0.dat< [... rest of LFC entries] Query> addattr /grid/arda i int t text Query> listattr /grid/arda >> >i< >> >int< >> >t< >> >text< Query> addentries /grid/arda/lfn-0.dat /grid/arda/lfn- 1.dat Query> listentries /grid//arda >> >lfn-0.dat< >> >lfn-1.dat< Query> addentry /grid/arda/lfn-2.dat i 2 t 'A test' Query> listentries /grid/arda >> >lfn-0.dat< >> >lfn-1.dat< >> >lfn-2.dat< Query> addattr /grid/arda f float Query> find /grid/arda/* 'i=2' >> >lfn-2.dat< Example command line session:

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators AMGA Implementation AMGA Implementation: –SOAP and Text frontends –Supports single calls, sessions & connections –SSL security with grid certs –PostgreSQL, Oracle, MySQL, SQLite backends –Works alongside LFC –C++, Java, Python clients See & download at project-arda-dev/metadata/ Client Application C++-API Security wrapper GS I SSL Application XML SQL Server PostgreSQ L File Server PostgreSQ L Firewall JavaAPI TEXT Server ODBC SOAP Security wrapper GS I SSL MD-Server Command Asynchr. Buffer

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 50 AMGA in Use AMGA in preproduction within several projects: LHCb and ATLAS: GANGA LHCb Logging and Bookkeeping EGEE BioMed applications –Highly secure access to medical images metadata Generic applications: –Metadata for EGEE-GILDA Movie-On-Demand application (gMOD) –UNOSAT project: Used side-by side with LFC catalogue for file- metadata of satellite images

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 51 References gLite homepage – DM subsystem documentation – FiReMan catalog user guide – gLite-I/O user guide – FTS/FPS user guide – v1.0.pdfhttps://edms.cern.ch/file/591792/1/EGEE-TECH Transfer-CLI- v1.0.pdf AMGA documentation –

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America First EELA Grid tutorial for users and system administrators 52 Questions…