Presentation on theme: "Workflows over Grid-based Web services General framework and a practical case in structural biology gLite 3.0 Data Management David García Aristegui Grid."— Presentation transcript:
Workflows over Grid-based Web services General framework and a practical case in structural biology gLite 3.0 Data Management David García Aristegui Grid Applications Developer CNB/CSIC
Before all of you go crazy... Grid Acronym Soup (GAS) –http://www.gridpp.ac.uk/gas/http://www.gridpp.ac.uk/gas/ EGEE Glosssary –http://public.eu-egee.org/faq/acronyms.htmlhttp://public.eu-egee.org/faq/acronyms.html EGEE II Glosssary –http://egee-technical.web.cern.ch/egee-technical/documents/glossary.htmhttp://egee-technical.web.cern.ch/egee-technical/documents/glossary.htm
Introduction EGEE middleware: called gLite, this middleware exploit experience and existing components from Condor, Globus, EDG, LCG, and others. gLite is a distribution that combines components from many different providers! gLite 3.0: convergence of LCG 2.7.0 and gLite 1.5.0 in spring 2006. Continuity on the production infrastructure ensured usability by applications. Data Management System (DMS): provides file manipulation for users and other Grid services. DMS enables the location, access and transfer of data.
Data Management What does “Data Management” mean? – Users and applications produce and require data – Data may be stored in Grid files – Granularity is at the “file” level (no data “structures”) – Users and applications need to handle files on the Grid Files are stored in appropriate parmanent resources called “Storage Elements” (SE) – Present almost at every site together with computing resources – Described in details in next presentations – We will treat a storage element as a “black box” where we can store data Appropriate data management utilities/services hide internal structure of SE Appropriate data management utilities/services hide details on transfer protocols
Data Management System - DMS EGEE DMS FUNCTIONALITY: –User does not need to know data location, just the logical name –Data is accessed through standard interfaces (POSIX) –Data can be replicated or transferred to serveral locations as needed –Data is shared within a VO KNOWN EGEE DMS LIMITATIONS: –Files cannot be changed unless removed or replaced –No intention of providing a global file management system –File replication sometimes doesn't affect performance –Local file system interfaces (ELFI) are still in beta stage
Data Issues and Grid Solutions - I Resource centers need to meet growing demand for storage –“Classic” Storage Elements – Storage Element capable to manage multiple disk pools Disk Pool Manager (DPM, disk) – Massive storage systems dCache (disk/tape), CASTOR (tape) Data is stored on different storage systems technologies –Common interface required to hide underlying complexity Storage Resource Manager (SRM) – storage management protocol.
Data Issues and Grid Solutions - II Data is stored at different locations –File catalogue to provide uniform view of Grid data LCG File Catalog (LFC) Applications need to access data management services –Data management API Grid File Access Layer (API) - GFAL Biomedical Applications need data security –Encrypted Data Storage (EDS) and access control lists (ACLs) Hydra
Concepts What concepts/services do we need to know to understand the EGEE Data Management? –Storage Resource Manager - SRM –Storage Element – SE –File Transfer Services - FTS –File Catalogs What APIs and application level services are available for developers? (not for this course) –Data Management APIs (GFAL) –Encryption (EDS, Hydra) –Metadata Catalog (AMGA)
Storage Resource Manager Storage Elements (SE) can use a wide variety of technologies Grid jobs need to see these SEs with a uniform interface –SRM is a protocol to manage storage resources (“classic” storage elements, DPM, dCache, Castor...) –It is NOT a file access protocol Files are accessed using different file access protocols –gridFTP (GSI + FTP) for file transfers –rfio, dcap, GFAL... for file access for applications
Storage Element –Provides storage space for grid files –SRM interface (not in the classic storage element) –Transfer protocol (gsiFTP) ~ GSI based FTP server –We have several implementations disk: classic (GridFTP server), Disk Pool Manager-DPM, dCache tape: Castor, dCache –Security: ACLs now available in DPM, next Castor and dCache –POSIX file access: Grid File Access Layer (GFAL) library Exposes only LCG-needed features Single client implementation for all service-implementation Uses the Grid Information system to discover services
File Transfer Service FTS is a low level data movement service Why is it needed? –Improves reliability for transfers –Provides asynchronous file transfer schedule transfers when resources are available –Provides control of transfer properties (channel concept) No catalogue interactions yet users have to handle SURL
File Catalog: LFC LFC Catalog => LFC - LCG File Catalogue –LCG = LHC Compute Grid –LHC = Large Hadron Collider (CERN) Provides –Mapping between LFN, GUID and SURL –Transactions, Sessions, Bulk queries –Hierarchical namespace, symbolic links –System metadata –Single string user metadata All members of a given VO have read-write permissions in their directory Commands look like UNIX with “lfc-” in front (often)
File and replicas name convention Globally Unique Identifier (GUID) –A non-human-readable unique identifier for a file, e.g. “guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6” Site URL (SURL) (or Physical/Site File Name (PFN/SFN)) –The location of the actual file on a storage system, e.g. “sfn://lxshare0209.cern.ch/data/biomed/ntuples.dat” Logical File Name (LFN) –An alias created by a user to refer to some file, e.g. “lfn:/grid/biomed/David20030203/run2/track1” Transport URL (TURL) –Temporary locator of a replica + access protocol: understood by a SE, e.g. “gsiftp://lxshare0209.cern.ch//data/biomed/ntuples.dat”
Two sets of commands lfc commands –Use LFC commands to interact with the catalogue only To create catalogue directory List files –Used by you and by lcg-utils lcg-utils –The LCG Data Management tools (usually called lcg-utils) allow users to copy files between UI, CE, WN and a SE, to register entries in the File Catalogs and replicate files between Ses.
LFC Catalog commands table Add/replace a commentlfc-setcomment Set file/directory access control listslfc-setacl Remove a file/directorylfc-rm Rename a file/directorylfc-rename Create a directorylfc-mkdir List file/directory entries in a directorylfc-ls Make a symbolic link to a file/directorylfc-ln Get file/directory access control listslfc-getacl Delete the comment associated with the file/directorylfc-delcomment Change owner and group of the LFC file-directorylfc-chown Change access mode of the LFC file/directorylfc-chmod
lcg-utils table lcg-cp Copies a Grid file to a local destination lcg-cr Copies a file to a SE and registers the file in the LRC lcg-del Deletes one file (either one replica or all replicas) lcg-rep Copies a file from SE to SE and registers it in the LRC lcg-seset file status to “Done” in a specified request lcg-aa Adds an alias in RMC for a given GUID lcg-gt Gets the TURL for a given SURL and transfer protocol lcg-la Lists the aliases for a given LFN, GUID or SURL lcg-lg Gets the GUID for a given LFN or SURL lcg-lr Lists the replicas for a given LFN, GUID or SURL lcg-ra Removes an alias in RMC for a given GUID lcg-rf Registers a SE file in the LRC (optionally in the RMC) lcg-uf Unregisters a file residing on an SE from the LRC
Bibliography “Data Services” - Simone Campana, LCG Experiment Integration and Support CERN-IT / INFN-CNAF “Data Management” - René Météry CS, Tutorial EGEE Marseille, 3-4 Oct 2006 “Data management in LCG and EGEE” - David Smith, CERN & EGEE-JRA1/SA3 Data Management Team “EGEE middleware: gLite Data Management” - EGEE Tutorial 23rd APAN Meeting, Manila, Jan 22, 2007