Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Data Management Ron Trompert SARA Grid Tutorial, 18-19 September 2006.

Similar presentations


Presentation on theme: "INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Data Management Ron Trompert SARA Grid Tutorial, 18-19 September 2006."— Presentation transcript:

1 INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Data Management Ron Trompert SARA Grid Tutorial, 18-19 September 2006

2 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 2 Outline Storage Infrastructures SRM Storage Elements in gLite Low Level Data Management LCG File Catalog (LFC) Datamanagement CLIs and APIs Examples FTS

3 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 3 Storage Infrastructures Disk-only Hierarchical storage management (HSM) –policy-based management of file backup and archiving in a way that uses storage devices economically and without the user needing to be aware of when files are being retrieved from or stored on backup storage media.policybackuparchiving –The hierarchy represents different types of storage media, such as disks systems, optical storage, or tape, each type representing a different level of cost and speed of retrieval when access is needed. For example, as a file ages in an archive, it can be automatically moved to a slower but less expensive form of storage. –HSM Software: TSM, DMF, CASTOR, Enstore, HPSS,…

4 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 4 Storage Infrastructures HSM example at SARA

5 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 5 SRM SRM standard –SRM implementations provide uniform access to heterogeneous storage resources on the Grid Storage Resource Managers –SRM is a control protocol for:  Space reservation  File management Pinning Lifetime management  Replication  Protocol negotiation

6 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 6 SRM SRM implementation –SRM I/F is implemented as a web service –Implementations:  dCache (disk/HSM)  DPM (disk)  CASTOR (HSM)  SRB (disk/HSM)  …. SRM Examples –srmRm –srmLs –srmPrepareToPut –srmBringOnline –srmCopy –srmGetTransferProtocols –….

7 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 7 Storage Elements in gLite Classic SE –No SRM –Will become deprecated in the autumn of this year –Transfer protocols: gridftp –Storage type: disk DPM –SRM –Transfer protocols: gridftp, secure rfio –Storage type: disk dCache –SRM –Transfer protocols: gridftp, gsidcap –Storage type: disk, HSM

8 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 8 Low Level Data Management GridFTP (all SEs) –globus-url-copy file:///home/ron/file \ gsiftp://srm.grid.sara.nl/pnfs/grid.sara.nl/data/dteam/file –Third party transfer  globus-url-copy gsiftp://hostA/pathA gsiftp://hostB/pathB –Also edg-gridftp-ls, edg-gridftp-rm, edg-gridftp-mkdir etc. –Uberftp  Interactive gridftp client  ftp commands  Gsi authentication

9 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 9 Low Level Data Management Gsidcap (dCache SEs) –dccp -p 20000:25000 /tmp/file \ gsidcap://srm.grid.sara.nl:22128/pnfs/grid.sara.nl/data/dteam/file –20000:25000 is derived from GLOBUS_TCP_PORT_RANGE environment variable Secure rfio –rfcp /path/myfile \ t2se01.physics.ox.ac.uk:/dpm/physics.ox.ac.uk/home/dteam/file Srmcp ( ! Classic SEs ) –Srmcp file:////tmp/file \ srm://srm.grid.sara.nl:8443//pnfs/grid.sara.nl/data/dteam/filefile:////tmp/file

10 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 10 Information system LDAP-based –Ldap servers running on service nodes (GRIS/BDII) –Ldap servers collecting the information for a site (site BDII) –Ldap servers collecting the information for all sites (BDII) Need to set environment variable LCG_GFAL_INFOSYS –Needs to be set to a BDII lcg-infosites –Example: finding an SE: > lcg-infosites --vo tutor se Avail Space(Kb) Used Space(Kb) Type SEs ---------------------------------------------------------- 2146321901097784n.a tbn15.nikhef.nl 6268800001163120000n.a tbn18.nikhef.nl 488106596368854044n.a mu2.matrix.sara.nl

11 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 11 Information system lcg-info –For more advanced searches: For example, finding out where to put your files > lcg-info --list-se --query 'SE=mu2.matrix.sara.nl’ --attrs Path - SE: mu2.matrix.sara.nl - Path /flatfiles/SE00/tutor ldapsearch –For the real troopers among us

12 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 12 LFC LFC stands for LCG File Catalog –LCG stands for LHC Computing Grid –LHC stands for Large Hadron Collider User and programs produce and require data –Resource Broker can send (small amounts of) data to/from jobs: Input and Output Sandbox. Not recommended for large amounts of data Data is stored on the grid –Located in Storage Elements –Several replicas of one file in different sites –Accessible by Grid users and applications from “anywhere” –Locatable by the WMS/RB (data requirements in JDL) Also… –Data may be copied from/to local filesystems (WNs, UIs) to the Grid or opened remotely on the SE (GFAL,gsidcap,rfio).

13 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 13 LFC –Keeps track of the location of copies (replicas) of files on the Grid

14 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 14 Name conventions Logical File Name (LFN) –An alias created by a user to refer to some item of data, e.g. “lfn:/grid/tutor/mydir/myfile” Globally Unique Identifier (GUID) –A non-human-readable unique identifier for an item of data, e.g. “guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6” Site URL (SURL) (or Physical File Name (PFN) or Site FN) –The location of an actual piece of data on a storage system, e.g. “srm://pcrd24.cern.ch/flatfiles/cms/output10_1” (SRM) “sfn://lxshare0209.cern.ch/data/alice/ntuples.dat” (Classic SE) Transport URL (TURL) –Locator of a replica + access protocol: understood by a SE, e.g. “rfio://lxshare0209.cern.ch//data/alice/ntuples.dat”

15 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 15 Naming conventions How do they fit together? –LFC holds the mapping LFN-GUID-SURL LFN 1 LFN i : SURL j GUID : : : TURL j1 TURL jl : TURL 11 TURL 1k SURL 1 LFC

16 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 16 LFC

17 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 17 LFC LFN acts as main key in the database. It has: –Symbolic links to it (additional LFNs) –Unique Identifier (GUID) –System metadata –Information on replicas –One field of user metadata

18 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 18 LFC Two kinds of LFC –Central LFC For each VO, one site on the grid will publish a global catalog. This will record entries (file replicas or dataset entities) across the whole of the grid. –Local LFC Local catalogs record the file replicas stored at that site's SEs only.

19 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 19 LFC Provides: –User exposed transaction C/C++ API (+ auto rollback on failure)  Python wrapper provided (python module lfc) –Command line tools with administrative functionality –Hierarchical unix-like namespace and namespace operations for LFNs  lfn:/grid/ /mydir/myfile  lfc-mkdir, lfc-chmod –Integrated GSI Authentication + Authorization –Access Control Lists (Unix Permissions and POSIX ACLs) –Checksums –Sessions (multiple operations inside a single transaction ) –Bulk operations (inside transactions )

20 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 20 LFC lfc-chmodChange access mode of the LFC file/directory lfc-chownChange owner and group of the LFC file-directory lfc-delcommentDelete the comment associated with the file/directory lfc-getaclGet file/directory access control lists lfc-lnMake a symbolic link to a file/directory lfc-lsList file/directory entries in a directory lfc-mkdirCreate a directory lfc-renameRename a file/directory lfc-rmRemove a file/directory lfc-setaclSet file/directory access control lists lfc-setcommentAdd/replace a comment Summary of the LFC Catalog commands

21 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 21 LFC lfc_deleteclass lfc_delreplica lfc_endtrans lfc_enterclass lfc_errmsg lfc_getacl lfc_getcomment lfc_getcwd lfc_getpath lfc_lchown lfc_listclass lfc_listlinks lfc_listreplica lfc_lstat lfc_mkdir lfc_modifyclass lfc_opendir lfc_queryclass lfc_readdir lfc_readlink lfc_rename lfc_rewind lfc_rmdir lfc_selectsrvr lfc_setacl lfc_setatime lfc_setcomment lfc_seterrbuf lfc_setfsize lfc_starttrans lfc_stat lfc_symlink lfc_umask lfc_undelete lfc_unlink lfc_utime send2lfc lfc_access lfc_aborttrans lfc_addreplica lfc_apiinit lfc_chclass lfc_chdir lfc_chmod lfc_chown lfc_closedir lfc_creat lfc_delcomment lfc_delete C/C++ API: Low level methods (many POSIX-like):

22 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 22 LFC Interfaces Integration with GFAL and lcg_utils APIs  lcg-utils/GFAL access the catalog in a transparent way Integration with the WMS –The RB can locate Grid files: allows for data based match- making –Jdl file:  InputData = "lfn:/grid/tutor/MyFile";

23 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 23 Data Management CLIs & APIs lcg_utils: lcg-* commands + lcg_* API calls –Provide (all) the functionality needed by the LCG user –Transparent interaction with file catalogs and storage interfaces when needed –Abstraction from technology of specific implementations Grid File Access Library (GFAL): API –Adds file I/O and explicit catalog interaction functionality –Still provides the abstraction and transparency of lcg_utils

24 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 24 Data Management CLIs & APIs lcg-utils commands: Replica Management lcg-cpCopies a grid file to a local destination lcg-crCopies a file to a SE and registers the file in the catalog lcg-delDelete one file lcg-repReplication between SEs and registration of the replica lcg-gtGets the TURL for a given SURL and transfer protocol lcg-sdSets file status to “Done” for a given SURL in a SRM request lcg-utils commands: File Catalog Interaction lcg-aaAdd an alias in LFC for a given GUID lcg-raRemove an alias in LFC for a given GUID lcg-rfRegisters in LFC a file placed in a SE lcg-ufUnregisters in LFC a file placed in a SE lcg-laLists the alias for a given SURL, GUID or LFN lcg-lgGet the GUID for a given LFN or SURL lcg-lrLists the replicas for a given GUID, SURL or LFN

25 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 25 Data Management CLIs & APIs lcg-utils C/C++ API: lcg-cplcg-lr lcg-crlcg-ra lcg-dellcg-rf lcg-replcg-uf lcg-sdlcg-la lcg-aalcg-lg lcg-gt

26 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 26 Data Management CLIs & APIs GFAL –Grid storage interactions today require using some existing software components:  The file catalog services to locate valid replicas of files in order to : Download them to the user local machine Move them from a SE to another one Make job running on the worker node able to access and manage files stored on remote storage element.  The SRM software to ensure: Files existence on disk or disk pool (they are recalled from mass storage if necessary) Space allocation on disk for new files (they are possibly migrated to mass storage later)

27 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 27 Data Management CLIs & APIs The GFAL Features –Hides interactions to the SRM to the end user –Provides a Posix-like interface for File I/O Operation  Posix calls prefixed with gfal_ –Based on shared libraries (both threaded e unthreaded version) –Needs only one header file (gfal_api.h) to write C applications –Supports following protocols :  file for local access, also lfn/guid  dcap, gsidcap and kdcap for dCache access protocol  rfio for CASTOR access protocol.  SRM –Access to SRMs in secure mode, i.e. using a valid Grid proxy obtained by voms-proxy-init command.

28 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 28 Examples Using lcg utils and lfc commands: –Define the server hostname  The LFC server must be published in the BDII ( $LCG_GFAL_INFOSYS )  Use environmental variable: $LFC_HOST=  $LFC_HOST must be set

29 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 29 Listing the entries of a LFC directory lfc-ls [-cdiLlRTu] [--class] [--comment] [--deleted] [--display_side] [--ds] path… where path specifies the LFN pathname (mandatory) –Remember that LFC has a directory tree structure –/grid/ / –All members of a VO have read-write permissions under their directory –You can set LFC_HOME to use relative paths > lfc-ls /grid/tutor/me > export LFC_HOME=/grid/tutor > lfc-ls -l me > lfc-ls -l -R /grid Examples Defined by the user LFC Namespace -l : long listing -R : list the contents of directories recursively: Don’t use it!

30 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 30 Examples Creating directories in the LFC lfc-mkdir [-m mode] [-p] path... Where path specifies the LFC pathname Remember that while registering a new file (using lcg-cr, for example) the corresponding destination directory must be created in the catalog beforehand. Examples: > lfc-mkdir /grid/tutor/me You can just check the directory with: > lfc-ls -l /grid/tutor/me drwxr-xrwx 0 19122 1077 0 Jun 14 11:36 demo

31 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 31 Examples Let us copy and register a file using lcg-utils > lcg-cr --vo tutor -l me/test -d mu2.matrix.sara.nl file:`pwd`/test guid: 7b4efaef-bb0f-42a3-bb6f-bbe35080d105 > lcg-lr --vo tutor lfn:me/test sfn://mu2.matrix.sara.nl/flatfiles/SE00/tutor/generated/2006-09- 18/file378fc829-351f-4558-8679-9d2ce530cbb4 > lfc-ls -l me -rw-rw-r-- 1 30010 2024 114 Sep 18 10:33 test

32 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 32 Examples Creating a symbolic link lfc-ln -s file linkname lfc-ln -s directory linkname Create a link to the specified file or directory with linkname –Examples: > lfc-ln -s /grid/tutor/me/test /grid/tutor/aLink Let’s check the link using lfc-ls with long listing (-l): > lfc-ls -l lrwxrwxrwx 1 30010 2024 0 Sep 18 10:38 aLink -> /grid/tutor/me/test Original File Symbolic link

33 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 33 Examples Adding/deleting metadata information lfc-setcomment path comment lfc-delcomment path lfc-setcomment adds/replaces a comment associated with a file/directory in the LFC Catalog lfc-delcomment deletes a comment previously added This is the only metadata (one field) supported by the catalog Examples: > lfc-setcomment me/test “nice file” Let’s see what happened: > lfc-ls --comment /grid/tutor/me/test /grid/tutor/me/test nice file

34 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 34 Examples Deleting the file lfc-rm lfc-rm removes file/link/directory only from the catalog lcg-del Lcg-del removes file from SEs and the lfns/links from the catalog Examples, delete all replicas: > lcg-del –a --vo tutor guid:8e413879-7cb3-4260-af9f-6964392da7e8 Example, delete only one replica: > lcg-del –a --vo tutor –s mu2.matrix.sara.nl guid:8e413879-7cb3-4260-af9f- 6964392da7e8

35 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 35 File Transfer Service A batch system for submitting datatransfer jobs For data intensive sciences –Currently in use in the LCG project

36 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 36 FTS Allows for –Managed transfers by means of channels to sites  Channels are between sites i.e. CERN-SARA for example.  Site admins can adapt the configuration of incoming channels to their site, can switch their channel off etc.  Set priorities for different VOs. –Optimisation of network tuning parametres per channel

37 Enabling Grids for E-sciencE INFSO-RI-508833 Grid Tutorial, RC RUG, 18-19 September 2006 37 FTS Command line interface –glite-transfer-cancel  Cancels a file transfer job –glite-transfer-list  Lists ongoing data transfer jobs –glite-transfer-status  Displays the status of an ongoing data transfer job –glite-transfer-submit  Submits a new data transfer job


Download ppt "INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Data Management Ron Trompert SARA Grid Tutorial, 18-19 September 2006."

Similar presentations


Ads by Google