Presentation is loading. Please wait.

Presentation is loading. Please wait.

CASTOR SRM v1.1 experience Presentation at HEPiX MSS Forum 28/05/2004 Olof Bärring, CERN-IT.

Similar presentations

Presentation on theme: "CASTOR SRM v1.1 experience Presentation at HEPiX MSS Forum 28/05/2004 Olof Bärring, CERN-IT."— Presentation transcript:

1 CASTOR SRM v1.1 experience Presentation at HEPiX MSS Forum 28/05/2004 Olof Bärring, CERN-IT

2 28/05/2004 CASTOR SRM v1.1 experience 2 Outline Brief overview of SRM v1.1 CASTOR implementation Interoperability tests Problems found –SRM specification –GSI SRM @ GGF: GSM WG –Input to the definition of SRM-Basic Conclusions and outlook

3 28/05/2004 CASTOR SRM v1.1 experience 3 Brief overview of SRM v1.1 SRM = Storage Resource Manager First (v1.0) interface definition – –October 22, 2001 –JLAB, FNAL and LBNL –Some key features: Transfer protocol negotiation Multi-file requests Asynchronous operations SRM is a management interface –Make files available for access (e.g. recall to disk) –Prepare resources for receiving files (e.g. allocate disk space) –Query status of requests or files managed by the SRM –Not a WAN file transfer protocol URLs –SURL – Site specific URL. Protocol neutral »srm:// –TURL – Transfer URL. Protocol specific »gsi

4 28/05/2004 CASTOR SRM v1.1 experience 4 SRM v1.0 operations getRecall from tape and pin on disk putReserve disk space, pin and maybe make permanent getRequestStatusGet the status of a running get/put setFileStatusSet the status of a file pinPin file on disk unPinCancel a previous pin operation mkPermanentMake existing file permanent getProtocolsGet list of supported transfer/access protocols getFileMetadataGet file metadata advisoryDeleteRecommend SRM to delete a file getEstGetTimeFake get for time estimation getEstPutTimeFake put for time estimation AsynchronousSynchronous/stateless

5 28/05/2004 CASTOR SRM v1.1 experience 5 The copy operation SRM v1.1 == SRM v1.0 + copy copy quite different from other SRM operations: –Copy file(s) from/to local SRM to/from another (optionally remote) SRM –The target SRM performs the necessary put and get operations and executes the file transfers using the negotiated protocol (e.g. gsiftp) The copy operation allows a batch job running on a worker node without in&out-bound WAN access to copy files to a remote storage element The copy operation was documented only 4 days ago(!) The copy operation could potentially provide the framework for planning transfers of a large data volumes (e.g. LHC T0 T1 data broadcasting)??

6 28/05/2004 CASTOR SRM v1.1 experience 6 CASTOR SRM v1.1 Implements the vital operations –get, put, getRequestStatus, setFileStatus, getProtocols No-ops: –pin, unPin, getEstGetTime, getEstPutTime Implemented but optionally disabled (requested by LCG) –advisoryDelete CASTOR GSI (CGSI) plug-in for gSOAP –Also used in GFAL Evolution @ CERN: –First prototype in summer 2003 –First production version deployed in December 2003 Other sites having deployed the CASTOR SRM –CNAF (INFN/Bologna) –PIC (Barcelona)

7 28/05/2004 CASTOR SRM v1.1 experience 7 CASTOR SRM v1.1 CASTOR tape archive SRM request repository Grid services SRMgridftp GSI CASTOR disk cache stagerRFIO Tape mover Tape queue CASTOR name space Volume Manager Local clients

8 28/05/2004 CASTOR SRM v1.1 experience 8 Interoperability tests CASTOR SRM has been running interoperability tests with various clients, notably –GFAL (Jean-Philippe) –EDG replica manager (Peter) –FNAL/dCache SRM (Timur)

9 28/05/2004 CASTOR SRM v1.1 experience 9 Problems found The interoperability problems can be classified as: –Due to problems with the SRM specification –Due to assumptions in SRM or SOAP implementations –Due to GSI incompatibilities The debugging of GSI incompatibilities is by far the most difficult and time consuming

10 28/05/2004 CASTOR SRM v1.1 experience 10 Problems with SRM spec (1) Lack of enumeration –All enumeration-like types are strings –Client needs to find a common denominator (e.g. cast all strings in capital letters) Request and file state lifecycles –Concise for put or get –Undefined for copy (a proposal was circulated 4 days ago). This turned out to be an important interoperability issue between CERN/CASTOR and FNAL/dCache SRMs –Undefined for mkPermanent, pin, unpin (probably irrelevant for the latter two)? Request history –What an SRM should with requests that have reached the Done or Failed status

11 28/05/2004 CASTOR SRM v1.1 experience 11 Problems with SRM spec (2) Immutability of request identifier –Request id is a 32 bit word –Unspecified if an SRM can reuse request ids for finished (Done or Failed) requests SURL (Site URL) semantics –Is it an URL or URI? –If URL, does it support relative and absolute paths? –If URI name space is virtually flat for an arbitrary client Pin lifetime –Pin lifetime is defined to be subject for site policy –No way to query the remaining pin lifetime for a particular file

12 28/05/2004 CASTOR SRM v1.1 experience 12 Problems with SRM spec (3) Exception handling and error propagation –Unspecified if a multi-file request should fail when a subset of the files got an error –Unspecified if and when an SRM can do retries –Only one error message, global for all files in a multi-file request, is available for reporting –Format and contents of error message undefined advisoryDelete != delete –It may be vital to know what the effect is No effect at all (if so, what happens if SURL is reused for a new file?) Only remove disk resident copy (if so, when?) Remove HSM file (if so, when?) Directory creation on the fly for put requests –If a put requests specifies a SURL corresponding to a path for which one or several sub-directory levels do not exist, should it create the missing dirs on the fly (provided the client has the appropriate permissions)?

13 28/05/2004 CASTOR SRM v1.1 experience 13 Problems due to SRM or SOAP implementation details SRM WSDL discovery –FNAL client assumed wsdl and service are hosted by same web-server Bug in gSOAP v2.3 WSDL importer Various bugs in CASTOR SRM found but not reported here

14 28/05/2004 CASTOR SRM v1.1 experience 14 GSI problems (1) CASTOR (GSI) – EDG RC (Java TrustManager) –TrustManager does not use GSI default of SSL handshake + credential delegation, but just a SSL handshake –TrustManager client would not work with SSL 3.0, which is forced by GSI –Solution: EDG RC uses CoG (Globus Java Security Implementation) instead CASTOR (GSI) – FNAL dCache (Java CoG) –FNAL client only used a limited number of algorithms for encryption that were not matching those provided by standard GSI –Limited Proxy certificate GSI error reporting not working properly

15 28/05/2004 CASTOR SRM v1.1 experience 15 GSI problems (2) Administration and deployment issues –EDG globus patch for supporting for dynamic pool accounts requires GRIDMAPDIR environment to be declared, even if default location was used for the security files –configuration problems (right Root CA not trusted) –CERN CA changed the Certificate naming scheme (number added at the end of DN). New certificates were not automatically propagated (to, for instance, FNAL). The effort for debugging GSI problems will scale with the number of SRM implementations –Establishing a SRM reference implementation for certifying new servers and clients would help

16 28/05/2004 CASTOR SRM v1.1 experience 16 SRM @ GGF: GSM WG GGF GSM (Grid Storage Management) WG –SRM interface specification for GGF will proceed in two steps SRM-Basic SRM-Advanced –Current proposal is to have SRM-Basic relatively close to SRM v1.1 SRM-Advanced close to SRM v2.1 + vaguely defined features like authorization, access control, monitoring Suggestion to HEPiX MSS forum how we could use GSM WG –SRM-Basic is hopefully sufficient for LHC Tier-0 Tier-1 data distribution. With that objective it is essential that all existing interoperability problems with SRM v1.1 definition are addressed as appropriate adding of new features should be kept at the minimum necessary –Hopefully we have already come up with some input during these two days

17 28/05/2004 CASTOR SRM v1.1 experience 17 Conclusions and outlook CASTOR SRM v1.1 is in production since a couple of months at CERN and some other CASTOR Tier- 1 sites SRM interoperability does not come for free –Definition not concise enough, room for too much site specific interpretation –Is GSI interoperability an illusion and, if so, will it continue to be so? We have currently no plans for a CASTOR SRM v2.1 implementation. Would rather like to tighten up SRM v1.1 in the context of the GGF GSM WG and the SRM-Basic definition

Download ppt "CASTOR SRM v1.1 experience Presentation at HEPiX MSS Forum 28/05/2004 Olof Bärring, CERN-IT."

Similar presentations

Ads by Google