Presentation is loading. Please wait.

Presentation is loading. Please wait.

WP1 WMS rel. 2.0 Some issues Massimo Sgaravatto INFN Padova.

Similar presentations


Presentation on theme: "WP1 WMS rel. 2.0 Some issues Massimo Sgaravatto INFN Padova."— Presentation transcript:

1 WP1 WMS rel. 2.0 Some issues Massimo Sgaravatto INFN Padova

2 Outline Some issues to discuss (and let’s try to decide) LB server choice New CondorG Proxy renewal RLS integration WP2 Optor integration Output data upload and registration LB issues Gangmatching Security of files on the WM node Disk quota management in WM node VOMS integration Job exit code ISB/OSB transfer errors Accounting integration User vs host proxies … ?

3 LB server choice Allow multiple LB servers for a single WM for increased reliability and performance Approach UI responsible to choose the LB server (e.g. via a round robin) ? List of available LB servers in UI conf file, waiting for having this VO specific info published in a “VO repository” (R-GMA/IS/VOMS) ? Move list of available NSs in this VO repository as well, when available Not too clear yet what could be this VO repository (discussions within ATF)

4 New CondorG New CondorG negotiated with Condor people (more details by Francesco P.) Released by end of March, included in VDT, and to be used in rel 2.0 Two proxies X509UserProxy One per job X509ManagementProxy One per user’s DN or one “serving” n jobs for that user’s DN A CondorG pair for a given X509ManagementProxy Details on the whole machinery to be discussed Where is this user’s DN  X509ManagementProxy mapping kept and managed ? Proxy renewal ? …

5 Proxy renewal Necessary to have a “persistent” proxy renewal daemon (i.e. if it is restarted it shouldn’t loose control of the “managed” jobs as it happens now) Necessary to discuss and decide on various issues Renewal of X509UserProxy Done only if requested by the user (if MyProxyServer specified in the JDL ?) ? No MyproxyServer in WM conf file anymore ? And what about renewal of X509ManagementProxy ? If a new proxy “arrives” from UI and extends the validity of the existing one, the new one replace the old one ? Not enough: what about if at least a job of that user asked for proxy renewal ? Necessary to renew also X509ManagementProxy Who does registration ? NS ? Who does un-registration ?? …

6 RLS integration At J+27 RB/MM will have to query the WP2 RLS instead of WP2 RC to get the SFNs given a LFN (or LCN, or a GUID) On-going negotiation of this WP1-WP2 interface New JDL attribute (VirtualOrganization) to make possible to refer to the “official” VO’s RLS (needed by WP2 services) Not needed anymore when VOMS integrated and therefore it will be possible to get the VO from user’s proxy Optional JDL attribute to make possible to specify a “non- official” RLS ? edgReplicaManager::listReplicas to have the SFNs New BrokerInfo content (under negotiation)

7 Integration with WP2 Optor Completely different approach than querying the RLS to have the PFNs (mutually exclusive) … RB calls getAccessCost for all the suitable CEs (the ones where the user is authorized to submit jobs and matching the JDL “Requirements” expression) and for all the specified input data (LFNs, LCNs, GUIDs) A “cost” is returned for each CE The RB chooses the CE, taking into account this cost and also the other Ranks (to be decided how) In some cases the WM has also to trigger the replica of files to the closeSE Not too difficult, but very high impact on scheduling/planning performed by RB/MM Integration WMS-Optor Planned after J+27 However according to WP2, this stuff ready and tested well before J+27 To discuss details of integration How ? A binary flag in the WM conf file to enable/disable Optor ? When ?

8 Output data upload and registration Problem discussed and solution agreed in the ATF Approach (details by Fabrizio P.): OutputData JDL attribute (optional) to specify output file names, output LFNs and output SEs Jobwrapper at the end has to call the WP2 function copyAndRegister Issues Some details about copyAndRegister to be sorted out Release date of this stuff not decided yet

9 LB What happens exactly at J+27 wrt: “Advanced query to LB” ? “LB – RGMA integration” ? How ? Interfaces (e.g. for advanced queries) ? Issues ? Ales ??

10 Gangmatching Problem: take into account both CE and SE information in the matchmaking For example to require a job to run on a CE close to a SE with “enough space” Salvo has been working on this for a while, also after some negotiations with Condor team (A. Roy) Salvo’s talk for details (e.g. JDL) and discussions When can this stuff be released ? J+27 ?

11 Security of files on the WM node Approach WP1 services (NS, …) running as edguser.edguser in WM node Different user’s subjects mapped to different local users in grid-mapfile: user1.user, user2.user, … Patched gridftp server (by Massimo M.) running on the NS node, so that the InputSandbox files are transferred in the NS node belonging to edguser as group and rwxrwx--- as mask So a user can not access files belonging to an other user anymore Issues When ? J+27 ? How ? Gridftp server RPM released by WP1 ?

12 Disk quota management on the WM node Having different DN users mapped to different local users in the grid-mapfile of the WM node allows to set disk quota for the various users NS to be modified (for J+27) so that it has to reject a job if no enough disk quota available to store the input sandbox files Issues ? Marco ??

13 VOMS integration E.g.: voms-proxy-init –vo CMS VO info in the generated proxy Impact on WP1 software Retrieve VO from user’s proxy So not necessary to provide it anymore in the JDL, for querying the RLS Check for authorization not node anymore with a matchmaking considering User Cert Subject but according to VO Proxy used by the various services (NS, LB, etc.) generated by VOMS ? Issues VOMS deployed at J+37 but not too clear which and when integration will take place Not clear yet which VOMS APIs available

14 Job exit code For release 2.0 we agreed to return job exit code to user with dg-job-status What about if exit code <> 0 ? Done-ok in any case ? Done-failed (and therefore resubmission) ?

15 ISB/OSB transfer errors In release 1.x job considered failed (and therefore resubmission attempted) if JobWrapper detects errors when transferring a file of ISB/OSB between RB node and WN But failure could be simply because of user’s error when writing ISB/OSB expressions in JDL … And what about if the job crashed for “internal” problems and therefore some OSB files not produced ? Is it ok to mark the job as failed and re-attempt the submission or is it better to consider the job as done-ok ? Approach in release 2.0 JobAdapter should check and issue globus-url-copy only for ISB- OSB files which exist (simple for OSB, bit more complex for ISB) and/or globus-url-copy errors ignored ?

16 Accounting integration What exactly happens at J+27 (“Accounting infrastructure”) ? And later, after release 2.0 (“Full integration of cost estimation/accouting into scheduling policies”) ? Dependencies and interfaces with other components and other WPs at J+27 and later ?

17 Host vs user proxies Can we rely on user’s proxies instead of host proxies for authentication when possible, as recommended ? E.g. in LB logging Other cases ?


Download ppt "WP1 WMS rel. 2.0 Some issues Massimo Sgaravatto INFN Padova."

Similar presentations


Ads by Google