Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Job Submission in a Dynamic Virtual Environment

Similar presentations


Presentation on theme: "Distributed Job Submission in a Dynamic Virtual Environment"— Presentation transcript:

1 Distributed Job Submission in a Dynamic Virtual Environment
Davide Salomoni INFN-CNAF D.Salomoni - CCR/INFNGrid WS, Palau, May 2009

2 D.Salomoni - CCR/INFNGrid WS, Palau, May 2009
Problem Description It is becoming increasingly feasible to run jobs in dynamic environments, i.e. on compute resources created at job execution time. See the Virtualization session of this workshop for a review of a number of techniques covering the creation and use of virtual resources. The tools meant to support this tend to be designed and work around local submission methods. This talk focuses on the architecture needed to make use of dynamically created, virtual resources, in a grid environment. D.Salomoni - CCR/INFNGrid WS, Palau, May 2009

3 D.Salomoni - CCR/INFNGrid WS, Palau, May 2009
Yes, but why? Possibility to run experiment software (not touched per se by this talk, as it normally lives in software areas) on varied O/S versions – across different VOs or even within a given VO. Efficiency optimization. Proper resource selection may bring about billing/accounting advantages. Feedback on interest and practical use cases welcome. D.Salomoni - CCR/INFNGrid WS, Palau, May 2009

4 A Graphical Representation
Grid Information System VM-A Run on VM-A! Site A CE WMS VM-A VM-B Site B CE Run on VM-B! ? Site C CE D.Salomoni - CCR/INFNGrid WS, Palau, May 2009

5 Local Layer Example: CNAF’s WNOD (Worker Nodes On Demand)
wnod_LsfMaster Bait VWN DomU Physical Box Dom0 wnod_XenMaster LSF Local Layer CE / CREAM WMS Job submission by user Step 0 Batch job dispatched on Bait VWN by the LSF master First JobStarter Step 5 notification: batch job execution started on VM VWN-001 LSF DomU PostExec Second JobStarter Step 4 batch job execution migrated to VM Step 1 JS stops, gets requirements and sends a notification msg for a new batch job to process Step 6 notification: batch job execution finished on VM Step 2 Request a VM for batch job execution Step 7 Close VWN and wait for a new job Step 3 Create or Recycle a VM for batch job execution and notify wnod_LsfMaster about it D.Salomoni - CCR/INFNGrid WS, Palau, May 2009

6 Elements toward a solution
User requirements must be [selected by the user and] carried forward along the WMS  CE  LRMS chain and ultimately processed by the dynamic VM manager. Use the so-called Forward Requirements In particular, adopt an existing Glue Schema 1.x attribute: SRTE : Software RunTime Environment D.Salomoni - CCR/INFNGrid WS, Palau, May 2009

7 D.Salomoni - CCR/INFNGrid WS, Palau, May 2009
SRTE Pros and Cons It’s an existing Glue 1.x attribute: no need to introduce new things, no compatibility issues It may be used to store VM data (and, hence, select them) The semantic of SRTE entries is not univocally determined For instance, SRTE is currently being used to identify which resource centers to select (e.g., “tier1” or “tier2”), only useful at the WMS match-making level (i.e. this is not really a forward requirement). How would an engine interpret the elements published into the SRTE attribute?  need to disentangle ambiguity. D.Salomoni - CCR/INFNGrid WS, Palau, May 2009

8 D.Salomoni - CCR/INFNGrid WS, Palau, May 2009
Transition Proposal Use SRTE to comply with the current Glue 1.x. Adopt the convention to identify dynamic VM into SRTE using the following syntax: VM_xxx Yes, it is fragile – and yes, it is temporary. D.Salomoni - CCR/INFNGrid WS, Palau, May 2009

9 D.Salomoni - CCR/INFNGrid WS, Palau, May 2009
In practice… The transition proposal may be tested today This involves close cooperation between the WMS, CE, and LRMS (local layer) developers / managers. Expected timeframe for first results is end of June 2009 D.Salomoni - CCR/INFNGrid WS, Palau, May 2009

10 D.Salomoni - CCR/INFNGrid WS, Palau, May 2009
Next steps A more complete solution shall be discussed in the next presentation. It will involve at least: Identification of VMs in the Information System. Use of Glue 2.x. Definition and standardization of key VM parameters, possibly including consumable resources (like RAM, floating licenses). Standardization of the interface toward the LRMS (for instance, to cover multiple LRMS – LSF, Torque, GridEngine, etc.) This is a task that could be linked to standardization in the management of dynamic virtual environments; see also the virtualization session of this workshop. VM retrieval mechanisms. D.Salomoni - CCR/INFNGrid WS, Palau, May 2009

11 D.Salomoni - CCR/INFNGrid WS, Palau, May 2009
Next steps From a presentation on the evolution of the gLite WMS, JRA1 All-Hands, 6-7/5/2009: A more complete solution shall be discussed in the next presentation. It will involve at least: Identification of VMs in the Information System. Use of Glue 2.x. Definition and standardization of key VM parameters, possibly including consumable resources (like RAM, floating licenses). Standardization of the interface toward the LRMS (for instance, to cover multiple LRMS – LSF, Torque, GridEngine, etc.) This is a task that could be linked to standardization in the management of dynamic virtual environments; see also the virtualization session of this workshop. VM retrieval mechanisms. D.Salomoni - CCR/INFNGrid WS, Palau, May 2009

12 Initiators of this talk
CNAF: M.Cecchi, A.Ghiselli, A.Italiano, D.Salomoni, V.Venturi Milano: D.Rebatto Padova: M.Sgaravatto, L.Zangrando Further contributions are definitely welcome. D.Salomoni - CCR/INFNGrid WS, Palau, May 2009


Download ppt "Distributed Job Submission in a Dynamic Virtual Environment"

Similar presentations


Ads by Google