Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workload management Owen Maroney, Imperial College London (with a little help from David Colling)

Similar presentations


Presentation on theme: "Workload management Owen Maroney, Imperial College London (with a little help from David Colling)"— Presentation transcript:

1 Workload management Owen Maroney, Imperial College London (with a little help from David Colling)

2 Contents Brief review of the WMS architecture used in LCG2. Future UK plans in WMS area.

3 WMS used in LCG2: EDG release 2(.1) architecture Slightly hardened and made more robust But appears to be reliable and scalable to current levels of LCG-2 Uses (modified) bdII instead of RGMA (gin/gout) Strictly speaking this is a monitoring issue rather than a WMS issue. Now takes less time to submit jobs

4 WMS used in LCG2: Description that follows was shown at GridPP7 and mainly taken from an even earlier presentation by Massimo Sgaravatto. So this is just a reminder, however there have been no changes in the basic architecture between then and LCG2.

5 UI Network Server Job Contr. - CondorG Workload Manager Replica Catalog Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status edg-job-submit myjob.jdl Myjob.jdl JobType = “Normal”; Executable = "$(CMS)/exe/sum.exe"; InputData = "LF:testbed "; InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"}; OutputSandbox = {“sim.err”, “test.out”, “sim.log"}; Requirements = other. GlueHostOperatingSystemName == “linux" && other. GlueHostOperatingSystemRelease == "Red Hat 7.3“ && other.GlueCEPolicyMaxWallClockTime > 10000; Rank = other.GlueCEStateFreeCPUs; submitted Job Status UI: allows users to access the functionalities of the WMS Job Description Language (JDL) to specify job characteristics and requirements

6 UI Network Server Job Contr. - CondorG Workload Manager Replica Catalog Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage Input Sandbox files Job waiting submitted Job Status NS: network daemon responsible for accepting incoming requests Job submission

7 UI Network Server Job Contr. - CondorG Workload Manager Replica Catalog Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status WM: responsible to take the appropriate actions to satisfy the request Job Job submission

8 UI Network Server Job Contr. - CondorG Workload Manager Replica Catalog Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Match- maker Where does this job must be executed ? Job submission

9 UI Network Server Job Contr. - CondorG Workload Manager Replica Catalog Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Match- Maker/ Broker Matchmaker: responsible to find the “best” CE where to submit a job Job submission

10 UI Network Server Job Contr. - CondorG Workload Manager Replica Catalog Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Match- Maker/ Broker Where are (which SEs) the needed data ? What is the status of the Grid ? Job submission

11 UI Network Server Job Contr. - CondorG Workload Manager Replica Catalog Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Match- maker CE choice Job submission

12 UI Network Server Job Contr. - CondorG Workload Manager Replica Catalog Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Job Adapter JA: responsible for the final “touches” to the job before performing submission (e.g. creation of wrapper script, etc.) Job submission

13 UI Network Server Job Contr. - CondorG Workload Manager Replica Catalog Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage Job Status JC: responsible for the actual job management operations (done via CondorG) Job submitted waiting ready Job submission

14 UI Network Server Job Contr. - CondorG Workload Manager Replica Catalog Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage Job Status Job Input Sandbox files submitted waiting ready scheduled Job submission

15 UI Network Server Job Contr. - CondorG Workload Manager Replica Catalog Inform. Service Computing Element Storage Element RB node RB storage Job Status Input Sandbox submitted waiting ready scheduled running “Grid enabled” data transfers/ accesses Job Job submission

16 UI Network Server Job Contr. - CondorG Workload Manager Replica Catalog Inform. Service Computing Element Storage Element RB node RB storage Job Status Output Sandbox files submitted waiting ready scheduled running done Job submission

17 UI Network Server Job Contr. - CondorG Workload Manager Replica Catalog Inform. Service Computing Element Storage Element RB node RB storage Job Status Output Sandbox submitted waiting ready scheduled running done edg-job-get-output Job submission

18 UI Network Server Job Contr. - CondorG Workload Manager Replica Catalog Inform. Service Computing Element Storage Element RB node RB storage Job Status Output Sandbox files submitted waiting ready scheduled running done cleared Job submission

19 UI Log Monitor Logging & Bookkeeping Network Server Job Contr. - CondorG Workload Manager Computing Element RB node LM: parses CondorG log file (where CondorG logs info about jobs) and notifies LB LB: receives and stores job events; processes corresponding job status Log of job events edg-job-status Job status Logging and bookkeeping.

20 Future UK plans The WMS will be change with ARDA (e.g. will go to pull rather push model for job distribution) UK emphasis is going to be on testing scalability Plan is: Instrument WMS code Build testbed (between Imperial HEP and LeSC) capable of simulating the load of entire LCG Understand the characteristics of different sorts of (HEP) job and feed this into simulation. Also Plan: To examine and understand the performance of the WMS in operation.

21 Future UK plans Details of the testbed construction to be worked out, however this effort will be integrated into the EGEE/LCG testplan. This effort also neatly dovetails into the GridCC project (see talk at GridPP11?)


Download ppt "Workload management Owen Maroney, Imperial College London (with a little help from David Colling)"

Similar presentations


Ads by Google