Presentation on theme: "Batch System Operation & Interaction with the Grid LCG/EGEE Operations Workshop May 25 th 2005 CERN.ch."— Presentation transcript:
Batch System Operation & Interaction with the Grid LCG/EGEE Operations Workshop May 25 th 2005 Tony.Cass@ CERN.ch
Tony.Cass@ CERN.ch 2 Why a Batch Workshop at HEPiX? Proposed after the last Operations Workshop. Remember the complaints then? –ETT doesnt work –ETT is meaningless when fairsharing is in place –The solution of a queue per VO while easy to implement now but is not a good or long term solution. –The [ETT] algorithm was questioned and other proposals were given. Idea was to bring together site managers, grid & local scheduler developers.
Tony.Cass@ CERN.ch 3 Workshop Aims Understand how different batch scheduling systems are used at HEP sites –Are there any commonalities? How do sites see the Grid interface? How would sites like to see the Grid interface? What is the impact of the current interface? How do developers of local and Grid level schedulers see the future? How/can HEP site managers influence future developments? Well attended (70-80) –Definite interest in this area from site managers See http://www.fzk.de/hepix
Tony.Cass@ CERN.ch 5 Site Presentations --- I Site reports covered –Brief overview of the available computing resources, showing (in)homogeneity of resources –Queue configuration---what and why –How do users select queues---cpu time alone or specifying other resources (e.g. memory, local disk space availability) –Need for, and use of, "special" queues---for "production managers", sudden high priority work, other reasons. »Question from LHCC referee: If there is some urgent analysis, how can [gLite] send this to a special queue? –Level of resource utilisation
Tony.Cass@ CERN.ch 6 Site Presentations --- II Overall, configurations and concerns were broadly equivalent across sites. Concerns were around –Scheduling –Security –Interface Scalability Cover these issues in next few slides.
Tony.Cass@ CERN.ch 8 Local Load Scheduling: summary Batch schedulers at local sites enable fine-grained control over heterogeneous systems and are used to enforce local policies on resource allocation and provide SLA for users (turnround time). –Large sites have subdivision of user groups Scheduling is by CPU time, some need to request –minimum CPU capacity for server –memory requirement –available disk work space (/pool, /scratch, /tmp) Sites want Grid interface to use existing queue(s) –NOT to create a queue per VO. –EMPHATICALLY NOT to replicate queue structure per VO
Tony.Cass@ CERN.ch 9 Grid/Local interface problems Jeffs presentation! In short –Not enough information passed from the site to the Grid –No information passed from the Grid to the site Result: –Queues at sites whilst others sit empty –Confused/frustrated site managers –Inefficient behaviour as people work the system »Tragedy of the commons
Tony.Cass@ CERN.ch 10 Should sites (be able to) enforce policies? Sites are funded for particular tasks and need to show funding agencies and users that they are fulfilling their mission. This is a Grid. Why does it matter if you are running jobs for X not Y? Y may be happily running jobs at another site. My view: –Sites need to understand and feel comfortable with the way they accept jobs from the Grid. –If they are comfortable, account may be taken of global activity when setting local priorities. –Lets walk before we try to run…
Tony.Cass@ CERN.ch 11 Can/Should we fix this? … or should we wait to see some general standard emerge? Strong support from commercial people (especially Platform and Sun) for HEP to work out solutions to this problem. –They are interested in what we do. Standards bodies (GGF,…) wont come up with any common solution soon. –But this doesnt mean HEP shouldnt participate »Raise profile of problems of interest to us »Give practical input based on realworld experience.
Tony.Cass@ CERN.ch 12 How to fix? Improve information available to Grid scheduler –VO information added in GLUE schema (v1.2) »Need volunteer per batch system to maintain dynamic plug-ins and the job manager. u CERN will do this for LSF. Need other volunteers! –but still assumption of homogeneous resources at a site. –There is a plan to start work on GLUE v2 in November »No requirement for backwards compatibility. »Discussion should start NOW! u But need to assess impact of v1.2 changes before rushing into anything. Grid scheduler should pass job resource requirements to the local resource manager. –Not yet. When? How? –Needs normalisation… Does this need to be per VO?
Tony.Cass@ CERN.ch 14 Security Issues Sites are still VERY concerned about traceability of users. Mechanisms seem to be in place to allow this, but sites have little practical experience. –c.f. delays for CERN to block user systematically crashing worker nodes. –Security group have doubts that sites are fulfilling obligations in terms of log retention. –Security Challenges mooted; these may help increase confidence… Whatever, it does NOT seem to be a good idea to have a portal handling user job requests and passing these on with a common certificate…
Tony.Cass@ CERN.ch 16 Interface Scalability IN2P3 example: GridJobManager asks job status once per minute (even for 15-hour jobs). –5000 queued jobs + 1000 running jobs = 100 queries/s Being solved by egee BLAHP –Caches query response But… –further example need for discussion between sites & developers (IN2P3 fixing this issue independently) –are there other similar issues out there? »c.f. LSF targets: u Scalability: 5K hosts, 500K active jobs, 100 concurrent users, 1M completed jobs per day u Performance: >90% slot utilistion, 5s max command response time, 4kB memory/job, master failover in <5mins »What are targets for the CE? RB?
Tony.Cass@ CERN.ch 18 End-to-End Guarantees The Condor talk raised many interesting points. One in particular was the (in)ability of the overall system to offer end-to-end execution guarantees to the users. Condor glide-in: pilot job submitted via the Grid which takes a job from a condor queue. Fair enough [modulo security…] for system managers PROVIDED pilot job expresses same resource requests as it advertises in a class-ad when it starts. –Shouldnt claim to be maximum possible length then run short job. –Class ads and GLUE schema not so different: Both are ways of saying what a node/site can do in a way that can be used to express (and then match) requirements.
Tony.Cass@ CERN.ch 19 Pre-emption & Virtualisation Strong message from batch system developers that pre-emption is A GOOD THING. With pre-emption schedulers can maximise throughput/resource usage by –suspending many jobs to allow parallel job to run –suspending long running jobs to provide quick turnround for priority jobs. Interest in virtualisation as method to ease this –Also discussed at last operations workshop as a way to ease handling of multiple (conflicting) requirements for OS versions. –Something to watch. How would (pre-empted) users like this? –No guarantee of time to completion once job starts…
Tony.Cass@ CERN.ch 20 Push vs Pull A false dichotomy –Sites can manipulate pull model to create a local queue Real issue is early vs. late allocation of task to resource –Early: site resource utilisation maximised: a free cpu resource can be filled immediately with a job from the local queue –Late: user doesnt see job sent to site A just before a cpu becomes free at site B. Questions: –Long term, will most cpu resources be full? –What do people want to maximise? Throughput or ? »Efficient scheduling important anyway… transparency of grid/local interface will be key. –Pre-emption, anyone?
Tony.Cass@ CERN.ch 25 Workshop Summary Useful workshop. Good that there has been progress since the November workshop at CERN (GLUE schema update), but much is still to be done. [Still] Need to increase dialogue between site managers and Grid [scheduler] developers –Site managers know a lot about running services. –Unfortunate that a meeting change created a clash and reduced scope for egee developers to participate in Kaelsruhe discussions. –A smaller session is pencilled in for HEPiX in SLAC, October 10 th – 14 th. More dialogue then? Not too early to start thinking about GLUE v2!