Scheduling under LCG at RAL UK HEP Sysman, Manchester 11th November 2004 Steve Traylen

Scheduling under LCG at RAL UK HEP Sysman, Manchester 11th November 2004 Steve Traylen s.traylen@rl.ac.uk

RAL, LCG, torque and Maui Observations of RAL within LCG vs. traditional batch. Various issues that arose and what was done to tackle this. Upcoming changes in the next release. Some items still to be resolved.

LCG Grid vs. Traditional Batch Observations from the LHCb vs. Atlas period earlier this year. Matters in common for LCG Grid and Batch. –RAL must provide 40% to Atlas, 30% to LHCb, … as dictated by GridPP. Differences for LCG Grid and Batch. –Batch - 4000 queued for 400 job slots. –LCG - often < 5 queued for 400 job slots.

LCG Grid vs. Traditional Batch Providing allocations difficult with LHCb submitting at a faster rate. RAL only received LHCb jobs. Only solution with OpenPBS to just hard limit LHCb. –Idle CPUs a waste of money. –Always better to give allocation as soon as possible. –LHCb jobs pile up due to apparent free resource. –RAL becomes unattractive (via ETT) to Atlas.

Queues per VO Many sites, CNAF, LIP, NIKHEF, … moved to queues per VO. Advantages –Estimated traversal calculation orthogonal for each VO. While piling up LHCb jobs Atlas jobs are still attracted, hopefully always one queued job available. –Queue lengths can be customised. Disadvantages –Change in farm just to fit into to LCG. –Adding VOs becomes harder.

Queues per VO(2) ETT calculation rudimentary, just increases as jobs are queued on a per queue basis. –RAL only gives 1 CPU to Zeus and 399 to Atlas ETT calculation does not reflect this really. In fact RAL’s queues now have a zero FIFO component. But it still works once Zeus jobs pile up they stop coming.

Take-up within LCG

CPU Scaling CPU variation. –Can be removed now within the batch farm by configuring pbs_mom to normalise CPU time. –Normalised speed published into info System. Walltime scaling more confusing. –RAL does, we fairshare the whole farm on wall time. –However what we advertise is a lie.

CPU Scaling and ETT Only at RAL the scaling is extreme. –Normalised on Pentium 450 –Nodes are scaled from 4.7 to 5.0 at present. –So CPUlimits and walltimes are very long (9 days). Once jobs are queued RAL became very unattractive. We modified info provider to make the “ETT” comparable to other sites. We will renormalize at some point soon.

OpenPBS to Maui/Torque OpenPBS (as in LCG today) hangs when one node crashes. –Torque is okay (most of the time). Torque is just a new version of OpenPBS maintained by www.supercluster.org.www.supercluster.org –No integration required. Active user community and mailing list. Maintained well, bug fixes and patches are accepted and added regularly. Maui is a more sophisticated scheduler capable of fairshare for instance.

Fairshare with Maui The default is FIFO and so the same as the default PBS scheduler. Maui supports fairshare on Walltime. –E.g., Consider last 7 days of operation. –Give Atlas 50%, CMS 20%. –Give lhcbsgm a huge priority but limit “them” to one job. –Reserve one CPU for a 10 minute queue. Monitoring jobs. Maui will strive to reach these targets. –Tools exist to diagnose and understand why Maui is not doing what you hope for allowing tuning.

Heterogeneous Clusters Many farms currently have mixed memory, local disk space, ….. Glue contains sub clusters but they are currently identified by only the hostname. GlueSubClusterUniqueID=lcgce02.gridpp.rl.ac.uk So only one hardware type per CE possible. Does RB joins the SubCluster against a GlueCE object.

Heterogeneous Clusters(2) It would seem easy to describe a second sub cluster and use a unique key. GlueSubClusterUniqueID=lcgce02.gridpp.rl.ac.uk-bigmem Different GlueCEs can then join on this? –Does this work? –Information providers may need tweaking. –Will the RB do this, what else will break. –Can the JobManager support different attributes per queue to target nodes. –Advertising fake queues possible.

Future Possibilities One queue per VO per memory size per local disk space per time period …. = a lot of queues. Some sites only have one queue and insist on users setting requirements per job. –It is a good a idea within the batch farm. –The Resource Broker does not pass this on to gram transfer, how much do we want this?

Maui based Info Provider Current info provider only interrogates PBS. PBS has no idea what is going to happen next. A Maui based provider could calculate the ETT better. But it may difficult to port to LSF, BQS, ….

Conclusions Moving to torque and Maui is transparent. Maui and queues per VO will introduce more control of resources within a site and increase their occupancy. Major adjustments are needed to existing queue infrastructures. Heterogeneous cluster support within LCG…. As LCG resources merge with other EGEE resources new challenges arise such as running parallel jobs in production – Crossgrid?

References Maui and Torque homepages including documentation. –http://www.supercluster.org/ Maui/Torque RPMS appearing in LCG. –http://www.gridpp.rl.ac.uk/tb- support/faq/torque.html More Maui/Torque RPMS and qstat cache mechanism. –http://www.dutchgrid.nl/Admin/nikhef/

Scheduling under LCG at RAL UK HEP Sysman, Manchester 11th November 2004 Steve Traylen

Similar presentations

Presentation on theme: "Scheduling under LCG at RAL UK HEP Sysman, Manchester 11th November 2004 Steve Traylen"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scheduling under LCG at RAL UK HEP Sysman, Manchester 11th November 2004 Steve Traylen

Similar presentations

Presentation on theme: "Scheduling under LCG at RAL UK HEP Sysman, Manchester 11th November 2004 Steve Traylen"— Presentation transcript:

Similar presentations

About project

Feedback