Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Session B2 - CPU and Resource Allocation " Panelists: – Charles Young (BaBar) – David Bigagli " Seed Questions: – Batch queuing system in use?

Similar presentations


Presentation on theme: "Parallel Session B2 - CPU and Resource Allocation " Panelists: – Charles Young (BaBar) – David Bigagli " Seed Questions: – Batch queuing system in use?"— Presentation transcript:

1 Parallel Session B2 - CPU and Resource Allocation " Panelists: – Charles Young (BaBar) – David Bigagli " Seed Questions: – Batch queuing system in use? – Turnaround guarantees? – Pre-allocation of resources?

2 Batch System " Vendor or home-brewed? – Maintenance and support issues. " If vendor, licensing and cost issues. – Already significant fraction of H/W. – Per node? Per unit computing power? Or? " Management concerns. – Can one really management 10K nodes? – Split into separate management domains? Slide from Charles Young (BaBar)

3 LSF - Talk given by David Bigagli (Platform Computing) " What is LSF " Developer's view of LSF – architecture " Scalability " Dealing with resources " Load Information manager " Batch " Lively discussion

4 Discussion: which batch systems are you using and why? " LSF (~30% ?) – LSF has worked for us and continues to work. " PBS (~30%) – PBS is free – Collaborators want us to use PBS (because PBS is free) – Ability to modify source – Nobody is using ProPBS " Condor (~20%) – Condor costs nothing – Cycle stealing allows us to get computing done

5 Discussion: which batch systems continued... " BQS – Homegrown at IN2P3 " used in a small number of external sites – Have had it for seven years – Everyone likes it " FBS – Homegrown at FNAL " used in some external sites – LSF is expensive – FBS is designed to be used on farms – Lightweight and flexible

6 Other issues " Mosix – Only CERN has looked at it " Appears to be difficult to take down individual machines in the cluster " How to deal with abusers? – Turn them over to the user community " How do people schedule downtime ? – Train people that jobs longer than 24 hours are at risk. – CERN posts a future shutdown time for the job starter (internal) – BQS has this feature inside. – Condor has eventd for draining. Labs reboot and have maintenance windows.


Download ppt "Parallel Session B2 - CPU and Resource Allocation " Panelists: – Charles Young (BaBar) – David Bigagli " Seed Questions: – Batch queuing system in use?"

Similar presentations


Ads by Google