Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cost-effective clustering with OpenPBS113/02/2003 Cost-effective clustering with OpenPBS Ben Webb WGR Research Group Physical and Theoretical Chemistry.

Similar presentations


Presentation on theme: "Cost-effective clustering with OpenPBS113/02/2003 Cost-effective clustering with OpenPBS Ben Webb WGR Research Group Physical and Theoretical Chemistry."— Presentation transcript:

1 Cost-effective clustering with OpenPBS113/02/2003 Cost-effective clustering with OpenPBS Ben Webb WGR Research Group Physical and Theoretical Chemistry Lab. University of Oxford

2 213/02/2003Cost-effective clustering with OpenPBS Overview History of PBS Interests of the WGR group OpenPBS architecture: portability, security, scheduling Grid integration Alternatives

3 313/02/2003Cost-effective clustering with OpenPBS History of PBS PBS is the Portable Batch System Developed from 1993 to 1997 for NASA Intended to replace NQS Currently available as: –OpenPBS (open source) –PBSPro (commercial)

4 413/02/2003Cost-effective clustering with OpenPBS Interests of the WGR group High throughput –Virtual screening (cancer screensaver) –Met by loose grid of over 2 million PCs; United Devices/Intel High performance –Ab initio chemistry –Simulation of chemical reactions (free energy) –Met by OpenPBS at zero software cost

5 513/02/2003Cost-effective clustering with OpenPBS OpenPBS architecture Server: keeps track of all jobs Scheduler: tells the server when and where to run jobs MOM (Machine Oriented Miniserver): runs on each node to start, monitor, and terminate jobs, under instruction from the server POSIX compliant batch system Supports file staging for executables and data No need for shared filesystem (e.g. NFS) although this does simplify communication

6 613/02/2003Cost-effective clustering with OpenPBS An example OpenPBS setup

7 713/02/2003Cost-effective clustering with OpenPBS Advantages of PBSPro Pre-emptive job scheduling Scheduler backfilling Improved fault tolerance Desktop Cycle Harvesting Paid support (all OpenPBS support is via mailing lists) Largely compatible with OpenPBS

8 813/02/2003Cost-effective clustering with OpenPBS Portability Runs on most Unix-like systems: e.g. Linux/Irix/Unicos/HPUX/IA64 etc. MOMs for various architectures take advantage of system-specific features –e.g. checkpointing supported on certain architectures Full server/client/MOM support for heterogeneous networks

9 913/02/2003Cost-effective clustering with OpenPBS Queues and nodes Unlike NQS, PBS does not rely on queues for scheduling decisions Queues are not tied to nodes, but can specify resources Routing queues can pass jobs to execution queues, possibly on different PBS servers Nodes can have any number of virtual processors

10 1013/02/2003Cost-effective clustering with OpenPBS Resource definition Server-defined properties group nodes into classes - e.g. intel for all Intel architecture machines Additional resources (e.g. tape drives, software licences) can be specified by each MOM –Custom resources are not utilised by the default scheduler

11 1113/02/2003Cost-effective clustering with OpenPBS Resource usage Timeshared nodes: balanced by load Cluster nodes: jobs allocated to virtual processors, usually exclusively MOMs track jobs and kill any that exceed resource limits (e.g. CPU or wall time, memory) No unified mechanism for accounting of running and finished jobs –qstat for running jobs –Server accounting logs for finished jobs

12 1213/02/2003Cost-effective clustering with OpenPBS Scheduling Scheduler is just a privileged client Well-defined PBS scheduling API Facilities to write schedulers in C/BaSL/Tcl OpenPBS provides a simple FIFO scheduler, as well as custom schedulers to take advantage of system-specific features Maui scheduler (third party) also integrates with other batch systems, and provides powerful scheduling

13 1313/02/2003Cost-effective clustering with OpenPBS Security Uses rhosts mechanism for authentication of clients to the server (consistent user name space not required), but does not require rsh MOMs can use rsh, ssh or cp (via NFS) to stage files in and out Access Control Lists can also be used to provide extra security PBS daemons use non-random port numbers, and TCP for most communication, allowing straightforward firewalling All daemons run as root! (No reported vulnerabilities to date, however.)

14 1413/02/2003Cost-effective clustering with OpenPBS Parallel support Conventional MPI mechanisms rely on well- behaved users, and lack resource tracking OpenPBS provides a Task Manager (TM) API –Allows parallel PBS jobs to spawn processes on nodes other than the master –mpiexec (third party) allows start-up of MPI jobs via the TM mechanism (MPICH/EMP/LAM) –Current LAM CVS also has a PBS-TM boot SSI (system services interface) for job start-up

15 1513/02/2003Cost-effective clustering with OpenPBS Customisation Full source code available, for commercial or non-commercial use Site-specific modification routines allow easy customisation of likely targets Defined C API for job submission, query etc. Third-party projects and patches, e.g. mpiexec, Cplant (fault tolerance), PyPBS, scalability patches, AFS token management

16 1613/02/2003Cost-effective clustering with OpenPBS Grid integration Globus Resource Allocation Manager (GRAM) available for PBS Maui scheduler or PBSPro default scheduler support advance reservations Silver metascheduler is grid-aware, has full support for PBS, and can work with or without Globus

17 1713/02/2003Cost-effective clustering with OpenPBS Comparison with Sun Grid Engine Both systems perform balancing of jobs/load between managed nodes PBS server is a single point of failure; SGE supports shadow masters SGE seems to now be more actively developed than OpenPBS

18 1813/02/2003Cost-effective clustering with OpenPBS Summary and acknowledgements OpenPBS is a cheap solution for Linux clustering, conventional supercomputer management, and/or use of idle workstations Can upgrade easily to PBSPro if desired PBS includes software developed by NASA Ames Research Center, Lawrence Livermore National Laboratory, and Veridian Information Solutions, Inc. Visit www.OpenPBS.org for OpenPBS software support, products, and information. WGR group webpages: http://bellatrix.pcl.ox.ac.uk/


Download ppt "Cost-effective clustering with OpenPBS113/02/2003 Cost-effective clustering with OpenPBS Ben Webb WGR Research Group Physical and Theoretical Chemistry."

Similar presentations


Ads by Google