Presentation is loading. Please wait.

Presentation is loading. Please wait.

Process Manager Specification Rusty Lusk 1/15/04.

Similar presentations


Presentation on theme: "Process Manager Specification Rusty Lusk 1/15/04."— Presentation transcript:

1 Process Manager Specification Rusty Lusk lusk@mcs.anl.gov 1/15/04

2 Outline Process Manager Functionality Expected Consumers Commands Semantics Examples Schema

3 Process Manager Functionality Process Execution Start process groups Provide status information during execution Provide command output and error messages Return exit status information Process Group Control Kill process groups Signal Process groups

4 Expected Consumers Components which execute programs Components which need to locate running processes Components which need to control running processes

5 Schematic of Process Management Component in Scalable Systems Software Context QMPM MPD’s mpdrun XML file mpiexec (MPI Standard args) QM’s job submission language interactive simple scripts or hairy GUIs using SSS XML SSS Components application processes SSS XML SSS side Prototype MPD-based implementation side EM SD PM Sched NSM

6 Commands - creates a new process group. - get status information; includes current process ids, exit status information and stdout/err information. - send a unix signal to all processes in a process group - kill all processes in a process group - allow process manager to discard process group information after process group has exited. All commands use the restriction syntax

7 Examples node1 node2

8 Examples (continued)

9 Examples (continued) Response:

10 Using the SSS Software Architecture on Chiba City

11 Chiba City Medium-sized cluster at Argonne National Laboratory 256 dual-processor 500MHz PIII’s Myrinet Linux (and sometimes others) No shared file system, for scalability (but now a test platform for PVFS2) Dedicated to Computer Science scalability research, not applications Many groups use it as a research platform Both academic and commercial Also used by friendly, hungry applications New requirement: support research requiring specialized kernels and alternate operating systems, for OS scalability research

12 New Challenges Want to schedule jobs that require node rebuilds (for new OS’s, kernel module tests, etc.) as part of “normal” job scheduling Want to build larger virtual clusters (using VMware or User Mode Linux) temporarily, as part of “normal” job scheduling Requires major upgrade of Chiba City systems software

13 Chiba Commits to SSS Fork in the road (occurred August, 2003): Major overhaul of old Chiba systems software (open PBS + Maui scheduler + homegrown stuff), OR Take great leap forward and bet on all-new software architecture of SSS Problems with leaping approach: SSS interfaces not finalized Some components don’t yet use library (implement own protocols in open code, not encapsulated in library) Some components not fully functional yet Solutions to problems: Collect components that are adequately functional and integrated (PM, SD, EM, BCM) Write “stubs” for other critical components (Sched, QM) Do without some components (CKPT, monitors, accounting) for the time being

14 Features of Adopted Solution Stubs adequate, at least for time being Scheduler does FIFO + reservations + backfill, improving QM implements “PBS compatibility mode” (accepts user PBS scripts) as well as asking Process Manager to start parallel jobs directly Process Manager wraps MPD-2 Single ring of MPD’s runs as root, managing all jobs for all users MPD’s started by Build-and-Config manager at boot time An MPI program called MPISH (MPI Shell) wraps user jobs for handling file staging and multiple job steps Python implementation of most components Demonstrated feasibility of using SSS component approach to systems software Running normal Chiba job mix for over five months now Moving forward on meeting new requirements for research support

15 Next Steps Integrate other components into this structure Integrate other instantiations of components into this structure Replace stubs as possible Easiest if they use same XML API’s Put “unusual” capabilities into production Rebuilding nodes on the fly


Download ppt "Process Manager Specification Rusty Lusk 1/15/04."

Similar presentations


Ads by Google