Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Job Submission.

Similar presentations


Presentation on theme: "© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Job Submission."— Presentation transcript:

1 © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Job Submission

2 IBM Blue Gene/P System Administration Job submission  Basic procedure 1. Create a block 2. Allocate a block 3. Boot a block 4. Run a job 5. Free the block, or run another job

3 IBM Blue Gene/P System Administration Block creation  Two ways of creating blocks  Block builder  mmcs_db_console  The use of block builder is recommended  Block builder is capable to create any available blocks  Block builder is a lot easier to use…

4 IBM Blue Gene/P System Administration Block creation  Block builder  Available via Navigator  Able to create any block with a valid block size –16, 32, 64, 128 and 256 nodes (mesh) –512 and multiples of 512 nodes (torus/mesh)  Starting card –16 node : J00, J01 –64 node : N00, N02, N04, N08, N10, N12 or N14 –128 node : N00, N04, N08 or N12 –256 node : N00 or N08

5 IBM Blue Gene/P System Administration Block creation  mmcs_db_console  Able to create most of the available blocks  Provides a set of commands to create block –genblock : a base partition –genblocks : each base partition on the system –gensmallblock : a sub-base partition –genBPblock : a set of base partitions –genfullblock : the entire system  Use Navigator for pass-through or split cables

6 IBM Blue Gene/P System Administration Block deletion  Available via mmcs_db_console  mmcs$ delete bgpblock R00-M0 type ‘help delete’ in the mmcs shell prompt for usage  Block deletion is not available via Navigator’s GUI  mmcs_db_console within the Navigator is available

7 IBM Blue Gene/P System Administration Exercise  Create a block from the block builder  Create a block from the mmcs_db_console  Delete a block from the mmcs_db_console

8 IBM Blue Gene/P System Administration Job modes  There are three job modes, virtual node mode, SMP mode, and Dual Mode  MPI Ranks (processes) per node & Threads per process:  VNM 4 processes/node, 1 thread/process  SMP 1 process/node, 4 threads/process  Dual 2 processes/node, 2 threads/process CPU 0 Rank 0 CPU 1 Rank 1 CPU 2 Rank 2 CPU 3 Rank 3 Virtual Node Mode CPU 0 Rank 0 CPU 1 thread CPU 2 thread CPU 3 thread SMP Mode CPU 0 Rank 0 CPU 1 thread CPU 2 Rank 1 CPU 3 thread Dual Mode

9 IBM Blue Gene/P System Administration Job submission  Ways to submit a job  mmcs_db_console  mpirun  LoadLeveler

10 IBM Blue Gene/P System Administration Job submission  mmcs_db_console  A console for the Midplane Management Control System (MMCS)  Used to configure and allocate blocks of compute nodes and I/O nodes and run programs on the BG/P system.  Basically for administrator use  Requires an access to the service node  Environmental variable needed to be set –/etc/profile.d/bgp.sh  Caveat when submitting jobs from the console –No stdin support –stdout & stderr sent to files

11 IBM Blue Gene/P System Administration Job submission  mmcs_db_console 1. $ cd /bgsys/driver/ppcfloor/bin 2. $./mmcs_db_console 3. mmcs$ allocate_block R00-M0 4. mmcs$ boot_block 5. mmcs$ submit_job /bghome/test/hello.rts /bghome/test 6. mmcs$ free R00-M0 7. mmcs$ quit type ‘help’ in the mmcs shell prompt for available commands

12 IBM Blue Gene/P System Administration Job submission  mmcs commands  allocate_block : mark the block as allocated, but does not boot it  boot_block : initialize, load and start block resource  submit_job : starts an executable running on the currently selected block  free : release the resources associated with the block ID

13 IBM Blue Gene/P System Administration Job submission  mpirun  Launches jobs on the BG/P hardware and acts as a job monitor –mpirun continually monitors status of the job, terminates when job is done –Transparently forwards stdin & receives stdout and stderr  Acts as a gateway for debuggers such as gdb and TotalView  Each job requires a partition –Can be allocated on the fly (-np or –shape) –Or used predefined partitions  Can boot partitions from their initial state –Disable this feature with –noallocate –User should verify no overlapping busy hardware  Can optionally not destroy booted partitions with -nofree

14 IBM Blue Gene/P System Administration Job submission  mpirun $ mpirun –partition R00-M0 –mode SMP –cwd /bghome/test –exe /bghome/test/hello.rts partition : specify which block to use mode : specify execution processor mode cwd : specify currently working directory exe : specify the program to run type mpirun –h for available options

15 IBM Blue Gene/P System Administration Job submission  LoadLeveler  Allocates machine resources to run jobs  Scheduling of jobs depends on the availability of resources within the system  A user submits a job using a job command file  Maximize the efficiency of the cluster by maximizing the utilization of resources

16 IBM Blue Gene/P System Administration Job submission  LoadLeveler some of the tasks can be performed:  Choosing the next job to run  Examining the job requirements  Collecting available resource in its cluster  Dispatching the job to the selected machine  Controlling running jobs  Create reservations and schedule jobs to run in the reservations  Job preemption to enable high priority jobs to run immediately  Fair share scheduling to automatically balance resources among users or groups of users  Co-scheduling to enable several jobs to be scheduled to run at the same time

17 IBM Blue Gene/P System Administration Example code  1. Write simple hello world: /* Hello World program */ #include void main(void) { printf("Hello World!\n"); }  2. Compile the program: /bgsys/drivers/ppcfloor/comm/bin/mpicc -o hello hello.c  3. Run the program: Assuming that the program lives in /bgsys/apps and you want the results (STDOUT and STDERR) to be written to /bgsys/apps/results: At the mmcs_db_console prompt: mmcs$ submit_job /bgsys/apps/hello /bgsys/apps/results/

18 IBM Blue Gene/P System Administration Exercise  Submit a job using mmcs_db_console  Free the block after the job finishes  Submit a job using mpirun  Submit a job using LoadLeveler

19 IBM Blue Gene/P System Administration Job termination  mmcs_db_console  killjob, kill_job 1.mmcs$ killjob R00-M0 124 2.mmcs$ wait_job  Terminating a job can take a while  default timeout is 5 minutes

20 IBM Blue Gene/P System Administration Job termination  mpirun Control-C –mpirund will do a cleanup –Do not send multiple control-C Second control-C will force termination Third control-C is almost similar to kill -9, which may cause block state to be left in limbo

21 IBM Blue Gene/P System Administration Scripting A list of commands for mmcs_db_console can be written into a file for a scripting usage $ mmcs_db_console < script_file script_file is a simple ascii text file with a list of commands for mmcs_db_console

22 IBM Blue Gene/P System Administration Scripting  Sample script_file Create and test several blocks $ cat script_file genblock R00-M0 R00-M0 64 allocate R00-M0 free R00-M0 genblock R00-M1 R00-M1 64 allocate R00-M1 free R00-M1 … quit

23 IBM Blue Gene/P System Administration Bridge API  Public API used by job schedulers  LoadLeveler, SLURM, Altair PBS Pro, Platform LSF, Cobalt  Used by mpirun too  Has Interfaces to manage various Blue Gene resources  Create, destroy, query logical constructs such as jobs and partitions  Query physical entities such as midplanes, node cards, switches, and cables  Essentially a thin abstraction layer of the database  Requires a polling model to obtain machine state, example:  Grab a snapshot of the machine state  Create a partition based on free resources  Boot partition  Poll partition state until it is INITIALIZED


Download ppt "© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Job Submission."

Similar presentations


Ads by Google