Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P LoadLeveler Blue Gene Support Enci Zhong LoadLeveler Development.

Similar presentations


Presentation on theme: "© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P LoadLeveler Blue Gene Support Enci Zhong LoadLeveler Development."— Presentation transcript:

1 © 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P LoadLeveler Blue Gene Support Enci Zhong LoadLeveler Development

2 IBM Blue Gene/P System Administration Interaction with Blue Gene Blue Gene Jobs Blue Gene Bridge API Get Resources And Jobs Data Find Resource For jobs And Define Partitions Blue Gene mpirun submitted Run a job LoadLeveler

3 IBM Blue Gene/P System Administration LoadLeveler Daemons Service NodeFront End Node Master LoadL_master Central Manager LoadL_negotiator Master LoadL_master Schedd LoadL_schedd Startd/Starter LoadL_startd LoadL_starter Jobs Blue Gene mpirun Blue Gene Bridge API

4 IBM Blue Gene/P System Administration LoadLeveler Configuration Service NodeFront End Node /etc/LoadL.cfg LoadL_config LoadL_admin LoadL_config.local

5 IBM Blue Gene/P System Administration LoadL_config SCHEDULER_TYPE = BACKFILL NEGOTIATOR_CYCLE_DELAY = 10 VM_IMAGE_ALGORITHM = FREE_PAGING_SPACE_PLUS_FREE_REAL_MEMORY BG_ENABLED = true BG_CACHE_PARTITIONS = true BG_MIN_PARTITION_SIZE = 32 CM_CHECK_USERID = false BG_ALLOW_LL_JOBS_ONLY = false

6 IBM Blue Gene/P System Administration LoadL_admin : type = machine central_manager = true : type = machine central_manager = false schedd_host = true # Allow jobs be submitted from the SN small: type = class include_bg = R00-M0 row1: type = class include_bg = R1 medium: type = class exclude_bg = R0 R1

7 IBM Blue Gene/P System Administration LoadL_config.local Service Node Front End Node START_DAEMONS = TRUE SCHEDD_RUNS_HERE = True STARTD_RUNS_HERE = True MAX_STARTERS = 60 CLASS = small(10) row1(20) medium(30) large(10) START_DAEMONS = TRUE SCHEDD_RUNS_HERE = FALSE STARTD_RUNS_HERE = FALSE Note: mpirun is run on the FEN and it doesn’t use a lot of resources and thus many mpirun processes can share the same FEN.

8 IBM Blue Gene/P System Administration Before Starting LoadLeveler on Blue Gene/P  Standalone mpirun must work  Add userid loadl to the bgpadmin group  /usr/lib64/libdb2.so must exist  In the login profile of userid loadl, add export BRIDGE_CONFIG_FILE=/bgsys/drivers/ppcfloor/bin/bridge.config export DB_PROPERTY=/bgsys/drivers/ppcfloor/bin/db.properties.tpl The two files or their local copy must be readable by userid loadl  Note: LoadLeveler need to be restarted after Blue Gene driver or database updates, etc.

9 IBM Blue Gene/P System Administration Starting LoadLeveler  llctl start on both the FEN and SN  llstatus  look for “Blue Gene is present”  llstatus -b Name Base Partitions c-nodes InQ Run BGP 4x4x2 32x32x16 0 0  llstatus –B all  show all base partitions  llstatus –P  llstatus –b –l  show more BG resources

10 IBM Blue Gene/P System Administration LoadLeveler Job Command File # @ job_name = myjob # @ comment = "BG Job by Size" # @ error = $(home)/output/$(job_name).$(jobid).err # @ output = $(home)/output/$(job_name).$(jobid).out # @ environment = COPY_ALL; # @ wall_clock_limit = 00:20:00 # @ notification = error # @ notify_user = $(user)@us.ibm.com # @ job_type = bluegene # @ bg_size = 32 # @ queue /usr/bin/mpirun -exe /bgtest/hello.rts -verbose 1

11 IBM Blue Gene/P System Administration Blue Gene Job Keywords  Mutually exclusive (one must be specified)  bg_size  number of compute nodes  bg_shape  1x2x4 number of BPs in x,y,z direction  bg_partition  specify a predefined partition  Optional  bg_connection  MESH, TORUS, PREFER_TORUS  bg_rotate  True or False  bg_requirements  c-node memory

12 IBM Blue Gene/P System Administration Submit a Job  llsubmit  llq  llq –b  show Blue Gene specific info  llq –s  show why the job step remains idle

13 IBM Blue Gene/P System Administration Partition Size and I/O Nodes  I/O Nodes/BP = 4, partition size >= 128  I/O Nodes/BP = 8, partition size >= 64/128  I/O Nodes/BP = 16, partition size >= 32  I/O Nodes/BP = 32, partition size >= 16/32  Only Blue Gene/P allows partition sizes 16, 64 and 256  LoadLeveler defined partition size can not be smaller than BG_MIN_PARTITION_SIZE  1 Rack has two Base Partitions (BP)

14 IBM Blue Gene/P System Administration Mixed I/O Nodes Ratio  One rack has 16 I/O Nodes/BP  Other racks have 4 I/O Nodes/BP  A job asks for 32 compute nodes will only be run on the rack with 16 I/O Nodes/BP  A job asks for 128 compute nodes can be run on any rack  BG_MIN_PARTITION_SIZE=16  32 actual  BG_MIN_PARTITION_SIZE=128  128 actual

15 IBM Blue Gene/P System Administration Unconnected I/O Nodes  Each BP has 16 I/O Nodes (ION)  One rack has all 16 IONs/BP connected  Other racks has only 4 of them connected  Must set  max_psets_per_bp=4 in db.properties file  BG_MIN_PARTITION_SIZE=128  Dynamically created partitions only use 4 IONs per BP  Predefined partitions (through mmcs_db_console or the navigator) can use more IONs and be smaller

16 IBM Blue Gene/P System Administration Advance Reservation  In LoadL_admin, add loadl: type = user max_reservations = 10  llmkres –t 14:00 –d 300 –c 1024  llmkres –t 12/18 08:00 –d 60 –f my_jcf  In LoadL_config, can add MAX_RESERVATIONS = 20 (default 10)

17 IBM Blue Gene/P System Administration Advance Reservation  Reserve for maintenance  Reserve for special workload  Allow other users or groups to use  Allow a reservation be automatically cancelled if no more jobs can run  Allow extra resources to be shared when all special jobs for the reservation start to run

18 IBM Blue Gene/P System Administration Advance Reservation  More resources are needed by TORUS than by MESH  Reservation made through bg_partition reserves exactly the same resources as the predefined partition  Reservation made through bg_size or bg_shape can reserves more resources to allow smaller jobs to run inside the reservation

19 IBM Blue Gene/P System Administration Fair Share Scheduling  Share resources “fairly” according to resource entitlement and usage  In LoadL_config, specify FAIR_SHARE_TOTAL_SHARES = 1000 FAIR_SHARE_INTERVAL = 720  llfs to show shares allocated and used  llfs –s/-r /-r to save/restore/reset the fair share data

20 IBM Blue Gene/P System Administration Fair Share Scheduling  It’s all about job priority!  SYSPRIO must be specified to enable Fair Share Scheduling  very flexible  NEGOTIATOR_RECALCULATE_SYSPRIO_I NTERVAL must be positive  In LoadL_admin, specify fair_shares values for some or all users/groups  All users can run jobs even if fair_shares=0

21 IBM Blue Gene/P System Administration A Mixed LoadLeveler Cluster  A Blue Gene system can be in the same cluster with other AIX or Linux machines  The Central Manager must be run on the service node of the Blue Gene system  Only one Blue Gene system can be in a LoadLeveler cluster  Job classes can be used to separate Blue Gene FENs, Linux and AIX machines  End users can submit all jobs the same way

22 IBM Blue Gene/P System Administration Multicluster Support In LoadL_admin, add Multicluster definitions ################################ # MULTICLUSTER DEFINITIONS # ################################ BGL: type = cluster outbound_hosts = bglfen3 inbound_hosts = bglfen3 local = true BGP1: type = cluster outbound_hosts = dd1sys1fen1 inbound_hosts = dd1sys1fen1 BGP2: type = cluster outbound_hosts = dd2sys1fen2 inbound_hosts = dd2sys1fen2 Three separate clusters forms a Multicluster environment

23 IBM Blue Gene/P System Administration Multicluster Support  From one cluster, a user can submit jobs to any other cluster llsubmit –X BGP1 my_job_command_file  From one cluster, a user can query jobs in any other cluster llq –X BGP2

24 IBM Blue Gene/P System Administration Runtime Environment  Available to Prologs and Epilogs  In LoadL_config, add JOB_PROLOG = /bgtest/bg_job_prolog.sh #!/bin/ksh name=`basename $0.sh` echo "$LOADL_BG_PARTITION $LOADL_BG_SIZE $LOADL_BG_CONNECTION $LOADL_BG_BPS $LOADL_BG_IONODES `date` $LOADL_STEP_OWNER $LOADL_STEP_ID $LOADL_STEP_CLASS " > /tmp/$name.$LOADL_STEP_ID.log  cat /tmp/bg_job_prolog.bgpdd1sys1.rchland.ibm.com.2.0.log LL07111910011602 512 MESH R20-M1 N00-J00,N04-J00,N08- J00,N12-J00 Mon Nov 19 10:01:16 CST 2007 ezhong bgpdd1sys1.rchland.ibm.com.2.0 high

25 IBM Blue Gene/P System Administration Blue Gene Job Info from llq # llq Id Owner Submitted ST PRI Class Running On ------------------------ ---------- ----------- -- --- ------------ ----------- bgpdd1sys1.9.0 ezhong 11/21 10:29 R 50 high bgpdd1sys1 1 job step(s) in queue, 0 waiting, 0 pending, 1 running, 0 held, 0 preempted # llq –b Id Owner Submitted LL BG PT Partition Size ________________________ __________ ___________ __ __ __ ________________ ______ bgpdd1sys1.9.0 ezhong 11/21 10:29 R FR LL07112110294409 512 1 job step(s) in queue, 0 waiting, 0 pending, 1 running, 0 held, 0 preempted # llq -f %id %BB %BS %PT %BG %dd %st Step Id Partition Size PT BG Disp. Date ST ------------------------ ---------------- ------ -- -- ----------- -- bgpdd1sys1.9.0 LL07112110294409 512 FR 11/21 10:29 R 1 job step(s) in queue, 0 waiting, 0 pending, 1 running, 0 held, 0 preempted

26 IBM Blue Gene/P System Administration Blue Gene Job Info from llq # llq –l =============== Job Step bgpdd1sys1.rchland.ibm.com.9.0 ===============... Step Type: Blue Gene Size Requested: 512 Size Allocated: 512 Shape Requested: Shape Allocated: 1x1x1 Wiring Requested: MESH Wiring Allocated: MESH Rotate: True Blue Gene Status: Blue Gene Job Id: Partition Requested: Partition Allocated: LL07112110294409 BG Partition State: FREE BG Requirements:...

27 IBM Blue Gene/P System Administration Multiple Top Dogs  Resources are reserved for highest priority jobs (top dogs) during a dispatching cycle that other jobs are backfilled around them.  In LoadL_config, set MAX_TOP_DOGS =  In LoadL_admin, set max_top_dogs =  Default number of top dogs is 1.

28 IBM Blue Gene/P System Administration Top Dog Query  A sample Data Access API program /opt/ibmll/LoadL/full/samples/lldata_access/topdog.c > make /usr/bin/g++ -m64 -g -I. -I/opt/ibmll/LoadL/full/include -c -o topdog.o topdog.c /usr/bin/g++ -m64 -g -I. -I/opt/ibmll/LoadL/full/include -o topdog topdog.o -m64 -L. - L/usr/lib64 -lllapi -lpthread –ldl >./topdog Step Owner q_sysprio Estimated Start Time ------------------------------ ---------- ---------- ----------------------- - bgpsys6.rchland.ibm.com.56.0 ezhong 50000 Thu Jun 21 17:50:32 2007 bgpsys6.rchland.ibm.com.56.1 ezhong 50000 Thu Jun 21 18:00:19 2007 bgpsys6.rchland.ibm.com.55.2 ezhong 50000 Thu Jun 21 17:50:19 2007 bgpsys6.rchland.ibm.com.55.3 ezhong 50000 Thu Jun 21 17:50:32 2007 ===== The top dogs were considered for scheduling at Thu Jun 21 17:40:43 2007

29 IBM Blue Gene/P System Administration More about job priority  q_sysprio in the llq –l output is used by LoadLeveler Central Manger for scheduling  Set in LoadL_config SYSPRIO_THRESHOLD_TO_IGNORE_STEP = integer  Jobs with lower q_sysprio won’t be scheduled to run  llmodify –s -- Admin only command option  Assign a fixed priority, won’t be changed by priority recalculation

30 IBM Blue Gene/P System Administration LoadLeveler Download Sites  For the initial download (including the license information)  https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=BG L-BLUEGENE  https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=BG P-BLUEGENEP  Those pages are password protected.  For the updates  http://www14.software.ibm.com/webapp/set2/sas/f/lodleveler/home.html  open for everyone.  For LoadLeveler documentation  http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ib m.cluster.infocenter.doc/library.html

31 IBM Blue Gene/P System Administration Installing LoadLeveler for Blue Gene/P  File sets needed  IBMJava2-142-ppc64-JRE-1.4.2-5.0.ppc64.rpm  LoadL-full-license-SLES10-PPC64-3.4.2.1-0.ppc64.rpm  LoadL-full-SLES10-PPC64-3.4.2.1-0.ppc64.rpm  From the directory with the filesets:  rpm -ihv LoadL-full-license-SLES10-PPC64-3.4.2.1-0.ppc64.rpm  /opt/ibmll/LoadL/sbin/install_ll -y -d.

32 IBM Blue Gene/P System Administration Installing LoadLeveler for Blue Gene/L  Please see Chapter 10 of the IBM Redbook: “IBM System Blue Gene Solution: Configuring and Maintaining Your Environment” http://www.redbooks.ibm.com/abstracts/sg247352.html


Download ppt "© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P LoadLeveler Blue Gene Support Enci Zhong LoadLeveler Development."

Similar presentations


Ads by Google