Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experimental Comparative Study of Job Management Systems George Washington University George Mason University

Similar presentations


Presentation on theme: "Experimental Comparative Study of Job Management Systems George Washington University George Mason University"— Presentation transcript:

1 Experimental Comparative Study of Job Management Systems George Washington University George Mason University http://ece.gmu.edu/lucite

2 Outline: 1.Review of experiments 2.Results 3.Encountered problems 4.Functional comparison 5.Extension to reconfigurable hardware

3 Review of Experiments

4 science.gmu.edu Linux – PII, 400 MHz, 128 MB RAM Linux RH7.0 – PIII 450 MHz, 512 MB RAM 4 x Linux RH6.2 – 2xPIII – 500 MHz, 128MB m1 pallj / m0 Solaris 8 – UltraSparcIIi, 360 MHz, 512 MB RAM m4 m5 m7 3 x Linux RH6.2 – 2xPIII – 450 MHz, 128MB Solaris 8 – UltraSparcIIi, 440 MHz, 512 MB RAM Solaris 8 – UltraSparcIIi, 440 MHz, 128 MB RAM Solaris 8 – UltraSparcIIi, 330 MHz, 128 MB RAM palpc2 alicja anna magdalena redfox gmu.edu Our Testbed

5 SHORT JOBS (1 s  execution time  2 minutes) * benchmarks used to determine the relative CPU factors of execution hosts SHORT JOBS (1 s  execution time  2 minutes)

6 Machine namesHost TypeHost Model CPU Factor m1-m4LinuxPIII_2_500_1281.65 m5-m7LinuxPIII_2_450_1281.55 palljLinuxPIII_1_450_5121.60 palpc2LinuxP2_1_400_1281.70 alicjaSolaris64USIIi_1_360_5121.0 annaSolaris64USIIi_1_440_1281.2 magdalenaSolaris64USIIi_1_440_5121.2 redfoxSolaris64USIIi_1_330_1281.2 CPU factors for medium benchmark list based on the execution time for bt.W and Sobel1024i

7 MEDIUM JOBS (2 minutes  execution time  10 minutes) * benchmarks used to determine the relative CPU factors of execution hosts

8 LONG JOBS (10 minutes  execution time  30 minutes) * benchmarks used to determine the relative CPU factors of execution hosts

9 INPUT/OUTPUT JOBS (1 second  execution time  10 minutes)

10 Typical experiment time Job submissions time 1 N i1i1 iNiN time=0 Jobs finishing execution Total time of an experiment  2 hours N= 150 for medium and small jobs 75 for long jobs Pseudorandom delays between consecutive job submissions Poisson distribution of the job submission rate

11 List of experiments

12 time t s submission time t b begin of execution time t e end of execution time t d delivery time T R response time T TA turn around time T EXE execution time T D delivery time Definition of timing parameters

13 time t s submission time t b begin of execution time t e end of execution time T R response time T TA turn around time T EXE execution time T D =0 delivery time=0 Typical scenario determined using the gettimeofday() function

14 Total Throughput time Job submissions time 1 N i1i1 iNiN time=0 Jobs finishing execution T N – time necessary to execute N jobs Total Throughput = N TNTN

15 Partial Throughput time Job submissions time 1 N i1i1 iNiN time=0 Jobs finishing execution T k – time necessary to execute k jobs Throughput (k) = k TkTk ikik

16 Utilization

17 Results

18 0 20 40 60 80 100 120 2 jobs/min 4 jobs/min 12 jobs/min Average job submission rate Medium jobs – Total Throughput Throughput [jobs/hour] LSF PBS Codine Condor 76 70 68 79 97 91 82 114 107 102 86 110

19 0 500 1000 1500 2000 2500 2 jobs/min 4 jobs/min 12 jobs/min Medium jobs – Turn-around Time LSF PBS Codine Condor Average job submission rate Turn-around Time [s] 496 462 607 505 1134 944 1293 1148 1765 1466 1949 1627

20 0 200 400 600 800 1000 1200 1400 1600 2 jobs/min 4 jobs/min 12 jobs/min Average job submission rate Medium jobs – Response Time Response Time [s] LSF PBS Codine Condor 13 3 31 28 636 452 734 671 1274 984 1385 1156

21 0 10 20 30 40 50 60 70 80 90 2 jobs/min 4 jobs/min 12 jobs/min Average job submission rate Medium jobs – Utilization Utilization [%] LSF PBS Codine Condor 54 41 70 61 63 57 71 74 73 67 78 69

22 0 5 10 15 20 25 30 35 40 45 0.5 job/min 2 jobs/min Average job submission rate Long jobs – Total Throughput Throughput [jobs/hour] LSF PBS Codine Condor 25 26 18 40 28 30 23 42

23 0 500 1000 1500 2000 2500 3000 3500 4000 0.5 job/min 2 jobs/min Average job submission rate Long jobs – Turn-around Time Turn-around Time [s] LSF PBS Codine Condor 1148 1079 1903 1926 2191 2163 3401 2357

24 0 200 400 600 800 1000 1200 1400 1600 0.5 job/min 2 jobs/min Average job submission rate Long jobs – Response Time Response Time [s] LSF PBS Codine Condor 13 3 3 721 860 799 1478 1225

25 0 10 20 30 40 50 60 70 80 0.5 job/min 2 jobs/min Average job submission rate Long jobs – Utilization Utilization [%] LSF PBS Codine Condor 43 46 52 24 56 58 64 69

26 0 200 400 600 800 1000 1200 1400 4 jobs/min6 jobs/min12 jobs/min30 jobs/min60 jobs/min Average job submission rate Short jobs – Total Throughput Throughput [jobs/hour] LSF PBS Codine Condor 240 227 234 160 356 322 337 205 652 414 607 280 1076 576 336 1255 642 370 1027 1210

27 0 20 40 60 80 100 120 140 4 jobs/min6 jobs/min12 jobs/min30 jobs/min60 jobs/min Average job submission rate LSF PBS Codine Condor Short jobs – Turn-around Time Turn-around Time [s] 42 34 29 50 41 33 29 51 42 58 29 51 68 58 31 52 120 62 32 50

28 0 10 20 30 40 50 60 70 80 90 4 jobs/min6 jobs/min12 jobs/min30 jobs/min60 jobs/min Average job submission rate LSF PBS Codine Condor Short jobs – Response Time Response Time [s] 9 2 1 19 9 3 1 98 1 17 32 8 2 18 83 9 2 18

29 0 5 10 15 20 25 30 35 40 45 4 jobs/min6 jobs/min12 jobs/min30 jobs/min60 jobs/min Average job submission rate LSF PBS Codine Condor Short jobs – Utilization Utilization [%] 9 18 6 6 15 21 9 8 20 35 16 10 26 38 12 37 38 12 32 37

30 Medium jobs – Total Throughput Throughput [jobs/hour] 0 20 40 60 80 100 120 1 job/CPU, 4 jobs/min 2 jobs/CPU, 4 jobs/min Maximum number of jobs per CPU LSF PBS Codine Condor 97 91 82 114 90 80 67 105

31 Medium jobs – Turn-around Time Turn-around Time [s] 0 200 400 600 800 1000 1200 1400 1600 1 job/CPU, 4 jobs/min 2 jobs/CPU, 4 jobs/min Maximum number of jobs per CPU LSF PBS Codine Condor 1134 944 1293 1147 1297 1273 1482 969

32 Medium jobs – Response Time Response Time [s] 0 100 200 300 400 500 600 700 800 1 job/CPU, 4 jobs/min 2 jobs/CPU, 4 jobs/min Maximum number of jobs per CPU LSF PBS Codine Condor 636 452 734 671 387 285 386

33 Medium jobs – Utilization Utilization [%] 0 10 20 30 40 50 60 70 80 1 job/CPU, 4 jobs/min 2 jobs/CPU, 4 jobs/min Maximum number of jobs per CPU LSF PBS Codine Condor 63 57 71 74 63 58 63 54

34 Encountered problems

35 1. Jobs with high requirements on the stack size Indication: Certain jobs do not finish execution when run under LSF. The same jobs run correctly outside of any JMS, and under other job management systems Source: Variable STACKLIMIT in $LSB_CONFDIR/ /configdir/lsb.queues Remaining Problem: Documentation of default limits.

36 2. Frequently submitted small jobs Indication: Unexpectedly high response time and turn-around time for a medium job submission rate Possible solution: Defining variable CHUNK_JOB_SIZE (e.g., =5) in lsb.queues, and the variable LSB_CHUNK_NORUSAGE=y in lsf.conf

37 3. Ordering of machines fulfilling resource requirements Question: How many machines are dropped from the list based on the first ordering? Default: r1m : pg

38 4. Random behavior from iteration to iteration Question: Why is r1m different each time? Indication: Assignment of jobs to particular machines is different in each iteration of the experiment

39 5. Boundary effects in the calculation of the throughput Question: How to define the steady state throughput? Indication: Steady state partial throughput different than the total throughput

40 6. Throughput vs. turn-around time Question: How to explain the lack of this correlation? Indication: No correlation between the ranking of JMSes in terms of the throughput and in terms of the turn-around time

41 Functional comparison

42 Operating system, flexibility, user interface LSF Codine PBS CONDOR RES Distribution Source code OS Support User Interface Solaris Linux Tru64 NT GUI & CLI com pub pub/compubgov GUI & CLI GUI & CLI GUI & CLI

43 Scheduling and Resource Management LSF Codine PBS CONDOR RES Batch jobs Interactive jobs Parallel jobs Accounting

44 Efficiency and Utilization LSF Codine PBS CONDOR RES Stage-in and stage-out Timesharing Process migration Dynamic load balancing Scalability

45 Fault Tolerance and Security LSF Codine PBS CONDOR RES Checkpointing Daemon fault recovery Authentication Authorization

46 Documentation and Technical Support LSF Codine PBS CONDOR RES Documentation Technical support

47 JMS features supporting extension to reconfigurable hardware capability to define new dynamic resources strong support for stage-in and stage-out - configuration bitstreams - executable code - input/output data support for Windows NT and Linux

48 Ranking of Centralized Job Management Systems (1) Capability to define new dynamic resources: Excellent:LSF, PBS, CODINE More difficult:CONDOR, RES Stage-in and stage-out: Excellent:LSF, PBS Limited:CONDOR No:CODINE, RES

49 Ranking of Centralized Job Management Systems (2) Overall suitability to extend to reconfigurable hardware: 1.LSF 2.CODINE 3.PBS 4.CONDOR 5.RES without changing the JMS source code requires changes to the JMS source code

50 Extension to reconfigurable hardware

51 Submission host LIM Batch API Master host MLIM MBD Execution host SBD Child SBD LIM RES User job Extension of LSF to reconfigurable hardware (1) Operation of LSF LIM – Load Information Manager MLIM – Master LIM MBD – Master Batch Daemon SBD – Slave Batch Daemon RES – Remote Execution Server queue 1 2 3 4 5 6 7 8 9 10 11 12 13 Load information other hosts other hosts bsub app

52 Extension of LSF to reconfigurable hardware(2) Submission host LIM Batch API Master host MLIM MBD Execution host SBD Child SBD LIM RES User job ELIM – External Load Information Manager ACS API – Adaptive Computing Systems API queue 1 2 3 4 5 6 7 8 9 10 11 12 13 Load information other hosts other hosts bsub app ELIM ACS API 14 FPGA board Status of the board


Download ppt "Experimental Comparative Study of Job Management Systems George Washington University George Mason University"

Similar presentations


Ads by Google