System Utilization Benchmark on the Cray T3E and IBM SP Adrian Wong, Leonid Oliker, William Kramer, Teresa Kaltz, Therese Enright and David Bailey National.

System Utilization Benchmark on the Cray T3E and IBM SP Adrian Wong, Leonid Oliker, William Kramer, Teresa Kaltz, Therese Enright and David Bailey National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory

Scientific Supercomputer Workload Long running batch jobs (hours) Typically 64 nodes per job Often long list of queued jobs Job turnaround maybe days

Motivations –Ability to fully utilize a large computer is almost as important as the speed of the computer. –Large capability mainframes rarely have idle cycles - need to maximize users’ productivity. –Need a way to measure potential day-to-day utilization. –No metric to gauge configuration changes other than anecdotal. –Increased complexity of scheduling with parallel platforms A test to assess system capabilities & configuration effects on utilization Effective System Performance (ESP)

Parallel Job Scheduling Optimization problem in packing with space (processor) and time constraints Dynamic situation Tradeoffs in turnaround, utilization & fairness

Scheduling Strategies Job Queue Hole Order of Submission Best-Fit-First Scan queue for best fit First-Come-First-Serve Wait for right size hole Starvation of large jobs May idle system Respects submission order

Key OS System Capabilities Swapping / Gang-scheduling Job migration / compaction Priority preemption Backfill Disjoint partitions Checkpoint / restart Dynamically adjustable queue structures

ESP Design Goals & Attributes Transferable metric(s) / Valid comparisons Reproducible Easily interpreted results Portable Platform size and speed independent Capture essence of real workload Compact and easily distributed Easy to run (< 12 hours) Automated / no human intervention Focus on utilization / factor out CPU speed Test responsiveness & adaptability of scheduler

ESP Design Start with throughput test Profile of jobs determined by historical accounting data Find applications with appropriate size and time Use two full configuration jobs to encapsulate change of operational mode (e.g. interactive to batch) Submit jobs in three blocks in pseudo-random order

ESP Test Schematic time <12 hours full config #1full config #2 regular jobs >10% regular jobs shutdown/ reboot (opt) regular jobs Vanilla variant (throughput)

Individual Applications in Jobmix

Jobmix Application Elapsed Times T3E SP Increasing Partition Size

Platforms Tested Cray T3E –512 processors –450 MHz Alpha EV56 –Microkernel MPP OS –NQS & Global Resource Mgr –Oversubscription possible –BFF strategy w/ dynamic queue configs IBM SP –512 processors –200 MHz Power3 –Semiautonomous Monolithic OSes –Loadleveller batch queues –FCFS w/ backfill (backfill disabled in 1st attempt)

T3E Chronology (with swap) Insufficient work; Tailend dilemma Starvation of large jobs Normalized = Elapsed / Theoretical Min

T3E Chronology (without swap) Slight decrease in utilization w/o swap capability Higher overall efficiency - significant overhead w/ swap

SP Chronology Waiting for machine to idle

Queue Wait Times (normalized) Jobs sorted by Partition Size & Submit Time T3E Swap T3E NoSwap SP BFF - larger jobs = longer wait FCFS - less dependence on size Swap permits more simultaneous jobs running = shorter wait times Idling twice causes 3 distinct regimes of wait times

Restoring Backfill on the SP Recognized that backfill is the standard mode for Loadleveller Have problems with backfill and ESP stipulations However… interesting data from invalid testshot

Backfill Effect I (Chronology) SP FCFS SP FCFS w/ backfill Highly efficient, but violates test Need to selectively backfill

Backfill Effect II (Queue Wait Times) SP FCFS SP FCFS w/ backfill

Backfill and Flaw in ESP test FC job submitted All jobs finish except one Guaranteed FC runtime time Backfill is working as expected but long-running job negates effect of reservation time - need finer granularity jobs Stipulation for FC jobs? 1. Run immediately (possibly premature termination of running jobs) T3E 2. Run after current jobs finish SP w/ backfill 3. No further jobs launched until FC finishes SP

Further Design Issues How to end the test? Possible to use backfill (globally or selectively)? Can we formulate a turnaround metric? Scalability in size and speed Finer granularity of jobs cf. overall test Perhaps need additional vanilla throughput test to evaluate purely scheduler performance

Conclusions & Observations SP - Can achieve very high utilization with backfill and no topology constraints SP -Lack of adaptability with dynamic workload - run ASAP mode T3E - Swapping with high overhead degrades utilization T3E - Can adapt to dynamic workload requirements

Ongoing and Future Work Scheduled test run on 512-way Origin 2K & Compaq SC Vanilla throughput runs on T3E and SP Redesign for next version of ESP Distribute ESP to other interested sites

System Utilization Benchmark on the Cray T3E and IBM SP Adrian Wong, Leonid Oliker, William Kramer, Teresa Kaltz, Therese Enright and David Bailey National.

Similar presentations

Presentation on theme: "System Utilization Benchmark on the Cray T3E and IBM SP Adrian Wong, Leonid Oliker, William Kramer, Teresa Kaltz, Therese Enright and David Bailey National."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

System Utilization Benchmark on the Cray T3E and IBM SP Adrian Wong, Leonid Oliker, William Kramer, Teresa Kaltz, Therese Enright and David Bailey National.

Similar presentations

Presentation on theme: "System Utilization Benchmark on the Cray T3E and IBM SP Adrian Wong, Leonid Oliker, William Kramer, Teresa Kaltz, Therese Enright and David Bailey National."— Presentation transcript:

Similar presentations

About project

Feedback