Presentation is loading. Please wait.

Presentation is loading. Please wait.

Allocation Management Solutions for High Performance Computing Scott M. Jackson Workshop on Scheduling and Resource Management for Parallel and Distributed.

Similar presentations


Presentation on theme: "Allocation Management Solutions for High Performance Computing Scott M. Jackson Workshop on Scheduling and Resource Management for Parallel and Distributed."— Presentation transcript:

1 Allocation Management Solutions for High Performance Computing Scott M. Jackson Workshop on Scheduling and Resource Management for Parallel and Distributed Systems (SRMPDS) Parallel and Distributed Processing Techniques and Applications (PDPTA'05) June 27, 2005 Las Vegas, USA

2 2 Allocation Manager Charges for resource usage

3 3 Typical problems faced by HPC managers ProblemSolution Resources are hogged by the projects or project members that submit jobs most aggressively Fair share, Fine- grained allocations It’s difficult to know how much additional work to accept on the machine or how big of an upgrade is needed Expiring allocations ( The Brick Approach ) Projects start slow then simultaneously demand their entire allocations be fulfilled at end of project cycle Expiring allocations (use-it-or-lose-it) Users can exceed their allocations until accounting is synchronized with resource management system Dynamic Charging, Reservations Would like to delegate the management and distribution of credits to established project management hierarchy Role-based AC, Nested accounts Want to share resources with other organizations but face issues of security, autonomy and accountability Strong security, Distrd accounting, Charge quotes Each site wants to track and perform historical queries on site-specific accounting statistics Custom accounting, Flexible charging, Journaling You want the accounting system to be transparent to the user Default projects, Credit accounts, Auto-generation

4 4 Gold is an open source allocation system that controls project usage on High Performance Computers. Allocation Manager Ensures resources are used according to mission plan Tracks resource utilization Allows for insightful capacity planning Facilitates resource sharing between organizations Used in production at PNNL and other sites

5 5 Resource Hogging Perhaps the greatest single resource allocation challenge on a High Performance Computer is that of fairly distributing the computing resources among the parties that want to use them. Without some form of allocation control, projects and users would consume computing resources based solely on how aggressively they submit their workload rather than based on any site-managed policies or priorities.

6 6 Fair Share Scheduling Job priorities are increased or decreased based on short term historical usage. Systems that employ fair share scheduling include LSF and the Maui Scheduler. Limitations: Does not factor in distant prior usage, cannot enforce absolute allocation limits.

7 7 AllocationsAllocations An allocation is an allotment of computational resource credits for use by specific projects or users during a specified time period. When jobs complete, projects are charged and resource usage recorded. Jobs are prevented from starting unless they have an active allocation with a sufficient balance. Limitations: Does manage short-term usage, does not impact scheduling priority. ProjectUsersMachineTime PeriodBalance BiologyAmy, BobColonyFY2005100,000 chemistryBob, DaveColonyFY2005150,000

8 8 Capacity and Workload Planning: The Brick Approach Controlling project usage is key to capacity and workload planning. The use of regularly expiring allocations allows you to establish a project cycle and use the brick approach to allocation management. Now - 1 Qtr+ 3 Qtr+ 2 Qtr+ 1 Qtr- 3 Qtr- 2 Qtr+ 4 Qtr 100 % Capacity

9 9 Year-end Resource Exhaustion It is common to see projects get off to a slow start and then have a surge of activity at the end when results are due. Without careful management of expectations, this can be a cause of considerable anxiety. 100% Machine Capacity Project A’s Allotment Project A’s Demand Project Period

10 10 Expiring Allocations

11 11 End of Life Spending Spree Offline banks cannot strictly enforce allocations Without reservations, more jobs can get started than allocation balance can support. Job Debit Accounts Job Debit Accounts

12 12 Reservations and Dynamic Charging Accounting Manager (Gold) 0 2 1 4 3 5 6 Deposit 0 Deposits are made in Account 1 A Job is Submitted 2 A Quote is Requested Quote 3 A Reservation is Made Scheduler (Maui) 4 The Job is allowed to Start Reservation 5 The Job Completes Resource Manager (PBS, LL, LSF) 6 The Reservation is Removed Reservation Charge and a Charge is Issued

13 13 Gold is an open source allocation system that controls project usage on High Performance Computers. Allocation Manager Ensures resources are used according to mission plan Tracks resource utilization Allows for insightful capacity planning Facilitates resource sharing between organizations Used in production at PNNL and other sites

14 14 BackgroundBackground SciDAC Scientific Discovery through Advanced Computing – A DOE initiative to improve the impact of scientific computing. Scalable Systems Software Project (SSS) Research, develop and support an integrated suite of systems software and tools for the effective management and utilization of DOE’s terascale computational resources. It is a 5 year project involving over a dozen sites. QBank Gold is based on a successful program, QBank, that has been used for years on dozens of government and university computing systems.

15 15 Allocation Architecture Time PeriodAllocation F1Q041,000,000 F2Q041,000,000 F3Q041,000,000 F4Q041,000,000 Universe of Projects Universe of Projects Account: 1234 Name:Chemistry Account: 1234 Name:Chemistry Universe of Users Universe of Users Universe of Machines Universe of Machines Account: 1234 Name:Chemistry Account: 1234 Name:Chemistry Time PeriodAllocationCredit Limit F1Q041,000,00010,000 F2Q041,000,00010,000 F3Q041,000,00010,000 F4Q041,000,00010,000

16 16 Other Gold Features Flexible Charging Credit and Debit ModelsGuaranteed Quotes Powerful Querying

17 17 Other Gold Features Strong Security Historical JournalingTransparency Features Dynamically Customizable

18 18 Web-based GUI

19 19 Facilitates Resource Sharing Between Organizations

20 20 StatusStatus Status: In production use at PNNL 11.8 TF system and at least one other site – being evaluated at others Currently in second beta release General release targeting 3 rd quarter of 2005 Download: http://www.emsl.pnl.gov/docs/mscf/gold For more information: Scott.Jackson@pnl.gov http://www.emsl.pnl.gov/docs/mscf/gold http://sss.scl.ameslab.gov/home.shtml http://www.scidac.org/ScalableSystems

21 21 Additional Slides

22 22 Nested Accounts University Biology Actinide s Chem 201 Physics Chem 101 Chemistry Workshop 25 40 5040

23 23 Gold Welcome Screen

24 24 List Accounts

25 25 Modify Projects

26 26 Account Statement


Download ppt "Allocation Management Solutions for High Performance Computing Scott M. Jackson Workshop on Scheduling and Resource Management for Parallel and Distributed."

Similar presentations


Ads by Google