Download presentation
Presentation is loading. Please wait.
Published byEstella McCormick Modified over 9 years ago
1
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting February 24-25, 2003
2
Resource Management and Accounting Working Group Working group scope Progress over last quarter Next steps Topics for group consideration
3
Working Group Scope The Resource Management Working Group is involved in the areas of resource management, scheduling and accounting. This working group will focus on the following software components: Queue Manager Scheduler Allocation Manager (and accounting) Meta Scheduler Other critical resource management components are being developed in the Process Management and Monitoring Working Group: Process Manager Cluster Monitor
4
Proposed Component Architecture Queue Manager Allocation Manager Node Monitor Meta Scheduler Local Scheduler Node Manager Process Manager Security System Information Service Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Infrastructure Services Event Manager
5
Resource Management Prototype Demonstration Queue Manager Allocation Manager Node Monitor Local Scheduler Process Manager Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Job Submission Client 1 Submit-Job 3 Query-Node 6 Exec-Process 4 Create-Reservation 2 Query-Job 5 Run-Job 8 Delete-Job 0 Service-Lookup 7 Query-Job 9 Withdraw-Allocation This demo runs a simple end-to-end test with a job being submitted running past it’s wallclock limit
6
General Progress Released v1.0 Initial SSS Resource Management Suite –OpenPBS-SSS 2.3.15-1 –Maui Scheduler 3.2.6 –QBank 2.10.4 (accounting system) Website created and software available for download (intended for friendly beta testers) SSSRMAP protocol (using HTTP) validated in Maui Scheduler, Queue Manager, PBS front-end, and Gold Allocation Manager (complex query support validated and utility shown within a diversity of usage scenarios) Scalability testing performed on all components
7
Scheduler Progress Scheduler implemented interfaces for the system monitor, the event manager, the service directory, as well as a scheduling extension interface (allow scheduling plug-ins to enable to scheduling algorithms and capabilities) enhanced native support for LoadLeveler, PBS, SGE, LSF, and BProc based systems significantly enhanced web based scheduler documentation, additional scheduler command man pages for select commands SSS Requirements document completed
8
Scheduler Progress Security improvements –Support DES, HMAC, MD5, and external source secret key based algorithms has been implemented for client/server authentication –Improved buffer overflow protection has been added to critical scheduler interfaces –A generalized secret key management facility has been implemented for secure multi-party communication. Scalability improvements –decreasing memory consumption by over 80% –enabling support for up to 8,000 nodes –enabling support for up to 32,000 processors –enabling support for up to 2,000 simultaneous active jobs –enabling support for jobs requesting up to 16,000 hosts
9
Scheduler Progress Fault Tolerance –migration of all Resource Manager calls to a threaded Resource Manager interface (enabling scheduler survival of interface hangs and crashes) –incorporation of Resource Manager and Allocation Manager diagnostics and failure tracking statistics –implementation of improved data checking and handling routines to detect and correct corrupt Resource Manager data Dynamic job support interfaces have been designed Limited support for generic resources has been enabled (i.e., software licenses, network bandwidth, global disk caches, etc.).
10
Queue Manager Progress Both Ames Queue Manager and PNNL PBS front-end have implemented and validated SSSRMAP HTTP interface Replaced third-party XML parser with SSS-created routines Created Resource Management Suite Software website PNNL created and tested patches for PBS scalability improvements and packaged as RPMs (and tarball + patch) for beta distribution Requirements document completed Updated Process-Manager interface for new XML schema Ames Queue-Manager has implemented a nearly complete PB-like command line interface
11
Accounting and Allocation Manager Progress QBank –a test harness was installed, test suites created, significant testing performed and bugs fixed –Security was strengthened (new qauth uses libcrypto and key in separate file for greater stability and so binary versions can be distributed) –The install process for QBank was streamlined and made non- interactive –Packaged in RPMs and tarballs for Linux and released in v1.0 SSS Resource Management System –Documentation was significantly improved including the creation of a user guide, a deployment guide, man pages, and updated online documentation
12
Accounting and Allocation Manager Progress Gold –Time-travel implemented –Initial support for object-joined queries –Implemented Reservations –Implemented Balance Checking Scalability Testing –Component-level testing was done to test timings to perform barrages of common accounting and allocation operations (charges, reservations, balance checks, etc.) –Simulations were performed with the Maui Scheduler to test transaction times with the allocation manager interface
13
Meta-Scheduler Progress SSS Requirements document completed Support has been added for Globus 2.0 and 2.2 based job staging The initial information service interface has been designed Security has been enhanced by adding Globus credential caching and enabling generalized secret session key management Support has been added for retrying resources Additional functionality includes the basic data management interface and an initial file staging capability
14
Next Work Release v2 SSS Resource Management and Accounting interface specification Implement and test SSSRMAP security authentication Try to get more components under a testing framework Portability enhancements (AIX, Tru64, possibly Cray)
15
Next Work Local Scheduler Test interaction with checkpoint/restart mechanisms when interfaces ready virtual partitioning through resource limit enforcement and tracking quality of service support for completion time guarantees Security integration Progress on graphical interfaces
16
Next Work Queue manager Implement persistence via database (replacing flat files) Add Epilogue/Prologue support and job submission verification script Interface with Node Monitor Full PBS qsub compatibility (nearly complete) Implement full input/output handling (need to define PM interfaces, if any) Add interface with Node Manager to support job dependent node OS image installation
17
Next Work Accounting and Allocation manager Quotations (Gold) Flexible charging (Gold) Continuing effort on open source of new and old Allocation Managers SSSRMAP XML Security integration (Gold) Support for operations on returned fields (sort, sum, max, unique, group by, etc) Begin Portability testing for Gold and QBank
18
Issues requiring inter-group discussion
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.