Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 5-6, 2003.

Similar presentations


Presentation on theme: "Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 5-6, 2003."— Presentation transcript:

1 Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 5-6, 2003

2 Resource Management and Accounting Working Group Working group scope Progress over last quarter Next steps Topics for group consideration

3 Working Group Scope The Resource Management Working Group is involved in the areas of resource management, scheduling and accounting. This working group will focus on the following software components: Queue Manager Scheduler Accounting and Allocation Manager Meta Scheduler Other critical resource management components are being developed in the Process Management and Monitoring Working Group: Process Manager Cluster Monitor

4 Proposed Component Architecture Queue Manager Allocation Manager Node Monitor Meta Scheduler Local Scheduler Node Manager Process Manager Security System Information Service Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Infrastructure Services Event Manager

5 Resource Management Prototype Demonstration Queue Manager Allocation Manager Node Monitor Local Scheduler Process Manager Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Job Submission Client 1 Submit-Job 3 Query-Node 6 Exec-Process 4 Create-Reservation 2 Query-Job 5 Run-Job 8 Delete-Job 0 Service-Lookup 7 Query-Job 9 Withdraw-Allocation This demo runs a simple end-to-end test with a job being submitted running past it’s wallclock limit

6 General Progress SSS front-end created for QBank Ready for re-release of v1.0 Initial SSS Resource Management Suite –OpenPBS-SSS 2.3.15-1 + sss_xml front-end –Maui Scheduler 3.2.6 –QBank 2.10.4 + sss_xml front-end Created Job Object Specification version 2.0 –Takes into account all stages of a job’s lifecycle –Support for job steps, preferences, request choices, charging, meta- scheduling, dynamic jobs, multi-task jobs, awareness policy –Distinguishes between requested, utilized and dedicated properties

7 General Progress Completed version 2.0 of the SSSRMAP resource management interface specification –Includes specifications for authentication and encryption –Has been implemented by Gold Accounting and Allocation Manager (as a proof of design) Beginning to see adoption of SSSRMAP specification –Commitment from SLURM (LLNL) to write interface to SSSRMAP –Commitment from Cluxterworx (Linux Networx) to write interface to SSSRMAP –Interest from bproc-based scheduler (Clemson University) to interface to queue manager via SSSRMAP –CLUBMask resource manager (Penn State) to interface with scheduler via SSSRMAP –Interest from CERN for a data manager interface to scheduler using SSSRMAP

8 Scheduler Progress Implemented XML client-server interface (40% of clients now using SSSRMAP) New interfaces to support generic resource loads (paging space, I/o, processor load, etc) for resource limit enforcement and tracking Documentation on resource limit enforcement and tracking Added support for multi-task group jobs Support for dynamic reservations (growing and shrinking to support MPI dynamic jobs)

9 Scheduler Progress Security -- support for a user specified keyfile containing the security token Performance -- continued efforts in memory- footprint reduction Fault tolerance – implemented a fallback server Ease of use -- Initial web-GUI developed (communicates directly with Maui server)

10 Queue Manager Progress Updated service directory and event manager interfaces Implemented caching of service directory lookups and prioritizing the wire protocol types returned for fault tolerance and performance. Beginning implementation of SSSRMAP v2 wire protocol and xml specification.

11 Accounting and Allocation Manager Progress Gold –Added support for 95% of functionality from QBank –Allocation design enhancements allocations shareable by users, projects and machines (also supports exclusions) Special wildcard types (ANY, NONE, MEMBER, DEFINED) Enhanced support for activation and expiration times (& active state) –Support added for Deposits Use of deposit shares for non-interactive deposit defaults –Support added for Hierarchical accounts (projects) Affected withdrawals, deposits, reservations, balance checks, etc. Support for recursive trickle up withdrawals and trickle down deposits

12 Accounting and Allocation Manager Progress Gold –Support added for Refunds –Implemented Guaranteed Quotes –Implemented Transfers –Support added for debit vs. credit allocations –Support for operations (aggregate functions) on returned query fields (sort, sum, max, unique, count, group by, etc) –Negation of options –Association metadata added to aid in GUI object navigation –Enhanced support for transaction logging, journaling, undo, redo –Implemented more flexible charging algorithm

13 Accounting and Allocation Manager Progress Gold –Implemented SSSRMAP version 2.0 –Implemented SSS Job Object version 2.0 –Infrastructure added for Role-Based Access Control –Support added for method overriding and method scope resolution –Progress on open source front (Gold and sss_xml front-ends) obtained approval from PNNL IP to apply a BSD open source license Sent letter to Fred requesting DOE approval to assert copyright –Created Accounting and Allocation Manager Binding document describing use of SSSRMAP protocol –Beginning effort to develop Web-based GUI (JSP) –Implemented SSSRMAP v2 authentication –Almost completed implementing SSSRMAP v2 encryption

14 Meta-Scheduler Progress Added basic data scheduling! (tested with Globus) Created interface for data-cache scheduling Fault tolerance improvements –job queue is persistent –Will recover from network failure, system failure, loss of checkpoint files Major documentation in all areas

15 Future Work Implement v2 SSS Resource Management and Accounting interface specification (all components) Implement v2 Job Object Specification Implement default SSSRMAP v2 security authentication and encryption for all components Release v1.0 Initial SSS Resource Management Suite and improve download and documentation webpages Release Portability enhancements (AIX, Tru64, possibly Cray) Create per-component interface specification documents (binding to SSSRMAP) Draft Design Specification documents

16 Future Work Local Scheduler Test interaction with checkpoint/restart mechanisms when interfaces ready Continued work on resource limit enforcement and tracking quality of service support for completion time guarantees Implement SSSRMAP v2.0 Security integration (authentication and encryption) Support for maleable jobs (pre-execution) Abstracting resource manager interfaces to accept multiple sources of input data and control Enable simulation to live submission translator

17 Future Work Queue manager Implement persistence via database (replacing flat files) Add Epilogue/Prologue support and job submission verification script Interface with Node Monitor Full PBS qsub compatibility Implement full input/output handling (need to define PM interfaces, if any) Implement SSSRMAP v2.0 (including security)

18 Future Work Accounting and Allocation manager Implement SSSRMAP v2 encryption (and test authentication) Implement Role-Based Access Control (fine- grained command authorization) Integration with Directory Service Open source gold (BSD license) Progress on Web-based JSP GUI

19 Future Work Meta Scheduler Continued effort in allocation management, credential management, data management, Enablement of grid level prioritization and fairness policies

20 Issues requiring inter-group discussion


Download ppt "Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 5-6, 2003."

Similar presentations


Ads by Google