Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting January 15-16, 2004 Argonne, IL.

Similar presentations


Presentation on theme: "Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting January 15-16, 2004 Argonne, IL."— Presentation transcript:

1 Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting January 15-16, 2004 Argonne, IL

2 Resource Management and Accounting Working Group Working group scope Progress over last quarter Next steps Topics for group consideration

3 Working Group Scope The Resource Management Working Group is involved in the areas of resource management, scheduling and accounting. This working group will focus on the following software components: Queue Manager Scheduler Accounting and Allocation Manager Meta Scheduler Other critical resource management components are being developed in the Process Management and Monitoring Working Group: Process Manager Cluster Monitor

4 Resource Management Component Architecture Queue Manager Allocation Manager Node Monitor Meta Scheduler Local Scheduler Node Manager Process Manager Security System Information Service Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Infrastructure Services Event Manager

5 Resource Management Prototype Demonstration Queue Manager Allocation Manager Node Monitor Local Scheduler Process Manager Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Job Submission Client 1 Submit-Job 3 Query-Node 6 Exec-Process 4 Create-Reservation 2 Query-Job 5 Run-Job 8 Delete-Job 0 Service-Lookup 7 Query-Job 9 Withdraw-Allocation This demo runs a simple end-to-end test with a job being submitted running past it’s wallclock limit

6 General Progress Created Node Object Specification version 2.0 Implemented SSSRMAP v2 response/status codes Completed Portability testing for initial release components –AIX, Tru64, HP-UX, IRIX, Solaris, Linux Completed system testing for SSSRMAP v2 and SC Release –on xtorc-sss, a RedHat 9.0 System (configured similarly to the OSCAR-sss target) –Included Maui, Bamboo, Warehouse, Process Manager, Gold, QBank, OpenPBS_sss, sss_xml_svr, etc.

7 General Progress Released RMWG components for SC2004 –packaged as tarballs, RPMs and OSCAR packages –Includes (some new) components: Bamboo Queue Manager v0.9.0 Maui-sss Scheduler v3.2p0 Gold Accounting and Allocation Manager v1.0.a0.0 Warehouse System Monitor v0.6.0 RMWG Webpage updated with SC release –Added Bamboo, Gold and Warehouse –Linked into main SSS home page

8 General Progress Deployed User Oriented Problem Response System –Implemented using RT –Created project and support queues for all RMWG components Created SSSRMAP C-implementation module Completed per-component interface specification documents (binding to SSSRMAP) Something about our functionality milestones

9 Scheduler Progress Generated Maui SSSRMAP binding document Added response code support Created SSS communication library containing reference implementation of SSSRMAP v2.0 XMLized Silver/Maui interface Augmented implementation of SSSRMAP to use more of the advanced features (where, set, op, units) Added support for (Warehouse) System Monitor Interface (and SSSRMAP v2 Node Object)

10 Scheduler Progress Completed suspend/resume and checkpoint/restart based SSS calls (synchronized with anticipated XML and tested with QM as we can go) – blocked until can test with CR guys Enhanced support for dynamic modification of job attributes (dynamic jobs) -- blocked until support provided in PM and QM Added support for policy specification for resource limit enforcement and tracking – blocked until support from PM and QM progresses

11 Queue Manager Progress Initial release of Bamboo made available in Nov. Produced Queue Manager binding document for the SSSRMAP protocol. Data storage via ODBC compliant database fully implemented. Packaging and installation scripts created for sss- oscar release. SSS suite has been installed on a cluster at Ames, not quite production ready, but close.

12 Accounting and Allocation Manager Progress QBank –Portability testing has been completed Linux, AIX, Tru64, HP-UX, IRIX and Solaris –This is probably all the further we are going to go on it Gold –Released Pre-alpha Early SC release of Gold Public release under a BSD open source license ( 14 NOV 2003) Packaged as a tarball, rpm (RedHat Linux 9.0 and 7.3, x86), and initial OSCAR packaging –Added support for Service Directory registration –Implemented SSSRMAP v2 response/status codes –Implemented instance-level role-based authorization

13 Accounting and Allocation Manager Progress Gold –Gold test results from PNNL 11.8TF cluster (MPP2) analyzed Accounting was coherent and stable over 2 week test period Memory and performance issues analyzed with profiler Initial chunking implementation was shown to successfully handle large response messages –Progress on GUI Implemented SSSRMAP SSL and Password authentication User, Project and Machine management views nearly complete Added search filter to List (and Modify, Delete, Undelete) operations –Improved debug logging (implemented log4j and debug flags) –Portability enhancements (archived java components into a jar file) –Documentation, Packaging and Installation refinements –Introduced Gnu ReadLine support in interactive client –Creation of interim regression test suite (condor dagman)

14 Meta-Scheduler Progress Add threaded support for local scheduler interface (can talk to multiple schedulers simultaneously) Improved Silver installation procedure (autoconf) Enhanced user commands to support direct reservation management Successful deployment and testing of data-staging

15 Future Work Draft and release SSSRMAP v3 protocol specifications Release alpha versions of new components (based on v2) –(Bamboo, Maui, Gold, Warehouse) Portability testing for new (alpha release) components –(at least Linux, AIX, +other_UNIX) Complete Design Specification documents for new components

16 Future Work Local Scheduler Complete integration of SSSRMAP v2 for queue objects Support full suite of AM interface calls Full support for multi-source RM interface Add support for encryption Intelligent decision response based on error codes Full support for checkpoint/restart, dynamic jobs, and resource limit enforcement and tracking when enabled by other components

17 Future Work Queue manager Retrieve exit codes and update to the Jan. 2004 PM XML. Finish prologue/epilogue support (dependant on exit code). Interface with Node Monitor once process monitoring is supported. IO staging (may need API from process manager) Full multi step job support Add support for optional site job submission verification script

18 Future Work Accounting and Allocation manager Complete Allocation Management portion of GUI Fully implement response chunking (part of v3) Resolve performance issues (reimplement server in Perl?) Automatic association deletion (undeletion) Port Gold to other OS’s Production deployment of Gold on 11.8TF Linux cluster (as primary allocation system) Support for challenge/SSL with Directory Service Open source QBank

19 Future Work Meta Scheduler More Silver client development Update documentation Enhance co-allocation support (tighter specification language) Implement SSSRMAP v2 Wire Protocol and Message Format Add allocation manager interface support

20 Issues requiring inter-group discussion Need process exit codes from process manager Need process manager support for resource limit enforcement Timeframe/schedule for dynamic jobs Schedule for integrating/testing with checkpoint/restart Discuss possibility of support for encryption(/type?) within Service Directory

21 Portability Testing Progress


Download ppt "Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting January 15-16, 2004 Argonne, IL."

Similar presentations


Ads by Google