Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting January 15-16, 2004 Argonne, IL.

Slides:



Advertisements
Similar presentations
Implementing Tableau Server in an Enterprise Environment
Advertisements

TeraGrid Deployment Test of Grid Software JP Navarro TeraGrid Software Integration University of Chicago OGF 21 October 19, 2007.
Accounting Manager Taking resource usage into your own hands Scott Jackson Pacific Northwest National Laboratory
IWay Service Manager 6.1 Product Update Scott Hathaway iWay Software Copyright 2010, Information Builders. Slide 1.
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
6/4/2015Page 1 Enterprise Service Bus (ESB) B. Ramamurthy.
Security SIG: Introduction to Tripwire Chris Harwood John Ives.
Tripwire Enterprise Server – Getting Started Doreen Meyer and Vincent Fox UC Davis, Information and Education Technology June 6, 2006.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting February 24-25, 2003.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Windows.Net Programming Series Preview. Course Schedule CourseDate Microsoft.Net Fundamentals 01/13/2014 Microsoft Windows/Web Fundamentals 01/20/2014.
CISTI Source & SiteSearch OCLC User Meeting 2001 Danielle Langlois & Carol Serroul May 9, 2001.
Linux Operations and Administration
Thomas Finnern Evaluation of a new Grid Engine Monitoring and Reporting Setup.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Module 14: Configuring Print Resources and Printing Pools.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
SUSE Linux Enterprise Server Administration (Course 3037) Chapter 4 Manage Software for SUSE Linux Enterprise Server.
Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting Aug 26-27, 2004 Argonne, IL.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 5-6, 2003.
AUTOBUILD Build and Deployment Automation Solution.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 13-14, 2002.
Robert Fourer, Jun Ma, Kipp Martin Copyright 2006 An Enterprise Computational System Built on the Optimization Services (OS) Framework and Standards Jun.
Section 1: Introducing Group Policy What Is Group Policy? Group Policy Scenarios New Group Policy Features Introduced with Windows Server 2008 and Windows.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting Jan 25-26, 2005 Washington D.C.
SUSE Linux Enterprise Desktop Administration Chapter 12 Administer Printing.
EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,
Module 8: Configuring Network Access Protection
Resource Management Working Group SSS Quarterly Meeting November 28, 2001 Dallas, Tx.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
SSS Test Results Scalability, Durability, Anomalies Todd Kordenbrock Technology Consultant Scalable Computing Division Sandia is a multiprogram.
Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting September 11-12, 2003 Washington D.C.
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting May 10-11, 2005 Argonne, IL.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Quill / Quill++ Tutorial.
National Center for Supercomputing ApplicationsNational Computational Science Grid Packaging Technology Technical Talk University of Wisconsin Condor/GPT.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
Oak Ridge National Laboratory -- U.S. Department of Energy 1 SSS Deployment using OSCAR John Mugler, Thomas Naughton & Stephen Scott May 2005, Argonne,
SPARRO Group, University of Regina 1 Portal Software: Browser-based Monte Carlo Zisis Papandreou University of Regina GlueX Collaboration Meeting JLab,
Creating SmartArt 1.Create a slide and select Insert > SmartArt. 2.Choose a SmartArt design and type your text. (Choose any format to start. You can change.
ClearQuest XML Server with ClearCase Integration Northwest Rational User’s Group February 22, 2007 Frank Scholz Casey Stewart
SSS Build and Configuration Management Update February 24, 2003 Narayan Desai
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
Maite Barroso - 10/05/01 - n° 1 WP4 PM9 Deliverable Presentation: Interim Installation System Configuration Management Prototype
TOPIC 7.0 LINUX SERVICES AND CONFIGURATION. ROOT USER Root user is called “super user” because it has power far beyond those of mortal user. As root,
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
Process Manager Specification Rusty Lusk 1/15/04.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
V7 Foundation Series Vignette Education Services.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
ConfigMgr Discovering and Organizing Resources Mariusz Zarzycki, Phd, MCT, MCTS, MCITP, MCSE, MCSA.....
Architecture Review 10/11/2004
Deploying and Configuring SSIS Packages
Introduction to Operating System (OS)
What’s changed in the Shibboleth 1.2 Origin
Chapter 2: System Structures
Chapter 2: The Linux System Part 1
Module 01 ETICS Overview ETICS Online Tutorials
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Wide Area Workload Management Work Package DATAGRID project
Condor-G: An Update.
Presentation transcript:

Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting January 15-16, 2004 Argonne, IL

Resource Management and Accounting Working Group Working group scope Progress over last quarter Next steps Topics for group consideration

Working Group Scope The Resource Management Working Group is involved in the areas of resource management, scheduling and accounting. This working group will focus on the following software components: Queue Manager Scheduler Accounting and Allocation Manager Meta Scheduler Other critical resource management components are being developed in the Process Management and Monitoring Working Group: Process Manager Cluster Monitor

Resource Management Component Architecture Queue Manager Allocation Manager Node Monitor Meta Scheduler Local Scheduler Node Manager Process Manager Security System Information Service Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Infrastructure Services Event Manager

Resource Management Prototype Demonstration Queue Manager Allocation Manager Node Monitor Local Scheduler Process Manager Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Job Submission Client 1 Submit-Job 3 Query-Node 6 Exec-Process 4 Create-Reservation 2 Query-Job 5 Run-Job 8 Delete-Job 0 Service-Lookup 7 Query-Job 9 Withdraw-Allocation This demo runs a simple end-to-end test with a job being submitted running past it’s wallclock limit

General Progress Created Node Object Specification version 2.0 Implemented SSSRMAP v2 response/status codes Completed Portability testing for initial release components –AIX, Tru64, HP-UX, IRIX, Solaris, Linux Completed system testing for SSSRMAP v2 and SC Release –on xtorc-sss, a RedHat 9.0 System (configured similarly to the OSCAR-sss target) –Included Maui, Bamboo, Warehouse, Process Manager, Gold, QBank, OpenPBS_sss, sss_xml_svr, etc.

General Progress Released RMWG components for SC2004 –packaged as tarballs, RPMs and OSCAR packages –Includes (some new) components: Bamboo Queue Manager v0.9.0 Maui-sss Scheduler v3.2p0 Gold Accounting and Allocation Manager v1.0.a0.0 Warehouse System Monitor v0.6.0 RMWG Webpage updated with SC release –Added Bamboo, Gold and Warehouse –Linked into main SSS home page

General Progress Deployed User Oriented Problem Response System –Implemented using RT –Created project and support queues for all RMWG components Created SSSRMAP C-implementation module Completed per-component interface specification documents (binding to SSSRMAP) Something about our functionality milestones

Scheduler Progress Generated Maui SSSRMAP binding document Added response code support Created SSS communication library containing reference implementation of SSSRMAP v2.0 XMLized Silver/Maui interface Augmented implementation of SSSRMAP to use more of the advanced features (where, set, op, units) Added support for (Warehouse) System Monitor Interface (and SSSRMAP v2 Node Object)

Scheduler Progress Completed suspend/resume and checkpoint/restart based SSS calls (synchronized with anticipated XML and tested with QM as we can go) – blocked until can test with CR guys Enhanced support for dynamic modification of job attributes (dynamic jobs) -- blocked until support provided in PM and QM Added support for policy specification for resource limit enforcement and tracking – blocked until support from PM and QM progresses

Queue Manager Progress Initial release of Bamboo made available in Nov. Produced Queue Manager binding document for the SSSRMAP protocol. Data storage via ODBC compliant database fully implemented. Packaging and installation scripts created for sss- oscar release. SSS suite has been installed on a cluster at Ames, not quite production ready, but close.

Accounting and Allocation Manager Progress QBank –Portability testing has been completed Linux, AIX, Tru64, HP-UX, IRIX and Solaris –This is probably all the further we are going to go on it Gold –Released Pre-alpha Early SC release of Gold Public release under a BSD open source license ( 14 NOV 2003) Packaged as a tarball, rpm (RedHat Linux 9.0 and 7.3, x86), and initial OSCAR packaging –Added support for Service Directory registration –Implemented SSSRMAP v2 response/status codes –Implemented instance-level role-based authorization

Accounting and Allocation Manager Progress Gold –Gold test results from PNNL 11.8TF cluster (MPP2) analyzed Accounting was coherent and stable over 2 week test period Memory and performance issues analyzed with profiler Initial chunking implementation was shown to successfully handle large response messages –Progress on GUI Implemented SSSRMAP SSL and Password authentication User, Project and Machine management views nearly complete Added search filter to List (and Modify, Delete, Undelete) operations –Improved debug logging (implemented log4j and debug flags) –Portability enhancements (archived java components into a jar file) –Documentation, Packaging and Installation refinements –Introduced Gnu ReadLine support in interactive client –Creation of interim regression test suite (condor dagman)

Meta-Scheduler Progress Add threaded support for local scheduler interface (can talk to multiple schedulers simultaneously) Improved Silver installation procedure (autoconf) Enhanced user commands to support direct reservation management Successful deployment and testing of data-staging

Future Work Draft and release SSSRMAP v3 protocol specifications Release alpha versions of new components (based on v2) –(Bamboo, Maui, Gold, Warehouse) Portability testing for new (alpha release) components –(at least Linux, AIX, +other_UNIX) Complete Design Specification documents for new components

Future Work Local Scheduler Complete integration of SSSRMAP v2 for queue objects Support full suite of AM interface calls Full support for multi-source RM interface Add support for encryption Intelligent decision response based on error codes Full support for checkpoint/restart, dynamic jobs, and resource limit enforcement and tracking when enabled by other components

Future Work Queue manager Retrieve exit codes and update to the Jan PM XML. Finish prologue/epilogue support (dependant on exit code). Interface with Node Monitor once process monitoring is supported. IO staging (may need API from process manager) Full multi step job support Add support for optional site job submission verification script

Future Work Accounting and Allocation manager Complete Allocation Management portion of GUI Fully implement response chunking (part of v3) Resolve performance issues (reimplement server in Perl?) Automatic association deletion (undeletion) Port Gold to other OS’s Production deployment of Gold on 11.8TF Linux cluster (as primary allocation system) Support for challenge/SSL with Directory Service Open source QBank

Future Work Meta Scheduler More Silver client development Update documentation Enhance co-allocation support (tighter specification language) Implement SSSRMAP v2 Wire Protocol and Message Format Add allocation manager interface support

Issues requiring inter-group discussion Need process exit codes from process manager Need process manager support for resource limit enforcement Timeframe/schedule for dynamic jobs Schedule for integrating/testing with checkpoint/restart Discuss possibility of support for encryption(/type?) within Service Directory

Portability Testing Progress