Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting February 24-25, 2003.

Slides:



Advertisements
Similar presentations
CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding or
Advertisements

TeraGrid Deployment Test of Grid Software JP Navarro TeraGrid Software Integration University of Chicago OGF 21 October 19, 2007.
NGAS – The Next Generation Archive System Jens Knudstrup NGAS The Next Generation Archive System.
Distributed Data Processing
Performance Testing - Kanwalpreet Singh.
Distributed Processing, Client/Server and Clusters
Accounting Manager Taking resource usage into your own hands Scott Jackson Pacific Northwest National Laboratory
CSF4, SGE and Gfarm Integration Zhaohui Ding Jilin University.
PlanetLab Operating System support* *a work in progress.
Netscape Application Server Application Server for Business-Critical Applications Presented By : Khalid Ahmed DS Fall 98.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
Microsoft ® Application Virtualization 4.5 Infrastructure Planning and Design Series.
Understanding and Managing WebSphere V5
Minerva Infrastructure Meeting – October 04, 2011.
Microsoft ® Application Virtualization 4.6 Infrastructure Planning and Design Published: September 2008 Updated: February 2010.
Client/Server Architectures
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
KARMA with ProActive Parallel Suite 12/01/2009 Air France, Sophia Antipolis Solutions and Services for Accelerating your Applications.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting Aug 26-27, 2004 Argonne, IL.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 5-6, 2003.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 13-14, 2002.
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting Jan 25-26, 2005 Washington D.C.
◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.
Process Management Working Group Process Management “Meatball” Dallas November 28, 2001.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Resource Management Working Group SSS Quarterly Meeting November 28, 2001 Dallas, Tx.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
Middleware for FIs Apeego House 4B, Tardeo Rd. Mumbai Tel: Fax:
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting January 15-16, 2004 Argonne, IL.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting September 11-12, 2003 Washington D.C.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting May 10-11, 2005 Argonne, IL.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
OS and System Software for Ultrascale Architectures – Panel Jeffrey Vetter Oak Ridge National Laboratory Presented to SOS8 13 April 2004 ack.
VMware vSphere Configuration and Management v6
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
INFSO-RI Enabling Grids for E-sciencE Ganga 4 – The Ganga Evolution Andrew Maier.
ClearQuest XML Server with ClearCase Integration Northwest Rational User’s Group February 22, 2007 Frank Scholz Casey Stewart
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Accounting in DataGrid HLR software demo Andrea Guarise Milano, September 11, 2001.
Copyright © 2004 R2AD, LLC Submitted to GGF ACS Working Group for GGF-16 R2AD, LLC Distributing Software Life Cycles Join the ACS Team GGF-16, Athens R2AD,
E-commerce Architecture Ayşe Başar Bener. Client Server Architecture E-commerce is based on client/ server architecture –Client processes requesting service.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Jean-Philippe Baud, IT-GD, CERN November 2007
Netscape Application Server
Blueprint of Persistent Infrastructure as a Service
Overview – SOE PatchTT November 2015.
GWE Core Grid Wizard Enterprise (
Overview – SOE PatchTT December 2013.
Introduction to Operating System (OS)
Outline Midterm results summary Distributed file systems – continued
Wide Area Workload Management Work Package DATAGRID project
Presentation transcript:

Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting February 24-25, 2003

Resource Management and Accounting Working Group Working group scope Progress over last quarter Next steps Topics for group consideration

Working Group Scope The Resource Management Working Group is involved in the areas of resource management, scheduling and accounting. This working group will focus on the following software components: Queue Manager Scheduler Allocation Manager (and accounting) Meta Scheduler Other critical resource management components are being developed in the Process Management and Monitoring Working Group: Process Manager Cluster Monitor

Proposed Component Architecture Queue Manager Allocation Manager Node Monitor Meta Scheduler Local Scheduler Node Manager Process Manager Security System Information Service Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Infrastructure Services Event Manager

Resource Management Prototype Demonstration Queue Manager Allocation Manager Node Monitor Local Scheduler Process Manager Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Job Submission Client 1 Submit-Job 3 Query-Node 6 Exec-Process 4 Create-Reservation 2 Query-Job 5 Run-Job 8 Delete-Job 0 Service-Lookup 7 Query-Job 9 Withdraw-Allocation This demo runs a simple end-to-end test with a job being submitted running past it’s wallclock limit

General Progress Released v1.0 Initial SSS Resource Management Suite –OpenPBS-SSS –Maui Scheduler –QBank (accounting system) Website created and software available for download (intended for friendly beta testers) SSSRMAP protocol (using HTTP) validated in Maui Scheduler, Queue Manager, PBS front-end, and Gold Allocation Manager (complex query support validated and utility shown within a diversity of usage scenarios) Scalability testing performed on all components

Scheduler Progress Scheduler implemented interfaces for the system monitor, the event manager, the service directory, as well as a scheduling extension interface (allow scheduling plug-ins to enable to scheduling algorithms and capabilities) enhanced native support for LoadLeveler, PBS, SGE, LSF, and BProc based systems significantly enhanced web based scheduler documentation, additional scheduler command man pages for select commands SSS Requirements document completed

Scheduler Progress Security improvements –Support DES, HMAC, MD5, and external source secret key based algorithms has been implemented for client/server authentication –Improved buffer overflow protection has been added to critical scheduler interfaces –A generalized secret key management facility has been implemented for secure multi-party communication. Scalability improvements –decreasing memory consumption by over 80% –enabling support for up to 8,000 nodes –enabling support for up to 32,000 processors –enabling support for up to 2,000 simultaneous active jobs –enabling support for jobs requesting up to 16,000 hosts

Scheduler Progress Fault Tolerance –migration of all Resource Manager calls to a threaded Resource Manager interface (enabling scheduler survival of interface hangs and crashes) –incorporation of Resource Manager and Allocation Manager diagnostics and failure tracking statistics –implementation of improved data checking and handling routines to detect and correct corrupt Resource Manager data Dynamic job support interfaces have been designed Limited support for generic resources has been enabled (i.e., software licenses, network bandwidth, global disk caches, etc.).

Queue Manager Progress Both Ames Queue Manager and PNNL PBS front-end have implemented and validated SSSRMAP HTTP interface Replaced third-party XML parser with SSS-created routines Created Resource Management Suite Software website PNNL created and tested patches for PBS scalability improvements and packaged as RPMs (and tarball + patch) for beta distribution Requirements document completed Updated Process-Manager interface for new XML schema Ames Queue-Manager has implemented a nearly complete PB-like command line interface

Accounting and Allocation Manager Progress QBank –a test harness was installed, test suites created, significant testing performed and bugs fixed –Security was strengthened (new qauth uses libcrypto and key in separate file for greater stability and so binary versions can be distributed) –The install process for QBank was streamlined and made non- interactive –Packaged in RPMs and tarballs for Linux and released in v1.0 SSS Resource Management System –Documentation was significantly improved including the creation of a user guide, a deployment guide, man pages, and updated online documentation

Accounting and Allocation Manager Progress Gold –Time-travel implemented –Initial support for object-joined queries –Implemented Reservations –Implemented Balance Checking Scalability Testing –Component-level testing was done to test timings to perform barrages of common accounting and allocation operations (charges, reservations, balance checks, etc.) –Simulations were performed with the Maui Scheduler to test transaction times with the allocation manager interface

Meta-Scheduler Progress SSS Requirements document completed Support has been added for Globus 2.0 and 2.2 based job staging The initial information service interface has been designed Security has been enhanced by adding Globus credential caching and enabling generalized secret session key management Support has been added for retrying resources Additional functionality includes the basic data management interface and an initial file staging capability

Next Work Release v2 SSS Resource Management and Accounting interface specification Implement and test SSSRMAP security authentication Try to get more components under a testing framework Portability enhancements (AIX, Tru64, possibly Cray)

Next Work Local Scheduler Test interaction with checkpoint/restart mechanisms when interfaces ready virtual partitioning through resource limit enforcement and tracking quality of service support for completion time guarantees Security integration Progress on graphical interfaces

Next Work Queue manager Implement persistence via database (replacing flat files) Add Epilogue/Prologue support and job submission verification script Interface with Node Monitor Full PBS qsub compatibility (nearly complete) Implement full input/output handling (need to define PM interfaces, if any) Add interface with Node Manager to support job dependent node OS image installation

Next Work Accounting and Allocation manager Quotations (Gold) Flexible charging (Gold) Continuing effort on open source of new and old Allocation Managers SSSRMAP XML Security integration (Gold) Support for operations on returned fields (sort, sum, max, unique, group by, etc) Begin Portability testing for Gold and QBank

Issues requiring inter-group discussion