Process Manager Specification Rusty Lusk 1/15/04.

Slides:



Advertisements
Similar presentations
GXP in nutshell You can send jobs (Unix shell command line) to many machines, very fast Very small prerequisites –Each node has python (ver or later)
Advertisements

K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
Operating Systems Manage system resources –CPU scheduling –Process management –Memory management –Input/Output device management –Storage device management.
More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
Chapter 2 Operating System Overview Operating Systems: Internals and Design Principles, 6/E William Stallings.
June 22-23, 2005 Technology Infusion Team Committee1 High Performance Parallel Lucene search (for an OAI federation) K. Maly, and M. Zubair Department.
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
CSC 501 Lecture 2: Processes. Von Neumann Model Both program and data reside in memory Execution stages in CPU: Fetch instruction Decode instruction Execute.
Operating Systems High Level View Chapter 1,2. Who is the User? End Users Application Programmers System Programmers Administrators.
INTRODUCTION OS/2 was initially designed to extend the capabilities of DOS by IBM and Microsoft Corporations. To create a single industry-standard operating.
Scripting Languages For Virtual Worlds. Outline Necessary Features Classes, Prototypes, and Mixins Static vs. Dynamic Typing Concurrency Versioning Distribution.
Legion Worldwide virtual computer. About Legion Made in University of Virginia Object-based metasystems software project middleware that connects computer.
Chiba City: A Testbed for Scalablity and Development FAST-OS Workshop July 10, 2002 Rémy Evard Mathematics.
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
Basic Unix Dr Tim Cutts Team Leader Systems Support Group Infrastructure Management Team.
Objectives  Understand the purpose of the superuser account  Outline the key features of the Linux desktops  Navigate through the menus  Getting help.
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting February 24-25, 2003.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
Module 2 Part I Introduction To Windows Operating Systems Intro & History Introduction To Windows Operating Systems Intro & History.
Prof. Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University FT-MPICH : Providing fault tolerance for MPI parallel applications.
Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving.
Progress in Multi-platform Software Deployment (Linux and Windows) Tim Kwiatkowski Welcome Consortium Members November 29,
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 13-14, 2002.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Chapter 2: Operating-System Structures. 2.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Jan 14, 2005 Operating System.
Process Management Working Group Process Management “Meatball” Dallas November 28, 2001.
Parallel Interactive Computing with PyTrilinos and IPython Bill Spotz, SNL (Brian Granger, Tech-X Corporation) November 8, 2007 Trilinos Users Group Meeting.
Resource Management Working Group SSS Quarterly Meeting November 28, 2001 Dallas, Tx.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 6 System Calls OS System.
Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-1 Process Concepts Department of Computer Science and Software.
An Overview of Berkeley Lab’s Linux Checkpoint/Restart (BLCR) Paul Hargrove with Jason Duell and Eric.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview Part 2: History (continued)
Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.
Operating System What is an Operating System? A program that acts as an intermediary between a user of a computer and the computer hardware. An operating.
Privilege separation in Condor Bruce Beckles University of Cambridge Computing Service.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Lecture 8: 9/19/2002CS149D Fall CS149D Elements of Computer Science Ayman Abdel-Hamid Department of Computer Science Old Dominion University Lecture.
Resource Management Task Report Thomas Röblitz 19th June 2002.
Interactive Workflows Branislav Šimo, Ondrej Habala, Ladislav Hluchý Institute of Informatics, Slovak Academy of Sciences.
The Process Manager in the ATLAS DAQ System G. Avolio, M. Dobson, G. Lehmann Miotto, M. Wiesmann (CERN)
Module 2 Part I Introduction To Windows Operating Systems Intro & History Introduction To Windows Operating Systems Intro & History.
1 Lecture 6 Introduction to Process Management COP 3353 Introduction to UNIX.
9-Nov-97Tri-Ada '971 TASH An Alternative to the Windows API TRI-Ada ‘97 Terry J. Westley
A PPARC funded project Common Execution Architecture Paul Harrison IVOA Interoperability Meeting Cambridge MA May 2004.
Linux History C151 Multi-User Operating Systems. Open Source Programming Open source programming: 1983, Richard Stallman started the GNU Project (GNU.
ClearQuest XML Server with ClearCase Integration Northwest Rational User’s Group February 22, 2007 Frank Scholz Casey Stewart
SSS Build and Configuration Management Update February 24, 2003 Narayan Desai
An API for the Process Manager Component Meeting at Argonne June 5-6, 2003.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Argonne National Laboratory + University of Chicago1 Users of a Process Manager Process Manager Application (e.g. MPI library) Interactive User Queue Manager.
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
SciDAC CS ISIC Scalable Systems Software for Terascale Computer Centers Al Geist SciDAC CS ISIC Meeting February 17, 2005 DOE Headquarters Research sponsored.
Computer System Structures
Kernel Design & Implementation
Chapter 4: Threads.
NGS computation services: APIs and Parallel Jobs
Privilege Separation in Condor
Chapter 2: System Structures
Chapter 2: The Linux System Part 1
Operating Systems Lecture 3.
Wide Area Workload Management Work Package DATAGRID project
CS510 Operating System Foundations
Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.
Lecture 6 Introduction to Process Management
Presentation transcript:

Process Manager Specification Rusty Lusk 1/15/04

Outline Process Manager Functionality Expected Consumers Commands Semantics Examples Schema

Process Manager Functionality Process Execution Start process groups Provide status information during execution Provide command output and error messages Return exit status information Process Group Control Kill process groups Signal Process groups

Expected Consumers Components which execute programs Components which need to locate running processes Components which need to control running processes

Schematic of Process Management Component in Scalable Systems Software Context QMPM MPD’s mpdrun XML file mpiexec (MPI Standard args) QM’s job submission language interactive simple scripts or hairy GUIs using SSS XML SSS Components application processes SSS XML SSS side Prototype MPD-based implementation side EM SD PM Sched NSM

Commands - creates a new process group. - get status information; includes current process ids, exit status information and stdout/err information. - send a unix signal to all processes in a process group - kill all processes in a process group - allow process manager to discard process group information after process group has exited. All commands use the restriction syntax

Examples node1 node2

Examples (continued)

Examples (continued) Response:

Using the SSS Software Architecture on Chiba City

Chiba City Medium-sized cluster at Argonne National Laboratory 256 dual-processor 500MHz PIII’s Myrinet Linux (and sometimes others) No shared file system, for scalability (but now a test platform for PVFS2) Dedicated to Computer Science scalability research, not applications Many groups use it as a research platform Both academic and commercial Also used by friendly, hungry applications New requirement: support research requiring specialized kernels and alternate operating systems, for OS scalability research

New Challenges Want to schedule jobs that require node rebuilds (for new OS’s, kernel module tests, etc.) as part of “normal” job scheduling Want to build larger virtual clusters (using VMware or User Mode Linux) temporarily, as part of “normal” job scheduling Requires major upgrade of Chiba City systems software

Chiba Commits to SSS Fork in the road (occurred August, 2003): Major overhaul of old Chiba systems software (open PBS + Maui scheduler + homegrown stuff), OR Take great leap forward and bet on all-new software architecture of SSS Problems with leaping approach: SSS interfaces not finalized Some components don’t yet use library (implement own protocols in open code, not encapsulated in library) Some components not fully functional yet Solutions to problems: Collect components that are adequately functional and integrated (PM, SD, EM, BCM) Write “stubs” for other critical components (Sched, QM) Do without some components (CKPT, monitors, accounting) for the time being

Features of Adopted Solution Stubs adequate, at least for time being Scheduler does FIFO + reservations + backfill, improving QM implements “PBS compatibility mode” (accepts user PBS scripts) as well as asking Process Manager to start parallel jobs directly Process Manager wraps MPD-2 Single ring of MPD’s runs as root, managing all jobs for all users MPD’s started by Build-and-Config manager at boot time An MPI program called MPISH (MPI Shell) wraps user jobs for handling file staging and multiple job steps Python implementation of most components Demonstrated feasibility of using SSS component approach to systems software Running normal Chiba job mix for over five months now Moving forward on meeting new requirements for research support

Next Steps Integrate other components into this structure Integrate other instantiations of components into this structure Replace stubs as possible Easiest if they use same XML API’s Put “unusual” capabilities into production Rebuilding nodes on the fly