Process Management Working Group Process Management “Meatball” Dallas November 28, 2001.

Slides:



Advertisements
Similar presentations
Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
Advertisements

Chap 2 System Structures.
Chap 4 Multithreaded Programming. Thread A thread is a basic unit of CPU utilization It comprises a thread ID, a program counter, a register set and a.
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
Distributed Systems Architectures
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
3.5 Interprocess Communication
1/28/2004CSCI 315 Operating Systems Design1 Operating System Structures & Processes Notice: The slides for this lecture have been largely based on those.
Canonical Producer CP API User Code CP Servlet Files CreateTable, Port, Protocol, Security, SQL Support, Multiple Query Support Security Insert Query Port.
Grids and Globus at BNL Presented by John Scott Leita.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 2: Operating-System Structures Modified from the text book.
Gursharan Singh Tatla Transport Layer 16-May
Platform as a Service (PaaS)
IT 210 The Internet & World Wide Web introduction.
Module 10 Configuring and Managing Storage Technologies.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
Self Adaptivity in Grid Computing Reporter : Po - Jen Lo Sathish S. Vadhiyar and Jack J. Dongarra.
Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 13-14, 2002.
A View from the Top November Dallas TX. Coordinator: Al Geist Participating Organizations ORNL ANL LBNL PNNL PSC.
Grid Computing I CONDOR.
Resource Management Working Group SSS Quarterly Meeting November 28, 2001 Dallas, Tx.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
SSS Test Results Scalability, Durability, Anomalies Todd Kordenbrock Technology Consultant Scalable Computing Division Sandia is a multiprogram.
Project 1. Goals  Write a simple TCP/IP client that supports a specific protocol  The server is running right now on login.ccs.neu.edu:27993  If your.
An Overview of Berkeley Lab’s Linux Checkpoint/Restart (BLCR) Paul Hargrove with Jason Duell and Eric.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
 2004 Deitel & Associates, Inc. All rights reserved. 1 Chapter 4 – Thread Concepts Outline 4.1 Introduction 4.2Definition of Thread 4.3Motivation for.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
Grid Computing Research Lab SUNY Binghamton 1 XCAT-C++: A High Performance Distributed CCA Framework Madhu Govindaraju.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
CS333 Intro to Operating Systems Jonathan Walpole.
PMI: A Scalable Process- Management Interface for Extreme-Scale Systems Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ewing.
Operating System Principles And Multitasking
Chapter 4 Message-Passing Programming. The Message-Passing Model.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Silberschatz, Galvin and Gagne  Operating System Concepts UNIT II Operating System Services.
PTools Annual Meeting, Knoxville, TN, September 2002 The Tool Daemon Protocol: Defining the Interface Between Tools and Process Management Systems.
M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young MACH: A New Kernel Foundation for UNIX Development Presenter: Wei-Lwun.
A View from the Top Al Geist June Houston TX.
SSS Build and Configuration Management Update February 24, 2003 Narayan Desai
Globus Grid Tutorial Part 2: Running Programs Across Multiple Resources.
Process Manager Specification Rusty Lusk 1/15/04.
ACCESS CONTROL. Components of a Process  Address space  Set of data structures within the kernel - process’s address space map - current status - execution.
An API for the Process Manager Component Meeting at Argonne June 5-6, 2003.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Major OS Components CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 5.
Argonne National Laboratory + University of Chicago1 Users of a Process Manager Process Manager Application (e.g. MPI library) Interactive User Queue Manager.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
An SDK for SSS Component Development. 2 One Vision for SSS in Action –Lots of components, open ended Not just getting existing components to talk to each.
Process Management & Monitoring WG Quarterly Report August 26, 2004.
Introduction to Operating Systems Concepts
Chapter 4 – Thread Concepts
Platform as a Service (PaaS)
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Platform as a Service (PaaS)
Platform as a Service (PaaS)
Cross Platform Development using Software Matrix
Chapter 4 – Thread Concepts
Introduction to Operating System (OS)
Replication Middleware for Cloud Based Storage Service
CS703 - Advanced Operating Systems
Chapter 2: Operating-System Structures
Wide Area Workload Management Work Package DATAGRID project
CS510 Operating System Foundations
Chapter 2: Operating-System Structures
Presentation transcript:

Process Management Working Group Process Management “Meatball” Dallas November 28, 2001

2 Subcomponents Process/Job management –What it includes and what it doesn’t include –Current status of interface definition –Demo Monitoring –Examples –Relationship to process management Checkpointing –Is this a component? –Relationship to process management

Usage Reports User DB AccountingScheduler Process Manager System Monitor Queue Manager Checkpoint/ Restart Data Migration Meta Scheduler Node Configuration & Build Manager Meta Monitor Meta Manager Resource Allocation management Application Environment High Performance Communication & I/O Access control Security manager File System Interacts with all components User utilities One Meatball

4 Process Manager Responsibilities Starts processes (and therefore knows hosts and pids) Delivers arguments, environment, limits –(between fork and exec) Starts other processes that need to know pids –Monitoring (e.g. Paradyn) –Debugging (e.g. TotalView) –Other (e.g. Myrinet monitor) Kills jobs Signals processes –May be part of checkpointing Report on job start/termination Provides return codes (job/process) Handles stdio as directed Service application runtime layer –Implements PMI (put/get/barrier/spawn, others as discovered)

5 P.M. Non-Responsibilities Policy Real-Time resource usage monitoring

6 Process Manager Component Interface to Other Components Defined (I.e. proposed XML schema exists) –Start-job Start-job response –Kill-job Kill-job response To do –Suspend-job, resume-job –Signal-job in general –Asynchronous notifications Job started Job terminated Others

7 The Process Manager Interface to Application Libraries A Prototype: PMI (formerly known as BNR) Used by application libraries (e.g. MPI implementations, UPC implementations, common runtime systems for multiple languages and libraries) Provided by process managers Simple and general –Find out rank and size –Put and get into keyval space –Barrier –Spawn Currently used by MPICH, provided by MPD

8 The Chiba City Testbed Dedicated to scalability research in computer science rather than to applications Currently 256 dual-processor nodes Designed to promote experimentation with system software SciDAC projects can get accounts: –Web form at –Specify SCIDAC as Project Group –Specify closest Argonne SciDAC person as contact (Rusty or Narayan for SSS) Future plans –1000 nodes, 8000 virtual nodes Vmware User-mode Linux

9 A Demo Start Service Directory component Start Process Manager component –It registers itself with Service Directory Start Proto-scheduler component –It queries Service Directory for access location (host,port) of process manager –It sends job-start requests from hard-coded queue to process manager –Process manager runs parallel jobs All components communicate using XML –Use XML schema for process-manager requests, responses –Prototypes written in Python with built-in XML parser

10 A Modest Proposal Multiple Wire Protocols are allowed. Components declare a WP associated with a port when they register with the service directory. (They can register multiple ports.) Other components learn the WP associated with a port when they find out the port. The default protocol is the “basic” protocol. –TCP –A message consists of a complete XML document –After sending, the sender does shutdown on the socket, providing EOF to the receiver to signal the end of the message, but leaving the socket half-open to receive the response. All components are required to support at least the basic protocol.

11 Advantages Something easy to start with No “framing problem” No other software required Does not preclude other protocols, which include security, streaming, etc. Can be used to bootstrap switches of protocol.