Process Introspection: A Checkpoint Mechanism for High Performance Heterogeneous Distributed Systems. University of Virginia. Author: Adam J. Ferrari.

Slides:



Advertisements
Similar presentations
Threads, SMP, and Microkernels
Advertisements

Mr. D. J. Patel, AITS, Rajkot 1 Operating Systems, by Dhananjay Dhamdhere1 Static and Dynamic Memory Allocation Memory allocation is an aspect of a more.
Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
Figure 2.8 Compiler phases Compiling. Figure 2.9 Object module Linking.
1 Concurrent and Distributed Systems Introduction 8 lectures on concurrency control in centralised systems - interaction of components in main memory -
Threads - Definition - Advantages using Threads - User and Kernel Threads - Multithreading Models - Java and Solaris Threads - Examples - Definition -
1/28/2004CSCI 315 Operating Systems Design1 Operating System Structures & Processes Notice: The slides for this lecture have been largely based on those.
Chapter 5: Memory Management Dhamdhere: Operating Systems— A Concept-Based Approach Slide No: 1 Copyright ©2005 Memory Management Chapter 5.
1 Threads Chapter 4 Reading: 4.1,4.4, Process Characteristics l Unit of resource ownership - process is allocated: n a virtual address space to.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
Yavor Todorov. Introduction How it works OS level checkpointing Application level checkpointing CPR for parallel programing CPR functionality References.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Threads.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.
Databases and Database Management Systems
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
Rio de Janeiro, October, 2005 SBAC Portable Checkpointing for BSP Applications on Grid Environments Raphael Y. de Camargo Fabio Kon Alfredo Goldman.
Processes and Threads CS550 Operating Systems. Processes and Threads These exist only at execution time They have fast state changes -> in memory and.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 7 OS System Structure.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
Topic 2d High-Level languages and Systems Software
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
COP4020 Programming Languages Names, Scopes, and Bindings Prof. Xin Yuan.
Threads G.Anuradha (Reference : William Stallings)
Processes Introduction to Operating Systems: Module 3.
Chapter 2 Processes and Threads Introduction 2.2 Processes A Process is the execution of a Program More specifically… – A process is a program.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
We will focus on operating system concepts What does it do? How is it implemented? Apply to Windows, Linux, Unix, Solaris, Mac OS X. Will discuss differences.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
1 Process Description and Control Chapter 3. 2 Process A program in execution An instance of a program running on a computer The entity that can be assigned.
Saurav Karmakar. Chapter 4: Threads  Overview  Multithreading Models  Thread Libraries  Threading Issues  Operating System Examples  Windows XP.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
FTOP: A library for fault tolerance in a cluster R. Badrinath Rakesh Gupta Nisheeth Shrivastava.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
1 Module 3: Processes Reading: Chapter Next Module: –Inter-process Communication –Process Scheduling –Reading: Chapter 4.5, 6.1 – 6.3.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
A Single Intermediate Language That Supports Multiple Implemtntation of Exceptions Delvin Defoe Washington University in Saint Louis Department of Computer.
Introduction to Operating Systems Concepts
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Introduction to threads
Processes and threads.
Self Healing and Dynamic Construction Framework:
Operating Systems (CS 340 D)
Chapter 9: Virtual Memory
COMBINED PAGING AND SEGMENTATION
Introduction to Operating System (OS)
Intro to Processes CSSE 332 Operating Systems
Operating Systems (CS 340 D)
Introduction to Operating Systems
Threads, SMP, and Microkernels
Computer Organization and Design Assembly & Compilation
Threads Chapter 5 2/17/2019 B.Ramamurthy.
Threads Chapter 5 2/23/2019 B.Ramamurthy.
CS510 Operating System Foundations
In Today’s Class.. General Kernel Responsibilities Kernel Organization
Presentation transcript:

Process Introspection: A Checkpoint Mechanism for High Performance Heterogeneous Distributed Systems. University of Virginia. Author: Adam J. Ferrari.

Some Basic Terminology.  What is a Process? A process is an entity that is actually running in an operating system.  What does Introspection mean? Introspection means understanding one’s inner self. ( Merriam-Webster Online)

Goals of the Process Introspection Project.  To construct a checkpoint/restart mechanism for a heterogeneous environment.  This mechanism should be: 1. Efficient, 2. Flexible, 3. Most importantly platform independent.

Heterogeneous Environment.  Became famous mainly due to their better price/performance ratio.  Some characteristics : 1. A conglomeration of workstations running on different operating systems and varied architecture bound together using a network line. 2. Generally used for computing intensive applications where many workstations that are idle/having less load participate in finishing of a task, providing efficient utilization of idle time. 3. User Dedicated machines.

Ex: Our Own Department.

Efficient Utilization.  To take the advantage of these heterogeneous workstations, the following schemes should be provided to the processes: 1. Process Migration. 2. Load Balancing. 3. Fault Tolerance.

Checkpoint/Restart Mechanism.  Mainly Two Phases: 1. To save the current running state of a process. 2. Reconstruct the original running process from the saved image and resume the execution from exactly the interrupted point.

Advantages of using the Checkpoint/Restart Mechanism.  Process Migration. 1. Distributed Load Balancing. 2. Efficient Resource Utilization.  Crash Recovery and Rollback Transaction.  Useful in System Administration.  Lowering the Programming Burden.  Running complex simulation or complex modeling.

Implementation Challenges/Complexity.  Due to the heterogeneous nature of the computing environment the checkpoint/restart mechanism should be platform independent. 1. Capture a state of a running process. 2. Reinstantiate it on a completely different architecture or OS platform which consist of a different instruction set, data format, address space layout.

Existing Implementations.  V migration mechanism. Compiler support is used to generate meta information about a process describing the locations and types of data items to be modified at migration time to mask data representation differences. Disadvantages: 1. Requires Kernel Support. Some other examples: MOSIX, Sprite. 2. Requires data to be stored at the same address in all migrated versions.  Theimer and Hayes. Construct an intermediate source code representation of a running process at the point of migration, and to recompile this source at migration target. Never been implemented.

Process Introspection Design.  Process + Introspection : The ability of a process to examine and describe its own internal state in a logical, and platform independent format.  Extends the technique of handcoding checkpoint/restart mechanism into an automated approach.

Components Involved.  The Process Introspection Design Pattern.  Process Introspection Library (PIL).  Automatic Process Introspection Compiler (APrIL).  Standard Checkpoint Interface.  Central Checkpoint Coordinator.

Process Introspection Design Pattern.  A design template for writing checkpointable codes.  Based on a Process Model.

Adding functionality to the modules.  Ability to save/restore threads of control. 1. Poll points (checkpoint requests) inserted to save call stacks. - Poll point placement is a key performance trade-off issue. 2. Serving a Checkpoint Request. save data and logical point of execution and return to its calling subroutine. 3. Restart a process from checkpoint. restore the variables from the checkpoint and use control flow to reach the correct point of execution, as mentioned by the checkpoint from the initial subroutine that is active at the checkpoint. Call the next subroutine from the checkpointed stack.

Adding functionality to the process contd...  Ability to save/restore memory blocks. -- Should take care of different data representation and address space layout on different platforms. For pointers. 1. Can’t save a raw memory address. 2. Have to save a logical description. High level descriptors are needed.

APrIL Compiler Transformed Code Hand-coded Checkpointable Modules Process Introspection Library (PIL) Checkpoint

Process Introspection Library (PIL).  A consistent API for manipulating the elements of a process.  Automates and integrates: Thread management. Logical Program Counter Stack. Data format conversion. Checkpoint/restart of statically allocated data. Checkpoint/restart of dynamically allocated data. Pointer analysis/description.

APrIL: Automatic Process Introspection Compiler.  A source code translator.  Inserting code to keep the PIL tables updated during run time.  Placement of Poll Points in the module code as the thread executing code in the module periodically polls for checkpoint requests.  During restart, process must restore all threads of execution.

Example - Function Prologues void example(double *A) { int i; double temp[100]; PIL_RegisterStackPointer(temp,PIL_Double,100); if(PIL_CheckpointStatus&PIL_StatusRestoreNow) { int PIL_restore_point = PIL_PopLPCValue(); A = PIL_RestoreStackPointer(); i = PIL_RestoreStackInt(); PIL_RestoreStackDoubles(temp,100); switch(PIL_restore_point) { case 1: PIL_DoneRestart(); goto _PIL_PollPt_1; case 2: goto _PIL_PollPt_2; case 3: PIL_DoneRestart(); goto _PIL_PollPt_3; }

Example - Poll Points _PIL_PollPt_2: i = function(A,X,100); _PIL_PollPt_3: if(PIL_CheckpointStatus&PIL_StatusCheckpointNow) { if(PIL_CheckpointStatus&PIL_StatusCheckpointInProgress) PIL_PushLPCValue(2); else { PIL_PushLPCValue(3); PIL_CheckpointStatus|=PIL_StatusCheckpointInProgress; } goto _PIL_save_frame_; }. _PIL_save_frame_: PIL_SaveStackPointer(A); PIL_SaveStackInt(i); PIL_SaveStackDoubles(X,100); return;

APrIL: Automatic Process Inrospection Compiler. High Level Language Transformed Code Binary 1Binary 2 Binary N APrIL Back End Compilers PIL

Checkpoint Coordination and Module Interfaces.  Helps in achieving interoperation of modules to produce checkpoint or restart processes.  SCI events: Process Startup. – registers any global or data type definitions Checkpoint Start/End – information of the module Restart. – restoring the state from checkpoint.

Judging an implementation.  Little or no Programmer effort.  Convenient Programmer Interface.  Low Checkpoint Request Service Latency.  Low Runtime Overhead.  Control over the number of checkpoints.  Should mix with the environment.

Example Overhead Measurements Run times in seconds, Latencies in milliseconds

Project Status.  Prototype PIL implemented. Tested across multiple platforms : Solaris, IRIX, AIX, OSF1, Linux, Win95/NT.  Example applications demonstrated E.g. matrix multiply, SOR, sort. Hand coded to use PIL. Checkpointed /restarted across above platforms.  APrIL under design and construction

References.  The Process Introspection Project.  Transparent Checkpointing under Unix. J.S. Plank, M.Beck, G. Kingsley, and K. Li.  CRAK: Linux Checkpoint/Restart As a Kernel Module. Hua Zhong and Jason Nieh (Linux taken as example to explain the design concepts).

Thank you. Questions ??