CSE 490dp Check-pointing and Migration Robert Grimm.

Slides:



Advertisements
Similar presentations
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Advertisements

More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
M. Muztaba Fuad Masters in Computer Science Department of Computer Science Adelaide University Supervised By Dr. Michael J. Oudshoorn Associate Professor.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Network Operating Systems Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: –Logging into the.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of.
490dp Synchronous vs. Asynchronous Invocation Robert Grimm.
Processes CSCI 444/544 Operating Systems Fall 2008.
490dp Introduction Robert Grimm. The Computer for the 21 st Century “The most profound technologies are those that disappear. They weave themselves into.
Advanced OS Chapter 3p2 Sections 3.4 / 3.5. Interrupts These enable software to respond to signals from hardware. The set of instructions to be executed.
CS-502 Fall 2006Processes in Unix, Linux, & Windows 1 Processes in Unix, Linux, and Windows CS502 Operating Systems.
CS 603 Threads, Processes, and Agents March 18, 2002.
Communication in Distributed Systems –Part 2
Tcl Agent : A flexible and secure mobile-agent system Paper by Robert S. Gray Dartmouth College Presented by Vipul Sawhney University of Pennsylvania.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Cross Cluster Migration Remote access support Adianto Wibisono supervised by : Dr. Dick van Albada Kamil Iskra, M. Sc.
PRASHANTHI NARAYAN NETTEM.
1 Distributed Systems: Distributed Process Management – Process Migration.
CSE 490dp Resource Control Robert Grimm. Problems How to access resources? –Basic usage tracking How to measure resource consumption? –Accounting How.
DISTRIBUTED PROCESS IMPLEMENTAION BHAVIN KANSARA.
VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Distributed Process Implementation
System Calls 1.
Yavor Todorov. Introduction How it works OS level checkpointing Application level checkpointing CPR for parallel programing CPR functionality References.
Processes, Threads and Virtualization
Introduction to Processes CS Intoduction to Operating Systems.
Implementing Processes and Process Management Brian Bershad.
Operating Systems Lecture 2 Processes and Threads Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Processes & Threads Emery Berger and Mark Corner University.
G-JavaMPI: A Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports Lin Chen, Cho-Li Wang, Francis C. M. Lau and.
Transparent Process Migration: Design Alternatives and the Sprite Implementation Fred Douglis and John Ousterhout.
Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-1 Process Concepts Department of Computer Science and Software.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Comparison of Distributed Operating Systems. Systems Discussed ◦Plan 9 ◦AgentOS ◦Clouds ◦E1 ◦MOSIX.
Multiprogramming. Readings r Silberschatz, Galvin, Gagne, “Operating System Concepts”, 8 th edition: Chapter 3.1, 3.2.
Beowulf Software. Monitoring and Administration Beowulf Watch 
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
4P13 Week 3 Talking Points 1. Process State 2 Process Structure Catagories – Process identification: the PID and the parent PID – Signal state: signals.
Computer Science Lecture 7, page 1 CS677: Distributed OS Multiprocessor Scheduling Will consider only shared memory multiprocessor Salient features: –One.
Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.
Linux Processes Travis Willey Jeff Mihalik. What is a process? A process is a program in execution A process includes: –program counter –stack –data section.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
Processes, Threads, and Process States. Programs and Processes  Program: an executable file (before/after compilation)  Process: an instance of a program.
Threads. Readings r Silberschatz et al : Chapter 4.
Cs431-cotter1 Processes and Threads Tanenbaum 2.1, 2.2 Crowley Chapters 3, 5 Stallings Chapter 3, 4 Silberschaz & Galvin 3, 4.
FTOP: A library for fault tolerance in a cluster R. Badrinath Rakesh Gupta Nisheeth Shrivastava.
EEL 5937 Mobile agents (2) EEL 5937 Multi Agent Systems Lotzi Bölöni.
Operating Systems Distributed-System Structures. Topics –Network-Operating Systems –Distributed-Operating Systems –Remote Services –Robustness –Design.
1 Module 3: Processes Reading: Chapter Next Module: –Inter-process Communication –Process Scheduling –Reading: Chapter 4.5, 6.1 – 6.3.
Multiprogramming. Readings r Chapter 2.1 of the textbook.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Processes and threads.
Chapter 3: Process Concept
Intro to Processes CSSE 332 Operating Systems
System Structure and Process Model
Introduction to Operating Systems
System Structure B. Ramamurthy.
System Structure and Process Model
Mid Term review CSC345.
Process Migration Troy Cogburn and Gilbert Podell-Blume
CSE 451: Operating Systems Spring Module 21 Distributed File Systems
Threads Chapter 5 2/17/2019 B.Ramamurthy.
Prof. Leonardo Mostarda University of Camerino
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
CS510 Operating System Foundations
Developer: Thadpong Pongthawornkamol
Process/Code Migration and Cloning
Presentation transcript:

CSE 490dp Check-pointing and Migration Robert Grimm

Problem How to capture the state of an application? –Save and restore application –Clone application –Move application to a different node Technical issue

Motivation Failure resilience –Restart application after failure Performance –Balance load across several nodes –Co-locate application with (remote) data Availability –Move away from nodes that are going to go down –Follow a user as she moves through physical world

Application State Internal data –Memory, objects Execution state –Thread-based: Stack, registers –Event-based: Event queue Connections –Open files, sockets Outside data –Executables –Stored data

What State to Capture? Issue: Degree of transparency –Fully transparent Application cannot tell the difference –No transparency Application needs to do everything itself

Internal Data Most basic application state –Memory – copy C, C++ –Objects – serialize Modula-3 Java

Execution State System must be quiescent –All execution is suspended Thread-based: State is implicit –Stack –Registers, including PC –Condition variable queues –Very low level Event-based: State is explicit –Event queue

Connections Open files, sockets, etc. Problems –May change while application is not executing Check-points –May not be available on new node Migration

Alternative Let application restore its connections –Harder for thread-based systems Thread may be accessing file or socket –Easier for event-based systems Tell application to restore connections –Explicit event

Outside Data Executables, stored data Make data available everywhere –Distributed file system Move executable(s) with application –Support moving code but not other data Group data and applications –Environments in one.world Hierarchy moved as one unit

Three Points in the Design Space Sprite [Douglis & Ousterhout 91] Aglets [Lange & Oshima 98] –Representative of Java-based agent systems one.world

Sprite Process migration motivated by performance –Use idle machines Transferred application state –Data –Execution state –Open connections “It turned out to be particularly difficult in Sprite to migrate the state associated with open files”

Transparency in Sprite Application seems to be on “home machine” –Location-independent kernel calls File system –Transfer execution state VM, open files, PIDs, UIDs, resource usage statistics –Call back to home machine gettimeofday –Modify state on both machines fork, exit, wait

Aglets Mobile agent system –“Clean” platform for experimenting with mobile agents Transferred application state –Data Relies on Java serialization –Executables Lazily – only currently used classes

Limitations Not transferred –Execution state Not supported by Java Applications need to implement their own state machines –Outside data beyond executables Not part of platform

one.world Failure resilience, availability, (performance) –checkpoint, restore, move, clone Transferred state –Data –Execution state Event queue –Outside data Environment hierarchy Not transferred –Open connections

Programming for Change Pervasive computing environment –Highly dynamic Tens of thousand of nodes and services come and go Applications –Cannot assume existence or availability of resources –Need to be prepared to re-acquire any resource at any time

Summary Sprite –Full migration, full transparency Does not scale across a global network Aglets –Limited environment with limited migration one.world –Better balance between no migration and full migration (?)