Download presentation
Presentation is loading. Please wait.
1
CSE 490dp Check-pointing and Migration Robert Grimm
2
Problem How to capture the state of an application? –Save and restore application –Clone application –Move application to a different node Technical issue
3
Motivation Failure resilience –Restart application after failure Performance –Balance load across several nodes –Co-locate application with (remote) data Availability –Move away from nodes that are going to go down –Follow a user as she moves through physical world
4
Application State Internal data –Memory, objects Execution state –Thread-based: Stack, registers –Event-based: Event queue Connections –Open files, sockets Outside data –Executables –Stored data
5
What State to Capture? Issue: Degree of transparency –Fully transparent Application cannot tell the difference –No transparency Application needs to do everything itself
6
Internal Data Most basic application state –Memory – copy C, C++ –Objects – serialize Modula-3 Java
7
Execution State System must be quiescent –All execution is suspended Thread-based: State is implicit –Stack –Registers, including PC –Condition variable queues –Very low level Event-based: State is explicit –Event queue
8
Connections Open files, sockets, etc. Problems –May change while application is not executing Check-points –May not be available on new node Migration
9
Alternative Let application restore its connections –Harder for thread-based systems Thread may be accessing file or socket –Easier for event-based systems Tell application to restore connections –Explicit event
10
Outside Data Executables, stored data Make data available everywhere –Distributed file system Move executable(s) with application –Support moving code but not other data Group data and applications –Environments in one.world Hierarchy moved as one unit
11
Three Points in the Design Space Sprite [Douglis & Ousterhout 91] Aglets [Lange & Oshima 98] –Representative of Java-based agent systems one.world
12
Sprite Process migration motivated by performance –Use idle machines Transferred application state –Data –Execution state –Open connections “It turned out to be particularly difficult in Sprite to migrate the state associated with open files”
13
Transparency in Sprite Application seems to be on “home machine” –Location-independent kernel calls File system –Transfer execution state VM, open files, PIDs, UIDs, resource usage statistics –Call back to home machine gettimeofday –Modify state on both machines fork, exit, wait
14
Aglets Mobile agent system –“Clean” platform for experimenting with mobile agents Transferred application state –Data Relies on Java serialization –Executables Lazily – only currently used classes
15
Limitations Not transferred –Execution state Not supported by Java Applications need to implement their own state machines –Outside data beyond executables Not part of platform
16
one.world Failure resilience, availability, (performance) –checkpoint, restore, move, clone Transferred state –Data –Execution state Event queue –Outside data Environment hierarchy Not transferred –Open connections
17
Programming for Change Pervasive computing environment –Highly dynamic Tens of thousand of nodes and services come and go Applications –Cannot assume existence or availability of resources –Need to be prepared to re-acquire any resource at any time
18
Summary Sprite –Full migration, full transparency Does not scale across a global network Aglets –Limited environment with limited migration one.world –Better balance between no migration and full migration (?)
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.