1 Checkpoint and Migration in the Condor Distributed Processing System Presentation by James Nugent CS739 University of Wisconsin Madison.

1 Checkpoint and Migration in the Condor Distributed Processing System Presentation by James Nugent CS739 University of Wisconsin Madison

2 Primary Sources  J. Basney and M. Livny, "Deploying a High Throughput Computing Cluster", in High Performance Cluster Computing, Rajkumar  Condor Web page http://www.cs.wisc.edu/condor/http://www.cs.wisc.edu/condor/  M. Litzkow, T. Tannenbaum, J. Basney, and M. Livny, "Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System", Computer Sciences Technical Report #1346, University of Wisconsin-Madison, April 1997.  Michael Litzkow, Miron Livny, and Matt Mutka, "Condor - A Hunter of Idle Workstations", Proceedings of the 8th International Conference of Distributed Computing Systems, pages 104-111, June, 1988.

3 Outline  Goals of Condor  Overview of Condor  Scheduling  Remote Execution  Checkpointing  Flocking

4 Goals  Opportunistically uses idle cycles for work Remote machines have intermittent availability Lower reliability, as cluster processing is not the primary function of the machines used.  Portability Only requires re-linking(non-relinked apps can’t be checkpointed)  Remote machines not a part of system.

5 Voluntary  People should see benefit of using it, At at least no penalty for letting others use their machine. No security problem from using Condor, or Condor using your machine. Low impact on remote machines perf.

6 Implementation Challenges  Runs serial programs in remote locations, without any source code changes to program. Program must be re-linked  Programs may have to move to a different machine Checkpointing allows process migration and provides reliability.  Provide transparent access to remote resources

7 Basic Functioning-Condor Pool Central Manager ABCD Condor Pool Select A+B Job Unavailable Job +C

8 Basic Execution Model Shadow Source Machine Remote Machine Job Condor Daemon 1 2 3 4 1.Create Shadow process. 2.Send job to remote machine. 3)Remote runs the job 4)Job sends remote system calls to shadow process on source machine

9 Scheduling CM selects A machine Machine has jobs to run? Can next job find a remote to run on? Y Next job runs Y N N Start

10 Central Manager—Select a machine to have a job run.  Priority(Machine)=Base Priority-(Users)+(# Running Jobs) (Users) is the number of users who have jobs local to this machine, i.e. their jobs started on this machine Lower values are a higher priority Ties are broken randomly CM periodically receives info from each machine. Number of remote jobs machine is running Number of remote jobs machine wants to run This is the only state about machines the CM stores, which reduces stored state and what machines must communicate CM.

11 Fair Scheduling  If user A submits 4 jobs and user B submits 1 then: User A’s jobs wait 1/5 of the time. User B’s job must wait 4/5 of the time. Assumes there is insufficient capacity to schedule all jobs immediatly User B contributes as much computing power to the system as User A.

12 Up-Down scheduling— Workstation priorities  Each machine has a scheduling index.(SI) Lower indexes have higher priority Index goes up when a machine is granted capacity.(Priority Decreases) Index goes down when a machine is denied access to capacity (Priority Increases) Index also decreases slowly over time.(Priority Increases over time)

13 Host Machines  Priority(Job)=(UserP*10)+Runing(1|0)+( Wait Time/1000000000). Runing jobs consume disk space  Jobs are ordered first by User Priority, then by if they are running or not, and finally by wait time.

14 Classads—Finding Servers for Jobs  Classadds are tuples of attribute names and expressions. (Requirements= Other.Memory > 32)  Two classads match if their requirements attributes evaluate to true in the context of each other.  Example Job: (Requirements Other.Memory > 32 && Other.Arch=“Alpha”) Machine: ( Memory=64),(Arch=“Alpha”) These two classads would match.

15 Remote Execution- Goals  Execute a program on a remote machine like it is executing on the local machine.  All the machines in a pool should not need to be identically configured in order to run remote jobs.  CM should only be involved in starting and stopping jobs, not while they are running.

16 Resource Access  Remote Resource access Have a shadow process on local machine Condor library sends system calls to local machine Thus there is a penalty to submitting from one’s machine with this mechanism. Also extra traffic. Users does not need account on remote machine  Jobs execute as ‘nobody’ Compensates for system differences  AFS,NFS Shadow checks for possible use.(I.e. is file reachable? Looks up local mappings) If file can be reach via the file system, this is used instead of remote system calls for performance reasons.

17 Interposition-Sending Syscalls Remote Job Shadow process on local machine Condor Library …. … fd=open(args); …. int open(args){ send(open,args); receive(result); store(args,result); return result;} receive(open, args); result=syscall(open, args); send(result); 1 2 3

18 Limitations  File operations must be idempotent Example: checkpoint Open a file for append Write a file Rollback to previous checkpoint.  Some functionality limited No socket communications Only single process jobs. Some signals reserved for condor use(SIGUSR2, SIGTSTP) No interprocess communication(pipes, semaphores or shared memory) Alarms,timers and sleeping not allowed. Multiple kernel threads not allowed(User level threads ok) Memory mapped files not allowed File locks not retained between checkpoints. Network communication must be brief Communication defers checkpoints

19 When Users Return Suspend Job Is user still using machine? Vacate Job ( Start Chkpting) Start Y Is chkpt done yet? Kill Job Checkpoint Migrates N Y Wait 5 minutes Job resumes execution N Wait 5 minutes

20 Goals of Checkpointing/Migration  Packages up a process, including dynamic state  Does not require any special considerations on programmers part; done by a linked in library.  Provides fault tolerance & reliability.

21 Checkpoint Considerations  Must schedule checkpoints to prevent excess checkpoint server/network traffic. Example: Many jobs submitted in a cluster, checkpoint interval is fixed, so they would all checkpoint at the same time. Solution: Have checkpoint server schedule checkpoints to avoid excessive network use.  Checkpoints use a transactional model A checkpoint will either: 1. Succeed 2. Fail and roll back to the previous checkpoint Only the most recent successful checkpoint is stored.

22 Checkpointing—The Job Stack Text Data Heap Shared Libraries Condor Lib  Identify Segments Use /proc interface to get segment addresses Compare segment addresses to know data to determine which segment is which.  A global variable marks the data segment  Static function marks the text segment  Stack pointer and stack ending address mark stack segments  All others are dynamic library text, or data  Most segments can be stored with no special actions needed  Static text segment is the executable and does not need to be stored  Stack Info saved by Setjmp(includes registers & CPU state)  Signal state must also be saved For instance if a signal is pending.

23 Restart—I Shadow process created on machine by Condor daemon Sets up environment Check if files must use remote syscalls or a networked FS Shadow creates segments Shadow restores segments Data Text Heap Shared Libraries Condor Lib

24 Restart—II File state must be restored Lseek sets file pointers to correct spot Blocked signals must be re-sent Stack restored Return to job Appears like a return from a signal handler Data Stack Text Temp Stack Heap Shared Libraries Condor Lib Kernel state Signals

25 Flocking-Goals  Ties groups of condor pools into, effectively, one large pool.  No change to CM  User see no difference if a job is in the Flock or their local pool.  Fault Tolerance: Fail in one pool should not effect rest of flock  Allows each group to maintain authority over their own machines  Having one large pool, instead of a flock, could overload the central manager

26 Flocking Net CM WM CM WM CM WM CM WM Net Pool M MMMM M M M MMM M M MMM M M M Net M MM Possible Job assignment

27 Flocking--Limitations World Machine uses normal priority system Machines can run on flock, when local space available World machine cannot be used to actually run jobs Word machine can only present itself as having one set of hardware The World machine can only receive one job per scheduling interval, no matter how many machines are available.

28 Bibliography  7.M.L. Powell and B.P. Miller, "Process Migration in DEMOS/MP", 9th Symposium on Operating Systems Principles, Bretton Woods, NH, October 1983, pp. 110-119.  9.E. Zayas, "Attacking the Process Migration Bottleneck", 11th Symposium on Operating Systems Principles, Austin, TX, November 1987, pp. 13-24.  Michael Litzkow, Miron Livny, and Matt Mutka, "Condor - A Hunter of Idle Workstations", Proceedings of the 8th International Conference of Distributed Computing Systems, pages 104- 111, June, 1988.  Matt Mutka and Miron Livny, "Scheduling Remote Processing Capacity In A Workstation- Processing Bank Computing System", Proceedings of the 7th International Conference of Distributed Computing Systems, pp. 2-9, September, 1987.  Jim Basney and Miron Livny, "Deploying a High Throughput Computing Cluster", High Performance Cluster Computing, Rajkumar Buyya, Editor, Vol. 1, Chapter 5, Prentice Hall PTR, May 1999.  Miron Livny, Jim Basney, Rajesh Raman, and Todd Tannenbaum, "Mechanisms for High Throughput Computing", SPEEDUP Journal, Vol. 11, No. 1, June 1997.

29 Bibliography II  Jim Basney, Miron Livny, and Todd Tannenbaum, "High Throughput Computing with Condor", HPCU news, Volume 1(2), June 1997.  D. H. J Epema, Miron Livny, R. van Dantzig, X. Evers, and Jim Pruyne, "A Worldwide Flock of Condors : Load Sharing among Workstation Clusters" Journal on Future Generations of Computer Systems, Volume 12, 1996  Scott Fields, "Hunting for Wasted Computing Power", 1993 Research Sampler, University of Wisconsin- Madison.  Jim Basney and Miron Livny, "Managing Network Resources in Condor", Proceedings of the Ninth IEEE Symposium on High Performance Distributed Computing (HPDC9), Pittsburgh, Pennsylvania, August 2000, pp 298-299.  Jim Basney and Miron Livny, "Improving Goodput by Co-scheduling CPU and Network Capacity", International Journal of High Performance Computing Applications, Volume 13(3), Fall 1999.  Rajesh Raman and Miron Livny, "High Throughput Resource Management", chapter 13 in The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, San Francisco, California, 1999.  Matchmaking: Distributed Resource Management for High Throughput Computing Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing, July 28-31, 1998, Chicago, IL.

30 Communication & Files  File state saved by overloaded library calls Syscall  Signal state must also be saved  Flush communications channels and store them  If only one endpoint is checkpointable Communication with only one checkpointed endpoint is done with a ‘switchboard process’. Example: license server Switchboard buffers communication till app Restarted. May be a problem if licensee server expects prompt replies(need to be specific)

31 Basic Problems  How are jobs scheduled on remote machines?  How are jobs moved around?  How are issues such as portability and user impact dealt with?

32 Basic Functioning-Condor Pool Host Machine: A Central Manager Remote Machines Select:A & B Job+Condor Lib Network

33 Checkpointing--Process  Identify Segments /proc interface Compare to know data to determine which is which  A global variable marks the data segment  Static function marks the text segment  Stack pointer and stack ending address mark stack segments  All others are dynamic library text, or data  Most segments can just be written Exception:Static text==Executable  Stack Info saved by Setjmp(includes registers & CPU state)

34 Additional Scheduling  Scheduling affected by parallel PVM apps. (Master on submit machine, never preemt) Opportunistic workers. Since workers contact master, avoids problems Of having two mobile endpoints.(workers know where master is)  Jobs can be ordered in a directed acyclic graph.

1 Checkpoint and Migration in the Condor Distributed Processing System Presentation by James Nugent CS739 University of Wisconsin Madison.

Similar presentations

Presentation on theme: "1 Checkpoint and Migration in the Condor Distributed Processing System Presentation by James Nugent CS739 University of Wisconsin Madison."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Checkpoint and Migration in the Condor Distributed Processing System Presentation by James Nugent CS739 University of Wisconsin Madison.

Similar presentations

Presentation on theme: "1 Checkpoint and Migration in the Condor Distributed Processing System Presentation by James Nugent CS739 University of Wisconsin Madison."— Presentation transcript:

Similar presentations

About project

Feedback