1 Checkpoint and Migration in the Condor Distributed Processing System Presentation by James Nugent CS739 University of Wisconsin Madison.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Threads, SMP, and Microkernels
Operating Systems Lecture 10 Issues in Paging and Virtual Memory Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing.
Basic Grid Projects – Condor Part II Sathish Vadhiyar Sources/Credits: Condor Project web pages.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
OS Fall ’ 02 Introduction Operating Systems Fall 2002.
Home: Phones OFF Please Unix Kernel Parminder Singh Kang Home:
3.5 Interprocess Communication
OS Spring’03 Introduction Operating Systems Spring 2003.
PRASHANTHI NARAYAN NETTEM.
Processes in Unix, Linux, and Windows CS-502 Fall Processes in Unix, Linux, and Windows CS502 Operating Systems (Slides include materials from Operating.
Condor Overview Bill Hoagland. Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware.
CSE 451: Operating Systems Autumn 2013 Module 6 Review of Processes, Kernel Threads, User-Level Threads Ed Lazowska 570 Allen.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
1 CS503: Operating Systems Part 1: OS Interface Dongyan Xu Department of Computer Science Purdue University.
Chapter 8 Windows Outline Programming Windows 2000 System structure Processes and threads in Windows 2000 Memory management The Windows 2000 file.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
System Calls 1.
Yavor Todorov. Introduction How it works OS level checkpointing Application level checkpointing CPR for parallel programing CPR functionality References.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
Implementing Processes and Process Management Brian Bershad.
Grid Computing I CONDOR.
Transparent Process Migration: Design Alternatives and the Sprite Implementation Fred Douglis and John Ousterhout.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Background: Operating Systems Brad Karp UCL Computer Science CS GZ03 / M th November, 2008.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
Processes and Process Control 1. Processes and Process Control 2. Definitions of a Process 3. Systems state vs. Process State 4. A 2 State Process Model.
Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
Processes CS 6560: Operating Systems Design. 2 Von Neuman Model Both text (program) and data reside in memory Execution cycle Fetch instruction Decode.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Processes, Threads, and Process States. Programs and Processes  Program: an executable file (before/after compilation)  Process: an instance of a program.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
FTOP: A library for fault tolerance in a cluster R. Badrinath Rakesh Gupta Nisheeth Shrivastava.
Threads, SMP, and Microkernels Chapter 4. Processes and Threads Operating systems use processes for two purposes - Resource allocation and resource ownership.
1 Module 3: Processes Reading: Chapter Next Module: –Inter-process Communication –Process Scheduling –Reading: Chapter 4.5, 6.1 – 6.3.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
Multiprogramming. Readings r Chapter 2.1 of the textbook.
Module 12: I/O Systems I/O hardware Application I/O Interface
Processes and threads.
Condor – A Hunter of Idle Workstation
Chapter 2: System Structures
Condor: Job Management
Threads and Cooperation
Introduction to Operating Systems
Threads, SMP, and Microkernels
Basic Grid Projects – Condor (Part I)
CSE 451: Operating Systems Spring 2012 Module 6 Review of Processes, Kernel Threads, User-Level Threads Ed Lazowska 570 Allen.
Lecture Topics: 11/1 General Operating System Concepts Processes
Threads Chapter 4.
Processes Hank Levy 1.
Threads Chapter 5 2/17/2019 B.Ramamurthy.
Threads Chapter 5 2/23/2019 B.Ramamurthy.
Still Chapter 2 (Based on Silberchatz’s text and Nachos Roadmap.)
Atlas: An Infrastructure for Global Computing
CSE 153 Design of Operating Systems Winter 2019
Processes Hank Levy 1.
Operating System Overview
Presentation transcript:

1 Checkpoint and Migration in the Condor Distributed Processing System Presentation by James Nugent CS739 University of Wisconsin Madison

2 Primary Sources  J. Basney and M. Livny, "Deploying a High Throughput Computing Cluster", in High Performance Cluster Computing, Rajkumar  Condor Web page  M. Litzkow, T. Tannenbaum, J. Basney, and M. Livny, "Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System", Computer Sciences Technical Report #1346, University of Wisconsin-Madison, April  Michael Litzkow, Miron Livny, and Matt Mutka, "Condor - A Hunter of Idle Workstations", Proceedings of the 8th International Conference of Distributed Computing Systems, pages , June, 1988.

3 Outline  Goals of Condor  Overview of Condor  Scheduling  Remote Execution  Checkpointing  Flocking

4 Goals  Opportunistically uses idle cycles for work Remote machines have intermittent availability Lower reliability, as cluster processing is not the primary function of the machines used.  Portability Only requires re-linking(non-relinked apps can’t be checkpointed)  Remote machines not a part of system.

5 Voluntary  People should see benefit of using it, At at least no penalty for letting others use their machine. No security problem from using Condor, or Condor using your machine. Low impact on remote machines perf.

6 Implementation Challenges  Runs serial programs in remote locations, without any source code changes to program. Program must be re-linked  Programs may have to move to a different machine Checkpointing allows process migration and provides reliability.  Provide transparent access to remote resources

7 Basic Functioning-Condor Pool Central Manager ABCD Condor Pool Select A+B Job Unavailable Job +C

8 Basic Execution Model Shadow Source Machine Remote Machine Job Condor Daemon Create Shadow process. 2.Send job to remote machine. 3)Remote runs the job 4)Job sends remote system calls to shadow process on source machine

9 Scheduling CM selects A machine Machine has jobs to run? Can next job find a remote to run on? Y Next job runs Y N N Start

10 Central Manager—Select a machine to have a job run.  Priority(Machine)=Base Priority-(Users)+(# Running Jobs) (Users) is the number of users who have jobs local to this machine, i.e. their jobs started on this machine Lower values are a higher priority Ties are broken randomly CM periodically receives info from each machine. Number of remote jobs machine is running Number of remote jobs machine wants to run This is the only state about machines the CM stores, which reduces stored state and what machines must communicate CM.

11 Fair Scheduling  If user A submits 4 jobs and user B submits 1 then: User A’s jobs wait 1/5 of the time. User B’s job must wait 4/5 of the time. Assumes there is insufficient capacity to schedule all jobs immediatly User B contributes as much computing power to the system as User A.

12 Up-Down scheduling— Workstation priorities  Each machine has a scheduling index.(SI) Lower indexes have higher priority Index goes up when a machine is granted capacity.(Priority Decreases) Index goes down when a machine is denied access to capacity (Priority Increases) Index also decreases slowly over time.(Priority Increases over time)

13 Host Machines  Priority(Job)=(UserP*10)+Runing(1|0)+( Wait Time/ ). Runing jobs consume disk space  Jobs are ordered first by User Priority, then by if they are running or not, and finally by wait time.

14 Classads—Finding Servers for Jobs  Classadds are tuples of attribute names and expressions. (Requirements= Other.Memory > 32)  Two classads match if their requirements attributes evaluate to true in the context of each other.  Example Job: (Requirements Other.Memory > 32 && Other.Arch=“Alpha”) Machine: ( Memory=64),(Arch=“Alpha”) These two classads would match.

15 Remote Execution- Goals  Execute a program on a remote machine like it is executing on the local machine.  All the machines in a pool should not need to be identically configured in order to run remote jobs.  CM should only be involved in starting and stopping jobs, not while they are running.

16 Resource Access  Remote Resource access Have a shadow process on local machine Condor library sends system calls to local machine Thus there is a penalty to submitting from one’s machine with this mechanism. Also extra traffic. Users does not need account on remote machine  Jobs execute as ‘nobody’ Compensates for system differences  AFS,NFS Shadow checks for possible use.(I.e. is file reachable? Looks up local mappings) If file can be reach via the file system, this is used instead of remote system calls for performance reasons.

17 Interposition-Sending Syscalls Remote Job Shadow process on local machine Condor Library …. … fd=open(args); …. int open(args){ send(open,args); receive(result); store(args,result); return result;} receive(open, args); result=syscall(open, args); send(result); 1 2 3

18 Limitations  File operations must be idempotent Example: checkpoint Open a file for append Write a file Rollback to previous checkpoint.  Some functionality limited No socket communications Only single process jobs. Some signals reserved for condor use(SIGUSR2, SIGTSTP) No interprocess communication(pipes, semaphores or shared memory) Alarms,timers and sleeping not allowed. Multiple kernel threads not allowed(User level threads ok) Memory mapped files not allowed File locks not retained between checkpoints. Network communication must be brief Communication defers checkpoints

19 When Users Return Suspend Job Is user still using machine? Vacate Job ( Start Chkpting) Start Y Is chkpt done yet? Kill Job Checkpoint Migrates N Y Wait 5 minutes Job resumes execution N Wait 5 minutes

20 Goals of Checkpointing/Migration  Packages up a process, including dynamic state  Does not require any special considerations on programmers part; done by a linked in library.  Provides fault tolerance & reliability.

21 Checkpoint Considerations  Must schedule checkpoints to prevent excess checkpoint server/network traffic. Example: Many jobs submitted in a cluster, checkpoint interval is fixed, so they would all checkpoint at the same time. Solution: Have checkpoint server schedule checkpoints to avoid excessive network use.  Checkpoints use a transactional model A checkpoint will either: 1. Succeed 2. Fail and roll back to the previous checkpoint Only the most recent successful checkpoint is stored.

22 Checkpointing—The Job Stack Text Data Heap Shared Libraries Condor Lib  Identify Segments Use /proc interface to get segment addresses Compare segment addresses to know data to determine which segment is which.  A global variable marks the data segment  Static function marks the text segment  Stack pointer and stack ending address mark stack segments  All others are dynamic library text, or data  Most segments can be stored with no special actions needed  Static text segment is the executable and does not need to be stored  Stack Info saved by Setjmp(includes registers & CPU state)  Signal state must also be saved For instance if a signal is pending.

23 Restart—I Shadow process created on machine by Condor daemon Sets up environment Check if files must use remote syscalls or a networked FS Shadow creates segments Shadow restores segments Data Text Heap Shared Libraries Condor Lib

24 Restart—II File state must be restored Lseek sets file pointers to correct spot Blocked signals must be re-sent Stack restored Return to job Appears like a return from a signal handler Data Stack Text Temp Stack Heap Shared Libraries Condor Lib Kernel state Signals

25 Flocking-Goals  Ties groups of condor pools into, effectively, one large pool.  No change to CM  User see no difference if a job is in the Flock or their local pool.  Fault Tolerance: Fail in one pool should not effect rest of flock  Allows each group to maintain authority over their own machines  Having one large pool, instead of a flock, could overload the central manager

26 Flocking Net CM WM CM WM CM WM CM WM Net Pool M MMMM M M M MMM M M MMM M M M Net M MM Possible Job assignment

27 Flocking--Limitations World Machine uses normal priority system Machines can run on flock, when local space available World machine cannot be used to actually run jobs Word machine can only present itself as having one set of hardware The World machine can only receive one job per scheduling interval, no matter how many machines are available.

28 Bibliography  7.M.L. Powell and B.P. Miller, "Process Migration in DEMOS/MP", 9th Symposium on Operating Systems Principles, Bretton Woods, NH, October 1983, pp  9.E. Zayas, "Attacking the Process Migration Bottleneck", 11th Symposium on Operating Systems Principles, Austin, TX, November 1987, pp  Michael Litzkow, Miron Livny, and Matt Mutka, "Condor - A Hunter of Idle Workstations", Proceedings of the 8th International Conference of Distributed Computing Systems, pages , June,  Matt Mutka and Miron Livny, "Scheduling Remote Processing Capacity In A Workstation- Processing Bank Computing System", Proceedings of the 7th International Conference of Distributed Computing Systems, pp. 2-9, September,  Jim Basney and Miron Livny, "Deploying a High Throughput Computing Cluster", High Performance Cluster Computing, Rajkumar Buyya, Editor, Vol. 1, Chapter 5, Prentice Hall PTR, May  Miron Livny, Jim Basney, Rajesh Raman, and Todd Tannenbaum, "Mechanisms for High Throughput Computing", SPEEDUP Journal, Vol. 11, No. 1, June 1997.

29 Bibliography II  Jim Basney, Miron Livny, and Todd Tannenbaum, "High Throughput Computing with Condor", HPCU news, Volume 1(2), June  D. H. J Epema, Miron Livny, R. van Dantzig, X. Evers, and Jim Pruyne, "A Worldwide Flock of Condors : Load Sharing among Workstation Clusters" Journal on Future Generations of Computer Systems, Volume 12, 1996  Scott Fields, "Hunting for Wasted Computing Power", 1993 Research Sampler, University of Wisconsin- Madison.  Jim Basney and Miron Livny, "Managing Network Resources in Condor", Proceedings of the Ninth IEEE Symposium on High Performance Distributed Computing (HPDC9), Pittsburgh, Pennsylvania, August 2000, pp  Jim Basney and Miron Livny, "Improving Goodput by Co-scheduling CPU and Network Capacity", International Journal of High Performance Computing Applications, Volume 13(3), Fall  Rajesh Raman and Miron Livny, "High Throughput Resource Management", chapter 13 in The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, San Francisco, California,  Matchmaking: Distributed Resource Management for High Throughput Computing Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing, July 28-31, 1998, Chicago, IL.

30 Communication & Files  File state saved by overloaded library calls Syscall  Signal state must also be saved  Flush communications channels and store them  If only one endpoint is checkpointable Communication with only one checkpointed endpoint is done with a ‘switchboard process’. Example: license server Switchboard buffers communication till app Restarted. May be a problem if licensee server expects prompt replies(need to be specific)

31 Basic Problems  How are jobs scheduled on remote machines?  How are jobs moved around?  How are issues such as portability and user impact dealt with?

32 Basic Functioning-Condor Pool Host Machine: A Central Manager Remote Machines Select:A & B Job+Condor Lib Network

33 Checkpointing--Process  Identify Segments /proc interface Compare to know data to determine which is which  A global variable marks the data segment  Static function marks the text segment  Stack pointer and stack ending address mark stack segments  All others are dynamic library text, or data  Most segments can just be written Exception:Static text==Executable  Stack Info saved by Setjmp(includes registers & CPU state)

34 Additional Scheduling  Scheduling affected by parallel PVM apps. (Master on submit machine, never preemt) Opportunistic workers. Since workers contact master, avoids problems Of having two mobile endpoints.(workers know where master is)  Jobs can be ordered in a directed acyclic graph.