4 Motivations In a locally distributed system, there is a good possibility that several computers are heavily loaded while others are idle or lightly loaded If we can move jobs around (in other words, distribute the load more evenly), the overall performance of the system can be maximized
5 Motivations (cont) A distributed scheduler is a resource management component of a distributed operating system that focuses on judiciously and transparently redistributing the load of the system among the computers to maximize the overall performance
7 Issues In Load Distributing Load estimation Resource queue lengths CPU utilization
8 Issues In Load Distributing (cont) Load distributing algorithms Basic function: transfer load (tasks) from heavily loaded computers to idle or lightly loaded computers. Can be characterized as: Static: decisions are hard-wired in the algorithm using a priori knowledge of the system. Dynamic: use system state information (the loads at nodes), at least in part. Adaptive: dynamically changing parameters of the algorithm to suit the changing system state.
9 Issues In Load Distributing (cont) Load balancing vs. load sharing Strive to reduce the likelihood of an unshared state (a state in which one computer lies idle while at the same time tasks contend for service at another computer) by transferring tasks to lightly loaded nodes. Load balancing algorithms go a step further by attemping to equalize loads at all computers
10 Issues In Load Distributing (cont) Preemptive vs. Nonpreemptive Transfers Preemptive task transfers involve the transfer of a task that is partially executed. Nonpreemptive task transfers involve the transfer of a task that have not begun execution.
11 Components Of A Load Distributing Algorithm Transfer Policy Determines when a node needs to send tasks to other nodes or can receive tasks from other nodes Threshold policy
12 Components Of A Load Distributing Algorithm (cont) Selection Policy Determines which task(s) to transfer Newly originated tasks that have caused the node to become a sender by increasing the load at the node > threshold Estimated average execution time for task > threshold Response time will be improved upon transfer Overhead incurred by the transfer should be minimal The number of location-dependent system calls made by the selected task should be minimal
13 Components Of A Load Distributing Algorithm (cont) Location Policy Find suitable nodes for load sharing Information policy Deciding when information about the states of other nodes in the system should be collected, where it should be collected from, and what information should be collected. Demand-driven: a node collects the state of other nodes only when it becomes either a sender or receiver. Periodic: nodes exchange load information periodically. State-change-driven: nodes disseminate state information whenever their state changes by a certain degree.
14 Stability The queuing-theoretic perspective A load distributing algorithm is effective under a given set of conditions if it improves the performance relative to that of a system not using load distribution Algorithmic perspective An algorithm is unstable if it can perform fruitless actions indefinitely with finite probability
17 Sender-Initiated Algorithms Load distributing activity is initiated by an overloaded node (sender) that attemps to send a task to an underloaded node (receiver). Transfer policy: uses a threshold policy based on CPU queue length. Selection policy: consider only newly arrived tasks for transfer. Location policy: Random Threshold Shortest
18 Sender-Initiated Algorithms (cont.) Random: A task is simply transferred to a node selected at random, with no information exchange between the nodes to aid in decision making. Problem: useless task transfers can occur when a task is transferred to a node that is already heavily loaded.
19 Sender-Initiated Algorithms (cont.) Threshold: Polling a node (selected at random) to determine whether it is a suitable receiver. If so the task is transferred to the selected node, which must execute the task regardless its state.
20 Sender-Initiated Algorithms (cont.) Shortest: Choose the best receiver for a task. A number of nodes (= PollLimit) are selected at random and are polled to determine their queue length -> choose the node with the shortest queue length as the destination for transfer unless its queue length >= T The destination node will execute the task regardless of its queue length at the time of arrival of the transferred task.
21 Sender-Initiated Algorithms (cont.) Information policy: demand-driven type. Stability: cause system instability at high system load, where no node is likely to be lightly loaded -> the probability that a sender will succeed in finding a receiver node is very low.
22 Receiver-Initiated Algorithms The load distributing activity is initiated from an underloaded node (receiver) that is trying to ontain a task from an overloaded node (sender). Transfer policy: threshold policy based on CPU queue length. Selection policy: any of the approaches discussed earlier. Location policy:
23 Receiver-Initiated Algorithms (cont.)
24 Receiver-Initiated Algorithms (cont.) Information policy: demand-driven type. Stability: do not cause system instability. a drawback: most transfer are preemptive.
25 Comparison of Sender-Initiated and Receiver-Initiated Algorithms
26 Symmetrically Initiated Algorithms Both senders and receivers search for receivers and senders. Advantages and disadvantages of both sender and receiver initiated algorithms. Above average algorithm.
27 The above average algorithm Proposed by Krueger and Finkel Try to maintain the load at each node within an acceptable range of system average
28 Transfer Policy Use two adaptive thresholds: equidistant from the node’s estimate of the average load across all nodes (average load is 2 the lower threshold = 1 and the upper threshold = 3). A node whose load is greater than upper threshold a sender, and load is less than lower threshold a receiver. Nodes that have loads between these thresholds lie within the acceptable range, so they are neither senders nor receivers.
29 Location Policy The location policy has the following two components: Sender-initiated component Receiver-initiated component
30 Sender-initiated component TooHigh Waiting Accept SENDER RECEIVER TooHigh TooLow Accept Load ++ AwaitingTask Transfer task No accept ChangeAverage
31 Receiver-initiated component A node, on becoming a receiver, broadcasts a TooLow message, set a TooLow timeout, and starts listening for a TooHigh message. If a TooHigh message is received, the receiver performs the same actions that it does under sender-initiated negotiation If the TooLow timeout expires before receiving any TooHigh message, the receiver broadcasts a ChangeAverage message to decrease the average load estimate at the other nodes
32 Selection and Information Policy Selection policy: this algorithm can make use of any of the approaches discussed earlier. Information policy: demand-driven.
33 Adaptive Algorithms A stable symmetrically initiated algorithm A stable sender initiate algorithm
34 A stable symmetrically initiated algorithm Instability in previous algorithms is indiscriminate polling by sender’s negotiation component. Utilize the information gathered during polling to classify the nodes in the system as either Sender/overloaded, Receiver/underloaded, or OK. The knowledge concerning the state of node is maintained by a data structure at each node: a sender list, a receiver list, and an OK list. Initially, each node assumes that every other node is a receiver.
35 Transfer policy A threshold policy where decisions are based on CPU queue length. Trigger when a new task originates or when a task departs. Two threshold values: a lower threshold (LT), an upper threshold (UT). A node is said to be a sender if its queue length > UT, a receiver if its queue length < LT, and OK if LT ≤ node’s queue length ≤ UT.
37 Sender initiated component Sender Receiver ID_R ID_S RECV ID_C SENDOK SENDRECV is receiver? Inform its status transfer task ID_R Sender or OK ID_RID_C poll is receiver? Remove ID_S status Receiver
38 Receiver initiated component SenderReceiver ID_R RECVSENDOK SENDRECV is sender? Transfer task and inform its after status Sender ID_S Remove ID_R status Receiver or OK Inform its current status ID_S ID_C
39 Selection and Information Policy Selection policy: The sender initiated component considers only newly arrived tasks for transfer. The receiver initiated component can make use of any of the approaches discussed earlier. Information policy: demand-driven.
40 A stable sender initiated algorithm Two desirable properties: It does not cause instability Load sharing is due to non-preemptive transfer (which are cheaper) only. This algorithm uses the sender initiated load sharing component of the stable symmetrically initiated algorithm as is, but has a modified receiver initiated component to attract the future non-preemptive task transfers from sender nodes.
41 A stable sender initiated algorithm The data structure (at each node) of the stable symmetrically initiated algorithm is augmented by a array called statevector. The statevector is used by each node to keep track of which list (senders, recevers, or OK) it belongs to at all the other nodes in the system. When a sender polls a selected node, the sender’s statevector is updated to reflect that the sender now belongs the senders list at the selected node, the polled node update its statevector based on the reply it sent to the sender node to reflect which list it will belong to at the sender
42 A stable sender initiated algorithm The receiver initiated component is replaced by the following protocol: When a node becomes a receiver, it informs all the nodes that are are misinformed about its curren state. The misinformed node are those nodes whose receivers lists do not contain the receiver’s ID. The statevector at the receiver is then updated to reflect that it now belongs to the receivers list at all those nodes that were informed of its current state. By this technique, this algorithm avoids the receivers sending broadcast messages to inform other nodes that they are receivers. No preemptive transfers of partly executed tasks here.
48 Introduction Task migration is the movement of an executing task from one host processor (source) in distributed computing system to another (destination). Task placement is the selection of a host for a new task and the creation of the task on that host.
49 Benefits of task migration Load balancing – improve performance for a distributed computing system overall, or a distributed application, by spreading the load more evenly over a set of host. Reduction in communication overhead – by locating on one host a group of tasks with intensive communication amongst them.
50 Benefits of task migration (cont) Resource access – not all resources are available across the network; a task may need to migrate in order to access a special device, or to satisfy a need for a large amount of physical memory. Fault-tolerance – allowing long running processes to survive the planned shutdown or failure of a host.
51 Steps involved in task migration Suspending (freezing) the task on the source Extracting and transmitting the state of the task to destination Reconstructing the state on the destination Deleting the task on the source and resuming the task’s execution on the destination
52 Issues in task migration State transfer Location transparency Structure of a migration mechanism
53 State transfer The cost to support remote execution Freezing the task (as little time as possible) Obtaining and transferring the state Unfreezing the task Residual dependencies – refer to the amount of resources a host of a migrated task continues to dedicate to service requests from the migrated task. They are undesirable for three reasons: reliability, performance and complexity.
54 State transfer mechanisms Precopying the state: bulk of the task state is copied to the new host before freezing the task Location-transparent file access mechanism Copy-on-reference: just copy what is migrated task need for its execution
55 Location Transparency Task migration should hide the locations of tasks. Location transparency in principle requires that names (process name, file names) be independent of their locations (host names). uniform name space throughout the system
56 Structure of a migration mechanism Typically, there will be interaction between the task migration mechanism, the memory management system, the interprocess communication mechanisms, and the file system. The mechanisms can be designed to be independent of one another so that if one mechanism’s protocol changes; the other’s need not the migration mechanism can be turned off without interfering with other mechanisms.