Scheduling in Distributed Systems There is not really a lot to say about scheduling in a distributed system. Each processor does its own local scheduling.

Scheduling in Distributed Systems There is not really a lot to say about scheduling in a distributed system. Each processor does its own local scheduling (assuming that it has multiple processes running on it), without regard to what the other processors are doing. However, when a group of related, heavily interacting processes are all running on different processors, independent scheduling is not always the most efficient way. See the example below:

Scheduling in Distributed Systems (continue) Although it is difficult to determine dynamically the inter-process communication patterns, in many cases, a group of related processes will be started off together. We can assume that processes are created in groups and that intra-group communication is much more prevalent than inter-group communication. We can further assume that a sufficiently large number of processors is available to handle the largest group, and that each processor is multi-programmed with N process slots. Ousterhout’s co-scheduling: –The idea is to have each processor use a round-robin scheduling algorithm with all processors first running the process in slot 0 for a fixed period, then all processors switch to slot 1 and run for a fixed period and so on. –A broadcast message could be used to tell each processor when to do process switching, to keep the time slices synchronized. –By putting all the members of a process group in the same slot number, but on different processors, one has the advantage of N-fold parallelism, with a guarantee all the processes will be run at the same time, to maximize communication throughput

Fault Tolerance A system is said to fail when it does not meet its specification. Examples of failures: supermarket’s distributed ordering systems, distributed air traffic control system (safety-critical system). Component Faults: Computer systems can fail due to a fault in some component, such as a processor, memory, I/O device, cable, or software. A fault is a malfunction, possibly caused by a design error, a manufacturing error, a programming error, physical damage, deterioration in the course of time, harsh environmental conditions, unexpected inputs, operator error, and many other causes. Faults are generally classified as: –Transient fault: occur once and then disappear. If the operation is repeated, the fault goes away for the second try. –Intermittent fault: it occurs, then vanishes, then reappears, and so on. Like a loose contact on a connector. Intermittent faults cause a great deal of aggravation because they are difficult to diagnose. –Permanent fault: one that continue to exist until the faulty component is repaired or replaced. Like burnt-out chips, software bugs, and disk head crashes.

The goal of designing and building fault-tolerance systems is to ensure that the system as a whole continues to function correctly, even in the presence of faults. This aim is quite different from simply engineering the individual components to be highly reliable, but allowing the system to fail if one of the components does so. Traditional work in the area of fault tolerance has been by statistical analysis. Very briefly, if some component has a probability p of malfunctioning in a given second of time, the probability of it not failing for k consecutive seconds and then failing is: p(1-p)^k; and the mean time to failure = 1/ p. For example, if the probability of a crash is 10^-6 per second, the mean time to failure is 10^6 second or about 11.6 days. Fault Tolerance (continue)

System Failures In a critical distributed system, often we are interested in making the system be able to survive component (in particular, processor) faults. System reliability is especially important in a distributed system due to the large number of components present, hence the greater chance of one of them being faulty. Two types of processor faults: –Fail-silent faults: a faulty processor just stops and does not respond to subsequent input or produce further output, except perhaps to announce that it is no longer functioning. They are also called fail-stop faults. –Byzantine faults:a faulty processor continues to run, issuing wrong answers to questions, and possibly working together maliciously with other faulty processors to give the impression that they are all working correctly. –Dealing with Byzantine faults is going to be much more difficult than dealing with fail-silent faults. –The term “Byzantine” refers to the Byzantine Empire (330-1453 at around modern Turkey) in which endless conspiracies, intrigue, and untruthfulness were alleged to be common in ruling circles.

Synchronous versus Asynchronous Systems Component faults: Transient, Intermittent, and Permanent; System faults: Fail-silent, and Byzantine. We now look at the faults in a synchronous system versus an asynchronous system. Suppose that we have a system in which if one processor sends a message to another, it is guaranteed to get a reply within a time T known in advance. Failure to get a reply means that the receiving system has crashed. The time T includes sufficient time to deal with lost messages (by sending then up to n times). On fault tolerance terms, a system that has the property of always responding to a message within a known finite bound if it is working is said to be synchronous. A system not having this property is said to be asynchronous. While this terminology is unfortunate, it is widely used among workers in fault tolerance. Dealing with asynchronous system fault is harder than with synchronous system faults because it is hard to tell that a processor is not responding or it is just slow.

Use of Redundancy The general approach to fault tolerance is to use redundancy. Three kinds of redundancies: –Information redundancy: extra bits are attached to the information to allow recovery from garbled bits --- Hamming code for error correction. –Time redundancy: an action is performed, and then, if need be, it is performed again. Using atomic transactions is an example of this approach. If a transaction aborts, it can be redone with no harm. Time redundancy is especially helpful when the faults are transient or intermittent. –Physical redundancy: extra equipment is added to make it possible for the system as a whole to tolerate the loss or malfunctioning of some components. –We have two ways to organize these extra processors, and consider the case of a server: Active replication: all the processors are used all the time as servers (in parallel) in order to hide faults completely. Primary backup: uses one processors as as server, replacing it with a backup if it fails.

Error Detection and Error Correction msblsbparity 1011001 msblsbparity 10110010 Transmit: msblsbparity 10100010 If Received: msblsbparity 10110011 If Received: Parity-check = m7+m6+m5+m4+m3+m2+m1+parity-bit Parity Check: 7 data bit, 1 parity bit check for detecting single bit or odd number of error. For example, parity bit = m7 + m6 + m5 + m4 + m3 + m2 + m1 = 0

Error Correction (Hamming Code) Hamming code (3, 1) –if “0”, we send “000”, if “1”, we send “111”. –For error patterns: 001, 010, 100, it will change 000 to 001, 010, 100, or change 111 to 110, 101, 011 –Hence if this code is used to do error correction, all single errors can be corrected. But double errors (error patterns 110, 101, 011) cannot be corrected. However, these double error can be detected). –Hamming code in general (3,1), (7, 4), (15, 11), (31, 26),... –Why can hamming code correct single error? Each message bit position (including the hamming code) is checked by some parity bits. If single error occurs, that implies some parity bits will be wrong. The collection of parity bits indicate the position of error. –How many parity bit is needed? –2 r >= (m + r + 1) where m = number of message bits; r = number of parity bit, and the 1 is for no error.

Hamming Codes (Examples) Hamming code (7, 4) 1 1 1 1 0 0 0 0 P4 1 1 0 0 1 1 0 0 P2 1 0 1 0 1 0 1 0 P1 Hamming code (15, 11) 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 P8 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 P4 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 P2 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 P1

Hamming Code (continue..) Assume m message bits, and r parity bits, the total number of bits to be transmitted is (m + r). A single error can occur in any of the (m + r) position, and the parity bit should also be include the case when there is no error. Therefore, we have 2 r >= (m + r + 1). As an example, we are sending the string “0110”, where m = 4, hence, we need 3 bits for parity check. The message to be sent is: m 7 m 6 m 5 P 4 m 3 P 2 P 1 where m 7 =0, m 6 =1, m 5 =1, and m 3 =0. Compute the value of the parity bits by: P 1 = m 7 + m 5 + m 3 = 1 P 2 = m 7 + m 6 + m 3 = 1 P 4 = m 7 + m 6 + m 5 = 0 Hence, the message to be sent is “0110011”.

Hamming Code (continue..) Say for example, if during the transmission, an error has occurred at position 6 from the right, the receiving message will now become “0010011”. To detect and correct the error, compute the followings: For P 1, compute m 7 + m 5 + m 3 + P 1 = 0 For P 2, compute m 7 + m 6 + m 3 + P 2 = 1 For P 4, compute m 7 + m 6 + m 5 + P 4 = 1 If (P 4 P 2 P 1 = 0) then there is no error else P 4 P 2 P 1 will indicate the position of error. With P 4 P 2 P 1 = 110, we know that position 6 is in error. To correct the error, we change the bit at the 6th position from the right from ‘0’ to ‘1’. That is the string is changed from “0010011” to “0110011” and get back the original message “0110” from the data bits m 7 m 6 m 5 m 3.

Active Replication Active replication is a well-known technique for providing fault tolerance using physical redundancy. It is used in biology (mammals have two eyes, two ears, two lungs, etc.), aircraft (747s have 4 engines but can fly on three), and sports (multiple referees in basketball games). Fault tolerance in electronics circuits: In Figure 4-21(b), each device is replicated 3 times. Following each stage in the circuit is a triplicated voter. Each voter is a circuit that has 3 inputs and 1 output. If 2 or 3 of the inputs are the same, the output is equal to that input. If all 3 inputs are different, the output is undefined. This kind of design is known as TMR (Triple Modular Redundancy). A1 V1 A2 V2 A3 V3 B1 V4 B2 V5 B3 V6 C1 V7 C2 V8 C3 V9

General issues in active replication Servers act like big finite state machines: they accept requests and produce replies. If each client request is sent to each server, and they all are received and processed in the same order, then after processing each one, all nonfaulty servers will be in exactly the same state and will give the same replies. A system is said to be k fault tolerant if it can survive faults in k components and still meet its specifications. For fail-silent components, having k+1 of them is enough to provide k fault tolerance. If k simply stop, then the answer from the remaining one can be used. If the components exhibit Byzantine failures, continuing to run when sick and sending out erroneous or random replies, a minimum of 2k+1 processors are needed to achieve k fault tolerance. An implicit pre-condition for this finite state machine model to be relevant is that all requests arrive at all servers in the same order, is called the atomic broadcast problem. One way to make sure that all requests are processed in the same order at all server is to number them globally. For example, all requests could first be sent to a global number server to get a serial number, but then provision would have to be made for the failure of this serial number server. Another way is to use Lamport’s logical clocks and give every request a unique timestamp. The only problem for timestamping is that when a server receives a request, it does not know whether any earlier requests are currently under way.

Primary Backup The idea of the primary-backup method is that at any one instant, one server is the primary and does all the work. If primary fails, the backup takes over. Ideally, the cutover should take place in a clean way and be noticed only by the client operating system, not by the application programs. Some examples: government (vice-president), aviation (co-pilot), automobiles (spare tires), and diesel-powered electrical generators in hospital operating theater. Primary backup has two advantages over active replication: –It is simpler during normal operation since messages go to just one primary server and not to the whole group. Message ordering problem also disappear. –It requires fewer machines, because at any instant, one primary and one backup is needed (although you might need a third machine when the backup is put into service). On the other hand, primary backup works poorly in the presence of Byzantine failures in which the primary claims to be working properly.

If the primary crashes before doing the work (step 2), no harm is done. The client will time out and retry. If it tries often enough, it will eventually get the backup and the work will be done exactly once. If the primary crashes after doing the work but before sending the update, when the backup takes over and the request comes in again, the work will be done a second time. If the work has side effects, this could be a problem. If the primary crashes after step 4 but before step 6, the work may end up being done three times, once by the primary, once by the backup as a result of step 3, and once after the backup becomes the primary. If requests carry identifiers, it may be possible to ensure that the work is done only twice, but getting it done exactly once is difficult to impossible. Primary Backup (continue) Client Primary Backup 1. Request3. Update 6. Reply5. Ack 2. Do work4. Do work

One theoretical and practical problem with the primary backup approach is when to cut over from the primary to the backup. In our protocol, the backup could send: “Are you alive?” messages periodically to the primary. If the primary fails to respond within a certain time, the backup would take over. However, what happens if the primary has not crashed, but is just slow? There is no way to distinguish between a slow primary and one that has gone down. There is a need to make sure that when the backup takes over, the primary really stop trying to act like the primary. The best solution is a hardware mechanism in which the backup can forcibly stop or reboot the primary. A variant of the approach of Figure 4-22 uses a dual-ported disk shared between the primary and the backup. When the primary gets a request, it writes the request to disk before doing any work and also writes the results to disk. No messages to or from the backup are needed. If the primary crashes, the backup can see the state of the world by reading the disk. The disadvantage of this scheme is that there is only one disk. Primary Backup (continue.)

Agreement in Faulty Systems In many distributed systems there is a need to have processes agree on something: electing a co-ordinator, deciding whether to commit a transaction or not, etc. The general goal of distributed agreement algorithms is to have all the nonfaulty processors reach consensus on some issue, and do that within a finite number of steps. Different cases are possible depending on system parameters, including: –Are messages delivered reliably all the time? –Can processes crash, and if so, fail-silent or Byzantine? –In the system synchronous or asynchronous? When the communication and processors are all perfect, reaching such agreement is often straightforward, but when they are not, problem arise. So let’s look at the following problems: –Perfect processors but unreliable communication -- Two-army problem. –Faulty Processors but reliable communication -- Byzantine generals problem –Faulty Processor and unreliable communication -- I consider this a hopeless situation!

Two-army Problem The red army, with 5000 troops, is encamped in a valley. Two blue armies, each 3000 strong, are encamped on the surrounding hillsides overlooking the valley. If the two blue armies can co-ordinate their attacks on the red army, they will be victorious. However, if either one attacks by itself, it will be slaughtered. The goal of the blue armies is to reach agreement about attacking. The catch is that they can only communicate using an unreliable channel: sending a messager down the hill, through the valley and to the other side, subjected to capture by the red army. This problem is proven to be unsolvable. The Proof: Given a protocol that can solve the problem, throw away any redundant or extra messages so that the resultant protocol is the minimum protocol and that all the messages are crucial to the protocol. With this minimum protocol, there must be a last message to be delivered from the sender to the receiver. If this message fails to arrive, the protocol will fail. However, there is no guaranteed that the receiver will receive the message. Hence, no protocol can exist with these conditions.

Byzantine Generals Problems The red army is still encamped in the valley, but n blue generals all head armies on the nearby hills. Communication is done pairwise by telephone and is perfect, but m of the generals are traitors (faulty) and are actively trying to prevent the loyal generals from reaching agreement by feeding them incorrect and contradictory information. The question is now whether the loyal generals can still reach agreement. We define agreement as by the end of the exchange of messages, whether every general know each other’s strength in terms of troops. In the example below, Loyal general tell the truth; traitor may tell every other general a different lie. We have General 1 has 1K troops, General 2 has 2K, and so on to simplify the problem. (n = 4 and m = 1; that is total of 4 generals and 1 is a traitor) So we follow these steps: –1. Every general sends a message to every other general announcing his true strength; –2. Every general, upon receiving the announcement of step 1, make up a message which contains the number of troops each general has. –3. Send the composed message of step 2 to every other general –4. Each general examine the receiving composed messages and try to make a conclusion out of these messages -- by simple majority rules.

Byzantine Generals Problems (continue)

Lamport proved that in a system with m faulty processors, agreement can be achieved only if 2m+1 correctly functioning processors are present, for a total of 3m+1. In other words, agreement is possible only if more than two-thirds of the processors are working properly.

Scheduling in Distributed Systems There is not really a lot to say about scheduling in a distributed system. Each processor does its own local scheduling.

Similar presentations

Presentation on theme: "Scheduling in Distributed Systems There is not really a lot to say about scheduling in a distributed system. Each processor does its own local scheduling."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scheduling in Distributed Systems There is not really a lot to say about scheduling in a distributed system. Each processor does its own local scheduling.

Similar presentations

Presentation on theme: "Scheduling in Distributed Systems There is not really a lot to say about scheduling in a distributed system. Each processor does its own local scheduling."— Presentation transcript:

Similar presentations

About project

Feedback