Presentation is loading. Please wait.

Presentation is loading. Please wait.

UNIT-II Distributed Synchronization 1 Mutual exclusion Mutual exclusion : makes sure that concurrent process access shared resources or data in a serialized.

Similar presentations

Presentation on theme: "UNIT-II Distributed Synchronization 1 Mutual exclusion Mutual exclusion : makes sure that concurrent process access shared resources or data in a serialized."— Presentation transcript:


2 UNIT-II Distributed Synchronization 1

3 Mutual exclusion Mutual exclusion : makes sure that concurrent process access shared resources or data in a serialized way. If a process, say P i, is executing in its critical section, then no other processes can be executing in their critical sections Example: – updating a DB – Directory management – sending control signals to an IO device 2

4 Mutual Exclusion Algorithms 3 Non-token based: A site/process can enter a critical section when an assertion (condition) becomes true. Algorithm should ensure that the assertion will be true in only one site/process. Token based: A unique token (a known, unique message) is shared among cooperating sites/processes. Possessor of the token has access to critical section. Need to take care of conditions such as loss of token, crash of token holder, possibility of multiple tokens, etc.

5 General System Model At any instant, a site may have several requests for critical section (CS), queued up, and serviced one at a time. Site States: Requesting CS, executing CS, idle (neither requesting nor executing CS). Requesting CS: blocked until granted access, cannot make additional requests for CS. Executing CS: using the CS. Idle: In token-based approaches, idle site can have the token. 4

6 Mutual Exclusion: Requirements Freedom from deadlocks: two or more sites should not endlessly wait on conditions/messages that never become true/arrive. Freedom from starvation: No indefinite waiting. Fairness: Order of execution of CS follows the order of the requests for CS. (equal priority). Fault tolerance: recognize faults, reorganize, continue. (e.g., loss of token). 5

7 Performance Number of messages per CS invocation: should be minimized. Synchronization delay, i.e., time between the leaving of CS by a site and the entry of CS by the next one: should be minimized. Response time: time interval between request messages transmissions and exit of CS. System throughput, i.e., rate at which system executes requests for CS: should be maximized. If sd is synchronization delay, E the average CS execution time: system throughput = 1 / (sd + E). 6

8 Performance metrics 7 Last site exits CS Next site enters CS Synchronization delay Time CS Request arrives Messages sent Enter CS Exit CS E Response Time

9 Performance... Low and High Load: –Low load: No more than one request at a given point in time. –High load: Always a pending mutual exclusion request at a site. Best and Worst Case: –Best Case (low loads): Round-trip message delay + Execution time. 2T + E. –Worst case (high loads). Message traffic: low at low loads, high at high loads. Average performance: when load conditions fluctuate widely. 8

10 Simple Solution Control site: grants permission for CS execution. A site sends REQUEST message to control site. Controller grants access one by one. Synchronization delay: 2T -> A site release CS by sending message to controller and controller sends permission to another site. System throughput: 1/(2T + E). If synchronization delay is reduced to T, throughput doubles. Controller becomes a bottleneck, congestion can occur. 9

11 Non-token Based Algorithms Notations: –Si: Site I –Ri: Request set, containing the ids of all Sis from which permission must be received before accessing CS. –Non-token based approaches use time stamps to order requests for CS. –Smaller time stamps get priority over larger ones. Lamports Algorithm –Ri = {S1, S2, …, Sn}, i.e., all sites. –Request queue: maintained at each Si. Ordered by time stamps. –Assumption: message delivered in FIFO. 10

12 Lamports Algorithm Requesting CS: –Send REQUEST(tsi, i). (tsi,i): Request time stamp. Place REQUEST in request_queuei. –On receiving the message; sj sends time-stamped REPLY message to si. Sis request placed in request_queuej. Executing CS: –Si has received a message with time stamp larger than (tsi,i) from all other sites. –Sis request is the top most one in request_queuei. Releasing CS: –Exiting CS: send a time stamped RELEASE message to all sites in its request set. –Receiving RELEASE message: Sj removes Sis request from its queue. 11

13 Lamports Algorithm… Performance. –3(N-1) messages per CS invocation. (N - 1) REQUEST, (N - 1) REPLY, (N - 1) RELEASE messages. –Synchronization delay: T 12

14 Lamports Algorithm: Example-1 13 (2,1) (1,2) S1 S2 S3 (1,2) (2,1) S1 S2 S3 Step 1: Step 2: S2 enters CS (1,2) (2,1)

15 Lamports: Example… 14 (1,2) (2,1) S1 S2 S3 Step 3: (1,2) (2,1) S2 leaves CS (1,2) (2,1) S1 S2 S3 Step 4: (1,2) (2,1) S1 enters CS (2,1)

16 Example-2 15

17 Ricart-Agrawala Algorithm Requesting critical section –Si sends time stamped REQUEST message –Sj sends REPLY to Si, if Sj is not requesting nor executing CS If Sj is requesting CS and Sis time stamp is smaller than its own request. Request is deferred(postponed) otherwise. Executing CS: after it has received REPLY from all sites in its request set. Releasing CS: Send REPLY to all deferred requests. i.e., a sites REPLY messages are blocked only by sites with smaller time stamps 16

18 Ricart-Agrawala: Performance Performance: –2(N-1) messages per CS execution. (N-1) REQUEST + (N-1) REPLY. –Synchronization delay: T. 17

19 Ricart-Agrawala: Example 18 (2,1) (1,2) S1 S2 S3 (2,1) S1 S2 S3 Step 1: Step 2: S2 enters CS

20 Ricart-Agrawala: Example… 19 (2,1) S1 S2 S3 Step 3: S1 enters CS S2 leaves CS

21 Maekawas Algorithm A site requests permission only from a subset of sites. Request set of sites si & sj: Ri, Rj such that Ri and Rj will have atleast one common site (Sk). Sk mediates conflicts between Ri and Rj. A site can send only one REPLY message at a time, i.e., a site can send a REPLY message only after receiving a RELEASE message for the previous REPLY message. Request Sets Rules: –Sets Ri and Rj have atleast one common site. –Si is always in Ri. –Cardinality of Ri, i.e., the number of sites in Ri is K. –Any site Si is in K number of Ris. N = K(K - 1) + 1 -> K = square root of N. 20

22 Maekawas Algorithm... Requesting CS –Si sends REQUEST(i) to sites in Ri. –Sj sends REPLY to Si if Sj has NOT sent a REPLY message to any site after it received the last RELEASE message. Otherwise, queue up Sis request. Executing CS: after getting REPLY from all sites in Ri. Releasing CS –send RELEASE(i) to all sites in Ri –Any Sj after receiving RELEASE message, send REPLY message to the next request in queue. –If queue empty, update status indicating receipt of RELEASE. 21

23 Maekawas Algorithm... Performance –Synchronization delay: 2T –Messages: 3 times square root of N (one each for REQUEST, REPLY, RELEASE messages) Deadlocks –Message deliveries are not ordered. –Assume Si, Sj, Sk concurrently request CS –Ri intersection Rj = {Sij}, Rj Rk = {Sjk}, Rk Ri = {Ski} –Possible that: Sij is locked by Si (forcing Sj to wait at Sij) Sjk by Sj (forcing Sk to wait at Sjk) Ski by Sk (forcing Si to wait at Ski) -> deadlocks among Si, Sj, and Sk. 22

24 Token-based Algorithms Unique token circulates among the participating sites. A site can enter CS if it has the token. Token-based approaches use sequence numbers instead of time stamps. –Request for a token contains a sequence number. –Sequence number of sites advance independently. Correctness issue is trivial since only one token is present -> only one site can enter CS. Deadlock and starvation issues to be addressed. 23

25 Suzuki-Kasami Algorithm If a site without a token needs to enter a CS, broadcast a REQUEST for token message to all other sites. Token: (a) Queue of request sites (b) Array LN[1..N], the sequence number of the most recent execution by a site j. Token holder sends token to requestor, if it is not inside CS. Otherwise, sends after exiting CS. Token holder can make multiple CS accesses. Design issues: –Distinguishing outdated REQUEST messages. Format: REQUEST(j,n) -> jth site making nth request. Each site has RNi[1..N] -> RNi[j] is the largest sequence number of request from j. –Determining which site has an outstanding token request. If LN[j] = RNi[j] - 1, then Sj has an outstanding request. 24

26 Suzuki-Kasami Algorithm... Passing the token –After finishing CS –(assuming Si has token), LN[i] := RNi[i] –Token consists of Q and LN. Q is a queue of requesting sites. –Token holder checks if RNi[j] = LN[j] + 1. If so, place j in Q. –Send token to the site at head of Q. Performance –0 to N messages per CS invocation. –Synchronization delay is 0 (if the token holder repeats CS) or T. 25

27 Example req=[1,0,0,0,0] last=[0,0,0,0,0] req=[1,0,0,0,0] initial state

28 Example req=[1,1,1,0,0] last=[0,0,0,0,0] req=[1,1,1,0,0] 1 & 2 send requests

29 Example req=[1,1,1,0,0] last=[1,0,0,0,0] Q=(1,2) req=[1,1,1,0,0] 0 prepares to exit CS

30 Example req=[1,1,1,0,0] last=[1,0,0,0,0] Q=(2) req=[1,1,1,0,0] 0 passes token to 1

31 Example req=[2,1,1,1,0] last=[1,0,0,0,0] Q=(2,0,3) req=[2,1,1,1,0] 0 and 3 send requests

32 Example req=[2,1,1,1,0] last=[1,1,0,0,0] Q=(0,3) req=[2,1,1,1,0] 1 sends token to 2

33 Raymonds Algorithm Sites are arranged in a logical directed tree. Root: token holder. Edges: directed towards root. Every site has a variable holder that points to an immediate neighbor node, on the directed path towards root. (Roots holder point to itself). Requesting CS –If Si does not hold token and request CS, sends REQUEST upwards provided its request_q is empty. It then adds its request to request_q. –Non-empty request_q -> REQUEST message for top entry in q (if not done before). –Site on path to root receiving REQUEST -> propagate it up, if its request_q is empty. Add request to request_q. –Root on receiving REQUEST -> send token to the site that forwarded the message. Set holder to that forwarding site. –Any Si receiving token -> delete top entry from request_q, send token to that site, set holder to point to it. If request_q is non-empty now, send REQUEST message to the holder site. 32

34 Raymonds Algorithm … Executing CS: getting token with the site at the top of request_q. Delete top of request_q, enter CS. Releasing CS –If request_q is non-empty, delete top entry from q, send token to that site, set holder to that site. –If request_q is non-empty now, send REQUEST message to the holder site. Performance –Average messages: O(log N) as average distance between 2 nodes in the tree is O(log N). –Synchronization delay: (T log N) / 2, as average distance between 2 sites to successively execute CS is (log N) / 2. –Greedy approach: Intermediate site getting the token may enter CS instead of forwarding it down. Affects fairness, may cause starvation. 33

35 Raymonds Algorithm: Example 34 S1 S4 S5 S2 S7 S3 S6 Token holder Token request S1 S4 S5 S2 S7 S3 S6 Step 1: Step 2: Token

36 Raymonds Algm.: Example… 35 S1 S4 S5 S2 S7 S3 S6 Step 3: Token holder

37 Example- 1,4 4, ,4,7 want to enter their CS

38 Raymonds Algorithm 1,4 4, sends the token to 6

39 Raymonds Algorithm 4 4,7 4 6 forwards the token to 1 4

40 Singhals Heuristic Algorithm Instead of broadcast: each site maintains information on other sites, guess the sites likely to have the token. Data Structures: –Si maintains SVi[1..N] and SNi[1..N] for storing information on other sites: state and highest sequence number. –Token contains 2 arrays: TSV[1..N] and TSN[1..N]. –States of a site R : requesting CS E : executing CS H : Holding token, idle N : None of the above –Initialization: SVi[j] := N, for j = N.. i; SVi[j] := R, for j = i-1.. 1; SNi[j] := 0, j = 1..N. S1 (Site 1) is in state H. Token: TSV[j] := N & TSN[j] := 0, j = 1.. N. 39

41 Singhals Heuristic Algorithm … Requesting CS –If Si has no token and requests CS: SVi[i] := R. SNi[i] := SNi[i] + 1. Send REQUEST(i,sn) to sites Sj for which SVi[j] = R. (sn: sequence number, updated value of SNi[i]). –Receiving REQUEST(i,sn): if sn <= SNj[i], ignore. Otherwise, update SNj[i] and do: SVj[j] = N -> SVj[i] := R. SVj[j] = R -> If SVj[i] != R, set it to R & send REQUEST(j,SNj[j]) to Si. Else do nothing. SVj[j] = E -> SVj[i] := R. SVj[j] = H -> SVj[i] := R, TSV[i] := R, TSN[i] := sn, SVj[j] = N. Send token to Si. Executing CS: after getting token. Set SVi[i] := E. 40

42 Singhals Heuristic Algorithm … Releasing CS –SVi[i] := N, TSV[i] := N. Then, do: For other Sj: if (SNi[j] > TSN[j]), then {TSV[j] := SVi[j]; TSN[j] := SNi[j]} else {SVi[j] := TSV[j]; SNi[j] := TSN[j]} –If SVi[j] = N, for all j, then set SVi[i] := H. Else send token to a site Sj provided SVi[j] = R. Fairness of algorithm will depend on choice of Si, since no queue is maintained in token. Arbitration rules to ensure fairness used. Performance –Low to moderate loads: average of N/2 messages. –High loads: N messages (all sites request CS). –Synchronization delay: T. 41

43 Singhal: Example 42 Assume there are 3 sites in the system. Initially: Site 1: SV1[1] = H, SV1[2] = N, SV1[3] = N. SN1[1], SN1[2], SN1[3] are 0. Site 2: SV2[1] = R, SV2[2] = N, SV2[3] = N. SNs are 0. Site 3: SV3[1] = R, SV3[2] = R, SV3[3] = N. SNs are 0. Token: TSVs are N. TSNs are 0. Assume site 2 is requesting token. S2 sets SV2[2] = R, SN2[2] = 1. S2 sends REQUEST(2,1) to S1 (since only S1 is set to R in SV[2]) S1 receives the REQUEST. Accepts the REQUEST since SN1[2] is smaller than the message sequence number. Since SV1[1] is H: SV1[2] = R, TSV[2] = R, TSN[2] = 1, SV1[1] = N. Send token to S2 S2 receives the token. SV2[2] = E. After exiting the CS, SV2[2] = TSV[2] = N. Updates SN, SV, TSN, TSV. Since nobody is REQUESTing, SV2[2] = H. Assume S3 makes a REQUEST now. It will be sent to both S1 and S2. Only S2 responds since only SV2[2] is H (SV1[1] is N now).

44 Comparison 43 Non-Token Resp. Time(ll)Sync. DelayMessages(ll)Messages(hl) Lamport2T+ET3(N-1)3(N-1) Ricart-Agrawala2T+ET2(N-1)2(N-1) Maekawa2T+E2T3*sq.rt(N)5*sq.rt(N) Token Resp. Time(ll)Sync. DelayMessages(ll)Messages(hl) Suzuki-Kasami2T+ETNN Singhal2T+ETN/2N RaymondT(log N)+ETlog(N)/2log(N)4

45 Deadlock A deadlock is a situation in which two or more competing actions are each waiting for the other to finish, and thus neither ever does. a deadlock is a situation which occurs when a process enters a waiting state because a resource requested by it is being held by another waiting process, which in turn is waiting for another resource. 44

46 Cont.. If a process is unable to change its state indefinitely because the resources requested by it are being used by another waiting process, then the system is said to be in a deadlock. Deadlock is a common problem in multiprocessing systems, parallel computing and distributed systems. 45

47 Example- 46

48 Example- Suppose a computer has three CD drives and three processes. Each of the three processes holds one of the drives. If each process now requests another drive, the three processes will be in a deadlock. Each process will be waiting for the "CD drive released" event, which can be only caused by one of the other waiting processes. Thus, it results in a circular chain. 47

49 Necessary conditions A deadlock situation can arise if and only if all of the following conditions hold simultaneously in a system: –Mutual Exclusion –Hold and Wait or Resource Holding: –No Preemption –Circular Wait 48


51 System Model System have Only Reusable Resources Processes are allowed only exclusive access to resources. There is Only One Copy of each resource Process can be in Running state or Blocked 50

52 Deadlocks in Distributed Systems Deadlocks in distributed systems are similar to deadlocks in single processor systems, –They are harder to avoid, prevent or even detect. –They are hard to cure when tracked down because all relevant information is scattered over many machines. 51 Tulika Ringan (AL_IT)

53 Types of Deadlocks People sometimes might classify deadlock into the following types: –Communication deadlocks -- competing with buffers for send/receive –Resources deadlocks -- exclusive access on I/O devices, files, locks, and other resources. We treat everything as resources, there we only have resources deadlocks.

54 Strategies to Handle Deadlock Four best-known strategies to handle deadlocks: –Detection (let deadlocks occur, detect them, and try to recover) –Prevention (statically make deadlocks structurally impossible) –Avoidance (avoid deadlocks by allocating resources carefully) 53

55 54 Prevention Too expensive in time and network traffic in a distributed system Avoidance Determining safe and unsafe states would require a huge number of messages in a DS Detection May be practical, and is primary chapter focus Resolution More complex than in non-distributed systems

56 DS Deadlock Detection Bi-partite graph strategy modified –Use Wait For Graph (WFG or TWF) All nodes are processes (threads) Resource allocation is done by a process (thread) sending a request message to another process (thread) which manages the resource (client - server communication model, RPC paradigm) –A system is deadlocked If and only if there is a directed cycle (or knot) in a global WFG 55 Tulika Ringan (AL_IT)

57 DS Deadlock Detection, Cycle vs. Knot The AND model of requests requires all resources currently being requested to be granted to un-block a computation –A cycle is sufficient to declare a deadlock with this model The OR model of requests allows a computation making multiple different resource requests to un- block as soon as any are granted –A cycle is a necessary condition –A knot is a sufficient condition 56

58 57 P8 P10 P9 P7 P6 P5 P4 P3P3 P2 P1 S1 S3 S2 Deadlock in the AND model; there is a cycle but no knot No Deadlock in the OR model

59 58 P8 P10 P9 P7 P6 P5 P4 P3P3 P2 P1 S1 S3 S2 Deadlock in both the AND model and the OR model; there are cycles and a knot

60 DS Detection Requirements Progress –No undetected deadlocks All deadlocks found Deadlocks found in finite time Safety –No false deadlock detection Phantom deadlocks (false) caused by network latencies Principal problem in building correct DS deadlock detection algorithms 59

61 Resolution Breaking Existing Wait for dependencies in system Rolling back one or more processes that are deadlocked and assigning their resources to blocked processes in the deadlock. When WF dependency is broken the corresponding information should be immediately cleaned up (detection of phantom deadlock). 60

62 Control Framework Approaches to DS deadlock detection fall in three domains: –Centralized control one node responsible for building and analyzing a real WFG for cycles –Distributed Control each node participates equally in detecting deadlocks … abstracted WFG –Hierarchical Control nodes are organized in a tree which tends to look like a business organizational chart 61

63 Total Centralized Control Simple conceptually: –Each node reports to the master detection node –The master detection node builds and analyzes the WFG –The master detection node manages resolution when a deadlock is detected Some serious problems: –Single point of failure –Network congestion issues –False deadlock detection 62

64 Total Centralized Control (cont) The Ho-Ramamoorthy Algorithms –Two phase (can be for AND or OR model) each site has a status table of locked and waited resources the control site will periodically ask for this table from each node the control node will search for cycles and, if found, will request the table again from each node Only the information common in both reports will be analyzed for confirmation of a cycle 63

65 Total Centralized Control (cont) The Ho-Ramamoorthy Algorithms (cont) –One phase (can be for AND or OR model) each site keeps 2 tables; process status and resource status the control site will periodically ask for these tables (both together in a single message) from each node the control site will then build and analyze the WFG, looking for cycles and resolving them when found 64

66 Distributed Control Each node has the same responsibility for, and will expend the same amount of effort in detecting deadlock –The WFG becomes an abstraction, with any single node knowing just some small part of it –Generally detection is launched from a site when some thread at that site has been waiting for a long time in a resource request message 65 Tulika Ringan (AL_IT)

67 Distributed Control Models Four common models are used in building distributed deadlock control algorithms: –Path-pushing path info sent from waiting node to blocking node –Edge-chasing probe messages are sent along graph edges –Diffusion computation echo messages are sent along graph edges –Global state detection sweep-out, sweep-in WFG construction and reduction 66 Tulika Ringan (AL_IT)

68 Path-pushing Obermarcks algorithm for path propagation : (an AND model) –based on a database model using transaction processing –sites which detect a cycle in their partial WFG views convey the paths discovered to members of the (totally ordered) transaction –the highest priority transaction detects the deadlock Ex => T1 => T2 => Ex –Algorithm can detect phantoms due to its asynchronous snapshot method 67 Tulika Ringan (AL_IT)

69 Edge Chasing Algorithms Chandy-Misra-Haas Algorithm (an AND model) –probe messages M(i, j, k) initiated by Pj for Pi and sent to Pk probe messages work their way through the WFG and if they return to sender, a deadlock is detected make sure you can follow the example in Figure 7.1 of the book 68 Tulika Ringan (AL_IT)

70 Chandy-Misra-Haas Algorithm 69 P8 P10 P9 P7 P6 P5 P4 P3P3 P2 P1 Probe (1, 3, 4) Probe (1, 7, 10) Probe (1, 6, 8) Probe (1, 9, 1) S1 S3 S2 P1 launches Tulika Ringan (AL_IT)

71 Edge Chasing Algorithms (cont) Mitchell-Meritt Algorithm (an AND model) –propagates message in the reverse direction –uses public - private labeling of messages –messages may replace their labels at each site –when a message arrives at a site with a matching public label, a deadlock is detected (by only the process with the largest public label in the cycle) which normally does resolution by self - destruct 70 Tulika Ringan (AL_IT)

72 71 P8 P10 P9 P7 P6 P5 P4 P3P3 P2 P1 S1 S3 S2 Public 1=> 3 Private 1 Public 3 Private 3 Public 2 => 3 Private 2 1. P6 initially asks P8 for its Public label and changes its own 2 to 3 2. P3 asks P4 and changes its Public label 1 to 3 3. P9 asks P1 and finds its own Public label 3 and thus detects the deadlock P1=>P2=>P3=>P4=>P5=>P6=>P8=>P9=>P Mitchell-Meritt Algorithm Tulika Ringan (AL_IT)

73 Diffusion Computation Deadlock detection computations are diffused through the WFG of the system –=> are sent from a computation (process or thread) on a node and diffused across the edges of the WFG –When a query reaches an active (non-blocked) computation the query is discarded, but when a query reaches a blocked computation the query is echoed back to the originator when( and if) all outstanding => of the blocked computation are returned to it –If all => sent are echoed back to an initiator, there is deadlock 72 Tulika Ringan (AL_IT)

74 Diffusion Computation of Chandy et al (an OR model) A waiting computation on node x periodically sends => to all computations it is waiting for (the dependent set), marked with the originator ID and target ID Each of these computations in turn will query their dependent set members (only if they are blocked themselves) marking each query with the originator ID, their own ID and a new target ID they are waiting on A computation cannot echo a reply to its requestor until it has received replies from its entire dependent set, at which time its sends a reply marked with the originator ID, its own ID and the most distant dependent ID When (and if) the original requestor receives echo replies from all members of its dependent set, it can declare a deadlock when an echo replys originator ID and most distant ID are its own 73 Tulika Ringan (AL_IT)

75 74 P8 P10 P9 P7 P6 P5 P4 P3P3 P2 P1 S1 S3 S2 Diffusion Computation of Chandy et al Tulika Ringan (AL_IT)

76 75 P1 => P2 message at P2 from P1 (P1, P1, P2) P2 => P3 message at P3 from P2 (P1, P2, P3) P3 => P4 message at P4 from P3 (P1, P3, P4) P4 => P5 ETC. P5 => P6 P5 => P7 P6 => P8 P7 => P10 P8 => P9 (P1, P8, P9), now reply (P1, P9, P1) P10 => P9 (P1, P10, P9), now reply (P1, P9, P1) P8 <= P9 reply (P1, P9, P8) P10<= P9 reply (P1, P9, P10) P6 <= P8 reply (P1, P8, P6) P7 <= P10 reply (P1, P10, P7) P5 <= P6 ETC. P5 <= P7 P4 <= P5 P3 <= P4 P2 <= P3 P1 <= P2 reply (P1, P2, P1) P5 cannot reply until both P6 and P7 replies arrive ! Diffusion Computation of Chandy et al end condition deadlock condition

77 Global State Detection Based on 2 facts of distributed systems: –A consistent snapshot of a distributed system can be obtained without freezing the underlying computation –A consistent snapshot may not represent the system state at any moment in time, but if a stable property holds in the system before the snapshot collection is initiated, this property will still hold in the snapshot 76

78 Global State Detection (the P-out-of-Q request model) The Kshemkalyani-Singhal algorithm is demonstrated in the text An initiator computation snapshots the system by sending FLOOD messages along all its outbound edges in an outward sweep A computation receiving a FLOOD message either returns an ECHO message (if it has no dependencies itself), or propagates the FLOOD message to it dependencies An echo message is analogous to dropping a request edge in a resource allocation graph (RAG) As ECHOs arrive in response to FLOODs the region of the WFG the initiator is involved with becomes reduced If a dependency does not return an ECHO by termination, such a node represents part (or all) of a deadlock with the initiator Termination is achieved by summing weighted ECHO and SHORT messages (returning initial FLOOD weights) 77

79 Hierarchical Deadlock Detection These algorithms represent a middle ground between fully centralized and fully distributed Sets of nodes are required to report periodically to a control site node (as with centralized algorithms) but control sites are organized in a tree The master control site forms the root of the tree, with leaf nodes having no control responsibility, and interior nodes serving as controllers for their branches 78

80 Hierarchical Deadlock Detection 79 Master Control Node Level 1 Control Node Level 2 Control Node Level 3 Control Node

81 Hierarchical Deadlock Detection The Menasce-Muntz Algorithm –Leaf controllers allocate resources –Branch controllers are responsible for the finding deadlock among the resources that their children span in the tree –Network congestion can be managed –Node failure is less critical than in fully centralized –Detection can be done many ways: Continuous allocation reporting Periodic allocation reporting 80

82 Hierarchical Deadlock Detection (contd) The Ho-Ramamoorthy Algorithm –Uses only 2 levels Master control node Cluster control nodes –Cluster control nodes are responsible for detecting deadlock among their members and reporting dependencies outside their cluster to the Master control node (they use the one phase version of the Ho-Ramamoorthy algorithm discussed earlier for centralized detection) –The Master control node is responsible for detecting intercluster deadlocks –Node assignment to clusters is dynamic 81

83 82 Agreement Protocols

84 When distributed systems engage in cooperative efforts like enforcing distributed mutual exclusion algorithms, processor failure can become a critical factor Processors may fail in various ways, and their failure modes and communication interfaces are central to the ability of healthy processors to detect and respond to such failures 83

85 The System Model The are n processors in the system and at most m of them can be faulty The processors can directly communicate with others processors via messages (fully connected system) A receiver computation always knows the identity of a sending computation The communication system is pipelined and reliable 84

86 Faulty Processors May fail in various ways –Drop out of sight completely –Start sending spurious messages –Start to lie in its messages (behave maliciously) –Send only occasional messages (fail to reply when expected to) May believe themselves to be healthy Are not known to be faulty initially by non- faulty processors 85

87 Communication Requirements Synchronous model communication is assumed in this section: –Healthy processors receive, process and reply to messages in a lockstep manner –The receive, process, reply sequence is called a round –In the synchronous-communication model, processes know what messages they expect to receive in a round The synchronous model is critical to agreement protocols, and the agreement problem is not solvable in an asynchronous system 86

88 Processor Failures Crash fault –Abrupt halt, never resumes operation Omission fault –Processor omits to send required messages to some other processors Malicious fault –Processor behaves randomly and arbitrarily –Known as Byzantine faults 87

89 Authenticated vs. Non-Authenticated Messages Authenticated messages (also called signed messages) –assure the receiver of correct identification of the sender –assure the receiver that the message content was not modified in transit Non-authenticated messages (also called oral messages) –are subject to intermediate manipulation –may lie about their origin 88

90 Authenticated vs. Non-Authenticated Messages (contd) To be generally useful, agreement protocols must be able to handle non-authenticated messages The classification of agreement problems include: –The Byzantine agreement problem –The consensus problem –the interactive consistency problem 89

91 Agreement Problems Problem Who initiates value Final agreement Byzantine One ProcessorSingle Value Agreement Consensus All ProcessorsSingle Value Interactive All Processors A Vector of Values Consistency 90 Tulika Ringan (AL_IT)

92 Agreement Problems (contd) Byzantine Agreement –One processor broadcasts a value to all other processors –All non-faulty processors agree on this value, faulty processors may agree on any (or no) value Consensus –Each processor broadcasts a value to all other processors –All non-faulty processors agree on one common value from among those sent out. Faulty processors may agree on any (or no) value 91

93 92 Interactive Consistency Each processor broadcasts a value to all other processors All non-faulty processors agree on the same vector of values such that v i is the initial broadcast value of non-faulty processor i. Faulty processors may agree on any (or no) value

94 Agreement Problems (contd) The Byzantine Agreement problem is a primitive to the other 2 problems The focus here is thus the Byzantine Agreement problem Lamport showed the first solutions to the problem –An initial broadcast of a value to all processors –A following set of messages exchanged among all (healthy) processors within a set of message rounds 93

95 The Byzantine Agreement problem The upper bound on number of faulty processors: –It is impossible to reach a consensus (in a fully connected network) if the number of faulty processors m exceeds ( n - 1) / 3 (from Pease et al) –Lamport et al were the first to provide a protocol to reach Byzantine agreement which requires m + 1 rounds of message exchanges –Fischer et al showed that m + 1 rounds is the lower bound to reach agreement in a fully connected network where only processors are faulty –Thus, in a three processor system with one faulty processor, agreement cannot be reached 94 Tulika Ringan (AL_IT)

96 Lamport - Shostak - Pease Algorithm The Oral Message (OM(m)) algorithm with m > 0 (some faulty processor(s)) solves the Byzantine agreement problem for 3m + 1 processors with at most m faulty processors –The initiator sends n - 1 messages to everyone else to start the algorithm –Everyone else begins OM( m - 1) activity, sending messages to n - 2 processors –Each of these messages causes OM (m - 2) activity, etc., until OM(0) is reached when the algorithm stops –When the algorithm stops each processor has input from all others and chooses the majority value as its value 95

97 Lamport - Shostak - Pease Algorithm (contd) The algorithm has O(n m ) message complexity, with m + 1 rounds of message exchange, where n (3m + 1) –See the examples on page in the book, where, with 4 nodes, m can only be 1 and the OM(1) and OM(0) rounds must be exchanged –The algorithm meets the Byzantine conditions: A single value is agreed upon by healthy processors That single value is the initiators value if the initiator is non-faulty 96

98 Dolev et al Algorithm Since the message complexity of the Oral Message algorithm is NP, polynomial solutions were sought. Dolev et al found an algorithm which runs with polynomial message complexity and requires 2m + 3 rounds to reach agreement The algorithm is a trade-off between message complexity and time-delay (rounds) see the description of the algorithm on page 87 97

99 Additional Considerations to Dolev Consider the case where n > (3m + 1) –more messages are sent than needed –a set of processors can be selected such the set size is 3m + 1 (called active processors) and messages can be limited to a degree among these processors –all active and passive processors using Dolevs algorithm this way reach Byzantine agreement in 2m + 3 rounds of these limited messages 98

100 Applications Atomic Commit in Distributed Database system In Distributed systems each system performs its individual transaction independently They decide individually whether to commit or abort. Once they decide, each system transfer its decision to all others Then the final decision is taken depending upon the common agreement. This way it follows the Byzantine agreement solution to the problem. 99

101 Atomic Commit Protocol Two-phase commit protocol: most commonly used atomic commit protocol. Implemented as: an exchange of messages between the coordinator and the cohorts. Guarantees global atomicity: of the transaction even if failures should occur while the protocol is executing.



104 Overview Algorithms For Deadlock Detection in Distributed Systems. Deadlock Handling Techniques. Deadlocks – An Introduction. Deadlocks in Distributed Systems. Summary.

105 Deadlocks – An Introduction What Are DEADLOCKS ? A Blocked Process which can never be resolved unless there is some outside Intervention. Resource R1 is requested by Process P1 but is held by Process P2. For Example:-

106 Illustrating A Deadlock Wait-For-Graph (WFG) Nodes – Processes in the system Directed Edges – Wait-For blocking relation A Cycle represents a Deadlock Starvation - A process execution is permanently halted. Process 1 Process 2 Resource 1 Resource 2 Waits For Held By

107 Causes Of Deadlocks Mutual Exclusion – Resources being held must be in non-shareable mode. Hold n Wait – A Process is holding one resource and is waiting for another, which is held by another process. No Preemption – Resource cannot be preempted even if it is being requested. Circular Wait – Presence of a cycle of waiting processes.

108 Deadlocks in Distributed Systems Resource Deadlock Most Common. Occurs due to lack of requested Resource. Communication Deadlock A Process waits for certain messages before it can proceed.

109 Handling Deadlocks Deadlock Avoidance Only fulfill those resource requests that wont cause deadlock in the future. Inefficient. Requires Prior resource requirement information for all processes. High Cost of scalability. Drawbacks Simulate resource allocation and determine if resultant state is safe or not.

110 Handling Deadlocks Deadlock Prevention Provide all required resources from start itself. Prioritize processes. Assign resources accordingly. Inefficient and effects Concurrency. Make Prior Rules: For Ex. – Process P1 cannot request resource R1 unless it releases resource R2. Future resource requirement unpredictable. Drawbacks Starvation possible.

111 Handling Deadlocks Deadlock Detection Resource allocation with an optimistic outlook. Periodically examine process status. Detect then break the Deadlock. Resolution – Roll back 1 or More processes and break dependency.

112 Deadlock Detection Centralized Deadlock Detection One control node (Coordinator) maintains Global WFG and searches for cycles. Distributed Deadlock Detection Each node equally responsible in maintaining Global WFG and detecting Deadlocks. Hierarchical Deadlock Detection Nodes organized in a tree, where each site detects deadlocks involving only its descendants. CONTROL ORGANIZATIONS

113 Deadlock Detection Algorithms Centralized Deadlock Detection Distributed Deadlock Detection Hierarchical Deadlock Detection Ho-Ramamoorthys one and two phase algorithms. Obermarcks Path Pushing Algorithm. Chandy-Misra-Haas Edge Chasing algorithm. Menasce-Muntz Algorithm. Ho-Ramamoorthys Algorithm.

114 Centralized Deadlock Detection Ho-Ramamoorthys 1-Phase Algorithm Each site maintains 2 Status Tables: One of the Sites Becomes the Central Control site. Process Table. Resource Table. The Central Control site periodically asks for the status tables. Contd…

115 Control site builds WFG using the status tables. Control site analyzes WFG and resolves any present cycles. Centralized Deadlock Detection Ho-Ramamoorthys 1-Phase Algorithm Contd… Shortcomings Phantom Deadlocks. High Storage & Communication Costs.

116 Phantom Deadlocks P0P0 P2P2 P1P1 R S T System A System B P1 releases resource S and asks-for resource T. 2 Messages sent to Control Site: 1. Releasing S. 2. Waiting-for T. Message 2 arrives at Control Site first. Control Site makes a WFG with cycle, detecting a phantom deadlock.

117 Centralized Deadlock Detection Ho-Ramamoorthys 2-Phase Algorithm Each site maintains a status table for processes. Phase 1 Control Site periodically asks for these Locked & Waited tables. Contd… Resources Locked & Resources Awaited. It then searches for presence of cycles in these tables.

118 Ho-Ramamoorthys 2-Phase Algorithm Contd… Phase 2 If cycles are found in phase 1 search, Control site makes 2 nd request for the tables. The details found common in both table requests will be analyzed for cycle confirmation. Centralized Deadlock Detection Shortcomings Phantom Deadlocks.

119 Distributed Deadlock Detection Obermarcks Path-Pushing Algorithm Individual Sites maintain local WFG A virtual node x exists at each site. Node x represents external processes. Detection Process Case 1: If Site S n finds a cycle not involving x -> Deadlock exists. Case 2: If Site S n finds a cycle involving x -> Deadlock possible. Contd…

120 Site S n sends a message containing its detected cycles to other sites. All sites receive the message, update their WFG and re-evaluate the graph. If Case 2 -> Consider Site S j receives the message: Site S j checks for local cycles. If cycle found not involving x (of S j ) -> Deadlock exists. If site S j finds cycle involving x it forwards the message to other sites. Process continues till deadlock found. Obermarcks Path-Pushing Algorithm

121 Distributed Deadlock Detection Chandy-Misra-Haas Edge Chasing algorithm. The blocked process sends probe message to the resource holding process. Probe message contains: ID of blocked process. ID of process sending the message. ID of process to which the message was sent. When probe is received by blocked process it forwards it to processes holding the requested resources. If Blocked Process receives its own probe -> Deadlock Exists.

122 Hierarchical Deadlock Detection Menasce-Muntz Algorithm Sites (controllers) organized in a tree structure. Leaf controllers manage local WFG. Upper controllers handle Deadlock Detection. Each Parent node maintains a Global WFG, union of WFGs of its children. Deadlock detected for its children. Changes propagated upwards in the tree.

123 Ho-Ramamoorthys Algorithm Hierarchical Deadlock Detection Sites grouped into clusters. Periodically 1 site chosen as central control site: Central control site chooses controls site for other clusters. Control site for each cluster collects the status graph there: Ho-Ramamoorthys 1-phase algorithm centralized DD algorithm used. All control sites forward status report to Central Control site which combines the WFG and performs cycle search.

124 Centralized Deadlock Detection Algorithms Large communication overhead. Coordinator is performance bottleneck. Possibility of single point of failure. Summary Distributed Deadlock Detection Algorithms High Complexity. Detection of phantom deadlocks possible. Hierarchical Deadlock Detection Algorithms Most Common. Efficient.

125 Choose the least general technique - which is still general enough to solve the problem. Edgar Knapp. THANK YOU

Download ppt "UNIT-II Distributed Synchronization 1 Mutual exclusion Mutual exclusion : makes sure that concurrent process access shared resources or data in a serialized."

Similar presentations

Ads by Google