DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

Slides:



Advertisements
Similar presentations
Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.
Advertisements

Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Distributed Computing
Page 1 Mutual Exclusion* Distributed Systems *referred to slides by Prof. Paul Krzyzanowski at Rutgers University and Prof. Mary Ellen Weisskopf at University.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
Distributed Systems Spring 2009
CS 582 / CMPE 481 Distributed Systems
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Distributed Snapshots –Termination detection Election algorithms –Bully –Ring.
1. Explain why synchronization is so important in distributed systems by giving an appropriate example When each machine has its own clock, an event that.
Synchronization Clock Synchronization Logical Clocks Global State Election Algorithms Mutual Exclusion.
Centralized Architectures
SynchronizationCS-4513, D-Term Synchronization in Distributed Systems CS-4513 D-Term 2007 (Slides include materials from Operating System Concepts,
Synchronization in Distributed Systems CS-4513 D-term Synchronization in Distributed Systems CS-4513 Distributed Computing Systems (Slides include.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
Election Algorithms. Topics r Issues r Detecting Failures r Bully algorithm r Ring algorithm.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 6 Synchronization.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms –Bully algorithm.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Coordination and Agreement. Topics Distributed Mutual Exclusion Leader Election.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.
Synchronization. Why we need synchronization? It is important that multiple processes do not access shared resources simultaneously. Synchronization in.
Vector Clock Each process maintains an array of clocks –vc.j.k denotes the knowledge that j has about the clock of k –vc.j.j, thus, denotes the clock of.
Synchronization Chapter 5. Outline 1.Clock synchronization 2.Logical clocks 3.Global state 4.Election algorithms 5.Mutual exclusion 6.Distributed transactions.
Synchronization Chapter 5.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Lecture 10 – Mutual Exclusion Distributed Systems.
Physical clock synchronization Question 1. Why is physical clock synchronization important? Question 2. With the price of atomic clocks or GPS coming down,
Distributed Process Coordination Presentation 1 - Sept. 14th 2002 CSE Spring 02 Group A4:Chris Sun, Min Fang, Bryan Maden.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Clock Synchronization When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
Lecture 12-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) October 4, 2012 Lecture 12 Mutual Exclusion.
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 6: Synchronyzation 3/5/20161 Distributed Systems - COMP 655.
Distributed Mutual Exclusion Synchronization in Distributed Systems Synchronization in distributed systems are often more difficult compared to synchronization.
Synchronization Chapter 5. Clock Synchronization When each machine has its own clock, an event that occurred after another event may nevertheless be assigned.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
CSC 8420 Advanced Operating Systems Georgia State University Yi Pan Transactions are communications with ACID property: Atomicity: all or nothing Consistency:
Lecture on Synchronization Submitted by
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Oct 1, 2015 Lecture 12: Mutual Exclusion All slides © IG.
Prof. Leonardo Mostarda University of Camerino
Distributed Computing
Distributed Mutex EE324 Lecture 11.
Chapter 6.3 Mutual Exclusion
SYNCHORNIZATION Logical Clocks.
Distributed Systems CS
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Outline Distributed Mutual Exclusion Introduction Performance measures
Distributed Systems CS
Lecture 10: Coordination and Agreement
Distributed Systems CS
Chapter 5 (through section 5.4)
Physical clock synchronization
Synchronization (2) – Mutual Exclusion
Prof. Leonardo Mostarda University of Camerino
Distributed Systems CS
Lecture 11: Coordination and Agreement
CSE 486/586 Distributed Systems Mutual Exclusion
Last Class: Naming Name distribution: use hierarchies DNS
Outline Theoretical Foundations
Presentation transcript:

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 6 Synchronization Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Clock Synchronization Figure 6-1. When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Figure 6-2. Computation of the mean solar day. Physical Clocks (1) Figure 6-2. Computation of the mean solar day. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Physical Clocks (2) Figure 6-3. International Atomic Time (TAI) seconds are of constant length, unlike solar seconds. Leap seconds are introduced when necessary to keep in phase with the sun. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Global Positioning System (1) Figure 6-4. Computing a position in a two-dimensional space. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Global Positioning System (2) Real world facts that complicate GPS It takes a while before data on a satellite’s position reaches the receiver. The receiver’s clock is generally not in synch with that of a satellite. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Clock Synchronization Algorithms Correction to local clock C Can't let time “run backward” Time of synch Figure 6-5. The relation between clock time and UTC when clocks tick at different rates. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Figure 6-6. Getting the current time from a time server. Network Time Protocol Figure 6-6. Getting the current time from a time server. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

NTP (2) Assume delay is about symmetric Skew = [(T2-T1)+(T3-T4)]/2. Take many (8) pair of (skew,delay) and use one with best (least) delay. Clocks have “strata” - lowest stratum is best – only adjust clock if your stratum is higher. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

The Berkeley Algorithm (1) Figure 6-7. (a) The time daemon asks all the other machines for their clock values. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

The Berkeley Algorithm (2) Figure 6-7. (b) The machines answer. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

The Berkeley Algorithm (3) Figure 6-7. (c) The time daemon tells everyone how to adjust their clock. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Clock Synchronization in Wireless Networks (1) Figure 6-8. (a) The usual critical path in determining network delays. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Clock Synchronization in Wireless Networks (2) Figure 6-8. (b) The critical path in the case of RBS. Reference Broadcast System: don't reset clock, just track offsets using logarithmic decay Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Clock Types Time-of-day (wall) Local clock (oscillator) Global reference Distributed clock Logical (event ordering) Lamport clock Vector clock Matrix clock Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Lamport’s Logical Clocks (1) The "happens-before" relation → can be observed directly in two situations: If a and b are events in the same process, and a occurs before b, then a → b is true. If a is the event of a message being sent by one process, and b is the event of the message being received by another process, then a → b Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Lamport’s Logical Clocks (2) Figure 6-9. (a) Three processes, each with its own clock. The clocks run at different rates. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Lamport’s Logical Clocks (3) Figure 6-9. (b) Lamport’s algorithm corrects the clocks. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Lamport’s Logical Clocks (4) Figure 6-10. The positioning of Lamport’s logical clocks in distributed systems. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Lamport’s Logical Clocks (5) Updating counter Ci for process Pi Before executing an event Pi executes Ci ← Ci + 1. When process Pi sends a message m to Pj, it sets m’s timestamp ts (m) equal to Ci after having executed the previous step. Upon the receipt of a message m, process Pj adjusts its own local counter as Cj ← max{Cj , ts (m)}, after which it then executes the first step and delivers the message to the application. (or +Δi in general) Use process ID to break ties if necessary. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Lamport’s Logical Clocks (6) Do we obtain a total order on events? Yes – single integer, break ties using proc ID What information does Lamport clock convey? ei → ej => ts(ei) < ts(ej)? Yes, timestamp of causally dependent event must be larger because of how protocol works ts(ei) < ts(ej) => ei → ej? No – unrelated events can have either order Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Group Messaging Properties Two related properties of interest Ordering FIFO (per sender) Causal Total Reliability Best effort Duplicate detection Omission detection/recovery per receiver Atomic to all receivers (all or none) Digital Fountain – fountain codes Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Example: Totally Ordered Multicasting Figure 6-11. Updating a replicated database and leaving it in an inconsistent state. Key concerns are: do operations commute (final state same); what values do clients see (intermediate states) Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Causal Message Delivery Want a method by which messages can be delivered in causal order, not necessarily in some total order across all processes Lamport logical clocks impose a total order that obeys causal order, but this is too restrictive (messages that are not causally related have a delivery order imposed on them unnecessarily) Vector clocks can capture causality information precisely Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Figure 6-12. Concurrent message transmission using logical clocks. Vector Clocks (1) Figure 6-12. Concurrent message transmission using logical clocks. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Vector Clocks (2) Vector clocks are constructed by letting each process Pi maintain a vector VCi with the following two properties: VCi [ i ] is the number of events that have occurred so far at Pi. In other words, VCi [ i ] is the local logical clock at process Pi . If VCi [ j ] = k then Pi knows that k events have occurred at Pj. It is thus Pi’s knowledge of the local time at Pj . Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Vector Clocks (3) Steps carried out to accomplish property 2 of previous slide and enforce causal delivery: Before executing an event (e.g., message send) Pi executes VCi [ i ] ← VCi [i ] + 1. When process Pi sends a message m to Pj, it sets m’s (vector) timestamp ts (m) equal to VCi after having executed the previous step. Upon the receipt of a message m from Pi, process Pj adjusts its own vector by setting VCj [k ] ← max{VCj [k ], ts (m)[k ]} for each k, after which it delivers the message to the application, unless VCj[k] < ts(m)[k] for some k other than i, or VCj[i] +1 < ts(m)[i], in which case, buffer m. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Enforcing Causal Communication Figure 6-13. Enforcing causal communication. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Enforcing Causal Communication (0000) (1000) (1000) (1001) (1101) P0 (1001) (0000) (1101) P1 (1000) (0001) (0000) P2 (1101) (0001) (1001) (0000) (0001) (1001) (1101) P3 Figure 6-13. Enforcing causal communication. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Atomic Multicast Want all nodes in group G to process all messages in same order Each node maintains a Lamport logical clock C. When node n sends a message m to G it includes a timestamp ts(m) = C. When node n' receives m, it updates its clock C' and sends a timestamped ACK a', with ts(a')=C'. Node n collects all ACKs and takes the maximum timestamp T(m)=max{ts(a)} for all a, and sends a commit message c containing T(m) to G. Node n' can deliver m when it receives c and it knows T(m) < T(m') for any other message m'. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Mutual Exclusion Need for synchronization Access to shared variables Access to serially reusable resources Approaches Centralized Distributed Voting Token-passing Characteristics Correctness – safety Efficiency – messages passed Fault tolerance Fairness – liveness Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Mutual Exclusion A Centralized Algorithm (1) locked=F Figure 6-14. (a) Process 1 asks the coordinator for permission to access a shared resource. Permission is granted since locked=F. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Mutual Exclusion A Centralized Algorithm (2) locked=T Figure 6-14. (b) Process 2 then asks permission to access the same resource. The coordinator does not reply since the resource is locked (locked=T). 2 is enqueued. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Mutual Exclusion A Centralized Algorithm (3) locked=T Figure 6-14. (c) When process 1 releases the resource, it tells the coordinator, which dequeues and replies to 2. When 2 sends release, locked is set to F (assuming the queue is empty). Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

A Distributed Algorithm (1) Lamport Timestamp DME algorithm Three different cases: If the receiver is not accessing the resource and does not want to access it, it sends back an OK message to the sender. If the receiver already has access to the resource, it simply does not reply. Instead, it queues the request. If the receiver wants to access the resource as well but has not yet done so, it compares the timestamp of the incoming message with the one contained in the message that it has sent everyone. The lowest one wins. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

A Distributed Algorithm (2) Figure 6-15. (a) Two processes want to access a shared resource at the same moment. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

A Distributed Algorithm (3) Figure 6-15. (b) Process 0 has the lowest timestamp, so it wins. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

A Distributed Algorithm (4) Figure 6-15. (c) When process 0 is done, it sends an OK also, so 2 can now go ahead. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

A Distributed Algorithm (5) Lamport Timestamp DME algorithm Observations: No deadlock – lowest timestamp wins. No starvation – eventually your request has the lowest timestamp (all later ones will “see” yours and have a larger timestamp). Replaces one overloaded node with N of them. Replaces one point of failure with N of them. “Fix” failures by requiring “Deny” response to denied requests – sender can now resend until gets a reply and distinguish failures. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Voting Algorithms (1) Lamport Voting DME algorithm Send request as before. Give vote if it has not already been given to someone else. Queue requests in timestamp order. Win when a majority of votes is collected. Return vote to grantor when done with resource. Give returned vote to head of queue if not empty. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Voting Algorithms (2) Lamport Voting DME algorithm Observations: Can tolerate N/2-1 failures. Don't have to send messages to all nodes – just enough to collect the votes needed. Problem: multiple nodes may collect votes and none get a majority – deadlock! Fix this with “Rescind” message – used if a grantor gets a request with a smaller timestamp. Give vote to priority request when returned. If receive a Rescind message and not in CS, must return vote and wait for majority. If receive Rescind while in CS, return when done Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Voting Algorithms (3) Quorum Voting DME algorithm - coteries Don't need majority – only need to break ties. Each node n has its own quorum Q(n) whose votes it must get to obtain lock. For all n and n', Q(n) intersects Q(n'). Node(s) in intersection of Q(n) and Q(n') decide between the requests of n and n' – safety. Can make |Q(n)| about sqrt(N) for N nodes. Can also make the number of quorums of which a node n is a member about sqrt(N), so no node is overloaded with requests. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

A Token Ring Algorithm Figure 6-16. (a) An unordered group of processes on a network. (b) A logical ring constructed in software. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Tree Algorithm (1) Form a tree of all nodes with the node currently holding the token as the root. Each node maintains a pointer to current parent and list of children, along with queue of requests (initially empty) and state (hold token or not). If node n wants get lock, checks if it has token. If so, then enter CS, else check if queue is empty. If so, send request to parent. Put self on queue. If receive request, have token, and not in CS, send token to requester and make it parent. Else send request to parent (if not self) and enqueue request. When done with CS or receive token, send to first node in queue and make it new parent. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Tree Algorithm (2) T Node 5 has token, is root of tree. 1 2 3 4 5 6 7 T 8 8 Node 5 has token, is root of tree. Node 8 requests token, enqueues self. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Tree Algorithm (3) 1 2 3 4 5 6 7 T 8 7 8 8 Node 6 receives 8's request, enqueues 8 and requests token. Node 7 requests token, enqueues self. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Tree Algorithm (4) T Node 6 receives request, enqueues 7. 1 2 3 T 4 5 6 7 87 7 8 8 Node 6 receives request, enqueues 7. Node 5 receives request, sends token to 6, makes 6 parent Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Tree Algorithm (5) 3 1 2 3 4 5 6 7 T 7 7 8 8 Node 6 receives token, dequeues 8, sends 8 token, with request. Node 3 requests token, enqueues self. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Tree Algorithm (6) T Node 7 receives request, enqueues 3. 1 2 3 4 5 6 7 7 73 8 T 6 Node 7 receives request, enqueues 3. Node 8 receives token, enters CS, makes self root, enqueues 6. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Tree Algorithm (7) 3 1 2 3 1 4 5 6 7 T 7 73 8 Node 8 leaves CS, dequeues 6, sends 6 token, makes 6 parent. Node 1 requests token, enqueues self. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Tree Algorithm (8) 3 1 2 3 1 T 4 5 6 7 1 73 8 Node 6 receives token, dequeues 7, sends 7 token, makes 7 parent. Node 5 receives request, enqueues 1, requests token. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Tree Algorithm (9) 3 1 2 3 1 T 4 5 6 7 3 1 5 8 Node 7 receives token, dequeues self, enters CS, makes self parent. Node 6 receives request, enqueues 5, requests token Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Tree Algorithm (10) T Node 7 receives request, enqueues 6. 3 1 2 3 1 T 4 5 6 7 36 1 5 4 8 Node 7 receives request, enqueues 6. Node 4 requests token, enqueues self. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Tree Algorithm (11) T Node 5 receives request, enqueues 4. 3 1 2 3 1 T 4 5 6 7 6 14 5 4 8 Node 5 receives request, enqueues 4. Node 7 leaves CS, dequeues 3, sends 3 token, makes 3 parent. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

A Comparison of the Four Algorithms ... plus two more... Figure 6-17. A comparison of three mutual exclusion algorithms. Coteries ~2 sqrt(n) ~2 sqrt(n) crash of p in quorum Token Tree 0 to log n 0 to 2 log n lost token, p crash Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Leader Election Why? How? Often simpler to use centralized approach Can get around SPoF and congestion by using hierarchies How? Bully algorithm Ring algorithm Variations on these Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Election Algorithms The Bully Algorithm P sends an ELECTION message to all processes with higher numbers. If no one responds, P wins the election and becomes coordinator. If one of the higher-ups answers, it takes over. P’s job is done. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

The Bully Algorithm (1) Figure 6-20. The bully election algorithm. (a) Process 4 holds an election. (b) Processes 5 and 6 respond, telling 4 to stop. (c) Now 5 and 6 each hold an election. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

The Bully Algorithm (2) Figure 6-20. The bully election algorithm. (d) Process 6 tells 5 to stop. (e) Process 6 wins and tells everyone. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Figure 6-21. Election algorithm using a ring. A Ring Algorithm Figure 6-21. Election algorithm using a ring. Note that here, each message collects node IDs as it circulates the ring – the winner will know all live nodes at end; only the best candidate ID must be circulated, and new msg propagated only if its candidate is better Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Ring Algorithm 1 2 Elect 2 7 3 Elect 6 6 4 5 Nodes 6 and 2 initiate elections, believing node 7 (previous leader) to be dead. Here, only the best candidate ID is circulated. A node enters election if it detects leader failure or if it receives an election message. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Ring Algorithm 1 2 Elect 6 7 3 Elect 3 6 4 5 Nodes 0 and 3 receive election messages, enter election. A node sends an election message if it initiates or if it receives a message with a better candidate. Node 0 continues with 6 as the best candidate, while node 3 considers itself the best candidate. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Ring Algorithm 1 2 Elect 6 7 3 6 4 Elect 4 5 Nodes 1 and 4 receive election messages, enter election. Node 1 continues with 6 as the best candidate, while node 4 considers itself the best candidate. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Ring Algorithm 1 2 Elect 6 7 3 6 Elect 5 4 5 Nodes 5 and 2 receive election messages, 5 enters election (2 is already in). Node 2 realizes that 6 is a better candidate than the best it has seen (i.e., 2) so forwards message, while node 5 considers itself the best candidate. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Ring Algorithm 1 2 Elect 6 7 3 6 Elect 5 4 5 Nodes 6 and 3 receive election messages, are both already in. Node 3 realizes that 6 is a better candidate than the best it has seen (i.e., 3) so forwards message, while node 6 knows it has seen a better candidate (itself). Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Ring Algorithm 1 2 7 3 Elect 6 6 4 5 Node 4 receives election message, but is already in. Node 4 realizes that 6 is a better candidate than the best it has seen (i.e., 4) so forwards message, while node 6 waits, having suppressed the election message from 5. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Ring Algorithm 1 2 7 3 6 Elect 6 4 5 Node 5 receives election message, but is already in. Node 5 realizes that 6 is a better candidate than the best it has seen (i.e., 5) so forwards message. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Ring Algorithm 1 2 7 3 6 4 Elect 6 5 Node 6 receives election message with its own ID as the best candidate seen. Since only the candidate itself will start an election message with its own ID, it knows there is no better candidate in the ring and it has won! Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Ring Algorithm 1 2 6 won! 6 won! 6 won! 7 3 6 won! 6 won! 6 won! 6 4 etc. 5 The winner knows it won when it sees its ID on a message, and then circulates an election results message around the ring with its ID. The results message causes each node to exit the election and return to normalcy. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Improving Elections Elections are generally held when a node fails - Use of heartbeats to detect failure – make “better” candidates more sensitive (i.e., time out sooner) so only one node starts an election Node starting an election can just contact the best candidate it believes to be alive, plus the better ones it thinks have failed (to be sure) To avoid lengthy delays, the initiator can contact the K best candidates, or If the best “live” candidate(s) don't respond, it can escalate to contact more, lower-tier candidates Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Elections vs. DME While both types of protocol end up selecting a single winner, there are some important differences: A DME participant keeps trying until it wins; this is not needed in leader election (any winner will do) There is no issue of fairness in leader election All nodes need to know the winner when leader election is done; this is not needed in DME Hence DME can be more message-efficient (i.e., sublinear) compared to leader election Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Elections in Wireless Environments (1) Figure 6-22. Election algorithm in a wireless network, with node a as the source. (a) Initial network. (b)–(e) The build-tree phase Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Elections in Wireless Environments (2) Figure 6-22. Election algorithm in a wireless network, with node a as the source. (a) Initial network. (b)–(e) The build-tree phase Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Elections in Wireless Environments (3) Figure 6-22. (e) The build-tree phase. (f) Reporting of best node to source. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Elections in Large-Scale Systems (1) Requirements for superpeer selection: Normal nodes should have low-latency access to superpeers. Superpeers should be evenly distributed across the overlay network. There should be a predefined portion of superpeers relative to the total number of nodes in the overlay network. Each superpeer should not need to serve more than a fixed number of normal nodes. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Elections in Large-Scale Systems (2) Figure 6-23. Moving tokens in a two-dimensional space using repulsion forces. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5