Lecture 11-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 28, 2010 Lecture 11 Leader Election.

Slides:



Advertisements
Similar presentations
CS542 Topics in Distributed Systems Diganta Goswami.
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.
Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.
Leader Election Let G = (V,E) define the network topology. Each process i has a variable L(i) that defines the leader.  i,j  V  i,j are non-faulty.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 11: Leader Election All slides © IG.
1 Algorithms and protocols for distributed systems We have defined process groups as having peer or hierarchical structure and have seen that a coordinator.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
LEADER ELECTION CS Election Algorithms Many distributed algorithms need one process to act as coordinator – Doesn’t matter which process does the.
CS 582 / CMPE 481 Distributed Systems
Distributed Systems Fall 2009 Coordination and agreement, Multicast, and Message ordering.
Computer Science Lecture 11, page 1 CS677: Distributed OS Last Class: Clock Synchronization Logical clocks Vector clocks Global state.
SynchronizationCS-4513, D-Term Synchronization in Distributed Systems CS-4513 D-Term 2007 (Slides include materials from Operating System Concepts,
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Synchronization in Distributed Systems CS-4513 D-term Synchronization in Distributed Systems CS-4513 Distributed Computing Systems (Slides include.
Clock Synchronization and algorithm
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 12: Mutual Exclusion All slides © IG.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 19: Paxos All slides © IG.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
Election Algorithms and Distributed Processing Section 6.5.
Election Algorithms. Topics r Issues r Detecting Failures r Bully algorithm r Ring algorithm.
1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms –Bully algorithm.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Lecture 6-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) September 12, 2013 Lecture 6 Global Snapshots Reading:
Coordination and Agreement. Topics Distributed Mutual Exclusion Leader Election.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 8 Leader Election Section 12.3 Klara Nahrstedt.
Synchronization CSCI 4780/6780. Mutual Exclusion Concurrency and collaboration are fundamental to distributed systems Simultaneous access to resources.
Paxos: Agreement for Replicated State Machines Brad Karp UCL Computer Science CS GZ03 / M st, 23 rd October, 2008.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion & Leader Election Steve Ko Computer Sciences and Engineering University.
Distributed Systems Topic 5: Time, Coordination and Agreement
Lecture 10: Coordination and Agreement (Chap 12) Haibin Zhu, PhD. Assistant Professor Department of Computer Science Nipissing University © 2002.
Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
Lecture 12-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) October 4, 2012 Lecture 12 Mutual Exclusion.
Lecture 7- 1 CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 7 Distributed Mutual Exclusion Section 12.2 Klara Nahrstedt.
Lecture 4-1 Computer Science 425 Distributed Systems (Fall2009) Lecture 4 Chandy-Lamport Snapshot Algorithm and Multicast Communication Reading: Section.
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 6: Synchronyzation 3/5/20161 Distributed Systems - COMP 655.
CIS 825 Review session. P1: Assume that processes are arranged in a ring topology. Consider the following modification of the Lamport’s mutual exclusion.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
CSE 486/586 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.
CSC 8420 Advanced Operating Systems Georgia State University Yi Pan Transactions are communications with ACID property: Atomicity: all or nothing Consistency:
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Lecture 11: Coordination and Agreement Central server for mutual exclusion Election – getting a number of processes to agree which is “in charge” CDK4:
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Oct 1, 2015 Lecture 12: Mutual Exclusion All slides © IG.
Distributed Systems 31. Theoretical Foundations of Distributed Systems - Coordination Simon Razniewski Faculty of Computer Science Free University of Bozen-Bolzano.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Coordination and Agreement
CSE 486/586 Distributed Systems Leader Election
Simon Razniewski Faculty of Computer Science
Lecture 17: Leader Election
Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013
CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy)
EEC 688/788 Secure and Dependable Computing
CSE 486/586 Distributed Systems Leader Election
Lecture 21: Replication Control
CSE 486/586 Distributed Systems Mutual Exclusion
EEC 688/788 Secure and Dependable Computing
Lecture 10: Coordination and Agreement
EEC 688/788 Secure and Dependable Computing
Lecture 11: Coordination and Agreement
CSE 486/586 Distributed Systems Mutual Exclusion
CSE 486/586 Distributed Systems Leader Election
Presentation transcript:

Lecture 11-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 28, 2010 Lecture 11 Leader Election Reading: Sections 12.3  2010, I. Gupta, K. Nahrtstedt, S. Mitra, N. Vaidya, M. T. Harandi, J. Hou

Lecture 11-2 Why Election?  Example 1: Your Bank maintains multiple servers in their cloud, but for each customer, one of the servers is responsible, i.e., is the leader  What if there are two leaders per customer?  What if servers disagree about who the leader is?  What if the leader crashes?

Lecture 11-3 Why Election?  Example 2: (2 lectures ago) In the sequencer-based algorithm for total ordering of multicasts, the "sequencer” = leader  Example 3: Group of cloud servers replicating a file need to elect one among them as the primary replica that will communicate with the client machines  Example 4: Group of NTP servers: who is the root server?

Lecture 11-4 What is Election?  In a group of processes, elect a Leader to undertake special tasks.  What happens when a leader fails (crashes)  Some process detects this (how?)  Then what?  Focus of this lecture: Election algorithm 1. Elect one leader only among the non-faulty processes 2. All non-faulty processes agree on who is the leader

Lecture 11-5  Any process can call for an election.  A process can call for at most one election at a time.  Multiple processes can call an election simultaneously.  All of them together must yield a single leader only  The result of an election should not depend on which process calls for it.  Messages are eventually delivered. Assumptions

Lecture 11-6  At the end of the election protocol, the non- faulty process with the best (highest) election attribute value is elected.  Attribute examples: leader has highest id or address, or fastest cpu, or most disk space, or most number of files, etc.  A run (execution) of the election algorithm must always guarantee at the end:  Safety:  non-faulty p: (p’s elected = (q: a particular non- faulty process with the best attribute value) or  )  Liveness:  election: (election terminates) &  p: non-faulty process, p’s elected is not  Problem Specification

Lecture 11-7  N Processes are organized in a logical ring (similar to ring in Chord p2p system)  p i has a communication channel to p (i+1) mod N.  All messages are sent clockwise around the ring.  Any process p that discovers the old coordinator has failed initiates an “election” message that contains p’s own id:attr.  When a process receives an election message, it compares the attr in the message with its own attr.  If the arrived attr is greater, the receiver forwards the message.  If the arrived attr is smaller and the receiver has not forwarded an election message earlier, it overwrites the message with its own id:attr, and forwards it.  If the arrived id:attr matches that of the receiver, then this process’s attr must be the greatest (why?), and it becomes the new coordinator. This process then sends an “elected” message to its neighbor with its id, announcing the election result.  When a process p i receives an elected message, it  sets its variable elected i  id of the message.  forwards the message if it is not the new coordinator. Algorithm 1: Ring Election

Lecture 11-8 Ring-Based Election: Example (attr:=id)  The worst-case scenario occurs when the counter-clockwise neighbor the initiator) has the highest attr. In the example: The election was started by process 17. The highest process identifier encountered so far is 24. (final leader will be 33)

Lecture 11-9 Ring-Based Election: Analysis  The worst-case scenario occurs when the counter-clockwise neighbor has the highest attr. In a ring of N processes, in the worst case:  A total of N-1 messages are required to reach the new coordinator-to-be (election messages).  Another N messages are required until the new coordinator-to-be ensures it is the new coordinator (election messages – no changes).  Another N messages are required to circulate the elected messages.  Total Message Complexity = 3N-1  Turnaround time = 3N-1

Lecture Correctness? Assume – no failures happen during the run of the election algorithm Safety and Liveness are satisfied. What happens if there are failures during the election run?

Lecture Example: Ring Election Election: 2 Election: 4 Election: 3 Election: 4 P1 P2 P3 P4 P0 P5 1.P2 initiates election after old leader P5 failed P1 P2 P3 P4 P0 P5 2. P2 receives “election”, P4 dies P1 P2 P3 P4 P0 P5 3. Election: 4 is forwarded for ever? May not terminate when process failure occurs during the election! Consider above example where attr==highest id

Lecture  Processes are organized in a logical ring.  Any process that discovers the coordinator (leader) has failed initiates an “election” message. This is the initiator of the election.  The message is circulated around the ring, bypassing failed nodes.  Each node adds its id:attr to the message as it passes it to the next node (without overwriting what is already in the message)  Once the message gets back to the initiator, it elects the node with the best election attribute value.  It then sends a “coordinator” message with the id of the newly- elected coordinator. Again, each node adds (appends) its id to the end of the message.  Once “coordinator” message gets back to initiator,  election is over if “coordinator” is in id-list.  else the algorithm is repeated (handles election failure). Algorithm 2: Modified Ring Election “Node”=“Process” throughout

Lecture Example: Ring Election Election: 2 Election: 2, 3,4,0,1 Election: 2,3,4 Election: 2,3 Coord(4): 2 Coord(4): 2,3 Coord(4) 2, 3,0,1 Election: 2 Election: 2,3 Election: 2,3,0 Election: 2, 3,0,1 Coord(3): 2 Coord(3): 2,3 Coord(3): 2,3,0 Coord(3): 2, 3,0,1 P1 P2 P3 P4 P0 P5 1. P2 initiates election P1 P2 P3 P4 P0 P5 2. P2 receives “election”, P4 dies P1 P2 P3 P4 P0 P5 3. P2 selects 4 and announces the result P1 P2 P3 P4 P0 P5 4. P2 receives “Coord”, but P4 is not included P1 P2 P3 P4 P0 P5 5. P2 re-initiates election P1 P2 P3 P4 P0 P5 6. P3 is finally elected

Lecture Modified Ring Election How many messages? What is the turnaround time? How would you redesign the algorithm to be fault-tolerant to an initiator’s failure? –One idea: Have the initiator’s successor wait a while, then re-initiate a new election. Do the same for this successor’s successor, and so on… Reconfiguration of ring upon failures –Can be done if all processes “know” about all other processes in the system Supports concurrent elections – an initiator with a lower id blocks other initiators’ election messages

Lecture What about that Impossibility? The Election problem is a special form of the consensus problem In an asynchronous system with no bounds on message delays and arbitrarily slow processes, can we have a generic leader election protocol? Where does the modified Ring election start to give problems with the above asynchronous system assumptions?

Lecture What about that Impossibility? The Election problem is a special form of the consensus problem In an asynchronous system with no bounds on message delays and arbitrarily slow processes, can we have a generic leader election protocol? –No! [Marzullo and Sabel] + Impossibility of Consensus Where does the modified Ring election start to give problems with the above asynchronous system assumptions? –Pi may just be very slow, but not faulty (yet it is not elected as leader!) –Also slow initiator, ring reorganization

Lecture  Assumptions:  Synchronous system  All messages arrive within T trans units of time.  A reply is dispatched within T process units of time after the receipt of a message.  if no response is received in 2T trans + T process, the node is assumed to be faulty (crashed).  attr=id  Each process knows all the other processes in the system (and thus their id’s) Algorithm 3: Bully Algorithm

Lecture  A node initiates election by sending an “election” message to only nodes that have a higher id than itself.  If no answer within timeout, announce itself to lower nodes as coordinator.  if any answer received, then there is some non-faulty higher node  so, wait for coordinator message. If none received after a timeout, start a new election.  A node that receives an “election” message replies with answer, & starts its own election protocol (unless it has already done so)  When a process finds the coordinator has failed, if it knows its id is the highest, it elects itself as coordinator, then sends a coordinator message to all processes with lower identifiers. Algorithm 3: Bully Algorithm

Lecture Example: Bully Election OK P1 P2 P3 P4 P0 P5 1. P2 initiates election 2. P2 receives “replies P1 P2 P3 P4 P0 P5 3. P3 & P4 initiate election P1 P2 P3 P4 P0 P5 P1 P2 P3 P4 P0 P5 4. P3 receives reply OK Election P1 P2 P3 P4 P0 P5 5. P4 receives no reply P1 P2 P3 P4 P0 P5 5. P4 announces itself coordin ator answer=OK

Lecture The Bully Algorithm The coordinator p 4 fails and p 1 detects this p 3 fails

Lecture Analysis of The Bully Algorithm Best case scenario: The process with the second highest id notices the failure of the coordinator and elects itself. –N-2 coordinator messages are sent. –Turnaround time is one message transmission time.

Lecture Analysis of The Bully Algorithm Worst case scenario: When the process with the lowest id in the system detects the failure. –N-1 processes altogether begin elections, each sending messages to processes with higher ids. –The message overhead is O(N 2 ). –Turnaround time is approximately 5 message transmission times if there are no failures during the run: 1.Election from lowest id process 2.Answer to lowest id process from 2 nd highest id process 3.Election from 2nd highest id process 4.Timeout for 2nd highest id process 5.Coordinator from 2 nd highest id process

Lecture Summary Coordination in distributed systems requires a leader process Leader process might fail Need to (re-) elect leader process Three Algorithms –Ring algorithm –Modified Ring algorithm –Bully Algorithm

Lecture Readings and Announcements Readings: –For today’s lecture: Section 12.3 –For next lecture: Section 12.2 »Mutual Exclusion Cheating Issues Midterm Exam is next Tuesday Oct 5 th –Everitt 151 MP demos will be later this week –Signup sheet already posted on newsgroup – please sign up today!