Simon Razniewski Faculty of Computer Science

Slides:



Advertisements
Similar presentations
CS542 Topics in Distributed Systems Diganta Goswami.
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.
Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 11: Leader Election All slides © IG.
1 Algorithms and protocols for distributed systems We have defined process groups as having peer or hierarchical structure and have seen that a coordinator.
Distributed Systems Spring 2009
LEADER ELECTION CS Election Algorithms Many distributed algorithms need one process to act as coordinator – Doesn’t matter which process does the.
CS 582 / CMPE 481 Distributed Systems
1. Explain why synchronization is so important in distributed systems by giving an appropriate example When each machine has its own clock, an event that.
Synchronization Clock Synchronization Logical Clocks Global State Election Algorithms Mutual Exclusion.
Computer Science Lecture 11, page 1 CS677: Distributed OS Last Class: Clock Synchronization Logical clocks Vector clocks Global state.
SynchronizationCS-4513, D-Term Synchronization in Distributed Systems CS-4513 D-Term 2007 (Slides include materials from Operating System Concepts,
Synchronization in Distributed Systems CS-4513 D-term Synchronization in Distributed Systems CS-4513 Distributed Computing Systems (Slides include.
Coordination in Distributed Systems Lecture # 8. Coordination Anecdotes  Decentralized, no coordination  Aloha ~ 18%  Some coordinating Master  Slotted.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Clock Synchronization and algorithm
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
Election Algorithms and Distributed Processing Section 6.5.
Election Algorithms. Topics r Issues r Detecting Failures r Bully algorithm r Ring algorithm.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
1DT066 D ISTRIBUTED I NFORMATION S YSTEM Time, Coordination and Agreement 1.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 6 Synchronization.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Naming Name distribution: use hierarchies DNS X.500 and LDAP.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Coordination and Agreement. Topics Distributed Mutual Exclusion Leader Election.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 8 Leader Election Section 12.3 Klara Nahrstedt.
Synchronization CSCI 4780/6780. Mutual Exclusion Concurrency and collaboration are fundamental to distributed systems Simultaneous access to resources.
Vector Clock Each process maintains an array of clocks –vc.j.k denotes the knowledge that j has about the clock of k –vc.j.j, thus, denotes the clock of.
ADITH KRISHNA SRINIVASAN
Paxos: Agreement for Replicated State Machines Brad Karp UCL Computer Science CS GZ03 / M st, 23 rd October, 2008.
Synchronization Chapter 5.
Lecture 11-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 28, 2010 Lecture 11 Leader Election.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion & Leader Election Steve Ko Computer Sciences and Engineering University.
Distributed Systems Topic 5: Time, Coordination and Agreement
Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
Lecture 12-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) October 4, 2012 Lecture 12 Mutual Exclusion.
Lecture 7- 1 CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 7 Distributed Mutual Exclusion Section 12.2 Klara Nahrstedt.
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 6: Synchronyzation 3/5/20161 Distributed Systems - COMP 655.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
CSE 486/586 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Systems 14. Coordination in Distributed Systems Simon Razniewski Faculty of Computer Science Free University of Bozen-Bolzano A.Y. 2014/2015.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Lecture 11: Coordination and Agreement Central server for mutual exclusion Election – getting a number of processes to agree which is “in charge” CDK4:
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Oct 1, 2015 Lecture 12: Mutual Exclusion All slides © IG.
Distributed Systems 31. Theoretical Foundations of Distributed Systems - Coordination Simon Razniewski Faculty of Computer Science Free University of Bozen-Bolzano.
Coordination and Agreement
CSE 486/586 Distributed Systems Leader Election
Distributed Computing
Distributed Processing Election Algorithm
Lecture 17: Leader Election
Parallel and Distributed Algorithms
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
CSE 486/586 Distributed Systems Leader Election
CSE 486/586 Distributed Systems Mutual Exclusion
Fault-Tolerant State Machine Replication
Lecture 10: Coordination and Agreement
Chapter 5 (through section 5.4)
Lecture 11: Coordination and Agreement
CSE 486/586 Distributed Systems Mutual Exclusion
CSE 486/586 Distributed Systems Leader Election
Last Class: Naming Name distribution: use hierarchies DNS
Presentation transcript:

Distributed Systems 27. Theoretical Foundations of Distributed Systems - Coordination Simon Razniewski Faculty of Computer Science Free University of Bozen-Bolzano A.Y. 2016/2017

Co-ordination Algorithms are fundamental in distributed systems: to dynamically re-assign the role of master choose primary server after crash co-ordinate resource access for resource sharing: concurrent updates of entries in a database (data locking) files a shared repository to agree on actions: whether to commit/abort database transaction agree on readings from a group of sensors

Co-ordination Problems Clock Synchronization processes must agree on order of events crucial e.g. for concurrent transactions in databases or change polling in distributed file systems Leader election after crash failure has occurred after network reconfiguration Mutual exclusion distributed form of synchronized access problem not covered here

1. Clock Synchronization

Why to synchronize clocks? Important to end users Ebay auctions Enrolment for sport courses/student dorms Order of chat/mail messages Important internally in distributed systems Concurrent updates in distributed databases Distributed file systems

End Users Synchronize with an authoritative source Network time protocol (NTP): UDP on port 123 “In operation since before 1985, NTP is one of the oldest Internet protocols in current use.” How?

Network Time Protocol (NTP) e.g. time.windows.com  Wireshark

System internal Exact time not necessarily important Important is a relative ordering between events Database updates File system changes

Message order matters! A bank keeps replicas of bank accounts in Milan and Rome Event 1: Customer pays 100 € into his account of 1000 € Event 2: The bank adds 1% interest

General problem: Determining message ordering send receive m 1 2 X Y Z Physical time A 3 t How can A know the order in which the messages were sent?

Time Ordering of Events (Lamport) Observation: For some events E1, E2, it is “obvious” that E1 happened before E2 (written E1  E2) If E1 happens before E2 in process P, then E1  E2 If E1 = send(M) and E2 = receive(M), then E1  E2 (M is a message) If E1  E2 and E2  E3 then E1  E3 (transitivity)

Logical Clocks Goal: Assign “timestamps” ti to events Ei such that E1  E2  t1 < t2 (partial order!) Approach: Processes incrementally number their events send numbers with messages update their “logical clock” to max(OwnTime, SendTime) +1 when they receive a message

Logical Clocks in the Message Scenario 5 2 4 Messages carry numbers 5 1 3 For a tie break, use process numbers as second component!

Combined with classical clocks Three processes, each with its own clock. The clocks run at different rates.

Combined with classical clocks (2) Lamport’s algorithm corrects the clocks.

Lamport’s Logical Clocks (4) Figure 6-10. The positioning of Lamport’s logical clocks in distributed systems.

Back to databases How long to wait before processing an update?

Solution Multicast messages to everyone, including oneself On receiving a message: Queue message based on their logical sending timestamp Acknowledge receipt of messages to all other peers Process acknowledgments and messages from same peer in sending order (how?) Only process messages upon receipt of acknowledgment by all other peers  Unique processing order for all peers “Totally-ordered multicast”

2. Leader Election P2p – who is tracker

Leader Election The problem: Qualitative properties: N processes, with unique IDs (how could we get them?) Old coordinator has crashed/disappeared due to network issues must choose a new unique coordinator among themselves one or more processes might start process simultaneously Qualitative properties: Correctness: Every process has the same value in the variable elected Liveness: All processes participate and eventually discover the identity of the leader (elected cannot stay undefined). Quantitative properties Bandwidth: total number of messages sent around Turnaround: number of steps needed to come to a result

Election on a Ring (Chang/Roberts 1979) Assumptions: UIDs have a linear order processes form a unidirectional logical ring, i.e., each process has channels to two other processes from one it receives messages, to the other it sends messages Goal: process with highest UID becomes leader Obtain UID: MAC

Election on a Ring (cntd) Processes send two kinds of messages: elect(UID), elected(UID) can be in two states: non-participant, participant Two phases 1. Determine leader 2. Announce winner Initially, each process is non-participant

Phase 1: Determine Leader Some process with UID id0 initiates the election by becoming participant sending the message elect(id0) to its neighbour When a non-participant receives a message elect(id) it forwards elect(idmax), where idmax is the maximum of its own and the received UID becomes participant When a participant receives a message elect(id) it forwards the message if id is greater than its own UID it ignores the message if id is less than its own UID

Phase 2: Announce Winner When a participant receives a message elect(id) where id is its own UID it becomes the leader it becomes non-participant sends the message elected(id) to its neighbour When a participant receives a message elected(id) it records id as the leader’s UID Becomes non-participant forwards the message elected(id) to its neighbour

Election on a Ring: Example 24 15 9 4 3 28 17 1 non-participants participants

Properties Correctness: Liveness Bandwidth: Turnaround:  clear, if only one election is running what, if several elections are running at the same time? participants do not forward smaller IDs Bandwidth: at most 3n – 1 (if a single process starts the election, what if several processes start an election?) Turnaround: at most 3n-1 (if …)

Under Which Conditions can it Work? What if there is a failure (process or connection)? the election gets stuck  assumption: no failures or timeouts during election (in token rings, nodes are connected to the network by a connector, which may pass on tokens, even if the node has failed) When is this applicable? token ring/token bus/virtual ring (Chord)

Bully Algorithm (Garcia-Molina) Idea: Process with highest ID imposes itself as the leader Assumption: each process has a unique ID each process knows the IDs of the other processes When is it applicable? IDs don't change Set of participants constant Possibly much faster than ring algorithm Turing Award 2015!

Bully Algorithm: Principles A process detects failure of the leader The process starts an election by notifying the potential candidates (i.e., processes with greater ID) if no candidate replies, the process declares itself the winner of the election if there is a reply, the process stops its election initiative When a process receives a notification it replies to the sender and starts an election if its ID is higher than the one of the sender

Bully Algorithm: Messages Election message: to “call elections” (sent to nodes with higher UID) Answer message: to “vote” (… against the caller, sent to nodes with lower UID) Coordinator message: to announce own acting as coordinator

Bully Algorithm: Actions Initially: The process with highest UID sends coordinator message A process starting an election sends an election message if no answer within time T = 2 Ttransmission + Tprocess, then it sends a coordinator message If a process receives a coordinator message it sets its coordinator variable If a process receives an election message it answers and begins another election (if needed) If a new process starts to coordinate (highest UID), it sends a coordinator message and “bullies” the current coordinator out

Example (1) Process 4 holds an election. (b) Processes 5 and 6 respond, telling 4 to stop. (c) Now 5 and 6 each hold an election. (d) Process 6 tells 5 to stop. (e) Process 6 wins and tells everyone.

Properties of the Bully Algorithm Liveness guaranteed because of timeouts Correctness clear if group of processes is stable (no new processes) not guaranteed if new process declares itself as the leader during election (e.g., old leader is restarted) two processes may declare themselves as leaders at the same time but no guarantee can be given on the order of delivery of those messages

Quantitative Properties Best case: process with 2nd highest ID detects failure Worst case: process with lowest ID detects failure Bandwidth: N - 1 messages in best case O(N2) in worst case Turnaround: 1 message in best case 3 messages in worst case What if we only notify the process with the highest ID (and probe downwards)?

Comparison

Election without UIDs (Itai/Rodeh) Assumptions N processes, unidirectional ring processes do not have UIDs Election each process selects ID at random from set {1,…,K} non-unique! but fast processes pass all IDs around the ring after one round, if there exists a unique ID then elect maximum unique ID otherwise, repeat Liveness? Probabilistically!

Election without UIDs (cntd) How many rounds does it take? the larger the probability of a unique ID, the faster the algorithm expected time: N=4, K=16, expected 1.01 rounds

Learned today Clock synchronization Leader Election Synchronization with reference server Relative ordering: Lamport’s logical clocks Leader Election Bully algorithm Ring-based election