CMPT 431 Lecture IX: Coordination And Agreement. 2 CMPT 431 © A. Fedorova A Replicated Service client servers network client master slave W W WR R W write.

Slides:



Advertisements
Similar presentations
Consensus on Transaction Commit
Advertisements

1 Concurrency: Deadlock and Starvation Chapter 6.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
RXQ Customer Enrollment Using a Registration Agent (RA) Process Flow Diagram (Move-In) Customer Supplier Customer authorizes Enrollment ( )
1 Hyades Command Routing Message flow and data translation.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination. Introduction to the Business.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Shameless Statements about Replication Rachid Guerraoui School of Computer and Communication Sciences, EPFL Joint ruminations with Eli Gafni (UCLA-MSR)
PP Test Review Sections 6-1 to 6-6
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
Fault Tolerance. Basic System Concept Basic Definitions Failure: deviation of a system from behaviour described in its specification. Error: part of.
Analyzing Genes and Genomes
Essential Cell Biology
PSSA Preparation.
Essential Cell Biology
Dan Deng 10/30/11 Ordering and Consistent Cuts 1.
Energy Generation in Mitochondria and Chlorplasts
Distributed Computing 9. Sorting - a lower bound on bit complexity Shmuel Zaks ©
Impossibility of Consensus in Asynchronous Systems (FLP) Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu.
CS542 Topics in Distributed Systems Diganta Goswami.
1 © R. Guerraoui The Limitations of Registers R. Guerraoui Distributed Programming Laboratory.
CS 542: Topics in Distributed Systems Diganta Goswami.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
NETWORK ALGORITHMS Presenter- Kurchi Subhra Hazra.
Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Failures and Consensus. Coordination If the solution to availability and scalability is to decentralize and replicate functions and data, how do we coordinate.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture IX: Coordination And Agreement.
CS 582 / CMPE 481 Distributed Systems
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
Distributed Systems Fall 2009 Coordination and agreement, Multicast, and Message ordering.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
Composition Model and its code. bound:=bound+1.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
Election Algorithms. Topics r Issues r Detecting Failures r Bully algorithm r Ring algorithm.
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo.
Consensus and Its Impossibility in Asynchronous Systems.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms –Bully algorithm.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Coordination and Agreement. Topics Distributed Mutual Exclusion Leader Election.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion & Leader Election Steve Ko Computer Sciences and Engineering University.
Exercises for Chapter 15: COORDINATION AND AGREEMENT From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 3, © Addison-Wesley.
Lecture 10: Coordination and Agreement (Chap 12) Haibin Zhu, PhD. Assistant Professor Department of Computer Science Nipissing University © 2002.
Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
Lecture 12-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) October 4, 2012 Lecture 12 Mutual Exclusion.
Lecture 7- 1 CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 7 Distributed Mutual Exclusion Section 12.2 Klara Nahrstedt.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
CSE 486/586 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Lecture 11: Coordination and Agreement Central server for mutual exclusion Election – getting a number of processes to agree which is “in charge” CDK4:
Exercises for Chapter 11: COORDINATION AND AGREEMENT
Coordination and Agreement
CSE 486/586 Distributed Systems Leader Election
Agreement Protocols CS60002: Distributed Systems
CSE 486/586 Distributed Systems Leader Election
CSE 486/586 Distributed Systems Mutual Exclusion
Lecture 10: Coordination and Agreement
Lecture 11: Coordination and Agreement
CSE 486/586 Distributed Systems Mutual Exclusion
CSE 486/586 Distributed Systems Leader Election
Presentation transcript:

CMPT 431 Lecture IX: Coordination And Agreement

2 CMPT 431 © A. Fedorova A Replicated Service client servers network client master slave W W WR R W write W data replication R read

3 CMPT 431 © A. Fedorova A Need For Coordination And Agreement client servers network client master slave Must coordinate election of a new master Must agree on a new master

4 CMPT 431 © A. Fedorova Roadmap Today we will discuss protocols for coordination and agreement This is a difficult problem because of failures and lack of bound on message delay We will begin with a strong set of assumptions (assume few failures), and then we will relax those assumptions We will look at several problems requiring communication and agreement: distributed mutual exclusion, election We will finally learn that in an asynchronous distributed system it is impossible to reach a consensus

5 CMPT 431 © A. Fedorova Distributed Mutual Exclusion (DMTX) Similar to a local mutual exclusion problem Processes in a distributed system share a resource Only one process can access a resource at a time Examples: –File sharing –Sharing a bank account –Updating a shared database

6 CMPT 431 © A. Fedorova Assumptions and Requirements A synchronous system Processes do not fail Message delivery is reliable (exactly once) Protocol requirements: Safety: At most one process may execute in the critical section at a time Liveness: Requests to enter and exit the critical section eventually succeed Fairness: Requests to enter the critical section are granted in the order in which they were received

7 CMPT 431 © A. Fedorova Evaluation Criteria of DMTX Algorithms Bandwidth consumed –proportional to the number of messages sent in each entry and exit operation Client delay –delay incurred by a process and each entry and exit operation System throughput –the rate at which processes can access the critical section (number of accesses per unit of time)

8 CMPT 431 © A. Fedorova DMTX Algorithms We will consider the following algorithms: –Central server algorithm –Ring-based algorithm –An algorithm based on voting

9 CMPT 431 © A. Fedorova The Central Server Algorithm

10 CMPT 431 © A. Fedorova The Central Server Algorithm Performance: –Entering a critical section takes two messages (a request message followed by a grant message) –System throughput is limited by the synchronization delay at the server: the time between the release message to the server and the grant message to the next client) Fault tolerance –Does not tolerate failures –What if the client holding the token fails?

11 CMPT 431 © A. Fedorova A Ring-Based Algorithm

12 CMPT 431 © A. Fedorova A Ring-Based Algorithm (cont) Processes are arranged in the ring There is a communication channel from process p i to process (p i +1) mod N They continuously pass the mutual exclusion token around the ring A process that does not need to enter the critical section (CS) passes the token along A process that needs to enter the CS retains the token; once it exits the CS, it keeps on passing the token No fault tolerance Excessive bandwidth consumption

13 CMPT 431 © A. Fedorova Maekawa’s Voting Algorithm To enter a critical section a process must receive a permission from a subset of its peers Processes are organized in voting sets A process is a member of M voting sets All voting sets are of equal size (for fairness)

14 CMPT 431 © A. Fedorova Maekawa’s Voting Algorithm p1 p2 p3 p4 Intersection of voting sets guarantees mutual exclusion To avoid deadlock, requests to enter critical section must be ordered

15 CMPT 431 © A. Fedorova Elections Election algorithms are used when a unique process must be chosen to play a particular role: –Master in a master-slave replication system –Central server in the DMTX protocol We will look at the bully election algorithm The bully algorithm tolerates failstop failures But it works only in a synchronous system with reliable messaging

16 CMPT 431 © A. Fedorova The Bully Election Algorithm All processes are assigned identifiers The system always elects a coordinator with the highest identifier: –Each process must know all processes with higher identifiers than its own Three types of messages: –election – a process begins an election –answer – a process acknowledges the election message –coordinator – an announcement of the identity of the elected process

17 CMPT 431 © A. Fedorova The Bully Election Algorithm (cont.) Initiation of election: –Process p 1 detects that the existing coordinator p 4 has crashed an initiates the election –p 1 sends an election messages to all processes with higher identifier than itself election p1p1 p2p2 p3p3 p4p4

18 CMPT 431 © A. Fedorova The Bully Election Algorithm (cont.) What happens if there are no crashes: –p 2 and p 3 receive the election message from p 1 send back the answer message to p 1, and begin their own elections –p 3 sends answer to p 2 –p 3 receives no answer message from p 4, so after a timeout it elects itself as a leader (knowing it has the highest ID) election p1p1 p2p2 p3p3 p4p4 answer coordinator

19 CMPT 431 © A. Fedorova The Bully Election Algorithm (cont.) What happens if p 3 also crashes after sending the answer message but before sending the coordinator message? In that case, p 2 will time out while waiting for coordinator message and will start a new election election p1p1 p2p2 p3p3 p4p4 answer p2p2

20 CMPT 431 © A. Fedorova The Bully Election Algorithm (summary) The algorithm does not require a central server Does not require knowing identities of all the processes Requires knowing identities of processes with higher IDs Survives crashes Assumes a synchronous system (relies on timeouts)

21 CMPT 431 © A. Fedorova Consensus With General Failures The algorithms we’ve covered so far tolerated only failstop failures Let’s look at reaching consensus in presence of more general failures –Omission –Byzantine

22 CMPT 431 © A. Fedorova Consensus All processes agree on the same value (or set of values) When do you need consensus? –Leader (master) election –Mutual exclusion –Transaction involving multiple parties (banking) We will look at several variants of consensus problem –Consensus –Byzantine generals

23 CMPT 431 © A. Fedorova System Model There is a set of processes P i There is a set of values {v 0, …, v N-1 } proposed by processes Each processes P i decides on d i d i belongs to the set {v 0, …, v N-1 } Assumptions: –Synchronous system (for now) –Failstop failures –Byzantine failures –Reliable channels

24 CMPT 431 © A. Fedorova Consensus Step 1 Propose. P1P1 P2P2 P3P3 v1v1 v3v3 v2v2 Consensus algorithm Step 2 Decide. P1P1 P2P2 P3P3 d1d1 d3d3 d2d2 Courtesy of Jeff Chase, Duke University

25 CMPT 431 © A. Fedorova Consensus (C) P i selects d i from {v 0, …, v N-1 }. All P i select the same v k (make the same decision) d i = v k Courtesy of Jeff Chase, Duke University

26 CMPT 431 © A. Fedorova Conditions for Consensus Termination: All correct processes eventually decide. Agreement: All correct processes select the same d i. Integrity: If all correct processes propose the same v, then d i = v

27 CMPT 431 © A. Fedorova Consensus in a Synchronous System Without Failures Each process p i proposes a decision value v i All proposed v i are sent around, such that each process knows all proposed v i Once all processes receive all proposed v’s, they apply to them the same function, such as: minimum(v 1, v 2, …., v N ) Each process p i sets d i = minimum(v 1, v 2, …., v N ) The consensus is reached What if processes fail? Can other processes still reach an agreement?

28 CMPT 431 © A. Fedorova Consensus in a Synchronous System With Failstop & Omission Failures We assume that at most f out of N processes fail To reach a consensus despite f failures, we must extend the algorithm to take f+1 rounds At round 1: each process p i sends its proposed v i to all other processes and receives v’s from other processes At each subsequent round process p i sends v’s that it has not sent before and receives new v’s The algorithm terminates after f+1 rounds Let’s see why it works…

29 CMPT 431 © A. Fedorova Proof that Consensus is Reached Will prove by contradiction Suppose some correct process p i possesses a value that another correct process p j does not possess This must have happened because some other processes p k sent that value to p i but crashed or before sending it to p j (or lost the message) The crash must have happened in round f+1 (last round). Otherwise, p i would have sent that value to p j in round f+1 But how come p j have not received that value in any of the previous rounds? There must have been a crash at every previous round – some process sent the value to some other processes, but did not send it to p j But this implies that there must have been f+1 failures This is a contradiction: we assumed at most f failures

30 CMPT 431 © A. Fedorova A Take-Away Point If you cannot build a fully failproof algorithm... Build an algorithm that is guaranteed to tolerate some number f of failures Then build a system that has fewer than f failures with high probability

31 CMPT 431 © A. Fedorova Byzantine Generals Problem (BG) Two types of generals: commander and subordinates A commander proposes an action (v i ). Subordinates must agree d i = v leader v leader leader or commander subordinate or lieutenant d j = v leader Courtesy of Jeff Chase, Duke University

32 CMPT 431 © A. Fedorova Conditions for Consensus Termination: All correct processes eventually decide. Agreement: All correct processes select the same d i. Integrity: If the commander is correct than all correct processes decide on the value that the commander proposed

33 CMPT 431 © A. Fedorova Consensus in a Synchronous System With Byzantine Failures Byzantine failure: a process can forward to another process an arbitrary value v Byzantine generals: the commander... –says to one lieutenant that v = A –says to another lieutenant that v = B We will show that consensus is impossible with only 3 generals Pease et. al generalized this to impossibility of consensus with N≤3f faulty generals

34 CMPT 431 © A. Fedorova BG: Impossibility With Three General Scenario 1: p 2 must decide v (by integrity condition) But p 2 cannot distinguish between Scenario 1 and Scenario 2 If it decides to believe the general, it will decide v in Scenario 2 By symmetry, p 3 will decide u in Scenario 2 p 2 and p 3 will have reached different decisions p 1 (Commander) p 2 p 3 1:v 2:1:v 3:1:u p 1 (Commander) p 2 p 3 1:u 1:v 2:1:v 3:1:u Faulty processes are shown shaded “3:1:u” means “3 says 1 says u”. Scenario 1 Scenario 2

35 CMPT 431 © A. Fedorova Solution With Four Byzantine Generals We can reach consensus if there are 4 generals and at most 1 is faulty Intuition: use the majority rule Correct process Who is telling the truth? Majority rules!

36 CMPT 431 © A. Fedorova Solution With Four Byzantine Generals p 1 (Commander) p 2 p 3 1:v 2:1:v 3:1:u Faulty processes are shown shaded p 4 1:v 4:1:v 2:1:v3:1:w 4:1:v p 1 (Commander) p 2 p 3 1:w1:u 2:1:u 3:1:w p 4 1:v 4:1:v 2:1:u3:1:w 4:1:v Round 1: The commander sends v to all other generals Round 2: All generals exchange values that they sent to commander The decision is made based on majority

37 CMPT 431 © A. Fedorova Solution With Four Byzantine Generals p 1 (Commander) p 2 p 3 1:v 2:1:v 3:1:u p 4 1:v 4:1:v 2:1:v3:1:w 4:1:v p 2 receives: {v, v, u}. Decides v p 4 receives: {v, v, w}. Decides v

38 CMPT 431 © A. Fedorova Solution With Four Byzantine Generals p 1 (Commander) p 2 p 3 1:w1:u 2:1:u 3:1:w p 4 1:v 4:1:v 2:1:u3:1:w 4:1:v p 2 receives: {u, w, v}. Decides NULL p 4 receives: {u, v, w}. Decides NULL p 3 receives: {w, u, v}. Decides NULL The result generalizes for system with N ≥ 3f + 1, (N is the number of processes, f is the number of faulty processes)

39 CMPT 431 © A. Fedorova Consensus in an Asynchronous System In the algorithms we’ve looked at consensus has been reached by using several rounds of communication The systems were synchronous, so each round always terminated If a process has not received a message from another process in a given round, it could assume that the process is faulty In an asynchronous system this assumption cannot be made! Fischer-Lynch-Patterson (1985): No consensus can be guaranteed in an asynchronous communication system in the presence of any failures. Intuition: a “failed” process may just be slow, and can rise from the dead at exactly the wrong time.

40 CMPT 431 © A. Fedorova Consensus in Practice Real distributed systems are by and large asynchronous How do they operate if consensus cannot be reached? Assume a synchronous system: use manual fault resolution if something goes wrong Fault masking: assume that failed processes always recover, and define a way to reintegrate them into the group. –If you haven’t heard from a process, just keep waiting… –A round terminates when every expected message is received. Failure detectors: construct a failure detector that can determine if a process has failed. –A round terminates when every expected message is received, or the failure detector reports that its sender has failed.

41 CMPT 431 © A. Fedorova Failure Detectors First problem: how to detect that a member has failed? –pings, timeouts, beacons, heartbeats –recovery notifications Is the failure detector accurate? – Does it accurately detect failures? Is the failure detector live? – Are there bounds on failure detection time? In an asynchronous system, it impossible for a failure detector to be both accurate and live

42 CMPT 431 © A. Fedorova Summary Coordination and agreement are essential in real distributed systems Real distributed systems are asynchronous Consensus cannot be reached in an asynchronous distributed system Nevertheless, people still build useful distributed systems that rely on consensus Fault recovery and masking are used as mechanisms for helping processes reach consensus Popular fault masking and recovery techniques are transactions and replication – the topics of the next few lectures