Introduction Many distributed systems require that participants agree on something On changes to important data On the status of a computation On what.

Commitment and Mutual Exclusion CS 188 Distributed Systems February 18, 2015

Introduction Many distributed systems require that participants agree on something On changes to important data On the status of a computation On what to do next Reaching agreement in a general distributed system is challenging

Commitment Reaching agreement in a distributed system is extremely important Usually impossible to control a system’s behavior without agreement One approach to agreement is to get all participants to prepare to agree Then, once prepared, to take the action

Challenges to Commitment
There are challenges to ensuring that commitment occurs Different nodes’ actions aren’t synchronous Communication only via messages Other actions can intervene Failures can occur

For Example, An optimistically replicated file system like Ficus
We want to be able to add replicas of a volume Which is a lot easier to do if all nodes hosting existing replicas agree

But we need a version vector element for the new replica
The Scenario A B C 1 2 4 3 3 7 5 3 7 5 3 7 5 3 7 5 But we need a version vector element for the new replica D I want a replica, too!

So What’s the Problem? A and C don’t know about the new replica
But they can learn about it as soon as they contact B So why is there any difficulty?

One Problem A B C D E 1 2 4 3 4 Now for some updates!
7 5 3 7 5 3 7 5 Now for some updates! D E Different updates . . . Same version vector . . . 5 7 3 1 5 7 3 5 7 3 1 5 7 3

And It Can Be a Lot Worse What if replicas are being added and dropped frequently? How will we keep track of which ones are live and which ones are which? It can get very confusing

But That’s Not What I Want To Do, Anyway
A common answer from system designers They don’t care about the odd corner cases They don’t expect them to happen So why pay a lot to handle them right? Sometimes a reasonable answer . . .

Why You Should Care If you allow a system to behave a certain way
Even if you don’t think it ever will And your system is widely deployed and used Sooner or later that improbable thing will happen And who knows what happens next?

The Basic Solution Use a commitment protocol
To ensure that all participating nodes understand what’s happening And agree to it Handles issues of concurrency and failures

Transactions A mechanism to achieve commitment By ensuring atomicity
Also consistency, isolation, and durability Very important in database community Set of asynchronous request/reply communications Either all of set are complete or none

Transactions and ACID Properties
ACID - Atomicity, Consistency, Isolation, and Durability Atomicity - all happen or none Consistency - Outcome equivalent to some serial ordering of actions Isolation - Partial results are invisible outside the transaction Durability - Committed transactions survive crashes and other failures

Achieving the ACID Properties
In distributed environment, use two-phase commit protocol A unanimous voting protocol Do something if all participants agree it should be done Essentially, hold on to results of a transaction until all participants agree

Basics of Two-Phase Commit
Run at the end of all application actions in a transaction Must end in commit or abort decision Must work despite delays and failures Require access to stable storage Usually started by a coordinator But coordinator has no more power than any other participant

The Two Phases Phase one: prepare to commit
All participants are informed that they should get ready to commit All agree to do so Phase two: commitment Actually commit all effects of the transaction

Outline of Two-Phase Commit Protocol
1. Coordinator writes prepare to his local stable log 2. Coordinator sends prepare message to all other participants 3. Each participant either prepares or aborts, writing choice to its local log 4. Each participant sends his choice to the coordinator

The Two-Phase Commit Protocol, continued
5. The coordinator collects all votes 6. If all participants vote to commit, coordinator writes commit to its log 7. If any participant votes to abort, coordinator writes abort to its log 8. Coordinator sends his decision to all others

The Two-Phase Commit Protocol, concluded
9. If other participants receive a commit message, write commit to log and release transaction resources 10. If other participants receive an abort message, write abort to log and release transaction resources 11. Return acknowledgement to coordinator

A Two-Phase Commit Example
Node 4 Node 1 coordinator committed commit prepared prepare prepare commit commit prepare All voted yes! committed prepared committed prepared prepare commit commit prepare Node 2 Node 3

What About the Abort Case?
Same as commit, except not everyone voted yes Instead of committing, send aborts And abort locally at coordinator On receipt of an abort message, undo everything

Overheads of Two-Phase Commit
For n participants, 4*(n-1) messages Each participant (except coordinator) gets a prepare and a commit message Each participant (except coordinator) sends a prepared and a committed message Can optimize committed messages away With potential cost of serious latencies in clearing log records

Two-Phase Commit and Failures
Two-phase commit behaves well in the face of all single node failures May not be able to commit But will cleanly commit or abort And, if anyone commits, eventually everyone will Assumes fail-stop failures

Some Failure Examples: Example 1
Node 4 Node 1 Failure of coordinator after prepare sent; not all participants get prepare prepare Nodes 2, 3, 4 consult on timeout and abort Node 2 Node 3

Node 4 Node 1 Failure of other participant before it replied to prepare abort prepare prepare prepared Node 1 never got a response from node 4 prepare Node 2 Node 3

Node 4 Node 1 Failure of other participant after replying prepared prepare commit Query commit status Commit commit All voted yes! committed commited commit Node 4 consults its log and notices it was prepared Node 1 never got the committed message from node 4 What happens if node 4 recovers? commit Node 2 Node 3

Handling Failures Non-failed nodes still recover if some participants failed The coordinator can determine what other nodes did Did we commit or did we not? If the coordinator failed, a new coordinator can be elected And can determine state of commit Except . . .

An Issue With Two-Phase Commit
What if both the coordinator and another node fail? During the commit phase Two possibilities The other failed node committed The other failed node did not commit

Possibility 1 Node 4 Node 1 Node 2 Node 3 commit commit prepare

Possibility 2 Node 4 Node 1 prepare prepare commit Node 2 Node 3

What Do the Other Nodes Do?
Here’s what they see, in both cases: Node 4 Node 3 prepare Node 1 Node 2 prepare commit Node 1 Node 2 prepare commit But what happened at the failed nodes? This? Or this?

Why Does It Matter? Well, why?
Consider, for each case, what would have happened if node 2 hadn’t failed

Handling the Problem Go to three phases instead of two
Third phase provides the necessary information to distinguish the cases So if this two node failure occurs, other nodes can tell what happened

Three Phase Commit Coordinator Participant(s) send canCommit
receive canCommit wait nak timeout OK no send ack all ack wait abort abort timeout abort send startCommit receive startCommit nak timeout prep send ack all ack prep send Commit timeout receive Commit confirm send ack Commit

Why Three Phases? First phase tells everyone a commit is in progress
Second phase ensures that everyone knows that everyone else was told No chance that only some were told Third phase actually performs the commit Three phases ensures that failures of coordinator plus another participant is non-ambiguous

How Does This Work? So it’s safe to commit and nodes 3 and 4
These status records tell us more than the prepare record did Node 4 So it’s safe to commit and nodes 3 and 4 startCommit Node 2 ACKed the canCommit message Node 1 knew all participants did a canCommit startCommit Node 3

Overhead of Three Phase Commit
For n participants, 6*(n-1) messages Each participant (except coordinator) gets a canCommit, startCommit, and a doCommit message Each participant (except coordinator) ACKed each of those messages Again, the final ACK can be optimized away But coordinator can’t delete record till it knows of all ACKs

Distributed Mutual Exclusion
Another common problem in synchronizing distributed systems One-way communications can use simple synchronization Built into the paradigm Or handled at the shared server More general communications require more complex synchronization To ensure multiple simultaneously running processes interact properly

Synchronization and Mutual Exclusion
Mutual exclusion ensures that only one of a set of participants uses a resource At any given moment In certain cases, that’s all the synchronization required In other cases, more synchronization can be built on top of mutual exclusion

The Basic Mutual Exclusion Problem
n independent participants are sharing a resource In distributed case, each participant on a different node At any moment, only one participant can use the resource Must avoid deadlock, ensure fairness, and use few resources

Mutual Exclusion Approaches
Contention-based Controlled

Contention-Based Mutual Exclusion
Each process freely and equally competes for the resource Some algorithm used to evaluate request resolution criterion Timestamps, priorities, and voting are ways to resolve conflicting requests Problem assumes everyone cooperates and follows the rules

Timestamp Schemes Whoever asked first should get the resource
Runs into obvious problems of distributed clocks Usually handled with logical clocks, not physical clocks

Lamport’s Mutual Exclusion Algorithm
Uses Lamport clocks With total order Assumes N processes Any pair can communicate directly Assumes reliable, in-order delivery of messages Though arbitrary message delays allowable

Outline of Lamport’s Algorithm
Each process keeps a queue of requests When process wants the resource, it adds request to local queue, in order Sends REQUEST to all other processes All other processes send REPLY msgs When done with resource, process sends RELEASE msg to all others Lamport timestamps on all msgs

When Does Someone Get the Resource?
A requesting process gets the resource when: 1) It has received replies from all other processes 2) Its request is at the top of its queue 3) A RELEASE message was received

Lamport’s Algorithm At Work
B 11 12 A 13 10 14 11 B B 10 B 11 RELEASE A 10 B 11 A 10 B receives the resource B requests the resource C 10 D 10 A 10 A 10

Dealing With Multiple Requests
C 11 B 15 16 14 A 10 B 11 C A 10 B 11 B 10 A 10 A 10 A releases the resource B receives the resource B requests the resource B and C send messages C 11 C 10 D 10 A 10 A 10 C requests the resource

Complexity of Lamport Algorithm
For N participants, 3*(N-1) per completion of critical section Requester sends N-1 REQUEST messages N-1 other processes each REPLY When requester relinquishes critical section, sends N-1 RELEASE messages

A Problem With Lamport Algorithm
One slow/failed process can cripple anyone getting the resource Since no process can claim the resource unless it knows all other processes have seen its request

Voting Schemes Processes vote on who should get the shared resource next Can work even if one process fails Or even if a minority of processes fail Variants can allow weighted voting

Basics of Voting Algorithms
Process needing shared resource sends a REQUEST to all other processes Each process receiving a request checks if it has already voted for someone else If not, it votes for the requester By replying

Obtaining the Shared Resource In Voting Schemes
When a requester gets replies from a majority of voters, it gets the section Since any voting process only replies to one requester, only one requester can get a majority When done with resource, send RELEASE message to all who voted for this process

Avoiding Deadlock If more than two processes request resource, sometimes no one wins Effectively a deadlock condition Can be fixed by allowing processes to change their votes Requires permission from the process that originally got the vote

Complexity of Voting Schemes for Mutual Exclusion
for reasons similar to Lamport discussion Use of quorums can reduce to O(SQRT(N))

Token Based Mutual Exclusion
Maintain a token shared by all processes needing the resource Current holder of the token has access to resource To gain access to resource, must obtain token

Obtaining the Token Typically done by asking for it through some topology of the processes Ring Tree Broadcast

Ring Topologies for Tokens
The token circulates along a pre-defined logical ring of processes As token arrives, if local process wants the resource the token is held Once finished, the token is passed on Good for high loads, high overhead for low loads

A Token Ring

Tree Topologies Only pass token when needed
Use a tree structure to pass requests from requesting process to current token holder When token passed, re-arrange the tree to put new token holder at root

Broadcast Topologies When a process wants the token, it sends a request to all other processes If current token holder isn’t using it, it sends the token to requester If the token is in use, its holder adds the request to the queue Use timestamp scheme to order the queue

A Common Problem With Token Schemes
What happens if the token-holder fails? Could keep token in stable storage But still unavailable until token-holder recovers Could create new token Must be careful not to end up with two tokens, though Typically by running voting algorithm

Introduction Many distributed systems require that participants agree on something On changes to important data On the status of a computation On what.

Similar presentations

Presentation on theme: "Introduction Many distributed systems require that participants agree on something On changes to important data On the status of a computation On what."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction Many distributed systems require that participants agree on something On changes to important data On the status of a computation On what.

Similar presentations

Presentation on theme: "Introduction Many distributed systems require that participants agree on something On changes to important data On the status of a computation On what."— Presentation transcript:

Similar presentations

About project

Feedback