# Distributed Systems Overview Ali Ghodsi

## Presentation on theme: "Distributed Systems Overview Ali Ghodsi"— Presentation transcript:

Distributed Systems Overview Ali Ghodsi alig@cs.berkeley.edu

Replicated State Machine (RSM) Distributed Systems 101 – Fault-tolerance (partial, byzantine, recovery,...) – Concurrency (ordering, asynchrony, timing,...) Generic solution for distributed systems: Replicated State Machine approach – Represent your system with a deterministic state machine – Replicate the state machine – Feed input to all replicas in the same order

Total Order Reliable Broadcast aka Atomic Broadcast Reliable broadcast – All or none correct nodes get the message (even if src fails) Atomic Broadcast – Reliable broadcast that guarantees: All messages delivered in the same order Replicated state machine trivial with atomic broadcast

Consensus? Consensus problem – All nodes propose a value – All correct nodes must agree on one of the values – Must eventually reach a decision (availability) Atomic Broadcast → Consensus – Broadcast proposal, Decide on first received value Consensus → Atomic Broadcast – Unreliably broadcast message to all – 1 consensus per round: – propose set of messages seen but not delivered – Each round deliver one decided message Atomic Broadcast equivalent to Atomic Broadcast

Consensus impossible No deterministic 1-crash-robust consensus algorithm exists for the asynchronous model 1-crash-robust – Up to one node may crash Asynchronous model – No global clock – No bounded message delay Life after impossibility of consensus? What to do?

Solving Consensus with Failure Detectors Black box that tells us if a node has failed Perfect failure detector – Completeness It will eventually tell us if a node has failed – Accuracy (no lying) It will never tell us a node has failed if it hasn’t Perfect FD → Consensus x i = input for r:=1 to N do if r=p then forall j do send to j; decide x i if collect from r then x i = x´; end decide x i

Solving Consensus Consensus → Perfect FD? – No. Don’t know if a node actually failed or not! What’s the weakest FD to solve consensus? – Least assumptions on top of asynchronous model!

Enter Omega Leader Election – Eventually every correct node trusts some correct node – Eventually no two correct nodes trust different correct nodes Failure detection and leader election are the same – Failure detection captures failure behavior detect failed nodes – Leader election also captures failure behavior Detect correct nodes (a single & same for all) Formally, leader election is an FD – Always suspects all nodes except one (leader) – Ensures some properties regarding that node

Weakest Failure Detector for Consensus Omega the weakest failure detector for consensus – How to prove it? – Easy to implement in practice

High Level View of Paxos Elect a single proposer using Ω – Proposer imposes its proposal to everyone – Everyone decides – Done! Problem with Ω – Several nodes might initially be proposers (contention) Solution is abortable consensus – Proposer attempts to enforce decision – Might abort if there is contention (safety) – Ω ensures eventually 1 proposer succeeds (liveness) 10

Replicated State Machine Paxos approach (Lamport) – Client sends input to leader Paxos – Leader executes Paxos instance to agree on command – Well-understood, many papers, optimizations View-stamp approach (Liskov) – Have one leader that writes commands to a quorum (no Paxos) – When failures happen, use Paxos to agree – Less understood (Mazieres tutorial)

Paxos Siblings Cheap Paxos (LM’04) – Fewer messages – Directly contact a quorum (e.g. 3 nodes out of 5) – If fail to get response from 3, expand to 5 Fast Paxos (L’06) – Reduce from 3 delays to 2 delays (delays ~ delays) – Clients optimistically write to a quorum – Requires recovery

Paxos Siblings Gaios/SMARTER (Bolosky’11) – Make logging to disk efficient for crash-recovery – Uses pipelining and batching Generalized Paxos (LM’05) – Commutative operations for repl. state machine

Atomic Commit – Commit IFF no failures and everyone votes commit – Else Abort Consensus on Transaction Commit (LG’04) – One Paxos instance for every TM – Only commit if every instance said Commit

Reconfigurable Paxos Change the set of nodes – Replace failed nodes – Add/remove new nodes (change size of quorum) Lamport’s idea – Part of the state of state-machine: set of nodes SMART (Eurosys’06) – Many problems (e.g. {A,B,C}->{A,B,D} and A fails) – Basic idea, run multiple Paxos instances side by side