Presentation on theme: "Paxos and Zookeeper Roy Campbell. Motivation Centralized service:- Coordination kernel Maintains – configuration information, – naming, – distributed."— Presentation transcript:
Paxos and Zookeeper Roy Campbell
Motivation Centralized service:- Coordination kernel Maintains – configuration information, – naming, – distributed synchronization, – group services. Avoids Synchronization and Races File-system based API – Manipulates small data nodes: znodes – State is a hierarchy of znodes
Visualizing Paxos 3 The proposer requests that the Paxos system accept some command. Paxos is like a postal system It thinks about the letter for a while (replicating the data and picking a delivery order) Once these are decided the learners can execute the command R1R2R3 learners proposer coordinator Acceptor
The Client issues a request to the distributed system, and waits for a response. For instance, a write request on a file in a distributed file server. The Acceptors act as the fault-tolerant "memory" of the protocol. Acceptors are collected into groups called Quorums. Any message sent to an Acceptor must be sent to a Quorum of Acceptors. Any message received from an Acceptor is ignored unless a copy is received from each Acceptor in a Quorum. Overview of roles of processes
Paxos Assumptions Processors operate at arbitrary speed. Processors may experience failures. Processors with stable storage may re-join the protocol after failures – Using crash-recovery fault tolerance Processors do not collude, lie, or otherwise attempt to subvert the protocol. – i.e. Byzantine failures don't occur. See Byzantine Paxos for a solution that tolerates failures from arbitrary/malicious behavior of the processes. In general, a consensus algorithm can make progress using 2F+1 processors despite the simultaneous failure of any F processors.
Paxos Network Processors can send messages to any other processor. Messages are sent asynchronously and may take arbitrarily long to deliver. Messages may be lost, reordered, or duplicated. Messages are delivered without corruption. – i.e. Byzantine network failures don't occur. See Byzantine Paxos for a solution.
Number of Processors In general, a consensus algorithm can make progress using 2F+1 processors despite the simultaneous failure of any F processors. However, using reconfiguration, a protocol may be employed which survives any number of total failures as long as no more than F fail simultaneously.
A Proposer advocates a client request, attempting to convince the Acceptors to agree on it, and Learners act as the replication factor for the protocol. Once a Client request has been agreed on by the Acceptors, the Learner may take action (i.e.: execute the request and send a response to the client). To improve availability of processing, additional Learners can be added. Paxos requires a distinguished Proposer (called the leader) to make progress. Many processes may believe they are leaders, but the protocol only guarantees progress if one of them is eventually chosen. If two processes believe they are leaders, they may stall the protocol by continuously proposing conflicting updates. However, the safety properties are still preserved on that case. Overview of roles of processes
Proposal Number & Agreed Value Each attempt to define an agreed value v is performed with proposals which may or may not be accepted by Acceptors. Each proposal is uniquely numbered for a given Proposer.
Basic Paxos Each instance of the Basic Paxos protocol decides on a single output value. The protocol proceeds over several rounds. A successful round has two phases: 1.Prepare-Promise 2.Accept Request - Accepted
ClientProposerAcceptorLearner Do(X) Request
Prepare Promise Prepare: 1.A Proposer (the leader) creates a proposal identified with a number N. 2.This number must be greater than any previous proposal number used by this Proposer. 3.Then, it sends a Prepare message containing this proposal to a Quorum of Acceptors.
Prepare-Promise Promise 1.If the proposal's number N is higher than any previous proposal number received from any Proposer by the Acceptor, then the Acceptor must return a promise to ignore all future proposals having a number less than N. If the Acceptor accepted a proposal at some point in the past, it must include the previous proposal number and previous value in its response to the Proposer. 2.Otherwise, the Acceptor can ignore the received proposal. It does not have to answer in this case for Paxos to work. However, for the sake of optimization, sending a denial (Nack) response would tell the Proposer that it can stop its attempt to create consensus with proposal N.
Accept Request 1.If a Proposer receives enough promises from a Quorum of Acceptors, it needs to set a value to its proposal. 2.If any Acceptors had previously accepted any proposal, then they'll have sent their values to the Proposer, who now must set the value of its proposal to the value associated with the highest proposal number reported by the Acceptors. 3.If none of the Acceptors had accepted a proposal up to this point, then the Proposer may choose any value for its proposal. 4.The Proposer sends an Accept Request message to a Quorum of Acceptors with the chosen value for its proposal.
Accepted If an Acceptor receives an Accept Request message for a proposal N, it must accept it if and only if it has not already promised to only consider proposals having an identifier greater than N. In this case, it should register the corresponding value v and send an Accepted message to the Proposer and every Learner. Else, it can ignore the Accept Request. Rounds fail when multiple Proposers send conflicting Prepare messages, or when the Proposer does not receive a Quorum of responses (Promise or Accepted). In these cases, another round must be started with a higher proposal number. Notice that when Acceptors accept a request, they also acknowledge the leadership of the Proposer. Hence, Paxos can be used to select a leader in a cluster of nodes.
A Paxos for every occasion Multi Paxos – avoid Prepare and Promise Cheap Paxos – tolerate F failures with F+1 processors and F auxiliary Fast Paxos – reduces end to end messages Generalized Paxos – exploits communitivity Byzantine Paxos
What is ZooKeeper? A highly available, scalable, distributed, configuration, consensus, group membership, leader election, naming, and coordination service Difficult to implement these kinds of services reliably – brittle in the presence of change – difficult to manage – different implementations lead to management complexity when the applications are deployed
Zookeeper Properties File API without partial reads/writes – Simple wait free data objects organized hierarchically as in le systems. Per Client guarantee of FIFO execution of requests Linearizability for all requests that change the Zookeeper state Built using ZAB, a totally ordered broadcast protocol (based on Paxos) 2F+1 servers can tolerate f crash failures
Any Guarantees? 1.Clients will never detect old data. 2.Clients will get notified of a change to data they are watching within a bounded period of time. 3.All requests from a client will be processed in order. 4.All results received by a client will be consistent with results received by all other clients.
ZooKeeper Servers 1)All servers store a copy of the data on disk 2)A leader is elected at startup 3)Followers service clients, all updates go through leader 4)Update responses are sent when a majority of servers have persisted the change
ZooKeeper Service All servers store a copy of the data, logs, snapshots on disk and use an in memory database A leader is elected at startup Followers service clients, all updates go through leader Update responses are sent when a majority of servers have persisted the change ZooKeeper Service Server Leader Client
Protocol Guarantees 1) Sequential Consistency - Updates from a client will be applied in the order that they were sent. 2) Atomicity - Updates either succeed or fail. No partial results. 3) Single System Image - A client will see the same view of the service regardless of the server that it connects to. 4) Reliability - Once an update has been applied, it will persist from that time forward until a client overwrites the update. 5) Timeliness - The clients view of the system is guaranteed to be up-to-date within a certain bound. Either system changes will be seen by a client within this bound, or the client will detect a service outage.
ZAB algorithm Zookeeper is based on the ZAB algorithm – ZAB: Zookeeper Atomic Broadcast Consists of two modes – Recovery When the service starts or after a leader failure. Ends when a leader emerges and a quorum of servers have synchronized their state with the leader – Broadcast The leader is the server that executes a broadcast by initiating the broadcast protocol Once a leader has synchronized with a quorum of followers, it begins to broadcast messages.
ZAB broadcast – The leader broadcasts a proposal for a message to be delivered. – Before proposing a message the leader assigns a monotonically increasing unique id, called the zxid. Because Zab preserves causal ordering, the delivered messages will also be ordered by their zxids. – Broadcasting consists of putting the proposal with the message attached into the outgoing queue for each follower – When a follower receives a proposal, it writes it to disk, and sends an acknowledgement to the leader as soon as the proposal is on the disk media. – When a leader receives ACKs from a quorum, the leader will broadcast a COMMIT and deliver the message locally. Followers deliver the message when they receive the COMMIT from the leader.
ZAB Leader Election 1)UDP based 2)Server with the highest logged transaction gets nominated 3)Election doesn't have to be absolutely correct, just very likely correct
ZAB Leader Election 1) Each server initially nominates itself 2) Servers poll each other to get their votes 3) and vote for the one with the highest zxid if there isn't a winner lastZxid: 22 vote: 3 voteZxid: 22 lastZxid: 22 vote: 3 voteZxid: 22 lastZxid: 22 vote: 3 voteZxid: 22 lastZxid: 22 vote: 3 voteZxid: 22 lastZxid: 23 vote: 3 voteZxid: 23 lastZxid: 23 vote: 3 voteZxid: 23 lastZxid: 21 vote: 3 voteZxid: 21 lastZxid: 21 vote: 3 voteZxid: 21 lastZxid: 21 vote: 3 voteZxid: 21 lastZxid: 21 vote: 3 voteZxid: 21
Difference Paxos Tolerates message losses and reordering Quorums If proposer believes it is a leader, it uses a higher number tom take over leadershop from another leader ZAB Uses TCP No Quorums needed New leader cannot take over leadership until all of the followers agree on the leader
Paxos references Schneider, Fred (1990). "Implementing Fault- Tolerant Services Using the State Machine Approach: A Tutorial". ACM Computing Surveys 22: 299. The Part-Time Parliament, Leslie Lamport, us/um/people/lamport/pubs/lamport-paxos.pdf us/um/people/lamport/pubs/lamport-paxos.pdf Paxos Made Simple, Leslie Lamport, us/um/people/lamport/pubs/paxos-simple.pdf