ZooKeeper: A Distributed Coordination Service for Distributed Applications
Motivation Large-scale distributed application require different forms of coordination: Configuration Group membership and leader election Synchronization Configuration: a list of operational parameters for the system processes Group membership: often processes need to know which other processes are alive and what those processes are in charge of
Related Works Amazon Simple Queue Service-queuing [25]-leader election [27]-configuration Chubby[6]-locking service with strong synchronization
Zookeeper Zookeeper is a distributed service for distributed applications. It support: Synchronization Configuration management Naming service
Why Do We Need Zookeeper Zookeeper is simple Zookeeper is replicated Zookeeper is ordered Zookeeper is fast
Data Model Regular Znode Ephemeral Znode Sequential flag
Watches Zookeeper Client1 Client4 create/exist/(WATCH) NOTIFICATION setData Client1 Client4
Client API create(path, data, flags) delete(path, version) exists(path, watch) getData(path, watch) setData(path, data, version) getChildren(path, watch) sync(path)
Zookeeper Service Architecture Read request is handled by local server Write request is sent to the leader, the leader broadcasts the change to the Zookeeper through Zab an atomic broadcast protocol.
Setup Zookeeper Download: http://www.apache.org/dyn/closer.cgi/zookeeper/ Configure Zookeeper: Standalone Mode Replicated Mode tickTime: the basic time unit in milliseconds used by ZooKeeper. It is used to do heartbeats and the minimum session timeout will be twice the tickTime. dataDir: the location to store the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database. clientPort: the port to listen for client connections
Standalone Mode create file zoo.cfg with the content: Start server: bin/zkServer.sh start Test with zookeeper client: bin/zkCli.sh -server 127.0.0.1:2181 tickTime=2000 dataDir=/var/lib/zookeeper clientPort=2181
Standalone Mode (2) ls / get Set …
Setup Zookeeper: Replicated mode Every server has the same configuration file. Create file named myid In the datadir directory. The content of myid file is an unique number. tickTime=2000 dataDir=/home/sdn/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.1=192.168.0.94:2888:3888 server.2=192.168.0.59:2888:3888 … server.n=192.168.0.59:2888:3888
Use cases Naming service Configuration management Synchronization Message Queue Notification system
Synchronization: Simple Lock Client has smallest number have permission to access locked object When the client finishes work with object, child node is deleted. Another client has smallest number have permission to access locked object AppRoot P(n) P(n+1) P(n+2) P(n+3) Zookeeper N=create(P, EPHEMERAL|SEQUENTIAL) N=create(P, EPHEMERAL|SEQUENTIAL) N=create(P, EPHEMERAL|SEQUENTIAL) N=create(P, EPHEMERAL|SEQUENTIAL) Client1 Client4 Client2 Client3
Synchronization: Simple Lock Check existing of approot and create it. Create child node with Sequential and Ephemeral flag And receive a number. When receive the notification, check whether the number is smallest of child nodes number
Synchronization: Barrier public class SimpleLock implements Watcher ZooKeeper zooKeeper = new ZooKeeper("192.168.0.94:2181", 3000, this); Stat res = zooKeeper.exists(root, true); if(res==null) String abc = zooKeeper.create(root, null, Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); String createRes = zooKeeper.create(root+"/", null, Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL); int number = Integer.parseInt(createRes.substring(root.length()+1)); while (true) { synchronized (root) { //TODO check whether the number is smallest if(smallest) {dost();}else{root.wait();} }} public void process(WatchedEvent event) { //Watcher event synchronized (root) {root.notify();} } private void dost(){ System.out.println("Access System.out at "+System.currentTimeMillis()); Thread.sleep(5000); }
Synchronization: Simple Lock public static void main(String[] args){ new SimpleLock(); } /simpleLock/0000000009:9 Access System.out at 1416028020742 /simpleLock/0000000010:10 Access System.out at 1416028025765 /simpleLock/0000000011:11 Access System.out at 1416028030779 /simpleLock/0000000012:12 Access System.out at 1416028035806
Synchronization: Barrier Every client creates child node of approot. Whenever the number of child nodes is enough, client will start work. AppRoot P(n) P(n+1) P(n+2) P(n+3) Zookeeper N=create(P, EPHEMERAL|SEQUENTIAL) N=create(P, EPHEMERAL|SEQUENTIAL) N=create(P, EPHEMERAL|SEQUENTIAL) N=create(P, EPHEMERAL|SEQUENTIAL) Client1 Client4 Client2 Client3
Synchronization: Barrier Check existing of approot and create it. Create child node with Sequential and Ephemeral flag And receive a number. When receive the notification, check whether the number is smallest of child nodes number
Synchronization: Barrier public static void main(String[] args){ new Barrier(3).run();} Run 3 instances: /barrier/0000000022 Starting at 1416048008944 /barrier/0000000023 Starting at 1416048008947 /barrier/0000000024 Starting at 1416048008948
N=create(P, SEQUENTIAL) N=create(P, SEQUENTIAL) Message Queue Every client creates child node of approot. Whenever the number of child nodes is enough, client will start work. AppRoot P(n) P(n+1) P(n+2) P(n+3) N=create(P, SEQUENTIAL) sender1 get(smallestP) Zookeeper receiver N=create(P, SEQUENTIAL) sender2
Message Queue public class MessageQueue implements Watcher public static class Sender extends MessageQueue implements Runnable public static class Receiver extends MessageQueue implements Runnable public void sendMessage() { //Watcher event String sendMessage = "sendMessage at "+System.currentTimeMillis(); zooKeeper.create(root+"/", sendMessage.getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT_SEQUENTIAL); } while (true) { synchronized (root) { List<String> childs = zooKeeper.getChildren(root, true); if(childs.size()==0){root.wait();}else{ //TODO sort the list for (String child : childs) { byte[] data = zooKeeper.getData(root+"/"+child, false, new Stat()); System.out.println("readMessage:"+new String(data)); zooKeeper.delete(root+"/"+child, 0); } } }} public void process(WatchedEvent event) { //Watcher event synchronized (root) {root.notify();} }
Message Queue: Run 2 sender instances and 1 receiver instance: readMessage:sendMessage at 1416066118908 readMessage:sendMessage at 1416066118916 readMessage:sendMessage at 1416066118926 readMessage:sendMessage at 1416066118929 readMessage:sendMessage at 1416066118939 readMessage:sendMessage at 1416066118942 readMessage:sendMessage at 1416066118952 readMessage:sendMessage at 1416066118954 readMessage:sendMessage at 1416066118964 readMessage:sendMessage at 1416066118967 readMessage:sendMessage at 1416066118976 readMessage:sendMessage at 1416066118984 readMessage:sendMessage at 1416066118989 readMessage:sendMessage at 1416066118997 readMessage:sendMessage at 1416066119001 readMessage:sendMessage at 1416066119009 readMessage:sendMessage at 1416066119016