Leader Election Using NewSQL Database Systems

Slides:



Advertisements
Similar presentations
Paxos and Zookeeper Roy Campbell.
Advertisements

Wait-free coordination for Internet-scale systems
A General Characterization of Indulgence R. Guerraoui EPFL joint work with N. Lynch (MIT)
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Mecanismos de alta disponibilidad con Microsoft SQL Server 2008 Por: ISC Lenin López Fernández de Lara.
1 Chapter 3. Synchronization. STEMPusan National University STEM-PNU 2 Synchronization in Distributed Systems Synchronization in a single machine Same.
Module 20 Troubleshooting Common SQL Server 2008 R2 Administrative Issues.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Transaction Processing Lecture ACID 2 phase commit.
Transaction Management and Concurrency Control
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
Distributed Systems CS Case Study: Replication in Google Chubby Recitation 5, Oct 06, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
6.4 Data and File Replication Gang Shen. Why replicate  Performance  Reliability  Resource sharing  Network resource saving.
Synchronization Methods for Multicore Programming Brendan Lynch.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Module 12: Designing High Availability in Windows Server ® 2008.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
Bringing Paxos Consensus in Multi-agent Systems Andrei Mocanu Costin Bădică University of Craiova.
Unit 9 Transaction Processing. Key Concepts Distributed databases and DDBMS Distributed database advantages. Distributed database disadvantages Using.
June 6, 2007TeraGrid '071 Clustering the Reliable File Transfer Service Jim Basney and Patrick Duda NCSA, University of Illinois This material is based.
Toward Fault-tolerant P2P Systems: Constructing a Stable Virtual Peer from Multiple Unstable Peers Kota Abe, Tatsuya Ueda (Presenter), Masanori Shikano,
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved.
XA Transactions.
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
© Spinnaker Labs, Inc. Chubby. © Spinnaker Labs, Inc. What is it? A coarse-grained lock service –Other distributed systems can use this to synchronize.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
State Machine Replication State Machine Replication through transparent distributed protocols State Machine Replication through a shared log.
Zookeeper Wait-Free Coordination for Internet-Scale Systems.
ZOOKEEPER. CONTENTS ZooKeeper Overview ZooKeeper Basics ZooKeeper Architecture Getting Started with ZooKeeper.
Distributed Transactions What is a transaction? (A sequence of server operations that must be carried out atomically ) ACID properties - what are these.
Detour: Distributed Systems Techniques
Specifying and reasoning about network protocols
Scaling HDFS to more than 1 million operations per second with HopsFS
High Availability 24 hours a day, 7 days a week, 365 days a year…
MDCC: Multi-data Center Consistency
CS 440 Database Management Systems
Distributed Systems – Paxos
Alternative system models
Cassandra Transaction Processing
CPS 512 midterm exam #1, 10/7/2016 Your name please: ___________________ NetID:___________ /60 /40 /10.
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
6.4 Data and File Replication
Lecture 17: Leader Election
Plethora: Infrastructure and System Design
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
EECS 498 Introduction to Distributed Systems Fall 2017
Implementing Consistency -- Paxos
آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95
CS 440 Database Management Systems
Distributed Systems, Consensus and Replicated State Machines
COT 5611 Operating Systems Design Principles Spring 2014
Reliable Distributed Systems
Active replication for fault tolerance
Fault-tolerance techniques RSM, Paxos
Assignment 8 - Solution Problem 1 - We replicate database DB1.
EEC 688/788 Secure and Dependable Computing
Wait-free coordination for Internet-scale systems
Distributed Transactions
Lecture 21: Replication Control
Salman Niazi1, Mahmoud Ismail1,
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Lecture 21: Replication Control
Implementing Consistency -- Paxos
Distributed systems Consensus
Sisi Duan Assistant Professor Information Systems
Presentation transcript:

Leader Election Using NewSQL Database Systems Salman Niazi, Mahmoud Ismail, Gautier Berthou and Jim Dowling

Content Problem Solution Evaluation

Leader Election

Leader Election Synchronous Systems Asynchronous Systems Eventually synchronous Systems

Leader Election (Eventually Synchronous System)

Problem Multiple leaders conflicting decisions data corruption all hell can break loose

Unique Leader Election Essentially an agreement problem Paxos Hard to understand Does not perform well for hundreds of servers Total order atomic broadcast Implementation ?

Leader Election Out of the box solutions Problems Zookeeper, Chubby Another service to maintain

A Typical Internet Application Coordination Service Service A Instance 1 Service D Instance 2 Service B Service C HA Database (NewSQL DBs) Leader Election Service

Thats not new? Shared memory based LE Using 2PC Transaction Guerraoui, R., Raynal, M.: A Leader Election Protocol for Eventually Synchronous Shared Memory Systems, pp. 75–80. IEEE Computer Society, Alamitos (2006) Fernandez, A., Jimenez, E., Raynal, M.: Electing an eventual leader in an asynchronous shared memory system. In: Dependable Systems and Networks, DSN 2007, pp. 399–408 (June 2007) Using 2PC Transaction No existing work using 2PC Transaction Some work using compare & swap primitives Afek, Y., Stupp, G.: Optima Time-Space Tradeoff for Shared Memory Leader Election. Journal Algorithms 25(1): 95-117 (1997) Serializable Transaction Isolation

Why NewSQL DB? Relational Databases NewSQL Failures are considered to be rare DB is unavailable until standby takes over NewSQL Are built to handle frequent node failures There is no pause in DB service if a datanode fails When a datanode fails the transactions can be quickly re-tried on other datanodes.

Problems with NewSQL Many of the NewSQL DBs does not support Serializable Transaction Poor scalability of serializable transactions especially in distributed environment

Contribution Scalable leader election using NewSQL as shared memory Majority of process uses weaker Tx isolation level than serializable Tx isolation level Serialize only if needed -- > Greater Scalability Combining 2PC and lease mechanism to ensure single leader at any given time Transaction isolation using row level locking Portable to many NewSQL Systems

Solution Consists of two registers Runs in rounds In each round Vars, Descriptors Runs in rounds In each round Start Tx Read all descriptors and variables Save to local history Update counter if smallest Id become leader, kick out dead processes and acquire a lease Commit Tx

Solution Vars Reg Descriptors Reg MaxId: 3, RD: 2000ms, Evict Flag P0 ( Counter: 10, IP: … ) P1 ( Counter: 11, IP: … ) P2 ( Counter: 10, IP: … ) P3 ( Counter: 12, IP: … )

Solution ( Periodic Counter Update) Vars Reg Descriptors Reg MaxId: 3, RD: 2000ms, Evict Flag P0 ( Counter: 11, IP: … ) P1 ( Counter: 12, IP: … ) P2 ( Counter: 11, IP: … ) P3 ( Counter: 13, IP: … )

Solution (Join) Vars Reg Descriptors Reg MaxId: 4, RD: 2000ms, Evict Flag P0 ( Counter: 11, IP: … ) P1 ( Counter: 12, IP: … ) P2 ( Counter: 11, IP: … ) P3 ( Counter: 13, IP: … ) P4 ( Counter: 1, IP: … )

Solution (Non - Leader Failure) Vars Reg Descriptors Reg MaxId: 4, RD: 2000ms, Evict Flag P0 ( Counter: 11, IP: … ) P1 ( Counter: 12, IP: … ) P2 ( Counter: 11, IP: … ) P3 ( Counter: 13, IP: … ) P4 ( Counter: 1, IP: … )

Solution (Non - Leader Failure) Vars Reg Descriptors Reg MaxId: 4, RD: 2000ms, Evict Flag P0 ( Counter: 12, IP: … ) P1 ( Counter: 12, IP: … ) P2 ( Counter: 12, IP: … ) P3 ( Counter: 14, IP: … ) P4 ( Counter: 2, IP: … )

Solution (Non - Leader Failure) Vars Reg Descriptors Reg MaxId: 4, RD: 2000ms, Evict Flag P0 ( Counter: 13, IP: … ) P1 ( Counter: 12, IP: … ) P2 ( Counter: 13, IP: … ) P3 ( Counter: 15, IP: … ) P4 ( Counter: 3, IP: … )

Solution (Non - Leader Failure) Vars Reg Descriptors Reg MaxId: 4, RD: 2000ms, Evict Flag P0 ( Counter: 14, IP: … ) P2 ( Counter: 14, IP: … ) P3 ( Counter: 16, IP: … ) P4 ( Counter: 4, IP: … )

Solution (Leader Failure) Vars Reg Descriptors Reg MaxId: 4, RD: 2000ms, Evict Flag P0 ( Counter: 14, IP: … ) P2 ( Counter: 14, IP: … ) P3 ( Counter: 16, IP: … ) P4 ( Counter: 4, IP: … ) 3.5 Sec left Leader Process P0

Solution (Leader Failure) Vars Reg Descriptors Reg MaxId: 4, RD: 2000ms, Evict Flag P0 ( Counter: 14, IP: … ) P2 ( Counter: 14, IP: … ) P3 ( Counter: 16, IP: … ) P4 ( Counter: 4, IP: … ) 3.5 Sec left Leader Process P0

Solution (Leader Failure) Vars Reg Descriptors Reg MaxId: 4, RD: 2000ms, Evict Flag P0 ( Counter: 14, IP: … ) P2 ( Counter: 15, IP: … ) P3 ( Counter: 17, IP: … ) P4 ( Counter: 5, IP: … ) 1.5 Sec left Leader Process P0

Solution (Leader Failure) Vars Reg Descriptors Reg MaxId: 4, RD: 2000ms, Evict Flag P0 ( Counter: 14, IP: … ) P2 ( Counter: 16, IP: … ) P3 ( Counter: 18, IP: … ) P4 ( Counter: 6, IP: … ) lease expired Leader Process: No Leader

Solution (Leader Failure) Vars Reg Descriptors Reg MaxId: 4, RD: 2000ms, Evict Flag P2 ( Counter: 17, IP: … ) P3 ( Counter: 19, IP: … ) P4 ( Counter: 7, IP: … ) Leader Process P2

Solution (Re-Join) Vars Reg Descriptors Reg MaxId: 5, RD: 2000ms, Evict Flag P2 ( Counter: 17, IP: … ) P3 ( Counter: 19, IP: … ) P4 ( Counter: 7, IP: … ) P5 ( Counter: 1, IP: … )

Transaction Isolation Two groups of processes Group A Process that only update their counters Majority of the processes Group B Leader Process Process contending to become leader New Processes Relatively very few processes No Serialization Needed Serialize All Transactions

How Transactions are Isolated Group A ( No Serialization Required) Group B ( Serialization Required) Vars Register Vars Register

Experiments NewSQL Setup ZooKeeper Setup Clients Network 6 Node MySQL Cluster 6-core AMD Opteron 2.6 GHz, 32GB RAM ZooKeeper Setup 3 Node Quorum Clients 12-core Intel Xeon 2.8 GHz, 40 GB RAM Network 1 Gbit Switch, 0.2 ms pings

Experiments Start N processes Kill Leader, and start a new process Measure time taken to elect new leader Go to 2.

Evaluation ( Fail over time )

Evaluation ( Counter update duration )

Recent Related Work Microsoft’s Project Orleans: Distributed Virtual Actors for Programmability and Scalability. Uses Azure Table service for Membership Mgm http://research.microsoft.com/en-US/people/philbe/disckeyotephilbefinal.pdf Beast Master: Coordination Server built on top of FoundationDB Status: Under Development https://news.ycombinator.com/item?id=6366665

Questions

LE Properties Integrity: there should never be more than one leader in the system. Termination: a correct process eventually becomes a leader. Termination: all invocations of the primitive getLeader() invoked by a correct process should return the leader’s id

Integrity there should never be more than one leader in the system.

MySQL Cluster Sample HA Setup