Centiman: Elastic, High Performance Optimistic Concurrency Control by Watermarking Authors: Bailu Ding, Lucja Kot, Alan Demers, Johannes Gehrke Presenter:

Slides:



Advertisements
Similar presentations
Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Advertisements

Consistency Guarantees and Snapshot isolation Marcos Aguilera, Mahesh Balakrishnan, Rama Kotla, Vijayan Prabhakaran, Doug Terry MSR Silicon Valley.
TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
1 Chapter 3. Synchronization. STEMPusan National University STEM-PNU 2 Synchronization in Distributed Systems Synchronization in a single machine Same.
COS 461 Fall 1997 Transaction Processing u normal systems lose their state when they crash u many applications need better behavior u today’s topic: how.
Database Systems, 8 th Edition Concurrency Control with Time Stamping Methods Assigns global unique time stamp to each transaction Produces explicit.
Concurrency Control Nate Nystrom CS 632 February 6, 2001.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 16 – Intro. to Transactions.
CMPT Dr. Alexandra Fedorova Lecture X: Transactions.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Chapter 8 : Transaction Management. u Function and importance of transactions. u Properties of transactions. u Concurrency Control – Meaning of serializability.
DBMS Functions Data, Storage, Retrieval, and Update
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
Transaction Management WXES 2103 Database. Content What is transaction Transaction properties Transaction management with SQL Transaction log DBMS Transaction.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Distributed Systems Fall 2009 Distributed transactions.
TRANSACTION PROCESSING TECHNIQUES BY SON NGUYEN VIJAY RAO.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “DATABASE RECOVERY” (PART – 1) Academic Year 2014 Spring.
Distributed Deadlocks and Transaction Recovery.
Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,
AN OPTIMISTIC CONCURRENCY CONTROL ALGORITHM FOR MOBILE AD-HOC NETWORK DATABASES Brendan Walker.
Orbe: Scalable Causal Consistency Using Dependency Matrices & Physical Clocks Jiaqing Du, EPFL Sameh Elnikety, Microsoft Research Amitabha Roy, EPFL Willy.
Transaction Management: Concurrency Control CS634 Class 16, Apr 2, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Presented by Dr. Greg Speegle April 12,  Two-phase commit slow relative to local transaction processing  CAP Theorem  Option 1: Reduce availability.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
C-Store: Concurrency Control and Recovery Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun. 5, 2009.
Concurrency Control in Database Operating Systems.
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
Transactions. What is it? Transaction - a logical unit of database processing Motivation - want consistent change of state in data Transactions developed.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Computer Science Lecture 13, page 1 CS677: Distributed OS Last Class: Canonical Problems Election algorithms –Bully algorithm –Ring algorithm Distributed.
XA Transactions.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
1 Multiversion Reconciliation for Mobile Databases Shirish Hemanath Phatak & B.R.Badrinath Presented By Presented By Md. Abdur Rahman Md. Abdur Rahman.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
CAP Theorem Justin DeBrabant CIS Advanced Systems - Fall 2013.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
IM NTU Distributed Information Systems 2004 Distributed Transactions -- 1 Distributed Transactions Yih-Kuen Tsay Dept. of Information Management National.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
10 1 Chapter 10 - A Transaction Management Database Systems: Design, Implementation, and Management, Rob and Coronel.
Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.
Jinze Liu. ACID Atomicity: TX’s are either completely done or not done at all Consistency: TX’s should leave the database in a consistent state Isolation:
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Database Management System Architecture 2004, Spring Pusan National University.
Distributed Transactions What is a transaction? (A sequence of server operations that must be carried out atomically ) ACID properties - what are these.
MDCC: Multi-data Center Consistency
Introduction to NewSQL
Two phase commit.
Outline Announcements Fault Tolerance.
Concurrency Control II (OCC, MVCC)
Chapter 10 Transaction Management and Concurrency Control
Fault-tolerance techniques RSM, Paxos
Replication and Recovery in Distributed Systems
Database Security Transactions
Distributed Transactions
Lecture 21: Replication Control
Atomic Commit and Concurrency Control
Lecture 20: Intro to Transactions & Logging II
UNIVERSITAS GUNADARMA
Lecture 21: Replication Control
Abstractions for Fault Tolerance
Concurrency control (OCC and MVCC)
Outline Introduction Background Distributed DBMS Architecture
Transaction Communication
Transactions, Properties of Transactions
Presentation transcript:

Centiman: Elastic, High Performance Optimistic Concurrency Control by Watermarking Authors: Bailu Ding, Lucja Kot, Alan Demers, Johannes Gehrke Presenter: Monika

Transactions Transaction are a series of operations executed by client which either – commits all its operations at server Commit = reflect updates on server-side objects – Or aborts and has no effect on server 2

Acid Properties for transactions Atomicity Consistency Isolation Durability 3

Concurrent transactions Two approaches to prevent isolation from being violated – Pessimistic Prevent transactions from accessing the same object Eg: Locks – Optimistic Assume nothing bad will happen 4 Update Local X Validate Write X Read X Validate T0 T1 T0 T1 Validator

But.. In a distributed environment: – Clients and/or servers might crash independently – Transactions could run concurrently, i.e., with multiple clients – Transactions may be distributed and replicated, i.e., across multiple servers 5

What is Centiman? Transaction processing system for the cloud Features: – ACID guarantees – Supports transactional guarantees using OCC (Optimistic concurrency control protocol) – Simple to deploy on any key value store – Elastic scaling – Sharded validation – Only two points of synchronization Start of validation Point where processor collects all validators outcome 6

Centiman Architecture Storage Nodes Key Value data Store Validator Processor Validator Global Master Client Storage Nodes Transaction Processing system Issues Txns Gets Respone Responses put(key, val, timestamp) get(key) -> (val, version) Monitors load, scaling, migration, failure recovery Validation Req Read/Write sets Success/Failure Response 7

Validators and sharded validation Validator A Validator B Validator Z Key Set {k1,..,ki,…kj,...,kn} {k1…ki} {ki…kj} {kj…kn} Partitioning Function Processor 8 {RA1, WA1} {RZ1, WZ1}

Validation (OCC) {x,value, 12} {y,value, 13} Timestamp: 16 Read Set Write Set {y,value} {z,value,} T:15{x,w,q,r} T:15{x,w,q,r} T:14 {w,q,r} T:14 {w,q,r} T:13 {w,q,r} T:13 {w,q,r} T:12 {x,w,q,r} Keys {X,W,P,Q,R} Keys {Y,Z,L} T:13{y,z} T:12{y,z,l} RS = {(x,12);} WS = {} RS = {(y,13);} WS = { y,z} 12 < i <16 Read a stale value. T15 has the new value. Hence abort!! 9

Global master Monitors system performance Handles failure recovery of validators Coordinates Elastic scaling 10

Elasticity Processor Validator Processor Odd Validator Odd Validator Even Validator Even Validator Odd Validator Odd Validator Even Validator Even Validator Before MigrationDuring MigrationAfter Migration Odd Even 11

Failure Recovery Two kinds of failure – Processor – Validator (Already discussed) Processor failure – Each node maintains a write ahead log – Logs : Init log entry Write set Decision to commit/abort Completed Asynchronously!! 12

Challenges Read Only transactions: – Writes are not installed to storage atomically – Hence, no optimizations for read only transactions Spurious Aborts – Remember we do nothing on ABORT!! 13

Spurious Aborts T3: R(X), W(X), W(Y) Validator A X Validator A X Validator A X Validator A X T4: R(X), W(X) Validator B Y Validator B Y No conflict. Adds X to write set Conflict Aborts!! T3 > T2; Assumes T4 read a stale value and hence aborts T4 unnecessarily X T2 14

Watermark Abstraction Watermarks are metadata that propagate through the system information about the timestamp of the latest completed transaction. System associates a read on a record with a watermark – get(key) = {Value, version, watermark} Guarantee: all transactions with timestamp i ≤ w have either already installed their writes on r or will never make any writes to r. Version:10 Watermark :15 T 19 T 10 T 9 Read X T 20 Watermark 15 15

Implementing Watermarks Each processor P ∈ Ƥ maintains a local processor watermark Wp. Watermark for the system: min(P ∈ Ƥ Wp). This information about Wp is spread using a gossip protocol (asynchronously). 16

Reducing spurious aborts The write sets of aborted transactions eventually “age out” and fall below the watermark, Spurious aborts will only reduce they need not become zero Version:10 Watermark :15 T 19 T 10 T 9 Read X T 20 Watermark 15 Check > 15 Aged out 17

Bypassing Read Only Transactions If we ensure that transactions read a consistent snapshot of the database, then they do not require validation. Snapshot: “Data store is at snapshot ‘i’ if no version of any record with a timestamp j ≤ i will ever be installed in the future, and no version of any record with a timestamp k > i exists in the data store.” Transaction can read a snapshot i even if data store is not at snapshot i. So, if the intersection of the intervals of all read versions is not null, then the transaction read a consistent snapshot V w During this interval, this version is the most current Read(x) = {x,value,v,w} 18

Evaluation The paper investigates the following questions – How do spurious aborts affect the system with and without watermarks? – How does the system behave when we scale elastically by adding a new validator? – How effective is the local check for read-only transactions? – How well does the system scale and perform under load? System specifications: Amazon EC2 m3.xlarge nodes with 4 vCPUs 15 GiB RAM and High performance network 19 System specifications: 20 Amazon EC2 m3.xlarge nodes with 8 vCPUs High performance network 50 storage nodes, 50 processors

Spurious aborts Truncation time is 10 sec (B1), 20 sec (B2)… As time passes, the WriteSets buffers at the validators become more polluted. The abort rate therefore increases with time until it reaches a plateau. The position of the plateau rises with increased buffer size, due to a larger number of spurious conflicts. Each processor updates its local watermark every 1 (W1), 1K (W1K), 10K (W10K), and 100K (W100K) transactions. Watermark-based approach alleviates the problem, as write sets from aborted transactions “age out” 20

Elastic Scaling Measures the overhead due to the scaling process. Occurs due to duplicate validation requests Global synchronization to finalize the switch to the new partitioning 21

Elastic Scaling (2) Abort rate and Spurious abort rate: Spurious Abort RateAbort Rate V12 : Scaling from one validator to 2 validators. Similarly V23, V34 Scaling begins at the 60 th second and finishes at the 120 th second. Slight increase in abort rate and spurious abort rate after the scaling is complete 22

Local check for read only transactions 23

Synthetic data (Throughput) The throughput of the system increases sublinearly (Network overhead) with the number of validator nodes. For skewed workloads, the throughput is slightly lower as we see more aborts. 24

Synthetic data (Abort rate) The abort rate increases with more validators, mostly due to feeding more concurrent transactions into the system to stress the validators. Workloads with larger transactions have higher abort rates. 25

Realistic Benchmarks (TPC-C) TPC-C Benchmark: High-update workload of medium size transactions 26

Realistic Benchmarks (TAPT) Read-heavy, conflict-rare and key-value store like workload. The local check optimization allows the transactions to by-pass the validation and hence huge change in throughput. 27

My thoughts!! Simple and clear explanation of concept and its correctness Validation is sharded and proceeds in parallel Elastic scaling Number of aborts if the key value store is slow in processing requests. Each processor assigns consecutive integers as timestamps, using processor ID’s as timestamps. No evaluation against other existing distributed transactional systems. 28

References k_socc_2015.pdf Indy’s slides &CFTOKEN=

Thank you !! 30