Causal Consistency Without Dependency Check Messages Willy Zwaenepoel.

Slides:

Advertisements

Similar presentations

Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky David G. Andersen * Princeton, Intel Labs, CMU Dont Settle for Eventual : Scalable Causal Consistency.

Advertisements

Chapter 12 Message Ordering. Causal Ordering A single message should not be overtaken by a sequence of messages Stronger than FIFO Example of FIFO but.

Consistency Guarantees and Snapshot isolation Marcos Aguilera, Mahesh Balakrishnan, Rama Kotla, Vijayan Prabhakaran, Doug Terry MSR Silicon Valley.

Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.

Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.

Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.

Author: Yang Zhang[SOSP’ 13] Presentator: Jianxiong Gao.

Time and Clock Primary standard = rotation of earth De facto primary standard = atomic clock (1 atomic second = 9,192,631,770 orbital transitions of Cesium.

A Collaborative Monitoring Mechanism for Making a Multitenant Platform Accoutable HotCloud 10 By Xuanran Zong.

Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.

“Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System ” Distributed Systems Κωνσταντακοπούλου Τζένη.

Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database.

Replication and Consistency CS-4513 D-term Replication and Consistency CS-4513 Distributed Computing Systems (Slides include materials from Operating.

G Robert Grimm New York University Bayou: A Weakly Connected Replicated Storage System.

Overview  Strong consistency  Traditional approach  Proposed approach  Implementation  Experiments 2.

Versioning and Eventual Consistency COS 461: Computer Networks Spring 2011 Mike Freedman 1.

GentleRain: Cheap and Scalable Causal Consistency with Physical Clocks Jiaqing Du | Calin Iorgulescu | Amitabha Roy | Willy Zwaenepoel École polytechnique.

Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks Jiaqing Du, Daniele Sciascia, Sameh Elnikety.

Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.

State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.

70-294: MCSE Guide to Microsoft Windows Server 2003 Active Directory, Enhanced Chapter 7: Active Directory Replication.

1 The Google File System Reporter: You-Wei Zhang.

Orbe: Scalable Causal Consistency Using Dependency Matrices & Physical Clocks Jiaqing Du, EPFL Sameh Elnikety, Microsoft Research Amitabha Roy, EPFL Willy.

TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.

Distributed File Systems

A Survey of Rollback-Recovery Protocols in Message-Passing Systems.

Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.

Eiger: Stronger Semantics for Low-Latency Geo-Replicated Storage Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡ * Princeton,

CSE 486/586 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.

Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.

Time This powerpoint presentation has been adapted from: 1) sApr20.ppt.

D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.

Physical clock synchronization Question 1. Why is physical clock synchronization important? Question 2. With the price of atomic clocks or GPS coming down,

Building Dependable Distributed Systems, Copyright Wenbing Zhao

Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.

Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.

 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.

Hwajung Lee. Primary standard = rotation of earth De facto primary standard = atomic clock (1 atomic second = 9,192,631,770 orbital transitions of Cesium.

CIS 825 Review session. P1: Assume that processes are arranged in a ring topology. Consider the following modification of the Lamport’s mutual exclusion.

6.2 Logical Clocks Kranthi Koya09/23/2015. Overview Introduction Lamport’s Logical Clocks Vector Clocks Research Areas Conclusion.

Minimizing Commit Latency of Transactions in Geo-Replicated Data Stores Paper Authors: Faisal Nawab, Vaibhav Arora, Divyakant Argrawal, Amr El Abbadi University.

EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Proof of liveness: an example

Nomadic File Systems Uri Moszkowicz 05/02/02.

Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li

Consistency and Replication

Clock-SI: Snapshot Isolation for Partitioned Data Stores

Replication and Consistency

I Can’t Believe It’s Not Causal

EECS 498 Introduction to Distributed Systems Fall 2017

Time and Clock.

湖南大学-信息科学与工程学院-计算机与科学系

Consistent Data Access From Data Centers

Replication and Consistency

EECS 498 Introduction to Distributed Systems Fall 2017

Distributed Systems CS

Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.

Time and Clock.

EECS 498 Introduction to Distributed Systems Fall 2017

Replication Improves reliability Improves availability

EECS 498 Introduction to Distributed Systems Fall 2017

Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)

Scalable Causal Consistency

Exercises for Chapter 16: Distributed Shared Memory

Distributed Systems CS

CSE 486/586 Distributed Systems Consistency --- 3

Replication and Consistency

Outline Introduction Background Distributed DBMS Architecture

IS 651: Distributed Systems HW3

Presentation transcript:

Causal Consistency Without Dependency Check Messages Willy Zwaenepoel

INTRODUCTION

Geo-replicated data stores Geo-replicated data centers Full replication between data centers Data in each data center partitioned Data center

The Consistency Dilemma Strong Consistency synchronous replication all replicas share same consistent view sacrifice availability Causal Consistency asynchronous replication all replicas eventually converge sacrifice consistency, but … … replication respects causality Eventual Consistency asynchronous replication all replicas eventually converge sacrifice consistency

The cost of causal consistency

Can we close the throughput gap? The answer is: yes, but there is a price

STATE OF THE ART: WHY THE GAP?

How is causality enforced? Each update has associated dependencies What are dependencies? – metadata to establish causality relations between operations – used only for data replication Internal dependencies – previous updates of the same client session External dependencies – read updates of other client sessions

Internal dependencies W(y = 2) Alice W(x = 1) Bob US Datacenter Europe Datacenter R(y) y = 0 R(y) y = 2 Example of 2 users performing operations at different datacenters at the same partition

External dependencies R(x) Alice W(x = 1) Bob US Datacenter Europe Datacenter x = 1 R(y) y = 0 R(y) y = 2 W(y = x + 1) Charlie Example of 3 users performing operations at datacenters at the same partition

How dependencies are tracked & checked In current implementations – COPS [SOSP ’11], ChainReaction [Eurosys ‘13], Eiger [NSDI ’13], Orbe [SOCC ’13] DepCheck(A) – “Do you have A installed yet?” Partition 0Partition 1 Partition N … Partition 0Partition 1 … US Datacenter Europe Datacenter Client Read(A) Read(B) Write(C, A+B) DepCheck(B) DepCheck(A) Partition N

Encoding of dependencies COPS [SOSP ’11], ChainR. [Eurosys ‘13], Eiger [NSDI ’13] – “direct” dependencies – Worst case: O( reads before a write ) Orbe [SOCC ‘13] – Dependency matrix – Worst case: O( partitions )

The main issues Metadata size is considerable – for both storage and communucation Remote dependency checks are expensive – multiple partitions are queried for each update

The cost of causal consistency

The cost of dependency check messages

CAUSAL CONSISTENCY WITH 0/1 DEPENDENCY CHECK MESSAGES

Getting rid of external dependencies Partitions serve only fully replicated updates – Replication Confirmation messages broadcast periodically External dependencies are removed – replication information implies dependency installation Internal dependencies are minimized – we only track the previous write – requires at most one remote check zero if write is local, one if it is remote

The new replication workflow Example of 2 users performing operations at different datacenters at the same partition Alice W(x = 1) Bob R(x) US Datacenter Europe Datacenter Asia Datacenter Replication Confirmation (periodically) R(x) x = 0 x = 1

Reading your own writes Clients need not wait for the replication confirmation – they can see their own updates immediately – other clients’ updates are visible once they are fully replicated Multiple logical update spaces Global update space (fully visible) Replication update space (not yet visible) Alice’s update space (visible to Alice) Alice’s update space (visible to Alice) Bob’s update space (visible to Bob) Bob’s update space (visible to Bob) … …

The cost of causal consistency

The price paid: update visibility increased With new implementation ~ max ( network latency from origin to furthest replica + network latency from furthest replica to destination + interval of replication information broadcast ) With conventional implementation: ~ network latency from origin to destination

CAUSAL CONSISTENCY WITHOUT DEPENDENCY CHECK MESSAGES ?!

Is it possible? Only make update visible when one can locally determine no causally preceding update will become visible later at another partition

How to do that? Encode causality by means of Lamport clock Each partition maintains its Lamport clock Each update is timestamped with Lamport clock Update visible – update.timestamp ≤ min( Lamport clocks ) Periodically compute minimum

0-msg causal consistency - throughput

The price paid: update visibility increased With new implementation ~ max ( network latency from origin to furthest replica + network latency from furthest replica to destination + interval of minimum computation ) With conventional implementation: ~ network latency from origin to destination

How to deal with “stagnant” Lamport clock? Lamport clock stagnates if no update in a partition Combine – Lamport clock – Loosely synchronized physical clock – (easy to do)

More on loosely synchronized physical clocks Periodically broadcast clock Reduces update visibility latency to – Network latency from furthest replica to destination + maximum clock skew + clock broadcast interval

Can we close the throughput gap? The answer is: yes, but there is a price The price is increased update visibility

MethodThroughput# of dep.check messages Update visibility Conventional< Evt. consistencyO( Rs since W ) O( partitions ) DDDD 01-msg~ Evt. consistencyO or 1~ 2 D max 0-msg~ Evt. consistency0~ 2 D max 0-msg + physical clock~ Evt. consistency0~ D max Conclusion: Throughput, messages, latency