Eventual Consistency Jinyang. Review: Sequential consistency Sequential consistency properties: –All read/write ops follow some total ordering –Read must.

Slides:



Advertisements
Similar presentations
Eventual Consistency Jinyang. Sequential consistency Sequential consistency properties: –Latest read must see latest write Handles caching –All writes.
Advertisements

Accountable systems or how to catch a liar? Jinyang Li (with slides from authors of SUNDR and PeerReview)
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
Replication Management. Motivations for Replication Performance enhancement Increased availability Fault tolerance.
L-11 Consistency 1. Important Lessons Replication  good for performance/reliability  Key challenge  keeping replicas up-to-date Wide range of consistency.
Distributed Systems 2006 Styles of Client/Server Computing.
Computer Science Lecture 16, page 1 CS677: Distributed OS Last Class: Web Caching Use web caching as an illustrative example Distribution protocols –Invalidate.
CS 582 / CMPE 481 Distributed Systems
“Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System ” Distributed Systems Κωνσταντακοπούλου Τζένη.
Crash recovery All-or-nothing atomicity & logging.
Flexible Update Propagation for Weakly Consistent Replication Karin Petersen, Mike K. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Mutual Consistency Detection of mutual inconsistency in distributed systems (Parker, Popek, et. al.) Distributed system with replication for reliability.
G Robert Grimm New York University Bayou: A Weakly Connected Replicated Storage System.
Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer.
Mobility Presented by: Mohamed Elhawary. Mobility Distributed file systems increase availability Remote failures may cause serious troubles Server replication.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “DATABASE RECOVERY” (PART – 1) Academic Year 2014 Spring.
6.4 Data and File Replication Gang Shen. Why replicate  Performance  Reliability  Resource sharing  Network resource saving.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Distributed Deadlocks and Transaction Recovery.
Mobility in Distributed Computing With Special Emphasis on Data Mobility.
Replication and Consistency. References r The Case for Non-transparent Replication: Examples from Bayou Douglas B. Terry, Karin Petersen, Mike J. Spreitzer,
CS162 Section Lecture 10 Slides based from Lecture and
Feb 7, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements.
CS Storage Systems Lecture 14 Consistency and Availability Tradeoffs.
Version Control with Subversion Quick Reference of Subversion.
Bayou. References r The Case for Non-transparent Replication: Examples from Bayou Douglas B. Terry, Karin Petersen, Mike J. Spreitzer, and Marvin M. Theimer.
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
Replication for Mobile Computing Prasun Dewan Department of Computer Science University of North Carolina
SOEN 6011 Software Engineering Processes Section SS Fall 2007 Dr Greg Butler
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
Concurrency Server accesses data on behalf of client – series of operations is a transaction – transactions are atomic Several clients may invoke transactions.
CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel,
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
CS425 / CSE424 / ECE428 — Distributed Systems — Fall 2011 Some material derived from slides by Prashant Shenoy (Umass) & courses.washington.edu/css434/students/Coda.ppt.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Feb 1, 2001CSCI {4,6}900: Ubiquitous Computing1 Eager Replication and mobile nodes Read on disconnected clients may give stale data Eager replication prohibits.
Distributed File Systems
1 Multiversion Reconciliation for Mobile Databases Shirish Hemanath Phatak & B.R.Badrinath Presented By Presented By Md. Abdur Rahman Md. Abdur Rahman.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
Information/File Access and Sharing Coda: A Case Study J. Kistler, M. Satyanarayanan. Disconnected operation in the Coda File System. ACM Transaction on.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
Write Conflicts in Optimistic Replication Problem: replicas may accept conflicting writes. How to detect/resolve the conflicts? client B client A replica.
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
Bayou: Replication with Weak Inter-Node Connectivity Brad Karp UCL Computer Science CS GZ03 / th November, 2007.
Consistency Guarantees Prasun Dewan Department of Computer Science University of North Carolina
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
THE EVOLUTION OF CODA M. Satyanarayanan Carnegie-Mellon University.
Mobility Victoria Krafft CS /25/05. General Idea People and their machines move around Machines want to share data Networks and machines fail Network.
Mobile File Systems.
Nomadic File Systems Uri Moszkowicz 05/02/02.
Chapter 25: Advanced Data Types and New Applications
Lecturer : Dr. Pavle Mogin
6.4 Data and File Replication
Example Replicated File Systems
EECS 498 Introduction to Distributed Systems Fall 2017
IS 651: Distributed Systems Consistency
Fault-tolerance techniques RSM, Paxos
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Outline The Case for Non-transparent Replication: Examples from Bayou Douglas B. Terry, Karin Petersen, Mike J. Spreitzer, and Marvin M. Theimer. IEEE.
Transaction management
Replica Placement Model: We consider objects (and don’t worry whether they contain just data or code, or both) Distinguish different processes: A process.
Last Class: Web Caching
EEC 688/788 Secure and Dependable Computing
CSE 486/586 Distributed Systems Consistency --- 3
Presentation transcript:

Eventual Consistency Jinyang

Review: Sequential consistency Sequential consistency properties: –All read/write ops follow some total ordering –Read must see result of latest write Realizing sequential consistency: –Reads/writes from a single node execute one at a time –All reads/writes to address X must be ordered by one memory/storage module responsible for X

Realizing sequential consistency W(A)1 W(A)2 Cache or replica Cache Or replica W(B)3 Invalidate, R(B)

Disadvantages of sequential consistency Requires highly available connections –Lots of chatter between clients/servers Not suitable for certain scenarios: –Disconnected clients (e.g. your laptop) –Apps might prefer potential inconsistency to loss of availability

Why (not) eventual consistency? Support disconnected operations –Better to read a stale value than nothing –Better to save writes somewhere than nothing 8Potentially anomalous application behavior –Stale reads and conflicting writes…

Sequential vs. eventual consistency Sequential: pessimistic conflict handling –Updates cannot take effect unless they are serialized first Eventual: optimistic conflict handling –Let updates happen, worry about whether they can be serialized later

Operating w/o total connectivity replica Client writes to its local replica W(A)1 W(A)2 Sync w/ server resolves non-conflicting changes, reports conflicting ones to user No sync between clients

Pair-wise synchronization replica W(A)1 W(A)2 W(B)3 Pair-wise sync resolves non-conflicting changes, reports conflicting ones to users

Examples usages? File synchronizers –One user, many gadgets

File synchronizer Goal 1.All replica contents eventually become identical 2.No lost updates –Do not replace new version with old ones

Prevent lost updates Detect if updates were sequential –If so, replace old version with new one –If not, detect conflict

How to prevent lost updates? Strawman: use mtime to decide which version should replace the other Problem? H1 H2 W(f)a mtime: W(f)c f W(f)b f f 15648

Strawman fix Carry the entire modification history If history X is a prefix of Y, Y is newer H1 W(f)a W(f)b W(f)c H1:15648 H1:16679 H1:15648 H2:23657

Compress version history H1 W(f)a W(f)b W(f)c H1:1 H1:2 H1:1 H1:2 H2:1 H1:1 H1:2 H1:2 implies H1:1, so we only need one number per host H1:1H1:2 H1:1H1:2 H2:1 H2

Compare vector timestamp H1:1 H2:3 H3:2 H1:1 H2:5 H3:7 H1:1 H2:3 H3:2 H1:2 H2:1 H3:7 < <

Using vector timestamp H1 W(f)a W(f)b W(f)c H1:1H1:2 H1:1 H2:1 H1:2 H2:1 H2

Using vector timestamp H1 W(f)a W(f)b W(f)c H1:1H1:2 H1:1 H2:1 H1:1 H2:1 H2

How to deal w/ conflicts? Easy: mailboxes w/ two different set of messages Medium: changes to different lines of a C source file Hard: changes to same line of a C source file After conflict resolution, what should the vector timestamp be?

What about file deletion? Can we forget about the vector timestamp for deleted files? Simple solution: treat deletion as a write –Conflicts involving a deleted file is easy Downside: –Need to remember vector timestamp for deleted files indefinitely

Tra [Cox, Josephson] What are Tra’s novel properties? –Easy to compress storage of vector timestamps –No need to check every file’s version vector during sync –Allows partial sync of subtrees –No need to keep timestamp for deleted files forever

Tra’s key technique Two vector timestamps: 1.One represents modification time –Tracks what a host has 2.One represents synchronization time –Tracks what a host knows Sync time implies no modification happens since mod time H1:1 H2:5 H3:7 H1:10 H2:20 H3:25

f1 f2 H1:0 H2:0 H1:0 H2:0 Using sync time H1 W(f1)a W(f2)b H1:1 H2:0 H2 H1:2 H2:0 f1 f2 H1:1 H1:2 H2:0 H1:2 H2:0 f2

Compress mtime and synctime dir synctime = element-wise min of child sync times dir mtime = element-wise max of child mod times Sync(d1  d1’) –Skip d1 if mtime of d1 is less than synctime of d1’ Can we achieve this with single mtime? –Skip d1 if mtime of d1 is less than mtime of d1’

Synctime enables partial synchronization Directory d1 contains f1 and f2, suppose host sync a subtree (d1/f1) –With synctime+mtime: synctime of d1 does not change. Mtime of d1 increases –With mtime-only: Mtime of d1 increases Host later syncs subtree d1/f2 –With synctime+mtime: will pull in modifications in f2 because synctime of d1 is smaller –With mtime-only: skips d1 because mtime is high enough

f2 H1:0 H2:0 Using sync time H1 W(f1)a W(f2)b H1:1 H2 H1:2f1f2 H1:2 H2:0 d Sync f1 only f1 H1:0 H2:0 H1:2 H1:0 H2:0 d f1 H1:1 H1:2 H2:0 H1:2 H1:0 H2:0 d Sync f2 only f1 H1:1 H1:2 H2:0 d f2 H1:2

f2 H1:0 How to deal w/ deletion H1 W(f1)a D(f2) H1:1 H2 f1f2 H1:2 H2:0 d f1 H1:0 H2:0 d H1:2 H2:0 Deletion notice for a deleted file contains its sync time f1 H1:1 H1:2 H2:0 d

f2 How to deal w/ deletion H1 W(f1)a D(f2) H1:1 H2 f1f2 H1:2 H2:0 d f1 H1:0 H2:1 d H1:2 H2:0 Deletion notice for a deleted file contains its sync time H2:1 f1 H1:1 H1:2 H2:1 d f2

Another definition of eventual consistency Eventual consistency (Tra) –All replica contents are eventually identical –Do not care about individual writes, just overwrite old replica w/ new one Eventual consistency (Bayou) –Writes are eventually applied in some total order –Reads might not see most recent writes in total order

Bayou Version Vector Write log N0:0 N1:0 N2:0 N0:0 N1:0 N2:0 N0:0 N1:0 N2:0 N0 N1 N2

Bayou propagation Version Vector Write log N0:3 N1:0 N2:0 N0 N1 N2 0:N0 W(x) 1:N0 W(y) 2:N0 W(z) N0:0 N1:1 N2:0 N0:0 N1:0 N2:0 0:N1 W(x) 0:N0 W(x) 1:N0 W(y) 2:N0 W(z) N0:3 N1:0 N2:0

Bayou propagation Version Vector Write log N0:3 N1:0 N2:0 N0 N1 N2 0:N0 W(x) 1:N0 W(y) 2:N0 W(z) N0:3 N1:4 N2:0 0:0 1:0 2:0 0:N0 W(x) 0:N1 W(x) 1:N0 W(y) 2:N0 W(z) 0:N1 W(x) N0:3 N1:4 N2:0

Bayou propagation Version Vector Write log N0 N1 N2 N0:3 N1:4 N2:0 N0:0 N1:0 N2:0 0:N0 W(x) 0:N1 W(x) 1:N0 W(y) 2:N0 W(z) N0:4 N1:4 N2:0 0:N0 W(x) 0:N1 W(x) 1:N0 W(y) 2:N0 W(z) Which portion of The log is stable?

Bayou propagation Version Vector Write log N0 N1 N2 N0:3 N1:4 N2:0 0:N0 W(x) 0:N1 W(x) 1:N0 W(y) 2:N0 W(z) N0:4 N1:4 N2:0 0:N0 W(x) 0:N1 W(x) 1:N0 W(y) 2:N0 W(z) N0:3 N1:4 N2:4 0:N0 W(x) 0:N1 W(x) 1:N0 W(y) 2:N0 W(z)

Bayou propagation Version Vector Write log N0 N1 N2 N0:3 N1:4 N2:4 0:N0 W(x) 0:N1 W(x) 1:N0 W(y) 2:N0 W(z) N0:4 N1:4 N2:0 0:N0 W(x) 0:N1 W(x) 1:N0 W(y) 2:N0 W(z) N0:3 N1:4 N2:4 0:N0 W(x) 0:N1 W(x) 1:N0 W(y) 2:N0 W(z) N0:3 N1:4 N2:4

Bayou uses a primary to commit a total order Why is it important to make log stable? –Stable writes can be committed –Stable portion of the log can be truncated Problem: If any node is offline, the stable portion of all logs stops growing Bayou’s solution: –A designated primary defines a total commit order –Primary assigns CSNs (commit-seq-no) –Any write with a known CSN is stable –All stable writes are ordered before tentative writes

Bayou propagation Version Vector Write log N0:3 N1:0 N2:0 N0 N1 N2 0:0:N0 W(x) 1:1:N0 W(y) 2:2:N0 W(z) N0:0 N1:1 N2:0 N0:0 N1:0 N2:0 ∞:0:N1 W(x) N0:0 N1:1 N2:0

Bayou propagation Version Vector Write log N0:4 N1:1 N2:0 N0 N1 N2 0:0:N0 W(x) 1:1:N0 W(y) 2:2:N0 W(z) N0:0 N1:1 N2:0 N0:0 N1:0 N2:0 ∞:0:N1 W(x) 3:0:N1 W(x) 0:0:N0 W(x) 1:1:N0 W(y) 2:2:N0 W(z) 3:3:N1 W(x) N0:4 N1:1 N2:0

Bayou’s limitations Primary cannot fail Server creation & retirement makes nodeID grow arbitrarily long Anomalous behaviors for apps? –Calendar app