Replication: optimistic approaches Marc Shapiro with Yasushi Saito (HP Labs) Cambridge Distributed Systems Group.

Slides:



Advertisements
Similar presentations
Eventual Consistency Jinyang. Sequential consistency Sequential consistency properties: –Latest read must see latest write Handles caching –All writes.
Advertisements

Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Telex/IceCube: a distributed memory with tunable consistency Marc Shapiro, Pierre Sutra, Pierpaolo Cincilla INRIA & LIP6, Regal group Nuno Preguic ̧ a.
Replication Management. Motivations for Replication Performance enhancement Increased availability Fault tolerance.
Lecture 11 Recoverability. 2 Serializability identifies schedules that maintain database consistency, assuming no transaction fails. Could also examine.
Conflict-free Replicated Data Types MARC SHAPIRO, NUNO PREGUIÇA, CARLOS BAQUERO AND MAREK ZAWIRSKI Presented by: Ron Zisman.
Giovanni Chierico | May 2012 | Дубна Consistency in a distributed world.
Overview of Mobile Computing (3): File System. File System for Mobile Computing Issues for file system design in wireless and mobile environments Design.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
Distributed Systems Fall 2010 Transactions and concurrency control.
CS 582 / CMPE 481 Distributed Systems
“Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System ” Distributed Systems Κωνσταντακοπούλου Τζένη.
The Atomic Commit Problem. 2 The Problem Reaching a decision in a distributed environment Every participant: has an opinion can veto.
1 ICS 214B: Transaction Processing and Distributed Data Management Replication Techniques.
Department of Electrical Engineering
Transaction Management
Reliability and Partition Types of Failures 1.Node failure 2.Communication line of failure 3.Loss of a message (or transaction) 4.Network partition 5.Any.
Data Sharing in OSD Environment Dingshan He September 30, 2002.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Mutual Consistency Detection of mutual inconsistency in distributed systems (Parker, Popek, et. al.) Distributed system with replication for reliability.
G Robert Grimm New York University Bayou: A Weakly Connected Replicated Storage System.
Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
ICS (062)CC in Adv. DB Applications1 Concurrency Control in Advanced Database Applications Dr. Muhammad Shafique 31 March 2007.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
CS 603 Data Replication February 25, Data Replication: Why? Fault Tolerance –Hot backup –Catastrophic failure Performance –Parallelism –Decreased.
Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
TRANSACTIONS AND CONCURRENCY CONTROL Sadhna Kumari.
Practical Replication. Purposes of Replication Improve Availability Replicated databases can be accessed even if several replicas are unavailable Improve.
Mobility in Distributed Computing With Special Emphasis on Data Mobility.
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
Replication March 16, Replication What is Replication?  A technique for increasing availability, fault tolerance and sometimes, performance 
Replicated Databases. Reading Textbook: Ch.13 Textbook: Ch.13 FarkasCSCE Spring
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
1 Advanced Database Topics Copyright © Ellis Cohen Synchronous Data Replication These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
1 Mobisnap: a database system for mobile computing Nuno Preguiça DI – FCT Universidade Nova de Lisboa Project partially supported by Praxis XXI/FCT/MCT.
CS425 / CSE424 / ECE428 — Distributed Systems — Fall 2011 Some material derived from slides by Prashant Shenoy (Umass) & courses.washington.edu/css434/students/Coda.ppt.
CSE 486/586 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
1 Multiversion Reconciliation for Mobile Databases Shirish Hemanath Phatak & B.R.Badrinath Presented By Presented By Md. Abdur Rahman Md. Abdur Rahman.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Two-Phase Commit Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
Transactions. Transaction: Informal Definition A transaction is a piece of code that accesses a shared database such that each transaction accesses shared.
Write Conflicts in Optimistic Replication Problem: replicas may accept conflicting writes. How to detect/resolve the conflicts? client B client A replica.
Transaction Management Overview. Transactions Concurrent execution of user programs is essential for good DBMS performance. – Because disk accesses are.
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
Bayou: Replication with Weak Inter-Node Connectivity Brad Karp UCL Computer Science CS GZ03 / th November, 2007.
CSE 486/586 Distributed Systems Consistency --- 3
Eventual Consistency Jinyang. Review: Sequential consistency Sequential consistency properties: –All read/write ops follow some total ordering –Read must.
1 Controlled concurrency Now we start looking at what kind of concurrency we should allow We first look at uncontrolled concurrency and see what happens.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
THE EVOLUTION OF CODA M. Satyanarayanan Carnegie-Mellon University.
Mobility Victoria Krafft CS /25/05. General Idea People and their machines move around Machines want to share data Networks and machines fail Network.
CS 347: Parallel and Distributed Data Management Notes07: Data Replication Hector Garcia-Molina CS 347 Notes07.
Nomadic File Systems Uri Moszkowicz 05/02/02.
6.4 Data and File Replication
IS 651: Distributed Systems Consensus
CSE 486/586 Distributed Systems Consistency --- 3
EECS 498 Introduction to Distributed Systems Fall 2017
Atomic Commit and Concurrency Control
Causal Consistency and Two-Phase Commit
Transaction management
Distributed Database Management Systems
CSE 486/586 Distributed Systems Consistency --- 3
Outline Introduction Background Distributed DBMS Architecture
Transactions, Properties of Transactions
Presentation transcript:

Replication: optimistic approaches Marc Shapiro with Yasushi Saito (HP Labs) Cambridge Distributed Systems Group

Bases de Données Avancées Replication: optimistic approaches 1 Motivations for this work Peer-to-peer, decentralised write sharing Lessons and commonalities Understand limitations Different solutions: spectrum or discrete points? Simple formal model

Bases de Données Avancées Replication: optimistic approaches 2 Optimistic replication Replicas of shared objects on sites Without synchronisation: peer-to-peer read and update! Consistency: a posteriori, offline Merge independent updates Applications: high latency networks disconnected operation cooperative work Improves availability & performance

Bases de Données Avancées Replication: optimistic approaches 3 Example: cooperative engineering with CVS CVS: developing shared code Local, disconnected replica: no interference Conflicts: Write same file = syntactic Overlap in file = violates edit semantics Doesn’t compile, test = violates application semantics Both sides of a conflict are excluded Manual repair

Bases de Données Avancées Replication: optimistic approaches 4 Example: Bayou General-purpose database Any replica can update, log actions action = { dependency check, operation, merge-procedure } Optimistic replication: epidemic exchange logs { roll-back, replay }*; commit dep-check: semantic check for conflict merge-proc: semantic repair

Bases de Données Avancées Replication: optimistic approaches 5 Basic vocabulary While isolated: tentative updates When connected, reconcile: Propagate & collect updates (Conceptually) Restart from initial state Replay updates (if possible) Overriding goal: consistency

1. Consistency Study component issues of consistency

Bases de Données Avancées Replication: optimistic approaches 7 What is consistency? Consistent with user intents apply operations according to user scenario Consistent with data invariants dependent actions pre- and post-conditions conflict resolution Replicas consistent with each other converge towards same values

Bases de Données Avancées Replication: optimistic approaches 8 Consistency: problem taxonomy 1. Objects & updates Internal vs. external consistency Value / value log / operation log Single master / multi-master 2. Detecting dependence vs. concurrency 3. Concurrency control 4. Laziness of concurrency control Pessimistic / advanced concurrency / optimistic 5. Convergence

Bases de Données Avancées Replication: optimistic approaches 9 Operation-based reconciliation Updates: concurrent, unsynchronised Local log of actions = operation descriptions object identifier, method, arguments Multi-log collects local + remote logs Reconciliation schedule: merge multi-log & run sequentially Scheduling issues: Include vs. exclude Execution order

Bases de Données Avancées Replication: optimistic approaches 10 Operation-based model

Bases de Données Avancées Replication: optimistic approaches 11 Dependence vs. concurrency Two actions are either have a dependency or commutative / concurrent Dependent actions: do not conflict must be scheduled in dependence order Concurrent actions potentially conflict Dependence / concurrency detection is a fundental mechanism

Bases de Données Avancées Replication: optimistic approaches 12 Concurrency control Concurrent & no conflict  commute: execute both, arbitrary order Conflict detection options Conflict resolution options

Bases de Données Avancées Replication: optimistic approaches 13 Convergence Liveness: sites receive same/all actions Safety: given same actions, sites compute the same value Stability: actions eventually not undone

2. Dependency & Concurrency Mechanisms to detect if actions are dependent or concurrent

Bases de Données Avancées Replication: optimistic approaches 15 Scalar clocks and timestamps Wall clock, Lamport clock Total order Total order, consistent with causal dependence Schedule in timestamp order Can’t detect concurrency

Bases de Données Avancées Replication: optimistic approaches 16 Happens-before e 1 precedes e 2 in process e 1 sends, e 2 receives  e 1  e 2  (e 1  e 2 )   (e 2  e 1 )  e 1 || e 2 e 1 || e 2 : e 1 does not cause e 2 e 1  e 2 : e 1 might cause e 2 Partial order, consistent with causal dependence Schedule consistent with 

Bases de Données Avancées Replication: optimistic approaches 17 Syntactic vs. semantic mechanisms Scalar timestamps no concurrency detection very conservative approx. of causality Vector timestamps detect concurrency conservative approx. of causality Alternative: explicit semantic constraints

Bases de Données Avancées Replication: optimistic approaches 18 Locks as semantic constraints Read(x) depends on previous Write(x) in same process, or previously-received Write(x), whichever is later Write(x) depends on previous Read(*) in same process More semantic information than Happens- Before Step in the right direction, but still too coarse

Bases de Données Avancées Replication: optimistic approaches 19 IceCube: Primitive constraints Declarative (“static”): MustHave: a  b if a  s and a  b then b  s (not necessarily contiguous nor in order) Order: a  b if a, b  s and a  b then a before b in s (not necessarily both nor contiguous) Within log, across logs Imperative (dynamic): preCondition (State)

Bases de Données Avancées Replication: optimistic approaches 20 Log constraints parcel predecessor- successor alternatives Express user intents: Predecessor/successor: a  b  b  a b uses effect of a; “a causes b” Parcel: a  b  b  a  transaction Alternatives: a  b  b  a

3. Concurrency control & scheduling Policies for dealing with concurrent actions

Bases de Données Avancées Replication: optimistic approaches 22 Optimistic concurrency control & scheduling Two actions are either: dependent  schedule in dependence order concurrent and non-conflicting or commutative  schedule in any order concurrent and conflicting  schedule in non-conflicting order  or exclude one, the other, or both

Bases de Données Avancées Replication: optimistic approaches 23 Concurrency control Concurrent & no conflict  commute: execute both, arbitrary order Conflict detection options: 2 concurrent actions conflict only if operate on same object only if both write only if violate semantic invariant Conflict resolution options: exclude both exclude 1st, include 2nd (or vice-versa) execute both in favorable order (rewrite and execute both)

Bases de Données Avancées Replication: optimistic approaches 24 What is a conflict? 1 site executes code + pre/post-conditions Pre/post-conditions often unknown Dependency between successive actions Schedule execution must satisfy pre/post- conditions Violation  conflict pre(x 0 )post(x 0, f(x 0 )) x 1 := f(x 0 ) pre(x 0 )post(x’ 1, g(x 0 )) x’ 1 := g(x 0 ) pre(x 1 )post(x 1, g(x 1 )) x 2 := g(x 1 )

Bases de Données Avancées Replication: optimistic approaches 25 Thomas’ Write Rule Pre- / post-conditions unknown Scalar clocks no concurrency detect implicit concurrency control schedule in clock order a later action excludes earlier ones Lost updates Delete ambiguity: “tombstone” state

Bases de Données Avancées Replication: optimistic approaches 26 Value-based Version Vector concurrency control Pre- / post-conditions unknown Independent objects actions to different objects commute VV = per-object vector timestamp any concurrent writes to object conflict Resolution: Manual Values: “Resolver” per data type

Bases de Données Avancées Replication: optimistic approaches 27 Bayou scheduling Disjoint databases; 1 primary / database Transaction: single database Action = { dependency check, operation, merge-procedure } Optimistic replication: epidemic exchange logs { roll-back, replay }*; commit Conflict  dependency check fails  merge-procedure

Bases de Données Avancées Replication: optimistic approaches 28 Bayou dependency checks Write-write conflicts: on replay check that data unchanged Read-write conflicts: check input data can detect concurrent updates semantic: only relevant changes Application-specific checks bank account balance > £100 fine grain

Bases de Données Avancées Replication: optimistic approaches 29 IceCube: Object constraints Shared data type advertises static semantics mutually exclusive a  b  b  a best order (e.g. bank: credits before debits) a  b Only between concurrent actions Also: dynamic constraints commute best order mutually exclusive

Bases de Données Avancées Replication: optimistic approaches 30 IceCube scheduling Insight: conflict: choice of which action to exclude maximise value

Bases de Données Avancées Replication: optimistic approaches 31 IceCube execution model log constraints object constraints dynamic constraints

Bases de Données Avancées Replication: optimistic approaches 32 Search vs. syntactic order

Bases de Données Avancées Replication: optimistic approaches 33 Performance of IceCube heuristics

4. Convergence Can a peer-to-peer system converge? Hard in the general case Formalise to understand limitations, trade-offs

Bases de Données Avancées Replication: optimistic approaches 35 Convergence Liveness: sites receive same/all operations epidemic multicast quickly Safety: sites compute the same value equivalent schedules Stability: actions eventually not undone stable schedules Users, external world dependency Garbage collection

Bases de Données Avancées Replication: optimistic approaches 36 Schedule soundness & equivalence s sound: Closed for MustHave a  s  a  b  b  s Consistent with Order (a,b  s  a  b)  a before b in s Equivalence: s  t s, t sound a  s  a  t ordering is irrelevant!

Bases de Données Avancées Replication: optimistic approaches 37 Stability Peer-to-peer, indefinite tentative update + advisory reconciliation OK But stability needed: Users, external world depend on it Garbage collect multilog Stable: eventually decisions not changed committed: definitely included in all schedules aborted: definitely excluded

Bases de Données Avancées Replication: optimistic approaches 38 Correctness of stability Actions known to be stable at site i: stable i = committed i  aborted i Live:  action a, site i:  a  stable i Safe:  site i, schedule s i : s i sound  committed i  s i  site i,k: committed i  aborted k =  Safety invariant: strong, global!

Bases de Données Avancées Replication: optimistic approaches 39 Maintaining disjointness  site i,k: committed i  aborted k =  Different possibilities Unilateral abort TWR, Holliday 2000 Unilateral commit Deterministic abort / commit rule TWR Primary (only one) site decides Bayou, CVS Consensus before deciding Deno, Holliday

Bases de Données Avancées Replication: optimistic approaches 40 Maintaining soundness  site i, schedule s i : s i sound  committed i  s i When aborting a, also abort actions that MustHave a When committing a, also abort uncommitted actions that are ‘Order’ed before a Maintain both soundness and disjointness. Peer-to-peer commitment is hard!

Bases de Données Avancées Replication: optimistic approaches 41 Stability with TWR Independent objects Independent writes (no MustHave nor Order) All sites take same decision: Given two writes to same object, abort the earlier Whether concurrent or not Write stable when seen by all sites Disjointness: committed i =  Soundness: no MustHave (no transactions)

Bases de Données Avancées Replication: optimistic approaches 42 Stability in Bayou Databases: Disjoint Independent: no multi-DB transaction 1 primary / database Log constraints: transactions, time order Disjointness: Only 1 site decides about a: the primary for the database that a updates Soundness: whole transaction commits or aborts

Bases de Données Avancées Replication: optimistic approaches 43 Holliday’s pre-commit protocol Log constraints: multi-object transactions happens-before order Read transactions commit locally Read-Write transactions: consensus to commit convert locks to intentions pre-commit, vote commit if quorum ‘yes’ abort if anti-quorum ‘no’ or conflict with committed

Bases de Données Avancées Replication: optimistic approaches 44 Trade-offs Deterministic rule fast, inflexible Partition + primary single point of failure no MustHave across partition boundaries Consensus slow scalability impossibility of consensus in asynchronous systems with failure

5. Conclusions

Bases de Données Avancées Replication: optimistic approaches 46 Need for OR not going away “Network technology improving: keep everything consistent pessimistically.” True, but: Constant latency; unavailable bandwidth Mobile access  unbounded latency Increasing numbers of replicas “Conflicts are rare.” True, but: Do occur Very high cost

Bases de Données Avancées Replication: optimistic approaches 47 OR pros & cons Peer-to-peer read/write sharing OR accepts more updates: Performance despite latency Availability despite failures Increased complexity Semantic information Not transparent Bottleneck moved to commit Hard to make peer-to-peer Unless (unacceptable?) restrictions Unavoidable

Bases de Données Avancées Replication: optimistic approaches 48 The end