Telex/IceCube: a distributed memory with tunable consistency Marc Shapiro, Pierre Sutra, Pierpaolo Cincilla INRIA & LIP6, Regal group Nuno Preguic ̧ a.

Slides:



Advertisements
Similar presentations
University of Tampere, CS Department Distributed Transaction Management Jyrki Nummenmaa
Advertisements

Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Chapter 15: Transactions Transaction Concept Transaction Concept Concurrent Executions Concurrent Executions Serializability Serializability Testing for.
Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.
Topic 6.3: Transactions and Concurrency Control Hari Uday.
Distributed Systems Overview Ali Ghodsi
1 Chapter 3. Synchronization. STEMPusan National University STEM-PNU 2 Synchronization in Distributed Systems Synchronization in a single machine Same.
Database Systems, 8 th Edition Concurrency Control with Time Stamping Methods Assigns global unique time stamp to each transaction Produces explicit.
Conflict-free Replicated Data Types MARC SHAPIRO, NUNO PREGUIÇA, CARLOS BAQUERO AND MAREK ZAWIRSKI Presented by: Ron Zisman.
CMPT Dr. Alexandra Fedorova Lecture X: Transactions.
Distributed Systems 2006 Styles of Client/Server Computing.
Distributed Systems Fall 2010 Transactions and concurrency control.
“Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System ” Distributed Systems Κωνσταντακοπούλου Τζένη.
Transaction Management and Concurrency Control
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
Transactional Services Ricardo Jiménez-Peris Marta Patiño-Martínez Technical University of Madrid 1 st Adapt Workshop 23 rd -24 th September 2002 Madrid,
Reliability and Partition Types of Failures 1.Node failure 2.Communication line of failure 3.Loss of a message (or transaction) 4.Network partition 5.Any.
Distributed Systems CS Case Study: Replication in Google Chubby Recitation 5, Oct 06, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
System Catalogue v Stores data that describes each database v meta-data: – conceptual, logical, physical schema – mapping between schemata – info for query.
Quick Review of Apr 24 material Sorting (Sections 13.4) Sort-merge Algorithm for external sorting Join Operation implementations (sect. 13.5) –Size estimation.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
SPORC: Group Collaboration using Untrusted Cloud Resources OSDI 2010 Presented by Yu Chen.
Distributed Deadlocks and Transaction Recovery.
Client/Server Databases and the Oracle 10g Relational Database
1 CSE 480: Database Systems Lecture 23: Transaction Processing and Database Recovery.
CS Storage Systems Lecture 14 Consistency and Availability Tradeoffs.
Replication: optimistic approaches Marc Shapiro with Yasushi Saito (HP Labs) Cambridge Distributed Systems Group.
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
Presented by Dr. Greg Speegle April 12,  Two-phase commit slow relative to local transaction processing  CAP Theorem  Option 1: Reduce availability.
Molecular Transactions G. Ramalingam Kapil Vaswani Rigorous Software Engineering, MSRI.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
Transactions. 421B: Database Systems - Transactions 2 Transaction Processing q Most of the information systems in businesses are transaction based (databases.
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
Distributed Operating Systems Part III Andy Wang COP 5611 Advanced Operating Systems.
©Silberschatz, Korth and Sudarshan15.1Database System Concepts Chapter 15: Transactions Transaction Concept Transaction State Implementation of Atomicity.
Databases Illuminated
Lecture 13 Advanced Transaction Models. 2 Protocols considered so far are suitable for types of transactions that arise in traditional business applications,
The Relational Model1 Transaction Processing Units of Work.
1 Mobisnap: a database system for mobile computing Nuno Preguiça DI – FCT Universidade Nova de Lisboa Project partially supported by Praxis XXI/FCT/MCT.
Feb 1, 2001CSCI {4,6}900: Ubiquitous Computing1 Eager Replication and mobile nodes Read on disconnected clients may give stale data Eager replication prohibits.
1 Multiversion Reconciliation for Mobile Databases Shirish Hemanath Phatak & B.R.Badrinath Presented By Presented By Md. Abdur Rahman Md. Abdur Rahman.
CS603 Basics of underlying platforms January 9, 2002.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Transaction Management Overview. Transactions Concurrent execution of user programs is essential for good DBMS performance. – Because disk accesses are.
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
Bayou: Replication with Weak Inter-Node Connectivity Brad Karp UCL Computer Science CS GZ03 / th November, 2007.
1 CSE232A: Database System Principles More Concurrency Control and Transaction Processing.
10 1 Chapter 10 - A Transaction Management Database Systems: Design, Implementation, and Management, Rob and Coronel.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 24: GFS.
Chapter 13 Managing Transactions and Concurrency Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.
Concurrent Revisions: A deterministic concurrency model. Daan Leijen & Sebastian Burckhardt Microsoft Research (OOPSLA 2010, ESOP 2011)
CalvinFS: Consistent WAN Replication and Scalable Metdata Management for Distributed File Systems Thomas Kao.
Gargamel: A Conflict-Aware Contention Resolution Policy for STM Pierpaolo Cincilla, Marc Shapiro, Sébastien Monnet.
On-Line Transaction Processing
Slide credits: Thomas Kao
Alternative system models
Transaction Management and Concurrency Control
CSCI5570 Large Scale Data Processing Systems
EECS 498 Introduction to Distributed Systems Fall 2017
Chapter 10 Transaction Management and Concurrency Control
Atomic Commit and Concurrency Control
Transaction management
UNIVERSITAS GUNADARMA
Abstractions for Fault Tolerance
Flexible approaches to replicating shared data consistently
Transactions, Properties of Transactions
Presentation transcript:

Telex/IceCube: a distributed memory with tunable consistency Marc Shapiro, Pierre Sutra, Pierpaolo Cincilla INRIA & LIP6, Regal group Nuno Preguic ̧ a Universidade Nova de Lisboa

1 Telex/IceCube It is: A distributed shared memory Object-based (high-level operations) Transactional (ACID… for some definition of ID) Persistent Top level design goal: availability ∞-optimistic ⇒ (reconcile ⇒ cascading aborts) Minimize aborts ⇒ non-sequential schedules (Only) consistent enough to enforce invariants Partial replication Distributed TMs — 22-Feb-2012

Scheduling (∞-optimistic execution) Sequential execution In (semantic) dependence order Dynamic checks for preconditions: if OK, application approves schedule Conflict ⇒ fork Latest checkpoint + replay Independent actions not rolled back Semantics: Conflict = non-commuting, antagonistic Independent 2Distributed TMs — 22-Feb-2012

3 Shared Calendar Application Telex lifecycle Application operations Telex appt +action(appt) User requests > application: actions, dependence > Telex: add to ACG, transmit

4Distributed TMs — 22-Feb-2012 Telex lifecycle Schedule Telex sched Shared Calendar Application ACG > Telex: valid path = sound schedule > application: tentative execute > application: if OK, approve approve

5Distributed TMs — 22-Feb-2012 Telex lifecycle Conflict ⇒ multiple schedules Telex !!! sched1 sched2 Shared Calendar Application ACG > Telex: valid path = sound schedule > application: tentative execute > application: if OK, approve approve2

6Distributed TMs — 22-Feb-2012 Constraints (Semantics-based conflict detection) Action: reified operation Constraint: scheduling invariant Binary relations: NotAfter    Enables (implication)    NonCommuting    Combinations: Antagonistic    Atomic    Causal dependence        Action-constraint graph ACG

7Distributed TMs — 22-Feb-2012 Single site scheduling Sound schedule: Path in the ACG that satisfies constraints Fork: Antagonism    NonCommuting + Dynamic checks    Several possible schedules Penalise lost work Optimal schedule: NP-hard IceCube heuristics

8Distributed TMs — 22-Feb-2012 Minimizing aborts (IceCube heuristics) Iteratively pick a sound schedule Application executes, checks, approves Check invariants Reject if violation Request alternative schedule Restart from previous Approved schedule Preferred for future Proposed to agreement

9Distributed TMs — 22-Feb-2012 /users/shapiro/mydoc Marc Pierre Ruby Logging and transmission = multilog Avoid write contention: per-doc, per-writer logs Append to my document log Transmit sequentially Read all document logs Ruby

10Distributed TMs — 22-Feb-2012 getCstrt Telex lifecycle Receive remote operations Telex ACG +cstrt(antg) Shared Calendar Application User requests > application: actions, dependence > Telex: add to ACG, transmit, merge

11Distributed TMs — 22-Feb-2012 Eventual consistency Common stable prefix Diverge beyond Combine approved schedules ⇒ Consensus on next extension of prefix Equivalence, not equality 0 0 0

12Distributed TMs — 22-Feb-2012 Telex lifecycle Convergence protocol Telex agreement Shared Calendar Application approved schedules > Telex: exchange, agree > commit/abort, serialise

13 Fast Gen. Genuine Consensus Throughput on LAN (1024 regs.) [Sutra 2010] Generalised Paxos Fast Paxos Paxos FGGC ☺ Distributed TMs — 22-Feb-2012

14 Special-case commutative commands ⇒ collision recovery Improvements higher on WAN. Generalised Paxos Fast Paxos Paxos FGGC WAN typical ☺ Distributed TMs — 22-Feb-2012 FGGC

FGGC: Varying commutativity Each command reads or writes a randomly-chosen register; WAN Paxos FGGC 1 register FGGC registers ☺ Distributed TMs — 22-Feb-2012

16Distributed TMs — 22-Feb-2012 Concurrency control: application safety assertions Dynamic test by application, ex.: bank account > 0 file system is a proper tree "(…(…(…)…(…))…)" Application approves/disapproves schedule at run time

17Distributed TMs — 22-Feb-2012 Example: Sakura Shared Calendar over Telex Private calendar + common meetings Example proposals: M1: Marc & Lamia & JeanMi, Monday | Tuesday | Friday M2: Lamia & Marc & Pierre, Tuesday | Wed. | Friday M3: JeanMi & Pierre & Marc, Monday | Tues. | Thurs. Change M3 to Friday Telex: (Local) Explore solutions, approve (Global) Combine, Commit Marc  Lamia Tues.  Fri. M1  M2 Run-time check

18Distributed TMs — 22-Feb-2012 Example ACG: calendar application Schedule: sound cut in the graph

19Distributed TMs — 22-Feb-2012 Status Open source 35,000 Java (well-commented) lines of code Available at gforge.inria.fr, BSD licensegforge.inria.fr, Applications: Sakura co-operative calendar (INRIA) Decentralised Collaborative Environment (UPC) Database, STMBench7 (UOC) Co-operative text editor (UNL, INRIA, UOC) Co-operative ontology editor (U. Aegean) Co-operative UML editor (LIP6)

20Distributed TMs — 22-Feb-2012 Lessons learned Separation of concerns: Application: business logic – Semantics: constraints, run-time checks System: Persistence, replication, consistency – Optimistic: availability, WAN operation – Adapts to application requirements – Constraints + run-time machine learning Modular ⇒ efficiency is an issue High latency, availability ⇒ reduce messages Commutative operations ⇒ CRDTs Partial replication is hard!

21Distributed TMs — 22-Feb-2012

22Distributed TMs — 22-Feb

23Distributed TMs — 22-Feb-2012 Application design MVC-like execution loop + rollback Designing for replication is an effort: Commute (no constraints) >> Antagonistic >> Non-Commuting Weaken invariants Make invariants explicit

24Distributed TMs — 22-Feb-2012 IceCube heuristic superior

25Distributed TMs — 22-Feb-2012 Multilog avoids write contention

26Distributed TMs — 22-Feb-2012 Decomposing ACG improves performance

27Distributed TMs — 22-Feb-2012 Designing applications for replication Replication is beneficial in many information sharing scenarios Understand your application needs Design for replication: Aim for commutativity Weaken invariants Make invariants explicit Telex: platform for decentralised, mutable, persistent shared data

28Distributed TMs — 22-Feb-2012 P2P collaborative work Shared documents Multiple masters Different locations, times Reconciliation: Isolated operation, diverge Detect & resolve conflicts Converge Conflict = invariant violated Telex: platform to ease application design & implementation Lamia Marc JeanMi shared document

29Distributed TMs — 22-Feb-2012 Goals Separation of concerns Support multiple interaction styles On-line WYSIWIS = pessimistic vs. Isolation, disconnected = optimistic Operate on local replica, replay remote ops Detect conflicts, choose, roll back Multi-object operations Safe: enforce application invariants Converge, commit Synchronise: only as required by application invariants (no more, no less)

30Distributed TMs — 22-Feb-2012 Competition Co-operative text editing: All operations assumed to commute (no invariants) Single object Total order: Database, atomic broadcast Safe, slow, doesn't scale Peer-to-peer: File sharing, database replication: scalable Single object Last Writer Wins (LWW): unsafe! Telex: principled, general-purpose, multi-object, adapts to application requirements, P2P