Presentation is loading. Please wait.

Presentation is loading. Please wait.

Telex/IceCube: a distributed memory with tunable consistency Marc Shapiro, Pierre Sutra, Pierpaolo Cincilla INRIA & LIP6, Regal group Nuno Preguic ̧ a.

Similar presentations


Presentation on theme: "Telex/IceCube: a distributed memory with tunable consistency Marc Shapiro, Pierre Sutra, Pierpaolo Cincilla INRIA & LIP6, Regal group Nuno Preguic ̧ a."— Presentation transcript:

1 Telex/IceCube: a distributed memory with tunable consistency Marc Shapiro, Pierre Sutra, Pierpaolo Cincilla INRIA & LIP6, Regal group Nuno Preguic ̧ a Universidade Nova de Lisboa

2 1 Telex/IceCube It is: A distributed shared memory Object-based (high-level operations) Transactional (ACID… for some definition of ID) Persistent Top level design goal: availability ∞-optimistic ⇒ (reconcile ⇒ cascading aborts) Minimize aborts ⇒ non-sequential schedules (Only) consistent enough to enforce invariants Partial replication Distributed TMs — 22-Feb-2012

3 Scheduling (∞-optimistic execution) Sequential execution In (semantic) dependence order Dynamic checks for preconditions: if OK, application approves schedule Conflict ⇒ fork Latest checkpoint + replay Independent actions not rolled back Semantics: Conflict = non-commuting, antagonistic Independent 2Distributed TMs — 22-Feb-2012

4 3 Shared Calendar Application Telex lifecycle Application operations Telex appt +action(appt) User requests > application: actions, dependence > Telex: add to ACG, transmit

5 4Distributed TMs — 22-Feb-2012 Telex lifecycle Schedule Telex sched Shared Calendar Application ACG > Telex: valid path = sound schedule > application: tentative execute > application: if OK, approve approve

6 5Distributed TMs — 22-Feb-2012 Telex lifecycle Conflict ⇒ multiple schedules Telex !!! sched1 sched2 Shared Calendar Application ACG > Telex: valid path = sound schedule > application: tentative execute > application: if OK, approve approve2

7 6Distributed TMs — 22-Feb-2012 Constraints (Semantics-based conflict detection) Action: reified operation Constraint: scheduling invariant Binary relations: NotAfter    Enables (implication)    NonCommuting    Combinations: Antagonistic    Atomic    Causal dependence        Action-constraint graph ACG

8 7Distributed TMs — 22-Feb-2012 Single site scheduling Sound schedule: Path in the ACG that satisfies constraints Fork: Antagonism    NonCommuting + Dynamic checks    Several possible schedules Penalise lost work Optimal schedule: NP-hard IceCube heuristics

9 8Distributed TMs — 22-Feb-2012 Minimizing aborts (IceCube heuristics) Iteratively pick a sound schedule Application executes, checks, approves Check invariants Reject if violation Request alternative schedule Restart from previous Approved schedule Preferred for future Proposed to agreement

10 9Distributed TMs — 22-Feb-2012 /users/shapiro/mydoc Marc Pierre Ruby Logging and transmission = multilog Avoid write contention: per-doc, per-writer logs Append to my document log Transmit sequentially Read all document logs Ruby

11 10Distributed TMs — 22-Feb-2012 getCstrt Telex lifecycle Receive remote operations Telex ACG +cstrt(antg) Shared Calendar Application User requests > application: actions, dependence > Telex: add to ACG, transmit, merge

12 11Distributed TMs — 22-Feb-2012 Eventual consistency Common stable prefix Diverge beyond Combine approved schedules ⇒ Consensus on next extension of prefix Equivalence, not equality 0 0 0

13 12Distributed TMs — 22-Feb-2012 Telex lifecycle Convergence protocol Telex agreement Shared Calendar Application approved schedules > Telex: exchange, agree > commit/abort, serialise

14 13 Fast Gen. Genuine Consensus Throughput on LAN (1024 regs.) [Sutra 2010] Generalised Paxos Fast Paxos Paxos FGGC ☺ Distributed TMs — 22-Feb-2012

15 14 Special-case commutative commands ⇒ collision recovery Improvements higher on WAN. Generalised Paxos Fast Paxos Paxos FGGC WAN typical ☺ Distributed TMs — 22-Feb-2012 FGGC

16 360 15 FGGC: Varying commutativity Each command reads or writes a randomly-chosen register; WAN Paxos FGGC 1 register FGGC 16384 registers 16384 8192 4096 2048 1024 1 ☺ Distributed TMs — 22-Feb-2012

17 16Distributed TMs — 22-Feb-2012 Concurrency control: application safety assertions Dynamic test by application, ex.: bank account > 0 file system is a proper tree "(…(…(…)…(…))…)" Application approves/disapproves schedule at run time

18 17Distributed TMs — 22-Feb-2012 Example: Sakura Shared Calendar over Telex Private calendar + common meetings Example proposals: M1: Marc & Lamia & JeanMi, Monday | Tuesday | Friday M2: Lamia & Marc & Pierre, Tuesday | Wed. | Friday M3: JeanMi & Pierre & Marc, Monday | Tues. | Thurs. Change M3 to Friday Telex: (Local) Explore solutions, approve (Global) Combine, Commit Marc  Lamia Tues.  Fri. M1  M2 Run-time check

19 18Distributed TMs — 22-Feb-2012 Example ACG: calendar application Schedule: sound cut in the graph

20 19Distributed TMs — 22-Feb-2012 Status Open source 35,000 Java (well-commented) lines of code Available at gforge.inria.fr, BSD licensegforge.inria.fr, Applications: Sakura co-operative calendar (INRIA) Decentralised Collaborative Environment (UPC) Database, STMBench7 (UOC) Co-operative text editor (UNL, INRIA, UOC) Co-operative ontology editor (U. Aegean) Co-operative UML editor (LIP6)

21 20Distributed TMs — 22-Feb-2012 Lessons learned Separation of concerns: Application: business logic – Semantics: constraints, run-time checks System: Persistence, replication, consistency – Optimistic: availability, WAN operation – Adapts to application requirements – Constraints + run-time machine learning Modular ⇒ efficiency is an issue High latency, availability ⇒ reduce messages Commutative operations ⇒ CRDTs Partial replication is hard!

22 21Distributed TMs — 22-Feb-2012

23 22Distributed TMs — 22-Feb-2012 -----

24 23Distributed TMs — 22-Feb-2012 Application design MVC-like execution loop + rollback Designing for replication is an effort: Commute (no constraints) >> Antagonistic >> Non-Commuting Weaken invariants Make invariants explicit

25 24Distributed TMs — 22-Feb-2012 IceCube heuristic superior

26 25Distributed TMs — 22-Feb-2012 Multilog avoids write contention

27 26Distributed TMs — 22-Feb-2012 Decomposing ACG improves performance

28 27Distributed TMs — 22-Feb-2012 Designing applications for replication Replication is beneficial in many information sharing scenarios Understand your application needs Design for replication: Aim for commutativity Weaken invariants Make invariants explicit Telex: platform for decentralised, mutable, persistent shared data

29 28Distributed TMs — 22-Feb-2012 P2P collaborative work Shared documents Multiple masters Different locations, times Reconciliation: Isolated operation, diverge Detect & resolve conflicts Converge Conflict = invariant violated Telex: platform to ease application design & implementation Lamia Marc JeanMi shared document

30 29Distributed TMs — 22-Feb-2012 Goals Separation of concerns Support multiple interaction styles On-line WYSIWIS = pessimistic vs. Isolation, disconnected = optimistic Operate on local replica, replay remote ops Detect conflicts, choose, roll back Multi-object operations Safe: enforce application invariants Converge, commit Synchronise: only as required by application invariants (no more, no less)

31 30Distributed TMs — 22-Feb-2012 Competition Co-operative text editing: All operations assumed to commute (no invariants) Single object Total order: Database, atomic broadcast Safe, slow, doesn't scale Peer-to-peer: File sharing, database replication: scalable Single object Last Writer Wins (LWW): unsafe! Telex: principled, general-purpose, multi-object, adapts to application requirements, P2P


Download ppt "Telex/IceCube: a distributed memory with tunable consistency Marc Shapiro, Pierre Sutra, Pierpaolo Cincilla INRIA & LIP6, Regal group Nuno Preguic ̧ a."

Similar presentations


Ads by Google