Download presentation
Presentation is loading. Please wait.
Published byJan Bence Modified over 9 years ago
1
Telex/IceCube: a distributed memory with tunable consistency Marc Shapiro, Pierre Sutra, Pierpaolo Cincilla INRIA & LIP6, Regal group Nuno Preguic ̧ a Universidade Nova de Lisboa
2
1 Telex/IceCube It is: A distributed shared memory Object-based (high-level operations) Transactional (ACID… for some definition of ID) Persistent Top level design goal: availability ∞-optimistic ⇒ (reconcile ⇒ cascading aborts) Minimize aborts ⇒ non-sequential schedules (Only) consistent enough to enforce invariants Partial replication Distributed TMs — 22-Feb-2012
3
Scheduling (∞-optimistic execution) Sequential execution In (semantic) dependence order Dynamic checks for preconditions: if OK, application approves schedule Conflict ⇒ fork Latest checkpoint + replay Independent actions not rolled back Semantics: Conflict = non-commuting, antagonistic Independent 2Distributed TMs — 22-Feb-2012
4
3 Shared Calendar Application Telex lifecycle Application operations Telex appt +action(appt) User requests > application: actions, dependence > Telex: add to ACG, transmit
5
4Distributed TMs — 22-Feb-2012 Telex lifecycle Schedule Telex sched Shared Calendar Application ACG > Telex: valid path = sound schedule > application: tentative execute > application: if OK, approve approve
6
5Distributed TMs — 22-Feb-2012 Telex lifecycle Conflict ⇒ multiple schedules Telex !!! sched1 sched2 Shared Calendar Application ACG > Telex: valid path = sound schedule > application: tentative execute > application: if OK, approve approve2
7
6Distributed TMs — 22-Feb-2012 Constraints (Semantics-based conflict detection) Action: reified operation Constraint: scheduling invariant Binary relations: NotAfter Enables (implication) NonCommuting Combinations: Antagonistic Atomic Causal dependence Action-constraint graph ACG
8
7Distributed TMs — 22-Feb-2012 Single site scheduling Sound schedule: Path in the ACG that satisfies constraints Fork: Antagonism NonCommuting + Dynamic checks Several possible schedules Penalise lost work Optimal schedule: NP-hard IceCube heuristics
9
8Distributed TMs — 22-Feb-2012 Minimizing aborts (IceCube heuristics) Iteratively pick a sound schedule Application executes, checks, approves Check invariants Reject if violation Request alternative schedule Restart from previous Approved schedule Preferred for future Proposed to agreement
10
9Distributed TMs — 22-Feb-2012 /users/shapiro/mydoc Marc Pierre Ruby Logging and transmission = multilog Avoid write contention: per-doc, per-writer logs Append to my document log Transmit sequentially Read all document logs Ruby
11
10Distributed TMs — 22-Feb-2012 getCstrt Telex lifecycle Receive remote operations Telex ACG +cstrt(antg) Shared Calendar Application User requests > application: actions, dependence > Telex: add to ACG, transmit, merge
12
11Distributed TMs — 22-Feb-2012 Eventual consistency Common stable prefix Diverge beyond Combine approved schedules ⇒ Consensus on next extension of prefix Equivalence, not equality 0 0 0
13
12Distributed TMs — 22-Feb-2012 Telex lifecycle Convergence protocol Telex agreement Shared Calendar Application approved schedules > Telex: exchange, agree > commit/abort, serialise
14
13 Fast Gen. Genuine Consensus Throughput on LAN (1024 regs.) [Sutra 2010] Generalised Paxos Fast Paxos Paxos FGGC ☺ Distributed TMs — 22-Feb-2012
15
14 Special-case commutative commands ⇒ collision recovery Improvements higher on WAN. Generalised Paxos Fast Paxos Paxos FGGC WAN typical ☺ Distributed TMs — 22-Feb-2012 FGGC
16
360 15 FGGC: Varying commutativity Each command reads or writes a randomly-chosen register; WAN Paxos FGGC 1 register FGGC 16384 registers 16384 8192 4096 2048 1024 1 ☺ Distributed TMs — 22-Feb-2012
17
16Distributed TMs — 22-Feb-2012 Concurrency control: application safety assertions Dynamic test by application, ex.: bank account > 0 file system is a proper tree "(…(…(…)…(…))…)" Application approves/disapproves schedule at run time
18
17Distributed TMs — 22-Feb-2012 Example: Sakura Shared Calendar over Telex Private calendar + common meetings Example proposals: M1: Marc & Lamia & JeanMi, Monday | Tuesday | Friday M2: Lamia & Marc & Pierre, Tuesday | Wed. | Friday M3: JeanMi & Pierre & Marc, Monday | Tues. | Thurs. Change M3 to Friday Telex: (Local) Explore solutions, approve (Global) Combine, Commit Marc Lamia Tues. Fri. M1 M2 Run-time check
19
18Distributed TMs — 22-Feb-2012 Example ACG: calendar application Schedule: sound cut in the graph
20
19Distributed TMs — 22-Feb-2012 Status Open source 35,000 Java (well-commented) lines of code Available at gforge.inria.fr, BSD licensegforge.inria.fr, Applications: Sakura co-operative calendar (INRIA) Decentralised Collaborative Environment (UPC) Database, STMBench7 (UOC) Co-operative text editor (UNL, INRIA, UOC) Co-operative ontology editor (U. Aegean) Co-operative UML editor (LIP6)
21
20Distributed TMs — 22-Feb-2012 Lessons learned Separation of concerns: Application: business logic – Semantics: constraints, run-time checks System: Persistence, replication, consistency – Optimistic: availability, WAN operation – Adapts to application requirements – Constraints + run-time machine learning Modular ⇒ efficiency is an issue High latency, availability ⇒ reduce messages Commutative operations ⇒ CRDTs Partial replication is hard!
22
21Distributed TMs — 22-Feb-2012
23
22Distributed TMs — 22-Feb-2012 -----
24
23Distributed TMs — 22-Feb-2012 Application design MVC-like execution loop + rollback Designing for replication is an effort: Commute (no constraints) >> Antagonistic >> Non-Commuting Weaken invariants Make invariants explicit
25
24Distributed TMs — 22-Feb-2012 IceCube heuristic superior
26
25Distributed TMs — 22-Feb-2012 Multilog avoids write contention
27
26Distributed TMs — 22-Feb-2012 Decomposing ACG improves performance
28
27Distributed TMs — 22-Feb-2012 Designing applications for replication Replication is beneficial in many information sharing scenarios Understand your application needs Design for replication: Aim for commutativity Weaken invariants Make invariants explicit Telex: platform for decentralised, mutable, persistent shared data
29
28Distributed TMs — 22-Feb-2012 P2P collaborative work Shared documents Multiple masters Different locations, times Reconciliation: Isolated operation, diverge Detect & resolve conflicts Converge Conflict = invariant violated Telex: platform to ease application design & implementation Lamia Marc JeanMi shared document
30
29Distributed TMs — 22-Feb-2012 Goals Separation of concerns Support multiple interaction styles On-line WYSIWIS = pessimistic vs. Isolation, disconnected = optimistic Operate on local replica, replay remote ops Detect conflicts, choose, roll back Multi-object operations Safe: enforce application invariants Converge, commit Synchronise: only as required by application invariants (no more, no less)
31
30Distributed TMs — 22-Feb-2012 Competition Co-operative text editing: All operations assumed to commute (no invariants) Single object Total order: Database, atomic broadcast Safe, slow, doesn't scale Peer-to-peer: File sharing, database replication: scalable Single object Last Writer Wins (LWW): unsafe! Telex: principled, general-purpose, multi-object, adapts to application requirements, P2P
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.