Presentation is loading. Please wait.

Presentation is loading. Please wait.

When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco.

Similar presentations


Presentation on theme: "When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco."— Presentation transcript:

1 When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís Rodrigues 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

2 Talk Structure Motivation and related work The GMU protocol Experimental results 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 1

3 Motivation and related work 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 2

4 Distributed STMs STMs are being employed in new scenarios: Database caches in three-tier web apps (FénixEDU) HPC programming language (X10) In-memory cloud data grids (Coherence, Infinispan) New challenges: Scalability Fault-tolerance 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal REPLICATION 3

5 Full Replication All sites store the whole set of data Full replication in transactional systems is a very investigated problem: Several solutions in DBMS world: Update anywhere-anytime-anyway solutions [SIGMOD96] Deferred-update replication techniques [JDPD03, VLDB00] Lazy techniques by relaxing consistency properties [SOSP07] Specific solutions for DSTMs: Efficient coding of the read-set [PRDC09] Communication/computation overlapping [NCA10] Lease-based commits [Middleware10] 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 4

6 Partial Replication It is a way to increase scalability. Each site stores a partial copy of the data. Genuine partial replication schemes maximize scalability by ensuring that: Only data sites that replicate data item read or written by a transaction T, exchange messages for executing/committing T. Existing 1-Copy Serializable implementations enforce distributed validation of read-only transactions [SRDS10]: considerable overheads in typical workloads 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 5

7 Objectives 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal Requirements Read-only transactions never abort or block Genuine certification mechanism Objectives Partially replicated DSTM Scalability and performance as first class targets Find a sweet spot in the consistency/performance tradeoff 6

8 Issues with Partial Replication Extending existing local multiversion (MV) STMs is not enough Local MV STMs rely on a single global counter to track version advancement Problem: Commit of transactions should involve ALL NODES NO GENUINENESS = POOR SCALABILITY 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 7

9 GMU: Genuine Multiversion Update serializable replication [ICDCS12] 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 8

10 Key concepts In the execution/commit phase of a transaction T, ONLY nodes which store data items accessed by T are involved. It uses multiple versions for each data item It builds visible snapshots = freshest consistent snapshots taking into account: 1. causal dependencies vs. previously committed transactions at the time a transaction began, 2. previous reads executed by the same transaction Vector clocks used to establish visible snapshots 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal G M U 9

11 Main data structures (i) For each node N: VCLog: sequence of vector clocks of “recently” committed transactions on N PrepareVC: vector clock greater than or equal to the most recent vector clock in VCLog 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 10

12 Main data structures (ii) For each transaction T: VC: a vector clock that is initialized with the most recent vector clock in local VCLog, updated upon reads during execution >> to ensure that T observes the most recent serializable snapshot, at commit time >> to assign final vector clock to the transaction (and to its write-set). 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 11

13 Main data structures (iii) A chain of versions per data item id: previous: value: 2 VN: 8 previous: value: 1 VN: 5 previous: value: 0 VN: 2 id Versions: 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 8 0 1 i n-2 n-1 Transaction T commits on node i T’s Vector Clock 12

14 T reads id on node i: Rule 1 Informally: it avoids reading remotely “too old” versions Formally: if it is the first read of T on i wait that VCLog.mostRecVC i [i] >= T.VC[i] this ensures that causal dependencies are enforced 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 13

15 Rule 1 in action Node 0 Node 1 (it stores X) Node 2 (it stores Y) X (2) T 1 :R(X) (1,1,1) (1,2,2) (1,1,1) Y (2) (1,2,2) T 0 :W(X,v) T 0 :W(Y,w) (1,1,1) T 1 :R(Y) Y (2) (1,2,2) Most recent VC in VCLog T 1.VC T 0 :Commit Commit (1,2,2) T 1.VC 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 14

16 T reads id on node i: Rule 2 Informally: it maximizes freshness by moving T’s VC ahead in time “as much as possible” in commit log Formally: if it is the first read of T on i, select the most recent VC in i’s Commit Log s.t. VC[j] <= T.VC[j] for each node j on which T has already read T.VC=MAX{VC, T.VC} Note: this updates only the entries of T.VC of the nodes from which T had not read yet 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 15

17 Rule 2 in action Node 0 Node 1 (it stores X) Node 2 (it stores Y) X (21) Y (11) X (20) T 0 :R(X) (1,1,1) (1,21,21) (1,1,1) Y (21) (1,21,21) T 1 :W(X,v) T 1 :W(Y,w) X (20) (1,20,1) T 0 :R(Y)Y (11) T 0 :Commit (1,20,1) Most recent VC in VCLog T 0.VC T 1 :Commit Commit (1,20,11) T 0.VC 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal (1,1,11) Y (1) 16

18 T reads id on node i: Rule 3 Informally: observe the most recent consistent version of id based on T’s history (previous reads) Formally: iterate over the versions of id and return the most recent one s.t. id.version.VN <= T.VC[i] 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 17

19 Committing read-only transactions Read-only transactions commit locally: No additional validations No possibility of aborts … and are never blocked, as in typical multiversion schemes. 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 18

20 Committing update transactions Run 2PC : Upon prepare message reception (participant-side i): Acquire read & write locks Validate read-set Increase PrepareVC[i] number and send PrepareVC back If all replies are positive (coordinator-side): Build a commit vector clock Broadcast back commit message 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 19

21 Building the commit Vector Clock A variant of the Skeen’s algorithm is implemented [SKEEN85]. This allows to keep track causal dependencies developed by: a transaction T during its execution, the most recent committed transactions at the nodes contacted by T 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 20

22 Consistency criterion GMU ensures Extended Update Serializability: Update Serializability ensures: 1-Copy-Serializabilty (1CS) on the history restricted to committed update transactions 1CS on the history restricted to committed update transactions and any single read-only transaction: but it can admit non-1CS histories containing at least 2 read-only transactions Extended Update Serializability: ensures US property also to executing transactions analogous to opacity in STMs 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 21

23 Experimental Results 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 22

24 Experiments on private cluster 8 core physical nodes TPC-C - 90% read-only xacts - 10% update xacts - 4 threads per node - moderate contention (15% abort rate at 20 nodes) 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 23

25 Experiments on private cluster 8 core physical nodes TPC-C - 90% read-only xacts - 10% update xacts - 4 threads per node - moderate contention (15% abort rate at 20 nodes) 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 24

26 FutureGrid Experiments All nodes are 2-core VMs deployed in the same site TPC-C - 90% read-only xacts - 10% update xacts - 1 thread per node - low/moderate contention, also at 40 nodes 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 25

27 Thanks for the attention 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 26

28 References 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal [ICDCS12] Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia, Luís Rodrigues. “When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Replication”. The IEEE 32 nd International Conference on Distributed Computing Systems, June, 2012. [JDPD03] Fernando Pedone, Rachid Guerraoui, André Schiper. “The Database State Machine Approach”. Journal of Distributed and Parallel Databases, vol. 14, issue 1, 71-98, July, 2003. [Middleware10] Nuno Carvalho, Paolo Romano, Luís Rodrigues. “Asynchronous lease-based replication of software transactional memory”. Proc. of the 11 th ACM/IFIP/USENIX International Conference on Middleware, 376-396, 2010. [NCA10] Roberto Palmieri, Francesco Quaglia, Paolo Romano. “AGGRO: Boosting STM Replication via Aggressively Optimistic Transaction Processing”. Proc. of the 9 th IEEE International Symposium on Networking Computing and Applications, 20-27, 2010. [PRDC09] Maria Couceiro, Paolo Romano, Nuno Carvalho, Luís Rodrigues. “D2STM: Dependable Distributed Software Trasanctional Memory”. Proc. of 15 th IEEE Pacific Rim International Symposium on Dependable Computing, 307-313, 2009. [SIGMOD96] Jim Gray, Pat Helland, Patrick O’Neil, Dennis Shasha. “The dangers of replication and solutions”. Proc. of the 1996 ACM SIGMOD international conference on Management of data, vol. 25, issue 2, 173-182, June, 1996. [SKEEN85] D. Skeen. “Unpublished communication”, 1985. Referenced in K. Birman, T. Joseph “Reliable Communication in the Presence of Failures”, ACM Trans. on Computer Systems, 47-76, 1987 [SOSP07] G. DeCandia et al. “Dynamo: Amazon’s Highly Available key-value Store”. Proc. of the 21 st ACM SIGOPS Symposium on Operating Systems Principles, 2007 [SRDS10] Nicolas Schiper, Pierre Sutra, Fernando Pedone. “P-Store: Genuine Partial Replication in Wide Area Networks”. Proc. of the 29 th Symposium of Reliable Distributed Systems, 2010. [VLDB00] Bettina Kemme, Gustavo Alonso. “Don’t Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication”. Proc. of the 26 th International Conference on Very Large Data Bases, 134-143, 2000. 27


Download ppt "When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco."

Similar presentations


Ads by Google