Gargamel: A Conflict-Aware Contention Resolution Policy for STM Pierpaolo Cincilla, Marc Shapiro, Sébastien Monnet.

Slides:



Advertisements
Similar presentations
TRAMP Workshop Some Challenges Facing Transactional Memory Craig Zilles and Lee Baugh University of Illinois at Urbana-Champaign.
Advertisements

Clustering Architectures in GIS/SI
SDN Controller Challenges
To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
IDIT KEIDAR DMITRI PERELMAN RUI FAN EuroTM 2011 Maintaining Multiple Versions in Software Transactional Memory 1.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
1 Johannes Schneider Transactional Memory: How to Perform Load Adaption in a Simple And Distributed Manner Johannes Schneider David Hasenfratz Roger Wattenhofer.
Selfishness in Transactional Memory Raphael Eidenbenz, Roger Wattenhofer Distributed Computing Group Game Theory meets Multicore Architecture.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
1 Scalable and transparent parallelization of multiplayer games Bogdan Simion MASc thesis Department of Electrical and Computer Engineering.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Task Scheduling for Highly Concurrent Analytical and Transactional Main-Memory Workloads Iraklis Psaroudakis (EPFL), Tobias Scheuer (SAP AG), Norman May.
Presented by Dr. Greg Speegle April 12,  Two-phase commit slow relative to local transaction processing  CAP Theorem  Option 1: Reduce availability.
1 Distributed Process Scheduling: A System Performance Model Vijay Jain CSc 8320, Spring 2007.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
CUDA Optimizations Sathish Vadhiyar Parallel Programming.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
WG5: Applications & Performance Evaluation Pascal Felber
This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/ ] under.
S-Paxos: Eliminating the Leader Bottleneck
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.
Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica.
Classic Model of Parallel Processing
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
ASPLOS’02 Presented by Kim, Sun-Hee.  Technology trends ◦ The rate of frequency scaling is slowing down  Performance must come from exploiting concurrency.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Static Process Scheduling
1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
1 Copyright © 2005, Oracle. All rights reserved. Following a Tuning Methodology.
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University.
1 Scalability and Accuracy in a Large-Scale Network Emulator Nov. 12, 2003 Byung-Gon Chun.
CalvinFS: Consistent WAN Replication and Scalable Metdata Management for Distributed File Systems Thomas Kao.
Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
Sathish Vadhiyar Parallel Programming
Guangxiang Du*, Indranil Gupta
Applying Control Theory to Stream Processing Systems
CSCI5570 Large Scale Data Processing Systems
Faster Data Structures in Transactional Memory using Three Paths
Introduction to NewSQL
Parallel Algorithm Design
The SNOW Theorem and Latency-Optimal Read-Only Transactions
Consistency in Distributed Systems
The University of Texas at Austin
Introduction, Focus, Overview
April 30th – Scheduling / parallel
EECS 498 Introduction to Distributed Systems Fall 2017
Predictive Performance
Omega: flexible, scalable schedulers for large compute clusters
Transactional Memory Semaphores, monitors, and conditional critical regions all suffer from limitations based on lock semantics Naïve synchronization may.
How to improve (decrease) CPI
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
CS 584.
Prophecy: Using History for High-Throughput Fault Tolerance
Distributed Systems CS
Control Theory in Log Processing Systems
Introduction, Focus, Overview
Database System Architectures
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

Gargamel: A Conflict-Aware Contention Resolution Policy for STM Pierpaolo Cincilla, Marc Shapiro, Sébastien Monnet

2 High contention => Repetitive aborts, poor performance ● Transactions in a TM execute speculatively: ● Commit if no conflicts ● Abort otherwise Low contention => Good performance

3 Client Load balancer Black box DB Concurrency control, contention => Distributed configurations scale poorly ● Single thread DBs perform well ● Scale => parallel, distributed

4 Current approaches: Acts after conflict detection Do not avoid wasted work => ● TM schedulers Avoid conflicts to appear } Avoid wasted work => ● Contention ● Manager }

5 Current approaches : Contention Manager ● Polite ● Karma ● Greedy ● Serializer Acts after conflict detection } Do not avoid wasted work => ● TM schedulers ● Polite ● Karma ● Greedy ● Serializer Avoid conflicts to appear } Avoid wasted work =>

6 Current approaches ● Memory partition => No global transactions ● Centralise writes ● Read-dominated workloads ● Weaken semantics (key-value store, NoSQL) ● Problematic for applications => SI }

7 Intuition Throughput-optimal = ● No concurrent conflicting ● transactions => no aborts ● 1 site = 1 scheduler + n workers threads – Conflicting => sequential ''chain'' – Non conflicting => parallel Conflict estimation

8 Intuition Throughput-optimal = ● No concurrent conflicting ● transactions => no aborts ● 1 site = 1 scheduler/load balancer + n workers – Conflicting => sequential ''chain'' – Non conflicting => parallel Conflict estimation

9 The Gargamel scheduling algorithm 1 Scheduler clients 1 Conflict ?

Architecture Scheduler 2 Threads Fault tolerance, latency, load => several sites Clients Site 1 Scheduler 1 Threads Site 2 Transaction transmission + agreement Clients

11 2 The Gargamel scheduling algorithm (2) Scheduler 1 clients Scheduler 2 clients Execution Canc elled AGREEMENT Transactions Agree in the oucome of bet Bet 7 1 Killed

12 Simulation results Measure: ● Resource usage – Number of running threads in ressources bounded systems ● Price of concurrency – Cancel/kill rate with respect to number of sites, message delay ● Simulation engine ● Simulated workload : TPC-C – Vary = number of sites, distance between sites (latency within sites =0) – Constant : workload

13 Resource usage (TPC-C) ● No overloaded threads Gargamel Passive

14 Resource usage (TPC-C) ● No overloaded threads Gargamel Passive

15 Throughput (TPC-C) ● Better throughput Gargamel Passive

16 Throughput (TPC-C) ● Better throughput Gargamel Passive

17 Penalisation (TPC-C) ● Better response time Gargamel Passive

18 100ms Price of concurrency (TPC-C) ● Good speedup even at high latency (up to 100ms) ● Kills are rare 300ms 500ms Cancelled (before execution) Kill (after termination) Kill (during execution) Transactions executed Optimum

19 Price of concurrency (TPC-E) ''Trade Result'' transaction is the bottleneck – Conflict parameter not available to oracle – TODO: add parameter to TPC-E 10ms 100ms optimum Transactions executed Cancelled (before execution) Kill (after termination)

20 Impact of message delay Higher kill/annullation rate in TPC-E ● Different incoming rate of bottleneck ● transaction (TPC-C : 88%, TPC-E 10%) Cancel Kil l

21 Contributions ● Conflict estimation = w/w conflicts = static analysis ● Chains = structured scheduler queues ● Load balancing + concurrency control = schedules ● Preventively parallelize unconflicting chains

22 Lesson learned ● Measure of optimum = single site, compare multiple site ● No concurrent conflicting transactions => no aborts ● Simulation result = benefits proportional to oracle accuracy and incoming rate ● Schedules as optimal as estimator is

23 Lesson learned ● Measure of optimum = single site, compare multiple site ● No concurrent conflicting transactions => no aborts ● Simulation result = benefits proportional to oracle accuracy and incoming rate ● Schedules as optimal as estimator is ● Trade-off black box STM / agreement off critical path ● Share-nothing => no resource contention

24 Lesson learned ● Measure of optimum = single site, compare multiple site ● No concurrent conflicting transactions => no aborts ● Simulation result = benefits proportional to oracle accuracy and incoming rate ● Schedules as optimal as estimator is ● Trade-off black box DB / agreement off critical path ● Share-nothing => no resource contention

25 Future Work ● Simulate Gargamel VS Tashkent+, and LARD (1m) ● Implementation + evaluation with real workloads (10m) ● Partial replication (similar to Tashkent+, comes almost for free) (1m) ● False negative => handle DB aborts (1m) ● Machine learning oracle (1m) ● Thesis (6m)