Gargamel: A Conflict-Aware Contention Resolution Policy for STM Pierpaolo Cincilla, Marc Shapiro, Sébastien Monnet
2 High contention => Repetitive aborts, poor performance ● Transactions in a TM execute speculatively: ● Commit if no conflicts ● Abort otherwise Low contention => Good performance
3 Client Load balancer Black box DB Concurrency control, contention => Distributed configurations scale poorly ● Single thread DBs perform well ● Scale => parallel, distributed
4 Current approaches: Acts after conflict detection Do not avoid wasted work => ● TM schedulers Avoid conflicts to appear } Avoid wasted work => ● Contention ● Manager }
5 Current approaches : Contention Manager ● Polite ● Karma ● Greedy ● Serializer Acts after conflict detection } Do not avoid wasted work => ● TM schedulers ● Polite ● Karma ● Greedy ● Serializer Avoid conflicts to appear } Avoid wasted work =>
6 Current approaches ● Memory partition => No global transactions ● Centralise writes ● Read-dominated workloads ● Weaken semantics (key-value store, NoSQL) ● Problematic for applications => SI }
7 Intuition Throughput-optimal = ● No concurrent conflicting ● transactions => no aborts ● 1 site = 1 scheduler + n workers threads – Conflicting => sequential ''chain'' – Non conflicting => parallel Conflict estimation
8 Intuition Throughput-optimal = ● No concurrent conflicting ● transactions => no aborts ● 1 site = 1 scheduler/load balancer + n workers – Conflicting => sequential ''chain'' – Non conflicting => parallel Conflict estimation
9 The Gargamel scheduling algorithm 1 Scheduler clients 1 Conflict ?
Architecture Scheduler 2 Threads Fault tolerance, latency, load => several sites Clients Site 1 Scheduler 1 Threads Site 2 Transaction transmission + agreement Clients
11 2 The Gargamel scheduling algorithm (2) Scheduler 1 clients Scheduler 2 clients Execution Canc elled AGREEMENT Transactions Agree in the oucome of bet Bet 7 1 Killed
12 Simulation results Measure: ● Resource usage – Number of running threads in ressources bounded systems ● Price of concurrency – Cancel/kill rate with respect to number of sites, message delay ● Simulation engine ● Simulated workload : TPC-C – Vary = number of sites, distance between sites (latency within sites =0) – Constant : workload
13 Resource usage (TPC-C) ● No overloaded threads Gargamel Passive
14 Resource usage (TPC-C) ● No overloaded threads Gargamel Passive
15 Throughput (TPC-C) ● Better throughput Gargamel Passive
16 Throughput (TPC-C) ● Better throughput Gargamel Passive
17 Penalisation (TPC-C) ● Better response time Gargamel Passive
18 100ms Price of concurrency (TPC-C) ● Good speedup even at high latency (up to 100ms) ● Kills are rare 300ms 500ms Cancelled (before execution) Kill (after termination) Kill (during execution) Transactions executed Optimum
19 Price of concurrency (TPC-E) ''Trade Result'' transaction is the bottleneck – Conflict parameter not available to oracle – TODO: add parameter to TPC-E 10ms 100ms optimum Transactions executed Cancelled (before execution) Kill (after termination)
20 Impact of message delay Higher kill/annullation rate in TPC-E ● Different incoming rate of bottleneck ● transaction (TPC-C : 88%, TPC-E 10%) Cancel Kil l
21 Contributions ● Conflict estimation = w/w conflicts = static analysis ● Chains = structured scheduler queues ● Load balancing + concurrency control = schedules ● Preventively parallelize unconflicting chains
22 Lesson learned ● Measure of optimum = single site, compare multiple site ● No concurrent conflicting transactions => no aborts ● Simulation result = benefits proportional to oracle accuracy and incoming rate ● Schedules as optimal as estimator is
23 Lesson learned ● Measure of optimum = single site, compare multiple site ● No concurrent conflicting transactions => no aborts ● Simulation result = benefits proportional to oracle accuracy and incoming rate ● Schedules as optimal as estimator is ● Trade-off black box STM / agreement off critical path ● Share-nothing => no resource contention
24 Lesson learned ● Measure of optimum = single site, compare multiple site ● No concurrent conflicting transactions => no aborts ● Simulation result = benefits proportional to oracle accuracy and incoming rate ● Schedules as optimal as estimator is ● Trade-off black box DB / agreement off critical path ● Share-nothing => no resource contention
25 Future Work ● Simulate Gargamel VS Tashkent+, and LARD (1m) ● Implementation + evaluation with real workloads (10m) ● Partial replication (similar to Tashkent+, comes almost for free) (1m) ● False negative => handle DB aborts (1m) ● Machine learning oracle (1m) ● Thesis (6m)