Transactional Locking Nir Shavit Tel Aviv University Joint work with Dave Dice and Ori Shalev.

Slides:



Advertisements
Similar presentations
Time-based Transactional Memory with Scalable Time Bases Torvald Riegel, Christof Fetzer, Pascal Felber Presented By: Michael Gendelman.
Advertisements

Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Transactional Locking Nir Shavit Tel Aviv University (Joint work with Dave Dice and Ori Shalev)
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
Software Transactional Memory Kevin Boos. Two Papers Software Transactional Memory for Dynamic-Sized Data Structures (DSTM) – Maurice Herlihy et al –
Toward High Performance Nonblocking Software Transactional Memory Virendra J. Marathe University of Rochester Mark Moir Sun Microsystems Labs.
Pessimistic Software Lock-Elision Nir Shavit (Joint work with Yehuda Afek Alexander Matveev)
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
Toward Efficient Support for Multithreaded MPI Communication Pavan Balaji 1, Darius Buntinas 1, David Goodell 1, William Gropp 2, and Rajeev Thakur 1 1.
Advanced Topics in Computing Winter 2009: Reliable Distributed Systems Oved Itzhak
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Concurrent Data Structures in Architectures with Limited Shared Memory Support Ivan Walulya Yiannis Nikolakopoulos Marina Papatriantafilou Philippas Tsigas.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
TOWARDS A SOFTWARE TRANSACTIONAL MEMORY FOR GRAPHICS PROCESSORS Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry.
DMITRI PERELMAN IDIT KEIDAR TRANSACT 2010 SMV: Selective Multi-Versioning STM 1.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Johannes Schneider Transactional Memory: How to Perform Load Adaption in a Simple And Distributed Manner Johannes Schneider David Hasenfratz Roger Wattenhofer.
Formalisms and Verification for Transactional Memories Vasu Singh EPFL Switzerland.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
EPFL - March 7th, 2008 Interfacing Software Transactional Memory Simplicity vs. Flexibility Vincent Gramoli.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 21: Synchronization Topics: lock implementations (Sections )
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
Software Transactional Memory Nir Shavit Tel-Aviv University and Sun Labs “Where Do We Come From? What Are We? Where Are We Going?”
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
Copyright © 2010, Oracle and/or its affiliates. All rights reserved. Who’s Afraid of a Big Bad Lock Nir Shavit Sun Labs at Oracle Joint work with Danny.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
An Introduction to Software Transactional Memory
Relativistic Red Black Trees. Relativistic Programming Concurrent reading and writing improves performance and scalability – concurrent readers may disagree.
Software Transactional Memory for Dynamic-Sized Data Structures Maurice Herlihy, Victor Luchangco, Mark Moir, William Scherer Presented by: Gokul Soundararajan.
Art of Multiprocessor Programming 1 Transactional Memory Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.
WG5: Applications & Performance Evaluation Pascal Felber
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.
Transactional Memory Lecturer: Danny Hendler. 2 2 From the New York Times…
Technology from seed Exploiting Off-the-Shelf Virtual Memory Mechanisms to Boost Software Transactional Memory Amin Mohtasham, Paulo Ferreira and João.
Consistency Oblivious Programming Hillel Avni Tel Aviv University.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Hardware and Software transactional memory and usages in MRE
AtomCaml: First-class Atomicity via Rollback Michael F. Ringenburg and Dan Grossman University of Washington International Conference on Functional Programming.
MULTIVIE W Slide 1 (of 21) Software Transactional Memory Should Not Be Obstruction Free Paper: Robert Ennals Presenter: Emerson Murphy-Hill.
ECE 1747: Parallel Programming Short Introduction to Transactions and Transactional Memory (a.k.a. Speculative Synchronization)
Transactional Memory Companion slides for
Multiprocessor Programming
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
PHyTM: Persistent Hybrid Transactional Memory
Transactional Memory Companion slides for
Atomic Operations in Hardware
Faster Data Structures in Transactional Memory using Three Paths
Lecture 6: Transactions
Yiannis Nikolakopoulos
Lecture 22: Consistency Models, TM
Does Hardware Transactional Memory Change Everything?
Hybrid Transactional Memory
Locking Protocols & Software Transactional Memory
CSE 153 Design of Operating Systems Winter 19
Presentation transcript:

Transactional Locking Nir Shavit Tel Aviv University Joint work with Dave Dice and Ori Shalev

object Shared Memory Concurrent Programming How do we make the programmer’s life simple without slowing computation down to a halt?!

A FIFO Queue bcd TailHead a Enqueue(d)Dequeue() => a

A Concurrent FIFO Queue synchronized{} Object lock bcd TailHead a P: Dequeue() => a Q: Enqueue(d)

Fine Grain Locks bcd TailHead a P: Dequeue() => a Q: Enqueue(d) Better Performance, More Complex Code Worry about deadlock, livelock…

Lock-Free (JSR-166) bcd TailHead a P: Dequeue() => a Q: Enqueue(d) Even Better Performance, Even More Complex Code Worry about deadlock, livelock, subtle bugs, hard to modify…

Transactional Memory [Herlihy-Moss] bcd TailHead a P: Dequeue() => a Q: Enqueue(d) Don’t worry about deadlock, livelock, subtle bugs, etc… Great Performance, Simple Code

Transactional Memory [Herlihy-Moss] bcd TailHead a P: Dequeue() => a Q: Enqueue(d) Don’t worry about deadlock, livelock, subtle bugs, etc… b TailHead a Great Performance, Simple Code

TM: How Does It Work synchronized{ } atomic Execute all synchronized instructions as an atomic transaction… Simplicity of Global Lock with Granularity of Fine-Grained Implementation

Hardware TM [Herlihy-Moss] Limitations: atomic{ } Machines will differ in their support When we build 1000 instruction transactions, it will not be for free…

Software Transactional Memory Implement transactions in Software All the flexibility of hardware…today Ability to extend hardware when it is available (Hybrid TM) But there are problems: –Performance? –Ease of programming (software engineering)? –Mechanical code transformation?

The Breif History of STM 1993 STM (Shavit,Touitou) 2003 DSTM (Herlihy et al) 2003 WSTM (Fraser, Harris) Lock-free 2003 OSTM (Fraser, Harris) 2004 ASTM (Marathe et al) 2004 T-Monitor (Jagannathan…) Obstruction-free Lock-based 2005 Lock-OSTM (Ennals) 2004 HybridTM (Moir) 2004 Meta Trans (Herlihy, Shavit) 2005 McTM (Saha et al) 2006 AtomJava (Hindman…) 1997 Trans Support TM (Moir) 2005 TL (Dice, Shavit))

As Good As Fine Grained Postulate (i.e. take it or leave it): If we could implement fine-grained locking with the same simplicity of course grained, we would never think of building a transactional memory. Implication: Lets try to provide TMs that get as close as possible to hand-crafted fine-grained locking.

Premise of Lock-based STMs 1.Memory Lifecycle: work with GC or any malloc/free 2.Transactification: allow mechanical transformation of sequential code 3.Performance: match fine grained 4.Safety: work on coherent state Unfortunately: Hybrid, Ennals, Saha, AtomJava deliver only 2 and 3 (in some cases)…

Transactional Locking TL2 Delivers all four properties How ? - Unlike all prior algs: use Commit time locking instead of Encounter order locking - Introduce Version Clock mechanism for validation

TL Design Choices Map Array of Versioned- Write-Locks Application Memory PS = Lock per Stripe (separate array of locks) PO = Lock per Object (embedded in object) V#

Encounter Order Locking (Undo Log) 1.To Read: load lock + location 2.Check unlocked add to Read-Set 3.To Write: lock location, store value 4.Add old value to undo-set 5.Validate read-set v#’s unchanged 6.Release each lock with v#+1 V# 0 X V# 1 V# 0 Y V# 1 V# 0 Mem Locks V#+1 0 V# 0 V#+1 0 V# 0 V#+1 0 V# 0 X Y Quick read of values freshly written by the reading transaction [Ennals,Hybrid,Saha,Harris,…]

Commit Time Locking (Write Buff) 1.To Read: load lock + location 2.Location in write-set? (Bloom Filter) 3.Check unlocked add to Read-Set 4.To Write: add value to write set 5.Acquire Locks 6.Validate read/write v#’s unchanged 7.Release each lock with v#+1 V# 0 Mem Locks V#+1 0 V# 0 Hold locks for very short duration V# 1 X Y V#+1 0 V# 1 V#+1 0 V# 0 V#+1 0 V# 0 V#+1 0 V# 0 X Y [TL,TL2]

Why COM and not ENC? 1.Under low load they perform pretty much the same. 2.COM withstands high loads (small structures or high write %). ENC does not withstand high loads. 3.COM works seamlessly with Malloc/Free. ENC does not work with Malloc/Free.

COM vs. ENC High Load ENC Hand MCS COM Red-Black Tree 20% Delete 20% Update 60% Lookup

COM vs. ENC Low Load COM ENC Hand MCS Red-Black Tree 5% Delete 5% Update 90% Lookup

Obnoxious Statement About Benchmarking Pick sequential algorithms and show how they do when parallelized Please no more Specjbb, Splash… Compare to other STMs and hand- crafted fine-grained implementations

COM: Works with Malloc/Free PS Lock Array A B To free B from transactional space: 1.Wait till its lock is free. 2.Free(B) B is never written inconsistently because any write is preceded by a validation while holding lock V# VALIDATE X FAILS IF INCONSISTENT

ENC: Fails with Malloc/Free PS Lock Array A B Cannot free B from transactional space because undo-log means locations are written after every lock acquisition and before validation. Possible solution: validate after every lock acquisition (yuck) V# VALIDATE X

Problem: Application Safety 1.All current lock based STMs work on inconsistent states. 2.They must introduce validation into user code at fixed intervals or loops, use traps, OS support,… 3.And still there are cases, however rare, where an error could occur in user code…

Solution: TL2’s “Version Clock” Have one shared global version clock Incremented by (small subset of) writing transactions Read by all transactions Used to validate that state worked on is always consistent Later: how we learned not to worry about contention and love the clock

Version Clock: Read-Only COM Trans 1.RV  VClock 2.On Read: read lock, read mem, read lock: check unlocked, unchanged, and v# <= RV 3.Commit V# V# Mem Locks Reads form a snapshot of memory. No read set! 100 VClock V# V#

Version Clock: Writing COM Trans 1.RV  VClock 2.On Read/Write: check unlocked and v# <= RV then add to Read/Write-Set 3.Acquire Locks 4.WV = F&I(VClock) 5.Validate each v# <= RV 6.Release locks with v#  WV Reads+Inc+Writes =Linearizable 100 VClock V# Mem Locks X Y Commit V# V# RV X Y

Version Clock Implementation On sys-on-chip like Sun T200™ Niagara: virtually no contention, just CAS and be happy On others: add TID to VClock, if VClock has changed since last write can use new value +TID. Reduces contention by a factor of N. Future: Coherent Hardware VClock that guarantees unique tick per access.

Performance Benchmarks Mechanically Transformed Sequential Red-Black Tree using TL2 Compare to STMs and hand-crafted fine-grained Red-Black implementation On a 16–way Sun Fire™ running Solaris™ 10

Uncontended Large Red-Black Tree 5% Delete 5% Update 90% Lookup Hand- crafted TL/PS TL2/PS TL/PO TL2/P0 Ennals Farser Harris Lock- free

Uncontended Small RB-Tree 5% Delete 5% Update 90% Lookup TL/P0 TL2/P0

Contended Small RB-Tree 30% Delete 30% Update 40% Lookup Ennals TL/P0 TL2/P0

Speedup: Normalized Throughput Hand- Crafted TL/PO Large RB-Tree 5% Delete 5% Update 90% Lookup

Overhead Overhead Overhead STM scalability is as good if not better than hand-crafted, but overheads are much higher Overhead is the dominant performance factor – bodes well for HTM Read set and validation cost (not locking cost) dominates performance

On Sun T200™ (Niagara): maybe a long way to go… RB-tree 5% Delete 5% Update 90% Lookup Hand- crafted STMs

Detail of RB-tree STMs Only RB-tree 5% Delete 5% Update 90% Lookup

Conclusions COM time locking, implemented efficiently, has clear advantages over ENC order locking: –No meltdown under contention –Working seamlessly with malloc/free VCounter can guarantee safety so we –don’t need to embed repeated validation in user code

What Next? Further improve performance Make TL1 and TL2 library available Mechanical code transformation tool… Cut read-set and validation overhead, maybe with hardware support? Add hardware VClock to Sys-on-chip.

Thank You