Pessimistic Software Lock-Elision Nir Shavit (Joint work with Yehuda Afek Alexander Matveev)

Slides:

Advertisements

Similar presentations

TRAMP Workshop Some Challenges Facing Transactional Memory Craig Zilles and Lee Baugh University of Illinois at Urbana-Champaign.

Advertisements

Privatization Techniques for Software Transactional Memory Michael F. Spear, Virendra J. Marathe, Luke Dalessandro, and Michael L. Scott University of.

Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.

SE-292 High Performance Computing

Transactional Locking Nir Shavit Tel Aviv University (Joint work with Dave Dice and Ori Shalev)

5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.

Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.

Outline CPU caches Cache coherence Placement of data Hardware synchronization instructions Correctness: Memory model & compiler Performance: Programming.

Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)

Multiple Processor Systems

Transactional Memory Overview Olatunji Ruwase Fall 2007 Oct

Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.

Submitted by: Omer & Ofer Kiselov Supevised by: Dmitri Perelman Networked Software Systems Lab Department of Electrical Engineering, Technion.

DMITRI PERELMAN IDIT KEIDAR TRANSACT 2010 SMV: Selective Multi-Versioning STM 1.

1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.

The Performance of Spin Lock Alternatives for Shared-Memory Microprocessors Thomas E. Anderson Presented by David Woodard.

1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.

1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.

1 Lecture 8: Transactional Memory – TCC Topics: “lazy” implementation (TCC)

CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.

CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.

Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P.

Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.

Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.

An Introduction to Software Transactional Memory

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Software Transactional Memory for Dynamic-Sized Data Structures Maurice Herlihy, Victor Luchangco, Mark Moir, William Scherer Presented by: Gokul Soundararajan.

Cosc 4740 Chapter 6, Part 3 Process Synchronization.

Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.

Concurrency Server accesses data on behalf of client – series of operations is a transaction – transactions are atomic Several clients may invoke transactions.

Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.

Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.

Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.

Transactional Memory Lecturer: Danny Hendler. 2 2 From the New York Times…

Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.

Optimistic Methods for Concurrency Control By: H.T. Kung and John Robinson Presented by: Frederick Ramirez.

CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.

CS510 Concurrent Systems Jonathan Walpole. A Methodology for Implementing Highly Concurrent Data Objects.

Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.

Hardware and Software transactional memory and usages in MRE

MULTIVIE W Slide 1 (of 21) Software Transactional Memory Should Not Be Obstruction Free Paper: Robert Ennals Presenter: Emerson Murphy-Hill.

Multidatabase Transaction Management COP5711. Multidatabase Transaction Management Outline Review - Transaction Processing Multidatabase Transaction Management.

Solving Difficult HTM Problems Without Difficult Hardware Owen Hofmann, Donald Porter, Hany Ramadan, Christopher Rossbach, and Emmett Witchel University.

Distributed Mutual Exclusion Synchronization in Distributed Systems Synchronization in distributed systems are often more difficult compared to synchronization.

4 November 2005 CS 838 Presentation 1 Nested Transactional Memory: Model and Preliminary Sketches J. Eliot B. Moss and Antony L. Hosking Presented by:

ECE 1747: Parallel Programming Short Introduction to Transactions and Transactional Memory (a.k.a. Speculative Synchronization)

Concurrent Revisions: A deterministic concurrency model. Daan Leijen & Sebastian Burckhardt Microsoft Research (OOPSLA 2010, ESOP 2011)

COMP 430 Intro. to Database Systems Transactions, concurrency, & ACID.

Novel Paradigms of Parallel Programming Prof. Smruti R. Sarangi IIT Delhi.

Adaptive Software Lock Elision

Maurice Herlihy and J. Eliot B. Moss, ISCA '93

Outline CPU caches Cache coherence Placement of data

Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam

Algorithmic Improvements for Fast Concurrent Cuckoo Hashing

Software Coherence Management on Non-Coherent-Cache Multicores

Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun

PHyTM: Persistent Hybrid Transactional Memory

Atomic Operations in Hardware

Faster Data Structures in Transactional Memory using Three Paths

Changing thread semantics

Lecture 6: Transactions

Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E

Lecture 22: Consistency Models, TM

Hybrid Transactional Memory

Software Transactional Memory Should Not be Obstruction-Free

Locking Protocols & Software Transactional Memory

Kernel Synchronization II

CSE 542: Operating Systems

Presentation transcript:

Pessimistic Software Lock-Elision Nir Shavit (Joint work with Yehuda Afek Alexander Matveev)

Read-Write Locks One of the most prevalent lock forms in concurrent applications – 80/20 rule applies to reading vs writing of data Mutex between write calls and between writes and read-only calls Allow read-only calls to proceed in parallel with one another

Coming Next Year: HTM and Hardware Lock Elision

Speculative Lock Elision (SLE) Thread 1 Start Acquire Speculate: try to execute the critical sections concurrently using transactions Start Acquire Lock Elided Start Release Thread 2 On failure: revert back to the lock Rajwar and Goodman: speculative execution of locks by optimistic hardware transactions (Haswell) Roy, Hand, and Harris: software implementation of SLE, transactions executed speculatively in software.

SLE: Good and Bad Advantages: Concurrency among writes and among reads and writes -- as long as they do not share/contend for memory Disadvantages: – Contention implies defaulting to lock Reads delayed by writes – System calls and I/O cannot be used will cause trans to fail – Debugging hard due to the speculative non- deterministic behavior Speculative execution breaks the lock semantics – you need to rewrite the code

Pessimistic Lock Elision (PLE) Non-speculatively replace read-write locks By pessimistic software transactions In a way that: – Preserves the lock semantics No code rewriting Allows I/O in transactions – Allows read-write concurrency always! Disadvantage: – Does not allow concurrency among writes How important is this for RW-locked code?

Pessimistic STM [MatveevShavit2011] A commit-time privatizing STM in which all transactions execute once and never abort And read-only transactions run in parallel with themselves and writes To create PLE, we designed a new encounter- order version of this pessimistic STM that wait-free read-only trans

Encounter Order Pessimistic STM Quiescence mechanism [MatveevShavit2010] to tell when reads terminate Write transactions execute sequentially (commits are serialized) by “passing a baton” Writes maintain a public undo log Wait-free reads collect a snapshot of the memory using undo log

Pessimistic Read-Write Interaction Write transactions must not write to locations being read by overlapping reads Solution: – On a write, the old value is logged publically before writing the new value – In read phase, logged values of concurrent writes are read – In the commit-phase, the old values are discarded after it is ensured using the quiesence mechanism that no-one reads them

Why does this work well? No need for CAS or even memory barriers in common case Even though logging is public, its only by one transaction at a time so very easy to implement

Applying Pessimistic Lock-Elision STM Compiler (Intel STM Compiler with PLE Transactions) Program with RW-Locks Program with PLE Processor with HLE (Intel’s Haswell) (HLE code is executed with software fallback to PLE) input output Standard Processor (PLE code is executed) execute Point 1 The semantics are not changed with PLE addition Point 2 Concurrency between read and write critical sections Point 3 HLE has limitations, but HLE + PLE does not have execute Point 4 PLE works on current processors

NORMAL HYPERTHREADS NUMA Performance We empirically evaluated our algorithm on an Intel 40-way machine with 2 Xeon E chips in a NUMA setup. 1. PLE: Our fully pessimistic encounter-time STM 2.RW_Lock_Egress: An ingress-egress counter based reader-writer mutex implementation for Intel platform. 3.MCS-Lock: Michael and Scott's MCS Lock 4.RW_Lock_SPAA: The new RWLock proposal from SPAA 2012

Three Ways to Elide Locks Software-only lock elision – If you don’t have hardware support A fall back (slow path) for the hardware HLE – Intel’s SLE A fall back using HTM – Intel’s RTM

If Your Machine Doesn’t Have Hardware Support Automatically replace at compile time all read- write locked code with PLE STM code – As easy as STM in new C++ compiler This will improve on your RW-locks because it will allow read-only calls to proceed in parallel with writes Write calls are sequential, but they were sequential anyhow…

If Your Machine Has SLE There is an XTEST instruction which returns true if the thread is currently executing in SLE Execute XTEST after the XACQUIRE instruction (the HLE transaction start instruction) At compile time create a duplicate PLE code path. If the XTEST fails, then the duplicate PLE path is executed

If Your Machine Has RTM Two copies: one copy is PLE path, the other is RTM code path: – RTM Hardware fall-back routine is PLE code path start – After the XBEGIN add a read (load) instruction of is_abort variable – PLE code path first executes small RTM transaction that updates is_abort – Causing all concurrently executing RTM transactions will fail

Lock-Elision Theory We are going to see a lot of use of lock elision in industry… So, what are the inherent costs of lock-elision using STMs? What are the inherent costs of pessimistic STM implementations? Can we quantify the interaction between hardware and software transactions (or with locks)

Thanks