Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

Slides:



Advertisements
Similar presentations
Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Advertisements

Read-Write Lock Allocation in Software Transactional Memory Amir Ghanbari Bavarsad and Ehsan Atoofian Lakehead University.
Privatization Techniques for Software Transactional Memory Michael F. Spear, Virendra J. Marathe, Luke Dalessandro, and Michael L. Scott University of.
Transaction Management: Concurrency Control CS634 Class 17, Apr 7, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Transactional Locking Nir Shavit Tel Aviv University (Joint work with Dave Dice and Ori Shalev)
Toward High Performance Nonblocking Software Transactional Memory Virendra J. Marathe University of Rochester Mark Moir Sun Microsystems Labs.
Pessimistic Software Lock-Elision Nir Shavit (Joint work with Yehuda Afek Alexander Matveev)
1 Chapter 3. Synchronization. STEMPusan National University STEM-PNU 2 Synchronization in Distributed Systems Synchronization in a single machine Same.
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
Ali Saoud Object Based Transactional Memory. Introduction Resent trends go towards object based SMT because it’s dynamic Word-based STM systems are more.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
TOWARDS A SOFTWARE TRANSACTIONAL MEMORY FOR GRAPHICS PROCESSORS Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry.
Submitted by: Omer & Ofer Kiselov Supevised by: Dmitri Perelman Networked Software Systems Lab Department of Electrical Engineering, Technion.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
University of Michigan Electrical Engineering and Computer Science 1 Parallelizing Sequential Applications on Commodity Hardware Using a Low-Cost Software.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 24: Transactional Memory Topics: transactional memory implementations.
1 Lecture 6: TM – Eager Implementations Topics: Eager conflict detection (LogTM), TM pathologies.
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
A Transaction-Friendly Dynamic Memory Manager for Embedded Multicore Systems Maurice Herlihy Joint with Thomas Carle, Dimitra Papagiannopoulou Iris Bahar,
An Introduction to Software Transactional Memory
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Software Transactional Memory for Dynamic-Sized Data Structures Maurice Herlihy, Victor Luchangco, Mark Moir, William Scherer Presented by: Gokul Soundararajan.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs.
On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.
Transactional Memory Lecturer: Danny Hendler. 2 2 From the New York Times…
The Relational Model1 Transaction Processing Units of Work.
Automatically Exploiting Cross- Invocation Parallelism Using Runtime Information Jialu Huang, Thomas B. Jablin, Stephen R. Beard, Nick P. Johnson, and.
Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support Man Cao Minjia Zhang.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
Technology from seed Exploiting Off-the-Shelf Virtual Memory Mechanisms to Boost Software Transactional Memory Amin Mohtasham, Paulo Ferreira and João.
Page 1 Concurrency Control Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
Consistency Oblivious Programming Hillel Avni Tel Aviv University.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
CS510 Concurrent Systems Jonathan Walpole. RCU Usage in Linux.
© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.
Hardware and Software transactional memory and usages in MRE
Solving Difficult HTM Problems Without Difficult Hardware Owen Hofmann, Donald Porter, Hany Ramadan, Christopher Rossbach, and Emmett Witchel University.
Read-Log-Update A Lightweight Synchronization Mechanism for Concurrent Programming Alexander Matveev (MIT) Nir Shavit (MIT and TAU) Pascal Felber (UNINE)
On Transactional Memory, Spinlocks and Database Transactions Khai Q. Tran Spyros Blanas Jeffrey F. Naughton (University of Wisconsin Madison)
Ran Liu (Fudan Univ. Shanghai Jiaotong Univ.)
Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam
Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Alex Kogan, Yossi Lev and Victor Luchangco
Part 2: Software-Based Approaches
PHyTM: Persistent Hybrid Transactional Memory
Faster Data Structures in Transactional Memory using Three Paths
Concurrency Control.
Changing thread semantics
Lecture 21: Transactional Memory
Hassium: Hardware Assisted Database Synchronization
Hybrid Transactional Memory
Software Transactional Memory Should Not be Obstruction-Free
Decomposing Hardware Lock Elision
Lecture 23: Transactional Memory
Lecture 21: Transactional Memory
CONCURRENCY Concurrency is the tendency for different tasks to happen at the same time in a system ( mostly interacting with each other ) .   Parallel.
Lecture: Transactional Memory
Dynamic Performance Tuning of Word-Based Software Transactional Memory
Presentation transcript:

Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT

Good: Hardware Transactional Memory (HTM) HTM may always fail due to: 1.L1 cache capacity 2.Interrupt 3.Unsupported instruction Bad: The HTM is “best-effort” To ensure progress, we need a software fallback

Thread 1Thread 2 1. HTM Start 2. Read lock and check it is free code … 4. HTM Commit 1. HTM Start 2. Read lock and check it is free code … 4. HTM Commit No conflict – HTMs commit concurrently No conflict – HTMs commit concurrently A Possible Solution is: Lock Elision 1. Lock 2. Unlock

Thread 1Thread 2 1. HTM Start 2. Read lock and check it is free code … 1. HTM Start 2. Read lock and check it is free code … No concurrency between hardware and software Thread 3 1. HTM Start 2. Read lock and check it is free code …3.... FAIL … HTM Restart 1. Acquire Lock code … 3. Release Lock CONFLICT … HTM Restart CONFLICT … HTM Restart Wait for Lock A Possible Solution is: Lock Elision

Good – Simple: No need to instrument reads and writes Bad: – Serial fallback: A software fallback grabs the global lock and aborts all hardware transactions A Possible Solution is: Lock Elision

Thread 1Thread 2 1. HTM Start 2. Read lock and check it is free code … 1. HTM Start 2. Read lock and check it is free code … Thread 3 1. HTM Start 2. Read lock and check it is free code …3.... FAIL … HTM Restart 1. STM Start code … 3. … more code … more code … more code STM and HTM execute concurrently Another Approach is: Hybrid Transactional Memory

Good – Hardware-Software Concurrency Bad: – Complex: 1.Hard to coordinate hardware and software 2.Hard to apply to code due to instrumentation Another Approach is: Hybrid Transactional Memory Our focus GCC C/C++ TM helps here a lot

2006: First Hybrid TM [DamronFedorovaLevLuchangcoMoirNussbaum] – Key Idea: Use per location metadata version- locks to coordinate hardware and software Bad: – Hardware is slow: on each read/write must read the version-lock and execute a branch condition check Hybrid TM History

2007: Phased TM [LevMoirNussbaum] – Key Idea: Use HTM mode or STM mode, but not HTM and STM at the same time Bad: – Expensive to switch modes: a single fallback must stop all hardware Hybrid TM History

2011: Hybrid Norec (state-of-the-art) [DalessandroCarougeWhiteLevMoirScottSpear] – Key Idea: No metadata + global clock for coordination Hybrid TM History

Good – No metadata: Efficient for low concurrency Bad: – Limited Scalability: too much aborts due to global clock updates A software write must abort all hardware A hardware write must abort all software Hybrid NOrec

Slow-Path: Software Read X (pure) Lock clock ABORT X = 4 Fast-Path: Hardware Unlock clock Read clock Read X Read clock RESTART Update clock Read X (verify clock) Read X: check clock => changed => restart/revalidate

2011: Hybrid NOrec 2 [RiegelMarlierNowackFelberFetzer] – Key Idea: Use non-speculative reads inside HTM to verify the global clock and avoid unnecessary aborts Bad: – HTM of Intel and IBM has no support for non- speculative reads A Possible Solution

2014: Invyswell Hybrid [CalciuGottschlichShpeismanPokamHerlihy] – Key Idea: Allow unsafe concurrency between hardware and software, and use the HTM sandboxing to detect and handle errors A Recent Approach

Invyswell Slow-Path: Software Read X (NEW) Lock clock X = 4 (NEW) Read Y (OLD) Func(X, Y): Unsafe Hopes HTM aborts Y = 8 (NEW) Unlock clock Update clock Fast-Path: Hardware NO ABORT FUTURE

Good – Much less aborts than Hybrid Norec Bad: – Unfortunately, HTM sandboxing may miss errors, so a corrupted transactions may commit and crash the system: – This problem was shown in a recent work: “Pitfalls of Lazy Subscription” by [DiceHarrisKoganLevMoir] Invyswell

2015: RH NOrec [MatveevShavit] – Key Idea: Use a “mixed” fallback path, that uses both software and short hardware transactions Our New Approach

RH NOrec Slow-Path: Software Read X (NEW) Lock clock X = 4 (NEW) Read Y (OLD) Func(X, Y): Unsafe Hopes HTM aborts Y = 8 (NEW) Unlock clock Update clock Fast-Path: Hardware X = 4 (HIDDEN) Y = 8 (HIDDEN) HTM X and Y both OLD or both NEW – not a mix Read X (OLD) Read Y (OLD) Func(X, Y) Safe! A Writes are speculative (invisible) Mixed Slow-Path

Key Point 1: Execute software writes in a short hardware transaction – No need to abort hardware transactions – Full safety In practice this works well – Due to the 80:20 rule: a typical operation has 80% reads and 20% writes RH NOrec

Key Point 2: Execute a maximal amount of initial software reads in a read-only hardware transaction – Allows to defer the global clock read, and significantly reduce the software restarts/revalidations RH NOrec

HTM start …reads/writes… Update clock HTM commit Fast-Path: Hardware Mixed Path Read clock RESTART Read some X: check clock => changed => restart/revalidate … reads in software … (verifies clock)

HTM start …reads/writes… Update clock HTM commit HTM start …reads in HTM… (pure/direct) Read clock HTM commit HTM Prefix Fast-Path: Hardware Mixed Path NO ABORT

HTM start …reads/writes… Update clock HTM commit HTM start …reads in HTM… (pure/direct) Read clock HTM commit HTM Prefix …reads in software… HTM start HTM commit HTM Postfix Lock clock …writes in HTM… Unlock clock HTM start Update clock HTM commit NO ABORT …reads/writes…

Throughput on 8-core Intel (GCC C/C++)

RH Norec: a new Hybrid TM that is safe and scalable Key Idea: Use a “mixed” fallback path that uses two short hardware transactions: 1.HTM Prefix: Executes a maximal amount of initial reads – defers the global clock read 2.HTM Postfix: Executes the software writes – preserves safety and allows hardware- software concurrency Conclusion

Thank You