Toward High Performance Nonblocking Software Transactional Memory Virendra J. Marathe University of Rochester Mark Moir Sun Microsystems Labs.

Slides:



Advertisements
Similar presentations
CM20145 Concurrency Control
Advertisements

Time-based Transactional Memory with Scalable Time Bases Torvald Riegel, Christof Fetzer, Pascal Felber Presented By: Michael Gendelman.
Concurrent programming for dummies (and smart people too) Tim Harris & Keir Fraser.
Privatization Techniques for Software Transactional Memory Michael F. Spear, Virendra J. Marathe, Luke Dalessandro, and Michael L. Scott University of.
Crash Recovery John Ortiz. Lecture 22Crash Recovery2 Review: The ACID properties  Atomicity: All actions in the transaction happen, or none happens 
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Transactional Locking Nir Shavit Tel Aviv University (Joint work with Dave Dice and Ori Shalev)
CS510 – Advanced Operating Systems 1 The Synergy Between Non-blocking Synchronization and Operating System Structure By Michael Greenwald and David Cheriton.
Transactional Memory Supporting Large Transactions Anvesh Komuravelli Abe Othman Kanat Tangwongsan Hardware-based.
Virendra J. Marathe, William N. Scherer III, and Michael L. Scott Department of Computer Science University of Rochester Presented by: Armand R. Burks.
Software Transactional Memory Kevin Boos. Two Papers Software Transactional Memory for Dynamic-Sized Data Structures (DSTM) – Maurice Herlihy et al –
McRT-Malloc: A Scalable Non-Blocking Transaction Aware Memory Allocator Ali Adl-Tabatabai Ben Hertzberg Rick Hudson Bratin Saha.
Locality-Conscious Lock-Free Linked Lists Anastasia Braginsky & Erez Petrank 1.
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
Ali Saoud Object Based Transactional Memory. Introduction Resent trends go towards object based SMT because it’s dynamic Word-based STM systems are more.
Database Systems, 8 th Edition Concurrency Control with Time Stamping Methods Assigns global unique time stamp to each transaction Produces explicit.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
TOWARDS A SOFTWARE TRANSACTIONAL MEMORY FOR GRAPHICS PROCESSORS Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry.
Transaction Management and Concurrency Control
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Supporting Nested Transactional Memory in LogTM Authors Michelle J Moravan Mark Hill Jayaram Bobba Ben Liblit Kevin Moore Michael Swift Luke Yen David.
1 Lecture 6: TM – Eager Implementations Topics: Eager conflict detection (LogTM), TM pathologies.
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
Crash recovery All-or-nothing atomicity & logging.
1 Lecture 7: Lazy & Eager Transactional Memory Topics: details of “lazy” TM, scalable lazy TM, implementation details of eager TM.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
Data Concurrency Control And Data Recovery
Copyright 2007 Sun Microsystems, Inc SNZI: Scalable Non-Zero Indicator Yossi Lev (Brown University & Sun Microsystems Laboratories) Joint work with: Faith.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.
Implicitly-Multithreaded Processors Il Park and Babak Falsafi and T. N. Vijaykumar Presented by: Ashay Rane Published in: SIGARCH Computer Architecture.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
Lowering the Overhead of Software Transactional Memory Virendra J. Marathe, Michael F. Spear, Christopher Heriot, Athul Acharya, David Eisenstat, William.
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
Computer Science Lecture 13, page 1 CS677: Distributed OS Last Class: Canonical Problems Election algorithms –Bully algorithm –Ring algorithm Distributed.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
Chapter 10 Recovery System. ACID Properties  Atomicity. Either all operations of the transaction are properly reflected in the database or none are.
CS333 Intro to Operating Systems Jonathan Walpole.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.
9 1 Chapter 9_B Concurrency Control Database Systems: Design, Implementation, and Management, Rob and Coronel.
10 1 Chapter 10_B Concurrency Control Database Systems: Design, Implementation, and Management, Rob and Coronel.
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects MAGED M. MICHAEL PRESENTED BY NURIT MOSCOVICI ADVANCED TOPICS IN CONCURRENT PROGRAMMING,
18 September 2008CIS 340 # 1 Last Covered (almost)(almost) Variety of middleware mechanisms Gain? Enable n-tier architectures while not necessarily using.
NB-FEB: A Universal Scalable Easy- to-Use Synchronization Primitive for Manycore Architectures Phuong H. Ha (Univ. of Tromsø, Norway) Philippas Tsigas.
Computer Science Lecture 13, page 1 CS677: Distributed OS Last Class: Canonical Problems Election algorithms –Bully algorithm –Ring algorithm Distributed.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Chapter 13 Managing Transactions and Concurrency Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.

Memory Hierarchy Ideal memory is fast, large, and inexpensive
DURABILITY OF TRANSACTIONS AND CRASH RECOVERY
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Part 2: Software-Based Approaches
PHyTM: Persistent Hybrid Transactional Memory
Faster Data Structures in Transactional Memory using Three Paths
Transaction Management
A Qualitative Survey of Modern Software Transactional Memory Systems
Changing thread semantics
Lecture 6: Transactions
Chapter 10 Transaction Management and Concurrency Control
Hybrid Transactional Memory
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Software Transactional Memory Should Not be Obstruction-Free
Dynamic Performance Tuning of Word-Based Software Transactional Memory
CSE 542: Operating Systems
Presentation transcript:

Toward High Performance Nonblocking Software Transactional Memory Virendra J. Marathe University of Rochester Mark Moir Sun Microsystems Labs

2 Nonblocking Progress & Transactional Memory Nonblocking Progress – arbitrary delays in some threads do not prevent others from making forward progress TM research began for nonblocking concurrent algorithms [Herlihy&Moss ISCA’93] Early software TMs (STMs) were nonblocking, but slow Recent shift toward blocking STMs Significant performance improvements General argument – nonblocking STMs are fundamentally slow We were not convinced

3 Agenda Why is nonblocking progress important? Background on STM Implementations What makes nonblocking STMs slow? Making nonblocking STMs fast Experimental Results Conclusions

4 The Virtues of Nonblocking Progress Tolerance from arbitrary delays due to Preemption, Page faults, Thread faults External scheduler support mitigates some problems, but Not portable Ideally contain the problem within the STM Environments where blocking is unacceptable TxLinux interrupt handler transactions

5 Agenda Why is nonblocking progress important? Background on STM Implementations What makes nonblocking STMs slow? Making nonblocking STMs fast Experimental Results Conclusions

6 STM Implementations Transactions execute speculatively Reads and writes use STM metadata Speculative writes typically acquire ownership of locations (using atomic ops. e.g. CAS) Reads are typically logged in a private read set for validation at commit time Post-commit/abort cleanup Make speculative updates non-speculative, or rollback speculative updates Release ownership of locations This forces waiting in blocking STMs

7 STM Implementations Two types of implementations for speculative writes: Redo Log – writes made to private buffer, and flushed out on commit ownership acquisition can be done at first write (eager acquire) or commit time (lazy acquire) Undo Log – writes are made directly to memory (need eager acquire), old values are logged in a private buffer, and old values are restored in case of an abort Read set validation to ensure isolation Several schemes (e.g. incremental, commit counter, timestamp, etc.)

8 Agenda Why is nonblocking progress important? Background on STM Implementations What makes nonblocking STMs slow? Making nonblocking STMs fast Experimental Results Conclusions

9 What makes nonblocking STMs slow? In Blocking STMs Transaction waits for a conflicting transaction in its post-commit/abort cleanup phase These usually lead to overheads in the (contention-free) common case Nonblocking STMs avoid waiting with Indirection (object-based STMs) Copying and Cloning Helping Stealing (Harris & Fraser; also our approach)

10 What makes blocking STMs fast? Significantly less overhead in the common case Simple metadata structure Streamlined fast path Performance optimizations Timestamp based validation We need to incorporate all these features in a nonblocking STM to make it competitive

11 Agenda Why is nonblocking progress important? Background on STM Implementations What makes nonblocking STMs slow? Making nonblocking STMs fast Experimental Results Conclusions

12 Our Contributions Keep the common case simple Resort to complicated case only when cleanup is delayed More streamlined common case execution path Incorporate recent optimizations (timestamp based validation)

13 STM Data Structures Word-based STM Conflict detection at granularity of contiguous blocks of memory Appropriate for unmanaged languages – C, C++ A table of ownership records (orecs) Each heap location hashes into a single orec Each orec indicates if currently owned or free, and identifies the owner Transaction Descriptor Read set Write set (redo log) – a 2D list, each row corresponds to an acquired orec Status – Active/Aborted/Committed

14 Common Case Execution Algorithm behaves like a blocking STM in the absence of contention Log reads, writes of transaction Acquire ownership of write set locations via their orecs Ensure that reads are still consistent (read set validation) Flush out updates after commit/abort Release orecs

15 Uncommon Case: Stealing Two flags in the orec for the stealing process stolen_orec : for orec’s stolen/unstolen state copier_exists : indicates if there exists an owner in cleanup phase

16 Stealing Example third owner (stealer 2) Shared HeapOwnership Records (orec) hashing ver# ID, flags T1 COMMITTED o1 o2 o3 o4 o5 OWNER T2 ACTIVE T3 ACTIVE STEALER 1 STEALER 2 S C locX Copyback in progress 001 locX:11 Write Set locX:11 Write Set 1 locX:12 Write Set 1011 Copyback complete 0 Redo Copyback 0 Clear C 10 locX’s logical value locX:12 T2 COMMITTED 12

17 Stealing Complexity Stealing mechanism quite complex Several corner case race conditions need to be handled (read the paper for further details) Overhead of accessing stolen locations is quite high, requiring a lookup in the last stealer’s write set However, we can throttle stealing and make it an uncommon case

18 Streamlining Common Case To release acquired orecs prior nonblocking STMs required Expensive synch. instructions (e.g. CAS) Indirection & garbage collection Blocking STMs use store instruction So do we (details in the paper)

19 Timestamps and Validation A significant optimization to read set validation (e.g. TL2) Log time at which orec was modified (done when owner releases orec) A reader checks if the orec was modified after it began execution, and if so, aborts conservatively

20 Adding Timestamps Recall: orec contains a pointer to the owner Superimpose a timestamp on this pointer A writer releases orec by storing back the current global time Timestamps lowered the cost of read set validation significantly

21 Undo Log Variant We have developed the first nonblocking undo log STM through simple modifications to a redo log variant Stealing of orecs happens in the redo log STM when a committed owner is delayed In undo log STMs stealing largely happens when an aborted owner is delayed Logical values of locations are in aborted owner’s undo log

22 Agenda Why is nonblocking progress important? Background on STM Implementations What makes nonblocking STMs slow? Making nonblocking STMs fast Experimental Results Conclusions

23 Experimental Platform Implementation of all STMs done in C Throughput tests conducted on microbenchmarks Scalable workloads: hash table, binary search tree Torture tests (no scaling): counter, array of counters Tests conducted on a 16 processor Sun Fire machine We compared the following STMs TL2, TL2 with schedctl calls to avoid preemption pathologies, Harris and Fraser’s word-based nonblocking STM Our Base blocking and nonblocking variants (do not contain store-based release and optimizations), and 3 variants of our Optimized STM (eager redo log, lazy redo log, undo log)

24 Binary Search Tree Our Optimized STMs TL2 HF-STM Base NB

25 Hash Table TL2-SchedTL2 Our Optimized STMs

26 Array of Counters TL2-Sched TL2 Redo Log Undo Log

27 Array of Counters – Stealing rate Redo Log Undo Log

28 Conclusion We presented several variants of a new STM that Effectively decouples the common case from nonblocking infrastructure Enables a more streamlined fast path (comparable to state-of-the-art blocking STMs) Enables integration of key optimizations such as Timestamp-based transaction validation We have shown that common case performance of nonblocking STMs can be made competitive with state-of-the-art blocking STMs

29 Thank You! Questions?

30 Common Case Example third owner (stealer 2) Shared HeapOwnership Records (orec) hashing ver# ID, flags T1 ACTIVE o1 o2 o3 o4 o5 S C locX Copyback in progress locX:11 Write Set 1011 Copyback complete locX’s logical value 0 T1 COMMITTED Release Store

31 Basic Idea Transaction steals ownership of the location under conflict Inspired by Harris & Fraser’s WSTM Stealing Requires complex metadata management Leads to high latency reads and writes Switch the stolen location back to unstolen state as quickly as possible

32 Phase-I STM: Switching orec back to Unstolen state If an orec is stolen, logical values of mapping locations may be in the last stealer’s write set (pointed by the orec) Stealer will reuse such a write set row (for a new transaction) only after it is reclaimed Subsequent stealer that comes across a stolen orec with ( copier_exists == false ) switches orec to unstolen state Stealing-releasing is a complex process

33 Phase-I STM: Illustration third owner (stealer 2) Shared HeapOwnership Records (orec) hashing ver# ID, flags T1 COMMITTED o1 o2 o3 o4 o5 First owner T2 ACTIVE T3 ACTIVE Second owner (stealer 1) Third owner (stealer 2) S C 0 1 Clear C 1 00

34 STM API stm_begin(my_txn) : Initializes a transacation stm_read(my_txn,loc) : Speculative read of location loc stm_write(my_txn,loc,val) : Speculative write val to loc stm_commit(my_txn) : Attempt to commit transaction

35 Phase-I STM: Example third owner (stealer 2) Shared HeapOwnership Records (orec) hashing ver# ID, flags T1 COMMITTED o1 o2 o3 o4 o5 First owner T2 ACTIVE T3 ACTIVE Second owner (stealer 1) Third owner (stealer 2) S C locX Copyback in progress 001 locX:11 Write Set locX:11 Write Set 1 locX:11 Write Set 1011 Copyback complete 0 Redo Copyback 0 Clear C 10 locX’s logical value

36 Phase-I STM: Stealing Mechanism Steal orec when transaction encounters orec acquired by a committed transaction The committed transaction is copying back its speculative updates Stealing done in two steps: Merge speculative updates of victim to the orec’s locations into stealer’s write set Acquire the orec with an atomic op This involves setting some special flags that indicate to the system that the orec is stolen

37 Phase-I STM: Stolen orec state Logical values of stolen locations are always in the stealer’s write set Subsequent accesses to these locations must lookup the stealer’s write set Quite expensive We use some flags to indicate when it is safe for a new stealer to switch the orec back to the unstolen state