Hybrid Transactional Memory

Slides:



Advertisements
Similar presentations
CM20145 Concurrency Control
Advertisements

Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
1 Concurrency Control Chapter Conflict Serializable Schedules  Two actions are in conflict if  they operate on the same DB item,  they belong.
Transaction Management: Concurrency Control CS634 Class 17, Apr 7, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
1 CSIS 7102 Spring 2004 Lecture 8: Recovery (overview) Dr. King-Ip Lin.
Transactional Memory Overview Olatunji Ruwase Fall 2007 Oct
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.
Distributed Systems 2006 Styles of Client/Server Computing.
1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:
1 Lecture 24: Transactional Memory Topics: transactional memory implementations.
Transaction Management
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
An Integrated Hardware-Software Approach to Transactional Memory Sean Lie Theory of Parallel Systems Monday December 8 th, 2003.
Software Transactional Memory for Dynamic-Sized Data Structures Maurice Herlihy, Victor Luchangco, Mark Moir, William Scherer Presented by: Gokul Soundararajan.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 10 Transaction Management.
Sutirtha Sanyal (Barcelona Supercomputing Center, Barcelona) Accelerating Hardware Transactional Memory (HTM) with Dynamic Filtering of Privatized Data.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Translation Lookaside Buffer
Lecture 20: Consistency Models, TM
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Transactional Memory : Hardware Proposals Overview
Part 2: Software-Based Approaches
PHyTM: Persistent Hybrid Transactional Memory
Transaction Management and Concurrency Control
Memory Consistency Models
Atomic Operations in Hardware
Atomic Operations in Hardware
Faster Data Structures in Transactional Memory using Three Paths
Memory Consistency Models
Concurrency Control.
Part- A Transaction Management
Two Ideas of This Paper Using Permissions-only Cache to deduce the rate at which less-efficient overflow handling mechanisms are invoked. When the overflow.
Transaction Management
A Qualitative Survey of Modern Software Transactional Memory Systems
Changing thread semantics
Lecture 6: Transactions
Lecture 17: Transactional Memories I
Chapter 10 Transaction Management and Concurrency Control
Lecture 21: Transactional Memory
Chapter 15 : Concurrency Control
Lecture 22: Consistency Models, TM
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Distributed Transactions
Software Transactional Memory Should Not be Obstruction-Free
Transaction management
Kernel Synchronization II
Lecture 8: Efficient Address Translation
Programming with Shared Memory Specifying parallelism
Lecture 23: Transactional Memory
Lecture 21: Transactional Memory
Lecture: Consistency Models, TM
Lecture: Transactional Memory
CSE 542: Operating Systems
Presentation transcript:

Hybrid Transactional Memory Reza Sherafat Prof. Cristiana Amza University of Toronto Dec 4, 2006

Quick Background Review A transaction is a sequence of operations that “as a whole” is performed atomically. Life cycle of a transaction: Initialization: start a transaction by storing the current state; Execution: Open objects for read/write; Data modifications are hidden from others; Watch for conflicts; Termination: end the transaction Successful completion (Commit): Let other threads know about the changes were made; and modifications take effect; or Unsuccessful completion (Abort): Discard modifications

Outline Motivations Hybrid Transactional Memory Implementation Evaluations Conclusions

Motivations In parallel programs we must protect concurrent access to shared data. Locks are widely used; but several problems are associated with using locks: Performance (speedup) Overhead of locking (wait time, acquire, release) Granularity (hard to balance wait time, overhead) Over serialization Programming Hard for programmers to write and debug Deadlocks are hard to avoid Other problems Priority inversion Problem when a process holding the lock crashes

Transactional Memory (TM) Main idea: Non-blocking execution Execute each concurrent transaction speculatively; Apply changes when transaction completed successfully. Non-conflicting access to shared objects within transactions is allowed: Once conflict is detected, transaction rolls back and state is restored (abort); TM support is provided through an API: Start a transaction Abort/commit a transaction Wrap objects in TM objects Properties of transactions: Atomic: a transaction is like a single unit (all-or-nothing) Serializable: concurrent Start a transaction t transactions are performed in some serial order Obstruction-freedom: guarantees progress of one process in absence of contention No deadlock

Conflicting Access to Shared Data Conflicts in accessing shared data may result in data inconsistencies. Conflicts happen when an object that has been accessed by other transactions (read or write) is updated before others commit. Multiple readers are allowed Only one writer is allowed at each time The system ensures that transactions that access data don’t conflict. If no conflicts occur, the transactions are serializable. Conflict resolution: once a conflict is detected, we can get a serializable execution by aborting all but one of the conflicting transactions. Speculative modifications of aborted transactions are discarded. Old values before starting the transaction become valid.

Hybrid TM Each approach should implement TM semantics: Start transaction, open object, detect conflicts, abort, commit. Hardware-based approaches: Bounded number of locations Maintain versions in cache → Low overhead Software-based approaches: Unbounded number of locations can be accessed within a transaction Slow due to overhead of maintaining multiple copies Potentially orders of magnitude Hybrid: Combines the benefits of both approaches High performance (unless the transaction exceeds HW limits) Support for unlimited transactional objects Handles simultaneous data access from HW/SW modes

Implementations Two modes for executing transactions: HW vs. SW. In general, HW mode is preferred (it is faster), unless we run out of resources. Naïve approach: the system has a universal mode of operation. A better approach: transactions have two modes to choose from. Each transaction separately chooses the mode of operation when it starts. Better performance and utilization of system resources Other policies may also be applied to chose the mode: If the transaction fails for a number of time (e.g., 3) then start in SW mode; Pure HW/SW implementations must be tailored such that they can coexist. Objects may be accessed simultaneously in transactions in HW, SW modes. Interoperability is a must.

Hardware TM A HW-TM scheme that can used for the Hybrid implementation that relies on the standard cache coherence protocol and some additional components. Cache coherence protocol handles data consistencies across multiple processors: Only one processor has permission to write to a cache line; No processor can read a line that another processor has permission to write to. Additional components on each processor store speculative data and check for conflicts: ISA extensions Instructions for: transactional begin, commit, abort, load/store, etc. Additional components on the processor chip (In parallel with the L1 cache) Transactional buffer: old, Transactional state table: state of the contexts (threads) running on the processor All memory accesses within a transaction are done transactionally.

HW-TM Old field is keeps speculative values Transactional semantics: Start transaction: Transactional state for that context is set to SELECT, ALL. Abort: Exception flag is set, clear corresponding read/write bits, invalidates speculative written data Commit: Update the transactional state. Detect conflicts: read/write bit vector If the exception flag is set, any attempt to commit or load/store by the transaction results in a trap that will be handled by the exception handler. Question: How is abort implemented across multiple processors? CCP!

Quick Review of DSTM X Before accessing an object within a transaction Object Contents State Pointer Old New State X Object Pointer Object Contents Valid Copy State Pointer Old New State Object Contents Modify

A locator object in Hybrid-TM Software TM Uses a locator similar to DSTM: Redirection and object copying. The locator also keeps track of the readers. As opposed to local hash tables to store the last data value in each read transaction. This helps early abort, and avoids validation when committing A locator consists of: Valid field Write state (one) Read state (multiple) Old/new objects Object size A locator object in Hybrid-TM

Putting Things Together Transactions in HW may conflict with those of SW, and vice versa. Opening an object in HW: [read the TMObject pointer transactionally] Abort all conflicting HW/SW Opening an object in SW: Create a state object, and load it transactionally Abort conflicting HW/SW transactions Hardware aborts Hardware A load/store (trans. by default) causes an abort Software aborts Hardware When SW opens a TMObject, it assigns it to a new locator. Since the object is transactionally read by the HW, the transaction is aborted. Hardware aborts Software When HW opens a TMObject, it writes ABORTED to transaction state having this object Software aborts Software Write ABORTED to the state from the reader/writer pointers.

Software aborts Hardware Conflict detected by the threads in the hardware mode Object Contents State Pointer Old New State In the Hardware Mode Modify in place X Object Pointer Object Contents Thread 1: HW mode Thread 2: HW mode State Pointer Old New State In the Software Mode Copy and Modify Thread 3: SW mode Object Contents

Evaluations Three microbenchmarks VR: Small critical section (overhead of starting/committing transactions) HT: Simultaneous lookup operations (per object overhead of transactions) GU: Course grained locking vs. transactional memory For each case two scenarios: Low and High Contention Compare four synchronization implementations Lock Pure Hardware Transactional Memory Pure Software Transactional Memory Hybrid Transactional Memory

Evaluations (Hybrid Execution) In all cases of hybrid execution, the ratio of SW/HW mode is very small. This is due to relatively (compared to size of transactional objects) large size of transactional buffer. (is this realistic?) Since in most transactions HW mode is used, this does not give a good view of the overhead associated with effects of slow SW mode.

Evaluations (VR) When # of processors grow, contention does not grow significantly This is because transactions are too small (conflicts rarely happen)

Evaluations (HT) It is true that several lookup operations can be performed simultaneously, however those operations will be rolled back all together once a conflict with a writer occurs This seems to be significant for slightly long duration transactions The lock performance is better. The paper claims similar behavior would be achieved by reader-writer locks; I expect that would have a much better performance, since once underway concurrent operations will not be undone

Evaluations (GU) Why does the execution time decreases in the lock implementation from GU-low to GU-high? It is usually inverse! Do locks have back-offs?

Conclusions Transactional memory outperforms the lock-based synchronization in most cases Hybrid Transactional Memory approach gives a good balance between scalability of SW and performance of HW Requires only modest hardware support (transactional buffer, state table) Within system limits: Good performance for most transactions Exceeding system limits: fallbacks to software mode when a transaction cannot complete within the hardware bounds More needs to be gone to ensure progress.

Questions?!

Additional limits for the HW: Hybrid has limitations: Nested transaction? Additional limits for the HW: Contexts Hybrid has limitations: Uses transactional buffer I am not sure how the non-blocking mechanism is implemented across multiple processors.