Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hybrid Transactional Memory

Similar presentations


Presentation on theme: "Hybrid Transactional Memory"— Presentation transcript:

1 Hybrid Transactional Memory
Reza Sherafat Prof. Cristiana Amza University of Toronto Dec 4, 2006

2 Quick Background Review
A transaction is a sequence of operations that “as a whole” is performed atomically. Life cycle of a transaction: Initialization: start a transaction by storing the current state; Execution: Open objects for read/write; Data modifications are hidden from others; Watch for conflicts; Termination: end the transaction Successful completion (Commit): Let other threads know about the changes were made; and modifications take effect; or Unsuccessful completion (Abort): Discard modifications

3 Outline Motivations Hybrid Transactional Memory Implementation
Evaluations Conclusions

4 Motivations In parallel programs we must protect concurrent access to shared data. Locks are widely used; but several problems are associated with using locks: Performance (speedup) Overhead of locking (wait time, acquire, release) Granularity (hard to balance wait time, overhead) Over serialization Programming Hard for programmers to write and debug Deadlocks are hard to avoid Other problems Priority inversion Problem when a process holding the lock crashes

5 Transactional Memory (TM)
Main idea: Non-blocking execution Execute each concurrent transaction speculatively; Apply changes when transaction completed successfully. Non-conflicting access to shared objects within transactions is allowed: Once conflict is detected, transaction rolls back and state is restored (abort); TM support is provided through an API: Start a transaction Abort/commit a transaction Wrap objects in TM objects Properties of transactions: Atomic: a transaction is like a single unit (all-or-nothing) Serializable: concurrent Start a transaction t transactions are performed in some serial order Obstruction-freedom: guarantees progress of one process in absence of contention No deadlock

6 Conflicting Access to Shared Data
Conflicts in accessing shared data may result in data inconsistencies. Conflicts happen when an object that has been accessed by other transactions (read or write) is updated before others commit. Multiple readers are allowed Only one writer is allowed at each time The system ensures that transactions that access data don’t conflict. If no conflicts occur, the transactions are serializable. Conflict resolution: once a conflict is detected, we can get a serializable execution by aborting all but one of the conflicting transactions. Speculative modifications of aborted transactions are discarded. Old values before starting the transaction become valid.

7 Hybrid TM Each approach should implement TM semantics:
Start transaction, open object, detect conflicts, abort, commit. Hardware-based approaches: Bounded number of locations Maintain versions in cache → Low overhead Software-based approaches: Unbounded number of locations can be accessed within a transaction Slow due to overhead of maintaining multiple copies Potentially orders of magnitude Hybrid: Combines the benefits of both approaches High performance (unless the transaction exceeds HW limits) Support for unlimited transactional objects Handles simultaneous data access from HW/SW modes

8 Implementations Two modes for executing transactions: HW vs. SW.
In general, HW mode is preferred (it is faster), unless we run out of resources. Naïve approach: the system has a universal mode of operation. A better approach: transactions have two modes to choose from. Each transaction separately chooses the mode of operation when it starts. Better performance and utilization of system resources Other policies may also be applied to chose the mode: If the transaction fails for a number of time (e.g., 3) then start in SW mode; Pure HW/SW implementations must be tailored such that they can coexist. Objects may be accessed simultaneously in transactions in HW, SW modes. Interoperability is a must.

9 Hardware TM A HW-TM scheme that can used for the Hybrid implementation that relies on the standard cache coherence protocol and some additional components. Cache coherence protocol handles data consistencies across multiple processors: Only one processor has permission to write to a cache line; No processor can read a line that another processor has permission to write to. Additional components on each processor store speculative data and check for conflicts: ISA extensions Instructions for: transactional begin, commit, abort, load/store, etc. Additional components on the processor chip (In parallel with the L1 cache) Transactional buffer: old, Transactional state table: state of the contexts (threads) running on the processor All memory accesses within a transaction are done transactionally.

10 HW-TM Old field is keeps speculative values Transactional semantics:
Start transaction: Transactional state for that context is set to SELECT, ALL. Abort: Exception flag is set, clear corresponding read/write bits, invalidates speculative written data Commit: Update the transactional state. Detect conflicts: read/write bit vector If the exception flag is set, any attempt to commit or load/store by the transaction results in a trap that will be handled by the exception handler. Question: How is abort implemented across multiple processors? CCP!

11 Quick Review of DSTM X Before accessing an object within a transaction
Object Contents State Pointer Old New State X Object Pointer Object Contents Valid Copy State Pointer Old New State Object Contents Modify

12 A locator object in Hybrid-TM
Software TM Uses a locator similar to DSTM: Redirection and object copying. The locator also keeps track of the readers. As opposed to local hash tables to store the last data value in each read transaction. This helps early abort, and avoids validation when committing A locator consists of: Valid field Write state (one) Read state (multiple) Old/new objects Object size A locator object in Hybrid-TM

13 Putting Things Together
Transactions in HW may conflict with those of SW, and vice versa. Opening an object in HW: [read the TMObject pointer transactionally] Abort all conflicting HW/SW Opening an object in SW: Create a state object, and load it transactionally Abort conflicting HW/SW transactions Hardware aborts Hardware A load/store (trans. by default) causes an abort Software aborts Hardware When SW opens a TMObject, it assigns it to a new locator. Since the object is transactionally read by the HW, the transaction is aborted. Hardware aborts Software When HW opens a TMObject, it writes ABORTED to transaction state having this object Software aborts Software Write ABORTED to the state from the reader/writer pointers.

14 Software aborts Hardware
Conflict detected by the threads in the hardware mode Object Contents State Pointer Old New State In the Hardware Mode Modify in place X Object Pointer Object Contents Thread 1: HW mode Thread 2: HW mode State Pointer Old New State In the Software Mode Copy and Modify Thread 3: SW mode Object Contents

15 Evaluations Three microbenchmarks
VR: Small critical section (overhead of starting/committing transactions) HT: Simultaneous lookup operations (per object overhead of transactions) GU: Course grained locking vs. transactional memory For each case two scenarios: Low and High Contention Compare four synchronization implementations Lock Pure Hardware Transactional Memory Pure Software Transactional Memory Hybrid Transactional Memory

16 Evaluations (Hybrid Execution)
In all cases of hybrid execution, the ratio of SW/HW mode is very small. This is due to relatively (compared to size of transactional objects) large size of transactional buffer. (is this realistic?) Since in most transactions HW mode is used, this does not give a good view of the overhead associated with effects of slow SW mode.

17 Evaluations (VR) When # of processors grow, contention does not grow significantly This is because transactions are too small (conflicts rarely happen)

18 Evaluations (HT) It is true that several lookup operations can be performed simultaneously, however those operations will be rolled back all together once a conflict with a writer occurs This seems to be significant for slightly long duration transactions The lock performance is better. The paper claims similar behavior would be achieved by reader-writer locks; I expect that would have a much better performance, since once underway concurrent operations will not be undone

19 Evaluations (GU) Why does the execution time decreases in the lock implementation from GU-low to GU-high? It is usually inverse! Do locks have back-offs?

20 Conclusions Transactional memory outperforms the lock-based synchronization in most cases Hybrid Transactional Memory approach gives a good balance between scalability of SW and performance of HW Requires only modest hardware support (transactional buffer, state table) Within system limits: Good performance for most transactions Exceeding system limits: fallbacks to software mode when a transaction cannot complete within the hardware bounds More needs to be gone to ensure progress.

21 Questions?!

22 Additional limits for the HW: Hybrid has limitations:
Nested transaction? Additional limits for the HW: Contexts Hybrid has limitations: Uses transactional buffer I am not sure how the non-blocking mechanism is implemented across multiple processors.


Download ppt "Hybrid Transactional Memory"

Similar presentations


Ads by Google