Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs.

Similar presentations


Presentation on theme: "Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs."— Presentation transcript:

1 Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs

2 Hybrid Transactional Memory2 Promise of Transactional Memory (TM) 1 Easier to program Compose naturally 2 Easier to get parallel performance 3 No deadlocks 4 Maintain consistency in the presence of errors 5 Avoid priority inversion and convoying 6 Supports fault tolerance transaction { A = A – 10; B = B + 10; } lock(l1); lock(l2); A = A – 10; B = B + 10; unlock(l1); unlock(l2); Simplify Parallel Programming... if ( error ) abort_transaction;... if ( error ) recovery_code();

3 Intel LabsHybrid Transactional Memory3 Flavors of Transactional Memory 1 Easier to program Compose naturally 2 Easier to get parallel performance 3 No deadlocks 4 Maintain consistency in the presence of errors 5 Avoid priority inversion and convoying 6 Supports fault tolerance Our Work: Efficient support for a TM that supports all these features Basic Support programmer abort Support nonblocking

4 Intel LabsHybrid Transactional Memory4 TM Implementations Requires versioning support and conflict detection  Hardware approach [ Herlihy’93 ]  Bounded number of locations  Maintain versions in cache → Low overhead  Pure-software approach [ Herlihy’03, Harris’03 ]  Unbounded number of locations can be accessed within a transaction  Slow due to overhead of maintaining multiple copies ─ Potentially orders of magnitude  Unbounded hardware approach [ Hammond’04, Ananian’05, Rajwar’05, Moore’06 ]  Require significant hardware support  Discussed in more detail in the paper

5 Intel LabsHybrid Transactional Memory5 Hardware vs. Software TM Hardware Approach  Low overhead  Buffers transactional state in Cache  More concurrency  Cache-line granularity  Bounded resource  Assembly  Within a module Software Approach  High overhead  Uses Object copying to keep transactional state  Less Concurrency  Object granularity  No resource limits  High-level languages  Across modules Useful BUT Limited to library writers Useful BUT Limited to special data structures Neither is satisfactory for broader use

6 Intel LabsHybrid Transactional Memory6 This Work A Hybrid Transactional Memory Scheme  Requires modest hardware support  Changes are localized  Supports unbounded number of locations  Performance of hardware when within hardware resource limits ( Low Overhead of pure Hardware TM )  Gracefully fall back to software if the hardware resource limits are exceeded ( Unbounded resources of Pure software TM ) Experimentally demonstrate effectiveness of our approach

7 Outline  Motivation  Proposed Architectural Support  Hybrid Transactional Memory  Performance Evaluation  Conclusions

8 Intel LabsHybrid Transactional Memory8 ISA Extensions  Start of a Transaction  Begin Transaction All ( XBA ) or Select ( XBS )  Save Register State ( SSTATE )  Specify handler on abort due to conflict ( XHAND )  During a Transaction  Perform memory loads and store  Override defaults ( LDX, STX, LDR, STR )  On Transaction Abort  Explicit Abort Transaction ( XA )  Restore Register State ( RSTATE )  On Transaction Commit  Commit Transaction ( XC )

9 Intel LabsHybrid Transactional Memory9 Baseline CMP Architecture  Our proposed changes  Modest and Localized  Modifications to  Core  L1 $  No changes to  Interconnect  Coherence Protocol  L2 $  Memory L2 $ Interconnect L1 $ Core

10 Intel LabsHybrid Transactional Memory10 Hardware Support for TM Three requirements:  Maintain two versions  Detect conflict  Same core: Tag  Another core: Cache coherence  Atomic commit and abort  Bounded  Capacity of TM $  Associativity of TM $ and L2 Core Regular Accesses Transactional $L1 $ Tag Data Tag Addl. Tag Old Data New Data To Interconnect Transactional Accesses L1 $

11 Outline  Motivation  Proposed Architectural Support  Hybrid Transactional Memory  Existing pure software scheme  Our hybrid scheme  Performance Evaluation  Conclusions

12 Intel LabsHybrid Transactional Memory12 Pure Software TM [ Herlihy’03 ]  We use this Pure Software TM as a starting point  Implemented without any special architectural support using two techniques  Use copies of objects to keep transactional state ─ Make modifications on the copy during a transaction  Add a level of indirection ─ Switch the versions on when a transaction is committed Object Contents Object Pointer Object Contents State Pointer Old New State Valid Copy ActiveOld AbortedOld CommittedNew

13 Intel LabsHybrid Transactional Memory13 Pure Software TM Scheme Cont’d Object Contents Object Pointer Object Contents State Pointer Old New State Object Contents State Pointer Old New State X Valid Copy Before accessing an object within a transaction Modify

14 Intel LabsHybrid Transactional Memory14 Our Hybrid Transactional Memory  Two modes: Hardware and Software mode  The two modes need to coexist  Non-solution: Make all threads transition modes in lockstep  Avoid versioning overheads (allocation and copying) in the hardware mode  Still incur the indirection overheads  Tricky because it needs to bridge the hardware and software schemes  Hardware mode needs to modify data in-place ─ Pure Software TM assumes data is never modified in-place  Different sharing granularity ─ Cache-line (Hardware) vs. Object (Software)  Different conflict detection scheme ─ Data (Hardware) vs. State (Software)

15 Intel LabsHybrid Transactional Memory15 Hybrid Scheme Example Object Contents Object Pointer Object Contents State Pointer Old New State Object Contents State Pointer Old New State X In the Software Mode Copy and Modify In the Hardware Mode Modify in place Thread 1: HW mode Thread 2: HW mode Thread 3: SW mode Conflict detected by the threads in the hardware mode

16 Intel LabsHybrid Transactional Memory16 Hybrid Scheme Summary Object Contents Object Pointer Object Contents State Pointer Old New State Conflict Detection Active Thread Mode HardwareSoftware Conflicting Thread Mode HardwareContentsState SoftwareObject PointerState Sharing Granularity Active Thread Mode HardwareSoftware Conflicting Thread Mode HardwareCache lineObject SoftwareObject

17 Outline  Motivation  Proposed Architectural Support  Hybrid Transactional Memory  Performance Evaluation  Conclusions

18 Intel LabsHybrid Transactional Memory18 Experimental Framework  Infrastructure  Cycle-accurate execution-driven Multi-core simulator  Modified GCC  Three microbenchmarks  Two scenarios: Low and High Contention  Compare four synchronization implementations  Lock  Pure Hardware Transactional Memory  Pure Software Transactional Memory  Hybrid Transactional Memory

19 Intel LabsHybrid Transactional Memory19 Performance Normalized Execution Time Number of Cores Benchmark: Vector-Reduce Contention: Low

20 Outline  Motivation  Proposed Architectural Support  Hybrid Transactional Memory  Performance Evaluation  Conclusions

21 Intel LabsHybrid Transactional Memory21 Conclusions  Transactional Memory is a promising approach  Makes parallel programming an easier task  Easier to achieve parallel speedup  Hybrid Transactional Memory approach works  Requires only modest hardware support  Common case: Good performance for most transactions  Uncommon case: Graceful fallback to software mode when a transaction cannot complete within the hardware bounds

22 Questions ?

23 Intel LabsHybrid Transactional Memory23 Transactions A Synchronization Mechanism to coordinate accesses to shared data by concurrent threads (An alternative to locks) Transaction: A group of operations on shared data Transaction { A = A – 10; B = B + 10;... if (error) abort_transaction; } An API Enhancement: 1. Abort in middle of a transaction o On encountering a error

24 Intel LabsHybrid Transactional Memory24 Transactional Memory (TM)  A transaction satisfies the following properties 1) Atomicity: All-or-nothing  On Commit: all operations become visible  On Abort: none of the operations are performed 2) Isolation (Serializable)  The transactions committed appear to have been performed in some serial order  Additional Properties 3) Optimistic concurrency control  Necessary for achieving good parallel speedup 4) Non-blocking (Optional)  Avoid Priority Inversion  Avoid Convoying

25 Intel LabsHybrid Transactional Memory25 Advantage 1: Performance Locks A B L1 A C D Serialized on Locks Finer granularity locks helps Burden on programmer Transactions A B C D Optimistically execute concurrently Abort and restart on data conflict Automatically done by runtime AA Data Conflict

26 Intel LabsHybrid Transactional Memory26 Advantage 2: Reduces Bugs  With locks, programmers need to  Remember mapping between shared data and locks that guard them ─ Make sure the appropriate locks are held while accessing shared data  Make lock granularity as small as possible  Avoid deadlocks due to locks  All of these can cause subtle bugs  With TM, programmer does not have to deal with these problems

27 Intel LabsHybrid Transactional Memory27 Other Advantages  Allows new programming paradigms  Simplifies error handling  A new style of programming: Speculate and Verify Programmer can abort offending transactions  Avoids other problems that locks suffer from  Priority Inversion: A low-priority thread can grab a lock and block a higher-priority thread  Convoying: If a thread holding a lock blocks on a high-latency event (like context-switch or I/O), it can cause other threads to wait for long periods  Fault Tolerant: If a process holding a lock dies, other processes will hang forever Runtime system can abort offending transactions

28 Intel LabsHybrid Transactional Memory28 Normalized Execution Time Number of Cores Benchmark: Vector-Reduce Contention: Low

29 Intel LabsHybrid Transactional Memory29 ABCDEF Abcdef Ghijk


Download ppt "Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs."

Similar presentations


Ads by Google