Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs.

Slides:



Advertisements
Similar presentations
Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Advertisements

Maurice Herlihy (DEC), J. Eliot & B. Moss (UMass)
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.
Transactional Memory Supporting Large Transactions Anvesh Komuravelli Abe Othman Kanat Tangwongsan Hardware-based.
Lock-Based Concurrency Control
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Rich Transactions on Reasonable Hardware J. Eliot B. Moss Univ. of Massachusetts,
Virendra J. Marathe, William N. Scherer III, and Michael L. Scott Department of Computer Science University of Rochester Presented by: Armand R. Burks.
CS 5204 – Operating Systems 1 Scheduler Activations.
Transactional Memory Overview Olatunji Ruwase Fall 2007 Oct
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Copyright © 2006, CS 612 Transactional Memory Architectural Support for a Lock-Free Data Structure Some material borrowed from : Konrad Lai, Microprocessor.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:
490dp Synchronous vs. Asynchronous Invocation Robert Grimm.
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
3.5 Interprocess Communication
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
KAUSHIK LAKSHMINARAYANAN MICHAEL ROZYCZKO VIVEK SESHADRI Transactional Memory: Hybrid Hardware/Software Approaches.
An Integrated Hardware-Software Approach to Transactional Memory Sean Lie Theory of Parallel Systems Monday December 8 th, 2003.
Transactional Memory CDA6159. Outline Introduction Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93) Paper 2: Transactional.
Programming Paradigms for Concurrency Part 2: Transactional Memories Vasu Singh
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Software & the Concurrency Revolution by Sutter & Larus ACM Queue Magazine, Sept For CMPS Halverson 1.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
Sutirtha Sanyal (Barcelona Supercomputing Center, Barcelona) Accelerating Hardware Transactional Memory (HTM) with Dynamic Filtering of Privatized Data.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.
Integrating and Optimizing Transactional Memory in a Data Mining Middleware Vignesh Ravi and Gagan Agrawal Department of ComputerScience and Engg. The.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
Group 3: Architectural Design for Enhancing Programmability Dean Tullsen, Josep Torrellas, Luis Ceze, Mark Hill, Onur Mutlu, Sampath Kannan, Sarita Adve,
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transnational Memory Qi Zhu CSE 340, Spring 2008 University of Connecticut Paper.
MULTIPLEX: UNIFYING CONVENTIONAL AND SPECULATIVE THREAD-LEVEL PARALLELISM ON A CHIP MULTIPROCESSOR Presented by: Ashok Venkatesan Chong-Liang Ooi, Seon.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
1 Database Systems ( 資料庫系統 ) December 27/28, 2006 Lecture 13 Merry Christmas & New Year.
Architectural Features of Transactional Memory Designs for an Operating System Chris Rossbach, Hany Ramadan, Don Porter Advanced Computer Architecture.
Transactional Memory Coherence and Consistency Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu,
Lecture 20: Consistency Models, TM
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
Mihai Burcea, J. Gregory Steffan, Cristiana Amza
Speculative Lock Elision
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Transactional Memory : Hardware Proposals Overview
Part 2: Software-Based Approaches
PHyTM: Persistent Hybrid Transactional Memory
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory By McKenney, Michael, Triplett and Walpole.
The University of Adelaide, School of Computer Science
Database Systems (資料庫系統)
Two Ideas of This Paper Using Permissions-only Cache to deduce the rate at which less-efficient overflow handling mechanisms are invoked. When the overflow.
Transaction Management
Changing thread semantics
Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E
Transactional Memory An Overview of Hardware Alternatives
Lecture 22: Consistency Models, TM
Hybrid Transactional Memory
Lecture 23: Transactional Memory
Lecture: Consistency Models, TM
Concurrent Cache-Oblivious B-trees Using Transactional Memory
Presentation transcript:

Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs

Hybrid Transactional Memory2 Promise of Transactional Memory (TM) 1 Easier to program Compose naturally 2 Easier to get parallel performance 3 No deadlocks 4 Maintain consistency in the presence of errors 5 Avoid priority inversion and convoying 6 Supports fault tolerance transaction { A = A – 10; B = B + 10; } lock(l1); lock(l2); A = A – 10; B = B + 10; unlock(l1); unlock(l2); Simplify Parallel Programming... if ( error ) abort_transaction;... if ( error ) recovery_code();

Intel LabsHybrid Transactional Memory3 Flavors of Transactional Memory 1 Easier to program Compose naturally 2 Easier to get parallel performance 3 No deadlocks 4 Maintain consistency in the presence of errors 5 Avoid priority inversion and convoying 6 Supports fault tolerance Our Work: Efficient support for a TM that supports all these features Basic Support programmer abort Support nonblocking

Intel LabsHybrid Transactional Memory4 TM Implementations Requires versioning support and conflict detection  Hardware approach [ Herlihy’93 ]  Bounded number of locations  Maintain versions in cache → Low overhead  Pure-software approach [ Herlihy’03, Harris’03 ]  Unbounded number of locations can be accessed within a transaction  Slow due to overhead of maintaining multiple copies ─ Potentially orders of magnitude  Unbounded hardware approach [ Hammond’04, Ananian’05, Rajwar’05, Moore’06 ]  Require significant hardware support  Discussed in more detail in the paper

Intel LabsHybrid Transactional Memory5 Hardware vs. Software TM Hardware Approach  Low overhead  Buffers transactional state in Cache  More concurrency  Cache-line granularity  Bounded resource  Assembly  Within a module Software Approach  High overhead  Uses Object copying to keep transactional state  Less Concurrency  Object granularity  No resource limits  High-level languages  Across modules Useful BUT Limited to library writers Useful BUT Limited to special data structures Neither is satisfactory for broader use

Intel LabsHybrid Transactional Memory6 This Work A Hybrid Transactional Memory Scheme  Requires modest hardware support  Changes are localized  Supports unbounded number of locations  Performance of hardware when within hardware resource limits ( Low Overhead of pure Hardware TM )  Gracefully fall back to software if the hardware resource limits are exceeded ( Unbounded resources of Pure software TM ) Experimentally demonstrate effectiveness of our approach

Outline  Motivation  Proposed Architectural Support  Hybrid Transactional Memory  Performance Evaluation  Conclusions

Intel LabsHybrid Transactional Memory8 ISA Extensions  Start of a Transaction  Begin Transaction All ( XBA ) or Select ( XBS )  Save Register State ( SSTATE )  Specify handler on abort due to conflict ( XHAND )  During a Transaction  Perform memory loads and store  Override defaults ( LDX, STX, LDR, STR )  On Transaction Abort  Explicit Abort Transaction ( XA )  Restore Register State ( RSTATE )  On Transaction Commit  Commit Transaction ( XC )

Intel LabsHybrid Transactional Memory9 Baseline CMP Architecture  Our proposed changes  Modest and Localized  Modifications to  Core  L1 $  No changes to  Interconnect  Coherence Protocol  L2 $  Memory L2 $ Interconnect L1 $ Core

Intel LabsHybrid Transactional Memory10 Hardware Support for TM Three requirements:  Maintain two versions  Detect conflict  Same core: Tag  Another core: Cache coherence  Atomic commit and abort  Bounded  Capacity of TM $  Associativity of TM $ and L2 Core Regular Accesses Transactional $L1 $ Tag Data Tag Addl. Tag Old Data New Data To Interconnect Transactional Accesses L1 $

Outline  Motivation  Proposed Architectural Support  Hybrid Transactional Memory  Existing pure software scheme  Our hybrid scheme  Performance Evaluation  Conclusions

Intel LabsHybrid Transactional Memory12 Pure Software TM [ Herlihy’03 ]  We use this Pure Software TM as a starting point  Implemented without any special architectural support using two techniques  Use copies of objects to keep transactional state ─ Make modifications on the copy during a transaction  Add a level of indirection ─ Switch the versions on when a transaction is committed Object Contents Object Pointer Object Contents State Pointer Old New State Valid Copy ActiveOld AbortedOld CommittedNew

Intel LabsHybrid Transactional Memory13 Pure Software TM Scheme Cont’d Object Contents Object Pointer Object Contents State Pointer Old New State Object Contents State Pointer Old New State X Valid Copy Before accessing an object within a transaction Modify

Intel LabsHybrid Transactional Memory14 Our Hybrid Transactional Memory  Two modes: Hardware and Software mode  The two modes need to coexist  Non-solution: Make all threads transition modes in lockstep  Avoid versioning overheads (allocation and copying) in the hardware mode  Still incur the indirection overheads  Tricky because it needs to bridge the hardware and software schemes  Hardware mode needs to modify data in-place ─ Pure Software TM assumes data is never modified in-place  Different sharing granularity ─ Cache-line (Hardware) vs. Object (Software)  Different conflict detection scheme ─ Data (Hardware) vs. State (Software)

Intel LabsHybrid Transactional Memory15 Hybrid Scheme Example Object Contents Object Pointer Object Contents State Pointer Old New State Object Contents State Pointer Old New State X In the Software Mode Copy and Modify In the Hardware Mode Modify in place Thread 1: HW mode Thread 2: HW mode Thread 3: SW mode Conflict detected by the threads in the hardware mode

Intel LabsHybrid Transactional Memory16 Hybrid Scheme Summary Object Contents Object Pointer Object Contents State Pointer Old New State Conflict Detection Active Thread Mode HardwareSoftware Conflicting Thread Mode HardwareContentsState SoftwareObject PointerState Sharing Granularity Active Thread Mode HardwareSoftware Conflicting Thread Mode HardwareCache lineObject SoftwareObject

Outline  Motivation  Proposed Architectural Support  Hybrid Transactional Memory  Performance Evaluation  Conclusions

Intel LabsHybrid Transactional Memory18 Experimental Framework  Infrastructure  Cycle-accurate execution-driven Multi-core simulator  Modified GCC  Three microbenchmarks  Two scenarios: Low and High Contention  Compare four synchronization implementations  Lock  Pure Hardware Transactional Memory  Pure Software Transactional Memory  Hybrid Transactional Memory

Intel LabsHybrid Transactional Memory19 Performance Normalized Execution Time Number of Cores Benchmark: Vector-Reduce Contention: Low

Outline  Motivation  Proposed Architectural Support  Hybrid Transactional Memory  Performance Evaluation  Conclusions

Intel LabsHybrid Transactional Memory21 Conclusions  Transactional Memory is a promising approach  Makes parallel programming an easier task  Easier to achieve parallel speedup  Hybrid Transactional Memory approach works  Requires only modest hardware support  Common case: Good performance for most transactions  Uncommon case: Graceful fallback to software mode when a transaction cannot complete within the hardware bounds

Questions ?

Intel LabsHybrid Transactional Memory23 Transactions A Synchronization Mechanism to coordinate accesses to shared data by concurrent threads (An alternative to locks) Transaction: A group of operations on shared data Transaction { A = A – 10; B = B + 10;... if (error) abort_transaction; } An API Enhancement: 1. Abort in middle of a transaction o On encountering a error

Intel LabsHybrid Transactional Memory24 Transactional Memory (TM)  A transaction satisfies the following properties 1) Atomicity: All-or-nothing  On Commit: all operations become visible  On Abort: none of the operations are performed 2) Isolation (Serializable)  The transactions committed appear to have been performed in some serial order  Additional Properties 3) Optimistic concurrency control  Necessary for achieving good parallel speedup 4) Non-blocking (Optional)  Avoid Priority Inversion  Avoid Convoying

Intel LabsHybrid Transactional Memory25 Advantage 1: Performance Locks A B L1 A C D Serialized on Locks Finer granularity locks helps Burden on programmer Transactions A B C D Optimistically execute concurrently Abort and restart on data conflict Automatically done by runtime AA Data Conflict

Intel LabsHybrid Transactional Memory26 Advantage 2: Reduces Bugs  With locks, programmers need to  Remember mapping between shared data and locks that guard them ─ Make sure the appropriate locks are held while accessing shared data  Make lock granularity as small as possible  Avoid deadlocks due to locks  All of these can cause subtle bugs  With TM, programmer does not have to deal with these problems

Intel LabsHybrid Transactional Memory27 Other Advantages  Allows new programming paradigms  Simplifies error handling  A new style of programming: Speculate and Verify Programmer can abort offending transactions  Avoids other problems that locks suffer from  Priority Inversion: A low-priority thread can grab a lock and block a higher-priority thread  Convoying: If a thread holding a lock blocks on a high-latency event (like context-switch or I/O), it can cause other threads to wait for long periods  Fault Tolerant: If a process holding a lock dies, other processes will hang forever Runtime system can abort offending transactions

Intel LabsHybrid Transactional Memory28 Normalized Execution Time Number of Cores Benchmark: Vector-Reduce Contention: Low

Intel LabsHybrid Transactional Memory29 ABCDEF Abcdef Ghijk