Software Transactional Memory

Slides:



Advertisements
Similar presentations
Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
Advertisements

Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Maurice Herlihy (DEC), J. Eliot & B. Moss (UMass)
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
Transactional Memory Yujia Jin. Lock and Problems Lock is commonly used with shared data Priority Inversion –Lower priority process hold a lock needed.
1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.
Supporting Nested Transactional Memory in LogTM Authors Michelle J Moravan Mark Hill Jayaram Bobba Ben Liblit Kevin Moore Michael Swift Luke Yen David.
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P.
Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
Programming Languages and Paradigms Object-Oriented Programming.
Software Transactional Memory system for C++ Serge Preis, Ravi Narayanaswami Intel Corporation.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Transactional Memory CDA6159. Outline Introduction Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93) Paper 2: Transactional.
CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
Consistency Oblivious Programming Hillel Avni Tel Aviv University.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.
MULTIVIE W Slide 1 (of 21) Software Transactional Memory Should Not Be Obstruction Free Paper: Robert Ennals Presenter: Emerson Murphy-Hill.
Multiprocessors – Locks
Chapter Goals Describe the application development process and the role of methodologies, models, and tools Compare and contrast programming language generations.
Chapter 4 – Thread Concepts
Lecture 20: Consistency Models, TM
Maurice Herlihy, Victor Luchangco, Mark Moir, William N. Scherer III
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
Software Coherence Management on Non-Coherent-Cache Multicores
Transaction Management
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Part 2: Software-Based Approaches
Chapter 4 – Thread Concepts
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Faster Data Structures in Transactional Memory using Three Paths
Concurrency Control.
Challenges in Concurrent Computing
Transactions.
Enforcing Isolation and Ordering in STM Systems
Maurice Herlihy, Victor Luchangco, Mark Moir, William N. Scherer III
Designing Parallel Algorithms (Synchronization)
Changing thread semantics
On transactions, and Atomic Operations
6 Chapter Functions.
Lecture 6: Transactions
Memory Management Tasks
Part 1: Concepts and Hardware- Based Approaches
On transactions, and Atomic Operations
Chapter 15 : Concurrency Control
Lecture 22: Consistency Models, TM
Dr. Mustafa Cem Kasapbaşı
Hybrid Transactional Memory
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Concurrency: Mutual Exclusion and Process Synchronization
Software Transactional Memory Should Not be Obstruction-Free
Locking Protocols & Software Transactional Memory
Transaction management
The University of Adelaide, School of Computer Science
Java Concurrency.
Lecture 17 Multiprocessors and Thread-Level Parallelism
Programming with Shared Memory Specifying parallelism
Lecture 23: Transactional Memory
The University of Adelaide, School of Computer Science
Controlled Interleaving for Transactions
CSE 542: Operating Systems
CSE 542: Operating Systems
Dynamic Binary Translators and Instrumenters
Transactions, Properties of Transactions
Presentation transcript:

Software Transactional Memory Session: Software Transactional Memory

Agenda TM Basics Playing with Software TM Summary 11/19/2018 2 2

Agenda TM Basics Playing with Software TM Summary Introduction Working principles Implementation Requirements Playing with Software TM Summary 11/19/2018 3 3

Introduction Multi-core processors are now mainstream Performance increase has to come from parallelism Traditionally programmers use locks to write parallel programs Lock based synchronization has known problems deadlocks, fine-grain parallelism, composition Transaction Memory avoids the problems and ease parallel programming Free lunch of performance is over. Lock-based programming has a number of well-known problems that frequently arise in practice: They require thinking about overlapping operations and partial operations in distantly separated and seemingly unrelated sections of code, a task which is very difficult and error-prone for programmers. They require programmers to adopt a locking policy to prevent deadlock, livelock, and other failures to make progress. Such policies are often informally enforced and fallible, and when these issues arise they are insidiously difficult to reproduce and debug. They can lead to priority inversion, a phenomenon where a high-priority thread is forced to wait on a low-priority thread holding exclusive access to a resource that it needs. Support lock-free infra structure so programmers can effectively develop parallel reusable encapsulations that can be used by others w/o risk of deadlocks

Example: Insert Node into a link list { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; }

Example: Insert Node into a link list Thread 1 Thread 2 { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; }

Example: Insert Node into a link list Lock Based Thread 1 Thread 2 { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; }

Example: Insert Node into a link list Lock Based Thread 1 get lock and execute Lock { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; } Thread 2 waits for lock to be released Lock { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; }

Example: Insert Node into a link list TM Based Thread 2 Thread 1 __tm_atomic { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; }

Example: Insert Node into a link list TM Based Thread 1 Thread 2 __tm_atomic { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; } __tm_atomic { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; } Both threads execute in parallel, if they write to same node then abort one thread and retry the transaction

What is Transactions Memory? Transactional Memory by Definitions A sequence of memory operations that either execute completely and atomically with no conflicts with other threads or have no effect Log Commit/Abort Quote from “Transactional Memory: Architectural Support for Lock-Free Data Structures”, Maurice Herlihy and J. Eliot B. Moss A transaction is a finite sequence of machine instructions, executed by a single process, satisfying the following properties: Serializability: Transactions appear to execute serially, meaning that the steps of one transaction never appear to be interleaved with the steps of another. Committed transactions are never observed by different processors to execute in different orders. Atomicity: Each transaction makes a sequence of tentative changes to shared memory. When the transaction completes, it either commits, making its changes visible to other processes (effectively) instantaneously, or it aborts, causing its changes to be disgarded. Transactional memory provides the following primitive instructions for accessing memory: Load-transactional (LT) reads the value of a shared memory location into a private register. Load-transactional-exclusive( LTXr)e ads the value of a shared memory location into a private register, “hinting” that the location is likely to be updated. Store-trunsuctional (sr) tentatively writes a value from a private register to a shared memory location. This new value does not become visible to other processors until the transaction successfully commits. A transaction’s read set is the set of locations rad by LT, and its write set is the set of locations accessed by LTX or ST. Its data set is the union of the read and write sets. Transactional memory also provides the following instructions for manipulating transaction state: Commit (COMMIT) attempts to make the transaction’s tentative changes permanent. It succeeds only if no other transaction has updated any location in the transaction’s data set, and no other transaction has read any location in this transaction’s write set. If it succeeds, the transaction’s changes to its write set become visible to other processes. If it fails, all changes to the write set are discarded. Either way, COMMIT returns processes an indication of success or failure. Abort (ABORT) discards all updates to the write set. Validate (VALIDATE) test the current transaction status. A successful VALIDATE returns True, indicating that the current transaction has not aborted (although it may do so later). An unsuccessful VALIDATE returns False, indicating that the current transaction has aborted, and discards the transaction’s tentative updates.

Illustration: Contention Addressing Thread T1 Thread T2 a b atomic { a.val += 10; b.val -= 20; } atomic { sum = a.val + b.val; } ver = 100 ver = 200 val = 10 val = 40 sum ver = 105 T1’s log: T2’s log: 10 Log can be implemented as a “Transactional Cache”, an ad hoc chip or a specific location in system memory with corresponding protocols.

Illustration: Contention Addressing Thread T1 Thread T2 a b atomic { a.val += 10; b.val -= 20; } atomic { sum = a.val + b.val; } ver = 100 ver = 200 val = 10 val = 40 sum ver = 105 T1’s log: a.ver = 100 T2’s log: 10

Illustration: Contention Addressing Thread T1 Thread T2 a b atomic { a.val += 10; b.val -= 20; } atomic { sum = a.val + b.val; } ver = 100 ver = 200 val = 10 val = 40 sum ver = 105 T1’s log: a.ver = 100 b.ver = 200 a.val: 10 b.val: 40 T2’s log: a.ver = 100 b.ver = 200 sum: 10 sum.ver = 105 10 Before updating a.val, b.val and sum, T1 and T2 logs the data that are going to be overwritten and the versions After logging, T1 checks the logged versions of a and b vs. the read versions of a and b; will commit the transaction since there is no version difference

Illustration: Contention Addressing Thread T1 Thread T2 a b atomic { a.val += 10; b.val -= 20; } atomic { sum = a.val + b.val; } ver = 101 ver = 201 val = 20 val = 20 sum ver = 105 T1’s log: a.ver = 100 b.ver = 200 a.val: 10 b.val: 40 T2’s log: a.ver = 100 b.ver = 200 sum: 10 sum.ver = 105 10 T1 updates a.val, b.val and the versions

Illustration: Contention Addressing Thread T1 Thread T2 a b atomic { a.val += 10; b.val -= 20; } atomic { sum = a.val + b.val; } ver = 101 ver = 201 val = 20 val = 20 sum ver = 105 T1’s log: a.ver = 100 b.ver = 200 a.val: 10 b.val: 40 T2’s log: a.ver = 100 b.ver = 200 sum: 10 sum.ver = 105 10 Before using a.val and b.val, T2 checks the versions. Since they are different, Abort, and restart the transaction

Illustration: Contention Addressing Thread T1 Thread T2 a b atomic { a.val += 10; b.val -= 20; } atomic { sum = a.val + b.val; } ver = 101 ver = 201 val = 20 val = 20 sum ver = 105 T1’s log: a.ver = 100 b.ver = 200 a.val: 10 b.val: 40 T2’s log: a.ver = 101 b.ver = 201 sum: 10 sum.ver = 105 10 T2 re-read the log versions of a, b and sum. It’s going to commit the transaction since there is no conflict

Illustration: Contention Addressing Thread T1 Thread T2 a b atomic { a.val += 10; b.val -= 20; } atomic { sum = a.val + b.val; } ver = 101 ver = 201 val = 20 val = 20 sum ver = 106 T1’s log: a.ver = 100 b.ver = 200 a.val: 10 b.val: 40 T2’s log: a.ver = 101 b.ver = 201 sum.ver = 105 40

What is Transactional Memory? (cont.) Languages augmented with a new atomic construct: lock(L); __tm_atomic { x++; -> x++; y++;  y++; unlock(L); } User specifies, system implements “under the hood” Common to all proposals New languages (X10, Chapel) Extensions to existing languages (Java, C#, C/C++, …) Intended use 1. use LT or LTX to read from a set of locations, 2. use VALIDATE to check that the values read are consistent, 3. use ST to modify a set of locations, and 4. use COMMIT to make the changes permanent. If either the VALIDATE or the COMMIT fails, the process returns to step (1). TM is a concurrency control mechanism

What is Transactional Memory? (cont.) Atomicity Strong: TM code is atomic as seen by both TM code and non-TM code Weak: Only TM code conflicts with other TM code are detected Conflict detection (or validation): Eager: Abort TM as soon as a conflict is detected Lazy: Only at the end of the transaction

What is Transactions Memory? (cont.) Concurrent read operations Basic locks do not permit multiple readers Transactions automatically allow multiple concurrent readers Concurrent access to disjoint data Programmers have to manually perform fine-grain locking Difficult and error prone Not modular Transactions automatically provide fine-grain locking Safe & scalable composition of software modules

What is Transactional Memory? (cont.) Overhead Necessary log operations (load, update, store and check version) take time Needed possible instruments in the code to monitor memory accesses for logging Amount of synchronizations within the application can impact performance If synchronization amounts beyond 20% of the application, overhead is relatively big.

TM Implementation Requirements Support unbounded (space & time) transactions Programmer cannot reason about size/duration of atomic block Good consistent performance No falling off the cliff Integrated with language environment Tools, garbage collection, debugger, etc Support full TM semantics such as nesting Language constructs use these semantics Flexible contention management An Unbounded Transactional Memory (UTM) system

Agenda TM Basics Playing with Software TM Summary Language Extensions Compiler/Library Support Summary 11/19/2018 24 24

TM Language Extensions Language Extensions (keywords) __tm_atomic { statements } Function Attribute Extensions (Windows*/Linux*) __declspec(tm_callable) / __attribute__((tm_callable))  __declspec(tm_unknown) /__attribute__((tm_unknown)) __declspec(tm_pure) / __attribute__((tm_pure)) __declspec(tm_seh) / none TM Language Extensions still evolving

Examples int foo(int arg) { __tm_atomic { func3(); a = b + 10; func1(); } func2(); return(a); } __declspec(tm_pure) void func1(); void func2(); __declspec(tm_callable) void func3() { func1(); } This is just some illustration of the use of annotations. Details can be referred to the backup portion

Class Level attributes __declspec(tm_callable) class foo { int func1(); int func2(); virtual void func3(); } class foo { __declspec(tm_callable) int func1(); __declspec(tm_callable) int func2(); __declspec(tm_callable) virtual void func3(); } This is just some illustration of the use of annotations. Details can be referred to the backup portion

Class Level attributes __declspec(tm_callable) class foo { int func1(); __declspec(tm_pure) int func2(); __declspec(tm_callable) int func3(); __declspec(tm_unknown) virtual void func4(); } class foo { __declspec(tm_callable) int func1(); __declspec(tm_pure) int func2(); __declspec(tm_callable) int func3(); __declspec(tm_unknown) virtual void func4(); } This is just some illustration of the use of annotations. Details can be referred to the backup portion

Exceptions which escape atomic region cause atomic region to commit Exception Handling void func(void) { __tm_atomic { try { stmt1; if (..) throw(); }catch(…) { ………… } stmt3; } } void func(void) { try { __tm_atomic { stmt1; if (…) throw(); } } catch(…) { ……… } } This is just some illustration of the use of annotations. Details can be referred to the backup portion Exceptions which escape atomic region cause atomic region to commit

Software TM (STM) Software implementation of transactional memory Model Unbound TM (UTM) Closed Nesting Weak Atomicity Memory access are instrumented in tm_region Functions cloned for tm/non-tm calls only TM callable functions are instrumented tm_region duplicated with single lock access control if number of retries exceed threshold then execute in single lock mode. Single lock code path not instrumented for loads/stores

Software TM Supports Compiler Library Load/Store instrumentation Function cloning Switch to single lock when calling function not aware of TM Library Track memory access Detect memory conflicts and retry if conflicts occur Switch to single lock if retry exceed limit

Software TM Compiler: Function Call Inside TM Region Determine whether the callee function is a TM function Runtime check for indirect function call Call the corresponding transactional version if it is a TM function If it is not a TM function (legacy library) Restart transaction in single lock mode. Use binary translation Outside TM Region Regular call

Sample Code Generation __tm_atomic { a = b + 10; }

Sample Code Generation (cont.) descriptor = getTransactionAndMementoSize(&size) memento = alloc(size) initializeTransaction(descriptor, memento, location) Label: action = startTransaction(descriptor, tm_mode) if ( action & restoreLiveVariables ) liveX = saved_liveX if ( action & retryTransaction ) goto Label else if ( (action & saveLiveVariables)) saved_liveX = liveX if ( action & InstrumentedCode) reg1 = ReadInteger(descriptor, &b) reg2 = reg1 + 10 WriteInteger(descriptor, &a, reg2) a = b + 20 commitTransaction(descriptor)

STM Compiler: TM Callable function __tm_callable foo() {….} _foo : jmp _foo$_@nonTXN  Calls outside TM Region _foo+4 : mov eax, -25264136 jmp _foo$_@TXN _foo$_@nonTXN : code for foo _foo$_@TXN :  Calls from TM Region instrumented code for foo

STM Compiler: Function Call __declspec(tm_callable) void foo(); void (*fptr)(); bar() { foo(); __tm_atomic { fptr(); } bar() { _foo._$nonTXN() __tm_atomic { _foo._$TXN(); if(*(fptr + 3) == -25264136) (*(fptr + 7)(); else { changeToIrrevocable(); *fptr(); } }

STM Compiler: Function Call __declspec(tm_callable) void foo(); void (*fptr)(); bar() { foo(); __tm_atomic { fptr(); } bar() { _foo._$nonTXN() __tm_atomic { _foo._$TXN(); if(*(fptr + 3) == -25264136) (*(fptr + 7)(); else { changeToIrrevocable(); *fptr(); } }

Compiler support for TM Support language extension for TM Support various architectures IA-32 Architectures, Intel® 64 Support various OS Windows®, Linux® Enable all existing optimization ipo, constant propagation, dead code elimination …… Support legacy code Optimization of TM specific code RaR, RaW, WaW … Support exception handling with TM C++ Exception Handling Structured Exception Handling (Windows®)

Summary Transaction memory is important for managing concurrency Eliminate deadlocks Easy to compose atomic regions Fine grained concurrency Prevent unnecessary and conservative locking Transaction memory can be implemented in Software/Hardware

Backup

Language Extension: __tm_atomic Syntax __tm_atomic { statements } Semantics Example indicates that stmt1 and stmt2 can be executed with automatic concurrency control if foo is called by multiple threads, when a memory conflict is detected by TM runtime, it does rollback and re-execution transaction A. void foo(void) { __tm_atomic { // Transaction A stmt1; stmt2; } Allow transaction nesting Single entry and multi-exit Undefined behavior for multi-entry TM region Semantics: A TM region is a code block that can be executed in parallel with automatic concurrency control. Each memory read/write in a TM region is a TM read/write with support from the compiler and runtime library. If TM runtime software detects TM read/write conflicts during execution before committing the TM reads and writes, it aborts and retries from the beginning of this TM region. It carries out this process by rolling back to the saved machine state and program state. Two transactions, in two threads of a process, are isolated from each other. They cannot see intermediate values of memory operands. The execution acts as if all operations in one thread are completed before, or after, all the operations in the other thread. However, that there is no similar guarantee of isolation between transactional code in one thread and non‐transactional code in another thread. A transaction specified by __tm_atomic can be nested inside another transaction. The effects of a nested transaction are visible only when the outermost transaction commits. This is often described as “closed nesting”. On a data conflict, the runtime may roll back to any level in the transaction nest, including to the outermost transaction of all, and re‐execute the transaction. Rules: The transactional memory (TM) region must be a single‐entry code section. It may have more than one exit. 􀂙 If the transactional region has multiple entries, program behavior is implementation dependant. The native compiler implementation issues error messages whenever this implementation dependency is statically detectable. 􀂙 continue/break/return/goto statements are permitted while implying a commit operation of this __tm_atomic region with a warning message issued by the compiler if they are bounded to a loop/switch/if –statement outside the __tm_atomic region. 􀂙 In Intel® C++ STM Compiler Prototype Edition 2.0, __tm_atomic supports legacy functions and treats them as irrevocable operations‐‐operations whose side effects cannot be rolled back. A transaction that executes an irrevocable operation is guaranteed to commit without rolling back. Thus, aside from functions specified with __declspec(tm_pure) or __declspec(tm_callable), functions specified with __declspec(tm_unknown) or not annotated with any TM attribute can also be called inside the __tm_atomic as well. (TM attributes are described later). 􀂙 Support TM annotation for C++ class, template member function, virtual function and class‐level TM attribute inheritance. 􀂙 Support transactional version of malloc, calloc, ralloc and free

Examples: __tm_atomic (1) //stmt1 and stmt2 execute with automatic concurrency control //if foo is called by multiple threads. When a memory conflict is //detected by TM runtime, it rolls back and re-executes //transaction A. void foo(void) { __tm_atomic { // Transaction A stmt1; stmt2; }

Examples: __tm_atomic (2) //Nested transaction void foo(int *x, int *y) { __tm_atomic { // Transaction A *x = *x + 1; __tm_atomic { // Transaction B *y = *y + *x; } //commit B, or if memory conflicts deteced, rollback and re-execute B //(or A) } //commit A,or if memory conflicts detected, rollback and re-execute A }

Examples: __tm_atomic (3) break/continue semantics void foo(int *x, int *y) { for (int k=0; k<100; k++) __tm_atomic { // Transaction A if (cond) { *x = *x + 1; continue; } else if (test) { break; } // continue/break in transaction A jumps out of transaction A } // and commits transaction A void foo(int *x, int *y) { __tm_atomic { // Transaction A for (int k=0; k<100; k++) { stmt1; if (*x) return; } switch (cond) { case 1: stmt2; break; default: break; } // break/return in transaction A does not jump out of transaction A

Function Attribute Extensions __declspec(tm_callable) / __attribute__((tm_callable)) Function that can be called inside transactions __declspec(tm_waiver) / __attribute(tm_waiver) Function enables an irrevocable function to be revocable __declspec(tm_only) / __attribute(tm_only) Function called inside transactions only Works in conjunction with tm_callable __declspec(tm_seh) // Windows* only Enabled structured exception handling for function with atomic region

Specifying a Function as tm_callable Syntax Windows* OS Syntax: Annotate a function as a tm_callable function using __declspec __declspec(tm_callable) function-declaration-statement Linux* OS Syntax: Annotate a function as a tm_callable function using __attribute__ __attribute__((tm_callable)) function-declaration-statement Semantics Cloned TM version function Read/Write barrier function inside transaction Semantics: When the __declspec(tm_callable) is used, its associated function is annotated as a TM callable function for the compiler to generate its cloned TM‐version function. Within the cloned TM‐version function, each memory read and write is translated to a TM read barrier function and a TM write barrier function by the compiler with support from the software runtime library. The cloned TM‐version function generated by the compiler with TM reads/writes will be used inside transactions. The original function with normal reads/writes will be used outside transactions. Rules: __tm_atomic allows a transaction to call tm_callable functions, and allows tm_callable functions to call other tm_callable and tm_pure functions. A _tm_callable function allows irrevocable operations and legacy function calls inside.

Example: tm_callable __declspec(tm_callable) void UserFoo(int); void UserGoo(float); // it is implicitly annotated with tm_unknown void func(void) { __tm_atomic { UserFoo(100); // legal use UserGoo(128.8); // switch to irrevocable mode }