Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software Transactional Memory

Similar presentations


Presentation on theme: "Software Transactional Memory"— Presentation transcript:

1 Software Transactional Memory
Session: Software Transactional Memory

2 Agenda TM Basics Playing with Software TM Summary 11/19/2018 2 2

3 Agenda TM Basics Playing with Software TM Summary Introduction
Working principles Implementation Requirements Playing with Software TM Summary 11/19/2018 3 3

4 Introduction Multi-core processors are now mainstream
Performance increase has to come from parallelism Traditionally programmers use locks to write parallel programs Lock based synchronization has known problems deadlocks, fine-grain parallelism, composition Transaction Memory avoids the problems and ease parallel programming Free lunch of performance is over. Lock-based programming has a number of well-known problems that frequently arise in practice: They require thinking about overlapping operations and partial operations in distantly separated and seemingly unrelated sections of code, a task which is very difficult and error-prone for programmers. They require programmers to adopt a locking policy to prevent deadlock, livelock, and other failures to make progress. Such policies are often informally enforced and fallible, and when these issues arise they are insidiously difficult to reproduce and debug. They can lead to priority inversion, a phenomenon where a high-priority thread is forced to wait on a low-priority thread holding exclusive access to a resource that it needs. Support lock-free infra structure so programmers can effectively develop parallel reusable encapsulations that can be used by others w/o risk of deadlocks

5 Example: Insert Node into a link list
{ new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; }

6 Example: Insert Node into a link list
Thread 1 Thread 2 { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; }

7 Example: Insert Node into a link list
Lock Based Thread 1 Thread 2 { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; }

8 Example: Insert Node into a link list
Lock Based Thread 1 get lock and execute Lock { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; } Thread 2 waits for lock to be released Lock { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; }

9 Example: Insert Node into a link list
TM Based Thread 2 Thread 1 __tm_atomic { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; }

10 Example: Insert Node into a link list
TM Based Thread 1 Thread 2 __tm_atomic { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; } __tm_atomic { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; } Both threads execute in parallel, if they write to same node then abort one thread and retry the transaction

11 What is Transactions Memory?
Transactional Memory by Definitions A sequence of memory operations that either execute completely and atomically with no conflicts with other threads or have no effect Log Commit/Abort Quote from “Transactional Memory: Architectural Support for Lock-Free Data Structures”, Maurice Herlihy and J. Eliot B. Moss A transaction is a finite sequence of machine instructions, executed by a single process, satisfying the following properties: Serializability: Transactions appear to execute serially, meaning that the steps of one transaction never appear to be interleaved with the steps of another. Committed transactions are never observed by different processors to execute in different orders. Atomicity: Each transaction makes a sequence of tentative changes to shared memory. When the transaction completes, it either commits, making its changes visible to other processes (effectively) instantaneously, or it aborts, causing its changes to be disgarded. Transactional memory provides the following primitive instructions for accessing memory: Load-transactional (LT) reads the value of a shared memory location into a private register. Load-transactional-exclusive( LTXr)e ads the value of a shared memory location into a private register, “hinting” that the location is likely to be updated. Store-trunsuctional (sr) tentatively writes a value from a private register to a shared memory location. This new value does not become visible to other processors until the transaction successfully commits. A transaction’s read set is the set of locations rad by LT, and its write set is the set of locations accessed by LTX or ST. Its data set is the union of the read and write sets. Transactional memory also provides the following instructions for manipulating transaction state: Commit (COMMIT) attempts to make the transaction’s tentative changes permanent. It succeeds only if no other transaction has updated any location in the transaction’s data set, and no other transaction has read any location in this transaction’s write set. If it succeeds, the transaction’s changes to its write set become visible to other processes. If it fails, all changes to the write set are discarded. Either way, COMMIT returns processes an indication of success or failure. Abort (ABORT) discards all updates to the write set. Validate (VALIDATE) test the current transaction status. A successful VALIDATE returns True, indicating that the current transaction has not aborted (although it may do so later). An unsuccessful VALIDATE returns False, indicating that the current transaction has aborted, and discards the transaction’s tentative updates.

12 Illustration: Contention Addressing
Thread T1 Thread T2 a b atomic { a.val += 10; b.val -= 20; } atomic { sum = a.val + b.val; } ver = 100 ver = 200 val = 10 val = 40 sum ver = 105 T1’s log: T2’s log: 10 Log can be implemented as a “Transactional Cache”, an ad hoc chip or a specific location in system memory with corresponding protocols.

13 Illustration: Contention Addressing
Thread T1 Thread T2 a b atomic { a.val += 10; b.val -= 20; } atomic { sum = a.val + b.val; } ver = 100 ver = 200 val = 10 val = 40 sum ver = 105 T1’s log: a.ver = 100 T2’s log: 10

14 Illustration: Contention Addressing
Thread T1 Thread T2 a b atomic { a.val += 10; b.val -= 20; } atomic { sum = a.val + b.val; } ver = 100 ver = 200 val = 10 val = 40 sum ver = 105 T1’s log: a.ver = 100 b.ver = 200 a.val: 10 b.val: 40 T2’s log: a.ver = 100 b.ver = 200 sum: 10 sum.ver = 105 10 Before updating a.val, b.val and sum, T1 and T2 logs the data that are going to be overwritten and the versions After logging, T1 checks the logged versions of a and b vs. the read versions of a and b; will commit the transaction since there is no version difference

15 Illustration: Contention Addressing
Thread T1 Thread T2 a b atomic { a.val += 10; b.val -= 20; } atomic { sum = a.val + b.val; } ver = 101 ver = 201 val = 20 val = 20 sum ver = 105 T1’s log: a.ver = 100 b.ver = 200 a.val: 10 b.val: 40 T2’s log: a.ver = 100 b.ver = 200 sum: 10 sum.ver = 105 10 T1 updates a.val, b.val and the versions

16 Illustration: Contention Addressing
Thread T1 Thread T2 a b atomic { a.val += 10; b.val -= 20; } atomic { sum = a.val + b.val; } ver = 101 ver = 201 val = 20 val = 20 sum ver = 105 T1’s log: a.ver = 100 b.ver = 200 a.val: 10 b.val: 40 T2’s log: a.ver = 100 b.ver = 200 sum: 10 sum.ver = 105 10 Before using a.val and b.val, T2 checks the versions. Since they are different, Abort, and restart the transaction

17 Illustration: Contention Addressing
Thread T1 Thread T2 a b atomic { a.val += 10; b.val -= 20; } atomic { sum = a.val + b.val; } ver = 101 ver = 201 val = 20 val = 20 sum ver = 105 T1’s log: a.ver = 100 b.ver = 200 a.val: 10 b.val: 40 T2’s log: a.ver = 101 b.ver = 201 sum: 10 sum.ver = 105 10 T2 re-read the log versions of a, b and sum. It’s going to commit the transaction since there is no conflict

18 Illustration: Contention Addressing
Thread T1 Thread T2 a b atomic { a.val += 10; b.val -= 20; } atomic { sum = a.val + b.val; } ver = 101 ver = 201 val = 20 val = 20 sum ver = 106 T1’s log: a.ver = 100 b.ver = 200 a.val: 10 b.val: 40 T2’s log: a.ver = 101 b.ver = 201 sum.ver = 105 40

19 What is Transactional Memory? (cont.)
Languages augmented with a new atomic construct: lock(L); __tm_atomic { x++; > x++; y++;  y++; unlock(L); } User specifies, system implements “under the hood” Common to all proposals New languages (X10, Chapel) Extensions to existing languages (Java, C#, C/C++, …) Intended use 1. use LT or LTX to read from a set of locations, 2. use VALIDATE to check that the values read are consistent, 3. use ST to modify a set of locations, and 4. use COMMIT to make the changes permanent. If either the VALIDATE or the COMMIT fails, the process returns to step (1). TM is a concurrency control mechanism

20 What is Transactional Memory? (cont.)
Atomicity Strong: TM code is atomic as seen by both TM code and non-TM code Weak: Only TM code conflicts with other TM code are detected Conflict detection (or validation): Eager: Abort TM as soon as a conflict is detected Lazy: Only at the end of the transaction

21 What is Transactions Memory? (cont.)
Concurrent read operations Basic locks do not permit multiple readers Transactions automatically allow multiple concurrent readers Concurrent access to disjoint data Programmers have to manually perform fine-grain locking Difficult and error prone Not modular Transactions automatically provide fine-grain locking Safe & scalable composition of software modules

22 What is Transactional Memory? (cont.)
Overhead Necessary log operations (load, update, store and check version) take time Needed possible instruments in the code to monitor memory accesses for logging Amount of synchronizations within the application can impact performance If synchronization amounts beyond 20% of the application, overhead is relatively big.

23 TM Implementation Requirements
Support unbounded (space & time) transactions Programmer cannot reason about size/duration of atomic block Good consistent performance No falling off the cliff Integrated with language environment Tools, garbage collection, debugger, etc Support full TM semantics such as nesting Language constructs use these semantics Flexible contention management An Unbounded Transactional Memory (UTM) system

24 Agenda TM Basics Playing with Software TM Summary Language Extensions
Compiler/Library Support Summary 11/19/2018 24 24

25 TM Language Extensions
Language Extensions (keywords) __tm_atomic { statements } Function Attribute Extensions (Windows*/Linux*) __declspec(tm_callable) / __attribute__((tm_callable))  __declspec(tm_unknown) /__attribute__((tm_unknown)) __declspec(tm_pure) / __attribute__((tm_pure)) __declspec(tm_seh) / none TM Language Extensions still evolving

26 Examples int foo(int arg) { __tm_atomic { func3(); a = b + 10; func1(); } func2(); return(a); } __declspec(tm_pure) void func1(); void func2(); __declspec(tm_callable) void func3() { func1(); } This is just some illustration of the use of annotations. Details can be referred to the backup portion

27 Class Level attributes
__declspec(tm_callable) class foo { int func1(); int func2(); virtual void func3(); } class foo { __declspec(tm_callable) int func1(); __declspec(tm_callable) int func2(); __declspec(tm_callable) virtual void func3(); } This is just some illustration of the use of annotations. Details can be referred to the backup portion

28 Class Level attributes
__declspec(tm_callable) class foo { int func1(); __declspec(tm_pure) int func2(); __declspec(tm_callable) int func3(); __declspec(tm_unknown) virtual void func4(); } class foo { __declspec(tm_callable) int func1(); __declspec(tm_pure) int func2(); __declspec(tm_callable) int func3(); __declspec(tm_unknown) virtual void func4(); } This is just some illustration of the use of annotations. Details can be referred to the backup portion

29 Exceptions which escape atomic region cause atomic region to commit
Exception Handling void func(void) { __tm_atomic { try { stmt1; if (..) throw(); }catch(…) { ………… } stmt3; } } void func(void) { try { __tm_atomic { stmt1; if (…) throw(); } } catch(…) { ……… } } This is just some illustration of the use of annotations. Details can be referred to the backup portion Exceptions which escape atomic region cause atomic region to commit

30 Software TM (STM) Software implementation of transactional memory
Model Unbound TM (UTM) Closed Nesting Weak Atomicity Memory access are instrumented in tm_region Functions cloned for tm/non-tm calls only TM callable functions are instrumented tm_region duplicated with single lock access control if number of retries exceed threshold then execute in single lock mode. Single lock code path not instrumented for loads/stores

31 Software TM Supports Compiler Library Load/Store instrumentation
Function cloning Switch to single lock when calling function not aware of TM Library Track memory access Detect memory conflicts and retry if conflicts occur Switch to single lock if retry exceed limit

32 Software TM Compiler: Function Call
Inside TM Region Determine whether the callee function is a TM function Runtime check for indirect function call Call the corresponding transactional version if it is a TM function If it is not a TM function (legacy library) Restart transaction in single lock mode. Use binary translation Outside TM Region Regular call

33 Sample Code Generation
__tm_atomic { a = b + 10; }

34 Sample Code Generation (cont.)
descriptor = getTransactionAndMementoSize(&size) memento = alloc(size) initializeTransaction(descriptor, memento, location) Label: action = startTransaction(descriptor, tm_mode) if ( action & restoreLiveVariables ) liveX = saved_liveX if ( action & retryTransaction ) goto Label else if ( (action & saveLiveVariables)) saved_liveX = liveX if ( action & InstrumentedCode) reg1 = ReadInteger(descriptor, &b) reg2 = reg1 + 10 WriteInteger(descriptor, &a, reg2) a = b + 20 commitTransaction(descriptor)

35 STM Compiler: TM Callable function
__tm_callable foo() {….} _foo : jmp  Calls outside TM Region _foo+4 : mov eax, jmp : code for foo :  Calls from TM Region instrumented code for foo

36 STM Compiler: Function Call
__declspec(tm_callable) void foo(); void (*fptr)(); bar() { foo(); __tm_atomic { fptr(); } bar() { _foo._$nonTXN() __tm_atomic { _foo._$TXN(); if(*(fptr + 3) == ) (*(fptr + 7)(); else { changeToIrrevocable(); *fptr(); } }

37 STM Compiler: Function Call
__declspec(tm_callable) void foo(); void (*fptr)(); bar() { foo(); __tm_atomic { fptr(); } bar() { _foo._$nonTXN() __tm_atomic { _foo._$TXN(); if(*(fptr + 3) == ) (*(fptr + 7)(); else { changeToIrrevocable(); *fptr(); } }

38 Compiler support for TM
Support language extension for TM Support various architectures IA-32 Architectures, Intel® 64 Support various OS Windows®, Linux® Enable all existing optimization ipo, constant propagation, dead code elimination …… Support legacy code Optimization of TM specific code RaR, RaW, WaW … Support exception handling with TM C++ Exception Handling Structured Exception Handling (Windows®)

39 Summary Transaction memory is important for managing concurrency
Eliminate deadlocks Easy to compose atomic regions Fine grained concurrency Prevent unnecessary and conservative locking Transaction memory can be implemented in Software/Hardware

40

41 Backup

42 Language Extension: __tm_atomic
Syntax __tm_atomic { statements } Semantics Example indicates that stmt1 and stmt2 can be executed with automatic concurrency control if foo is called by multiple threads, when a memory conflict is detected by TM runtime, it does rollback and re-execution transaction A. void foo(void) { __tm_atomic { // Transaction A stmt1; stmt2; } Allow transaction nesting Single entry and multi-exit Undefined behavior for multi-entry TM region Semantics: A TM region is a code block that can be executed in parallel with automatic concurrency control. Each memory read/write in a TM region is a TM read/write with support from the compiler and runtime library. If TM runtime software detects TM read/write conflicts during execution before committing the TM reads and writes, it aborts and retries from the beginning of this TM region. It carries out this process by rolling back to the saved machine state and program state. Two transactions, in two threads of a process, are isolated from each other. They cannot see intermediate values of memory operands. The execution acts as if all operations in one thread are completed before, or after, all the operations in the other thread. However, that there is no similar guarantee of isolation between transactional code in one thread and non‐transactional code in another thread. A transaction specified by __tm_atomic can be nested inside another transaction. The effects of a nested transaction are visible only when the outermost transaction commits. This is often described as “closed nesting”. On a data conflict, the runtime may roll back to any level in the transaction nest, including to the outermost transaction of all, and re‐execute the transaction. Rules: The transactional memory (TM) region must be a single‐entry code section. It may have more than one exit. 􀂙 If the transactional region has multiple entries, program behavior is implementation dependant. The native compiler implementation issues error messages whenever this implementation dependency is statically detectable. 􀂙 continue/break/return/goto statements are permitted while implying a commit operation of this __tm_atomic region with a warning message issued by the compiler if they are bounded to a loop/switch/if –statement outside the __tm_atomic region. 􀂙 In Intel® C++ STM Compiler Prototype Edition 2.0, __tm_atomic supports legacy functions and treats them as irrevocable operations‐‐operations whose side effects cannot be rolled back. A transaction that executes an irrevocable operation is guaranteed to commit without rolling back. Thus, aside from functions specified with __declspec(tm_pure) or __declspec(tm_callable), functions specified with __declspec(tm_unknown) or not annotated with any TM attribute can also be called inside the __tm_atomic as well. (TM attributes are described later). 􀂙 Support TM annotation for C++ class, template member function, virtual function and class‐level TM attribute inheritance. 􀂙 Support transactional version of malloc, calloc, ralloc and free

43 Examples: __tm_atomic (1)
//stmt1 and stmt2 execute with automatic concurrency control //if foo is called by multiple threads. When a memory conflict is //detected by TM runtime, it rolls back and re-executes //transaction A. void foo(void) { __tm_atomic { // Transaction A stmt1; stmt2; }

44 Examples: __tm_atomic (2)
//Nested transaction void foo(int *x, int *y) { __tm_atomic { // Transaction A *x = *x + 1; __tm_atomic { // Transaction B *y = *y + *x; } //commit B, or if memory conflicts deteced, rollback and re-execute B //(or A) } //commit A,or if memory conflicts detected, rollback and re-execute A }

45 Examples: __tm_atomic (3)
break/continue semantics void foo(int *x, int *y) { for (int k=0; k<100; k++) __tm_atomic { // Transaction A if (cond) { *x = *x + 1; continue; } else if (test) { break; } // continue/break in transaction A jumps out of transaction A } // and commits transaction A void foo(int *x, int *y) { __tm_atomic { // Transaction A for (int k=0; k<100; k++) { stmt1; if (*x) return; } switch (cond) { case 1: stmt2; break; default: break; } // break/return in transaction A does not jump out of transaction A

46 Function Attribute Extensions
__declspec(tm_callable) / __attribute__((tm_callable)) Function that can be called inside transactions __declspec(tm_waiver) / __attribute(tm_waiver) Function enables an irrevocable function to be revocable __declspec(tm_only) / __attribute(tm_only) Function called inside transactions only Works in conjunction with tm_callable __declspec(tm_seh) // Windows* only Enabled structured exception handling for function with atomic region

47 Specifying a Function as tm_callable
Syntax Windows* OS Syntax: Annotate a function as a tm_callable function using __declspec __declspec(tm_callable) function-declaration-statement Linux* OS Syntax: Annotate a function as a tm_callable function using __attribute__ __attribute__((tm_callable)) function-declaration-statement Semantics Cloned TM version function Read/Write barrier function inside transaction Semantics: When the __declspec(tm_callable) is used, its associated function is annotated as a TM callable function for the compiler to generate its cloned TM‐version function. Within the cloned TM‐version function, each memory read and write is translated to a TM read barrier function and a TM write barrier function by the compiler with support from the software runtime library. The cloned TM‐version function generated by the compiler with TM reads/writes will be used inside transactions. The original function with normal reads/writes will be used outside transactions. Rules: __tm_atomic allows a transaction to call tm_callable functions, and allows tm_callable functions to call other tm_callable and tm_pure functions. A _tm_callable function allows irrevocable operations and legacy function calls inside.

48 Example: tm_callable __declspec(tm_callable) void UserFoo(int);
void UserGoo(float); // it is implicitly annotated with tm_unknown void func(void) { __tm_atomic { UserFoo(100); // legal use UserGoo(128.8); // switch to irrevocable mode }


Download ppt "Software Transactional Memory"

Similar presentations


Ads by Google