Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software Transactional Memory system for C++ Serge Preis, Ravi Narayanaswami Intel Corporation.

Similar presentations


Presentation on theme: "Software Transactional Memory system for C++ Serge Preis, Ravi Narayanaswami Intel Corporation."— Presentation transcript:

1 Software Transactional Memory system for C++ Serge Preis, Ravi Narayanaswami Intel Corporation

2 2 Software and Services Group 2/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Agenda Parallel programming challenges Transactional memory introduction and overview −Software implementation of TM specifics Intel STM system −Language extensions −C++ support −Compiler overview −Library overview Performance results and analysis Conclusion Questions and answers

3 3 Software and Services Group 3/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Parallel programming Shared memory gets more popular −Multi-core is getting momentum −2-socket workstations for maximum desktop performance −Hyper-threading is back Shared resources access may be a bottle-neck −Locks required to avoid races −Single global lock is simple, but limiting −Fine-grain locks may provide best scalability −Fine-grain locks are hard to design, implement and test −Poor implementation is error-prone and limiting −Locks are not composable

4 4 Software and Services Group 4/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others What is transactional memory Syntax: __tm_atomic { // transaction code goes here } Semantics: −Isolation: effects are localized −Atomicity: commit or rollback −Retry if data conflict is detected −Publication and privatization safety −Composability via nesting −Fine-grain: based on data accesses

5 5 Software and Services Group 5/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Example: Insert Node into a link list { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; }

6 6 Software and Services Group 6/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Example: Insert Node into a link list Thread 1 Thread 2 { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; }

7 7 Software and Services Group 7/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Example: Insert Node into a link list Lock { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; } Thread 1 Thread 2 Single global lock get lock and execute waits for lock to be released

8 8 Software and Services Group 8/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Example: Insert Node into a link list __tm_atomic { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; } Thread 1 Thread 2 TM Based Both threads execute in parallel, if they write to same node then abort and retry one of transactions

9 9 Software and Services Group 9/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Transactional memory overview Fine-grain concurrency management without locks −Concurrent readers are welcome −Re-execute entire transaction if conflict is detected Simple syntax and semantics −Looks and behaves like single global lock −Simpler create race-free programs Possible implementations −Purely software (STM) −Software with HW acceleration (HaSTM) −Separate HW and SW (HyTM) −HW-based with TX size restrictions (RTM) −HW-based for short transactions, software for unbounded (VTM) No TM hardware is out yet −SUN* ROCK* planned to be released next year −AMD* has published spec for TM H/W assistance

10 10 Software and Services Group 10/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Software transactional memory Weak atomicity: guarantees are only for transactional code −Same as for locks Unbounded: transactions of arbitrary size are supported −HW resources (memory) is the limit Instrumentation of memory accesses is required within transactions −Spatial and performance overhead −Including called functions −Not always possible −Different data and contention management techniques −Object based or word-based depending on language

11 11 Software and Services Group 11/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Intel STM implementation C/C++ Compiler + run-time library Word-based STM system Close nesting Failure atomicity (__tm_abort) Irrevocable execution support for I/O and legacy Support for C++ constructs Highly optimized System is complete, based on production compiler and published for everyone to try

12 12 Software and Services Group 12/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Basic language extensions Atomic blocks −Nesting is possible >Composability is a great advantage −Support of function calls >Including indirect −Failure atomicity using user aborts >Lexical only −Support of pre-compiled code >Dynamically incompatible with aborts Escape from atomic execution −Programmer ensures correctness __tm_atomic { if (local_var > Global_val) { Global_max = local_var; local_var = Global_arr[++loc_i]; foo(Global_max); } __tm_atomic { foo(Global_max++) } __tm_atomic { cout << “HelloWorld!”; //precompiled __tm_atomic { // some code __tm_abort; // inner TX aborted } if (some_condition) { __tm_abort; // runtime error }

13 13 Software and Services Group 13/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Annotations All functions called from atomic block are either annotated or treated as pre-compiled −Including class methods, template instantiations etc. −Can be deduced by compiler automatically for some cases −Special care is taken over indirect calls Main annotations are: −tm_callable: may be called from atomic region if processed by compiler −tm_safe: same as callable, but safe to mix with tm_abort (no unsafe pre- compiled code inside) −tm_pure: may be called from atomic region as is −tm_unknown: pre-compiled code (same as no annotation at all), overrides all other annotations in some cases Some functions processed specially −Memory allocations −Memory copying −There is special annotation to code TM wrappers for functions

14 14 Software and Services Group 14/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Annotating many functions at once Class annotations −tm_callable, tm_safe and tm_unknow supported −Serves as default to all methods introduced in the class −Inherited and may be overridden on child −May be overridden on individual methods, overriding rules apply Virtual members annotations −All annotations are supported −Inheritance has higher priority than class one −Overriding rules apply Templates annotations −My be overridden on explicit instantiation and specialization −Both functions and classes are supported Derived class Baseunknowncallablesafepure unknownYes callablenoYes no safenoNoYesno purenoNonoYes #define _ds(x) __declspec(x) struct A { virtual int _ds(tm_safe) getA() const; virtual int process(); //unknown }; _ds(tm_callable) struct B { virtual int getA() const; // tm_callable virtual int __ds(tm_unknown) process(); }; struct AB : A, B { // tm_callable int _ds(tm_pure) getA(); //tm_safe→error int process(); // tm_unknown };

15 15 Software and Services Group 15/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Exception Handling with TM Do not let exceptions leave atomic block open −Catch all exceptions at block boundary and commit implicitly −Abort on exception is also supported explicitly also with rethrow Process all handlers belonging to atomic block accordingly Support correct calls to copy c’tors and d’tors from C++ RTL during stack unwinding −Dynamically decide whether transactional or non-transactional version should be called at each frame Do not corrupt C++ RTL EH state during abort and retry −Should not long-jump out of catch block as we do for retry −At the same time should not execute user code after conflict is detected −Transparent for programmer −Extremely complicated task, now in implementation

16 16 Software and Services Group 16/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Intel STM optimizing compiler Language extensions Instrumentation of code −Transaction boundaries −Memory accesses and function calls −Exception handling Optimizations −Reduce deoptimization overhead −Optimized memory instrumentation >Taking order of access into account >Taking care of locals −Annotations propagation −TM mode based on transaction properties −Simplified instrumentation for some TM modes There is still plenty of room for optimization The product oBased on production version of Intel® C/C++ compiler oFull set of classic optimizations including IPO, vectorizer and parallelizer oWindows*/Linux, IA32/Intel 64 oCompatible with GCC on Linux and various versions of Microsoft* Visual Studio* on Windows* oVersion 3.0 is coming oPublicly available to try oFeedback is appreciated

17 17 Software and Services Group 17/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Intel STM library Comprehensive and flexible ABI −Supports various STM implementations −Allows compiler optimizations Effective contention management with switchable strategies −In-place updates −2 dynamically interchangeable strategies −Effective implementation −Additional obstinate mode for long transactions Irrevocable execution support Failure atomicity support for locals Nesting support with local aborts Special handling for memory allocation and copying STM library ABI at a glance TxnDesc* getTransaction() int beginTx(TxnDesc*,int modes_hints) void commitTx(TxnDesc*) int beginInnerTx(TxnDesc*,int) void commitInnerTx(TxnDesc*) void abortTx(TxnDesc*) void switchToSerialMode(TxnDesc*) void write (TxnDesc*,Type*,Type) Type read (TxnDesc*,Type*) void logValue (TxnDesc*,Type*) void logMem(TxnDesc*,void*,size t) void writeAfterRead(Type*,Type) void writeAfterWrite (Type*,Type) Type readAfterRead (Type*) Type readAfterWrite (Type*) Type readForWrite (Type*)

18 18 Software and Services Group 18/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Example of translation __declcpec(tm_callable) foo(); //... __tm_atomic { foo(++c); } Label: action = StartTX(); if (action & restoreLiveVariables) liveX = saved_liveX; if (action & retryTransaction) goto Label; else if (action & saveLiveVariables) saved_liveX = liveX; if (action & InstrumentedCode) { temp = Read (c); temp = temp + 1; Write (c); foo_$TXN(c); // TM-version of foo } else { c = c + 1; foo(c); } CommitTX();

19 19 Software and Services Group 19/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Performance data (SPLASH2 benchmarks) Speedup vs. serial execution on 8TScalability of BARNESS benchmark Performance of STM depends highly on workload

20 20 Software and Services Group 20/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Performance analysis For many workloads STM outperforms Single Global Lock −On 8 or more threads −Results still lower than Fine Grain Locks −More optimizations and improvements are on the way RAYTRACE is and example where ceiling is hit −When all 8 threads are running the HW is 100% busy and thus STM code increases the machine load beyond the limit and performance drops −Picture is quite different for 8T on 16-way HW, but for 16T is the same −Optimization to run short transactions in Global Lock mode helps much to this benchmark Optimizations are not good for all benchmarks: RADIOSYTY performs best in pure STM runs Small transactions optimization is ON

21 21 Software and Services Group 21/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Conclusion Pros −Simple programming model −Composability −Failure atomicity −Decent scalability at 8 threads and beyond for many workloads Cons −Overhead −Profitability highly depend on workload −Retries eat power −Function annotation are not that convenient Intel C/C++ STM compiler prototype edition is publicly available at http://whaif.intel.com −Includes Intel STM library, user documentation and examples −Active discussion forum for questions and comments We target the future

22 22 Software and Services Group 22/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Q&A


Download ppt "Software Transactional Memory system for C++ Serge Preis, Ravi Narayanaswami Intel Corporation."

Similar presentations


Ads by Google