Software Transactional Memory system for C++ Serge Preis, Ravi Narayanaswami Intel Corporation.

Slides:



Advertisements
Similar presentations
Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
Advertisements

Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
The C ++ Language BY Shery khan. The C++ Language Bjarne Stroupstrup, the language’s creator C++ was designed to provide Simula’s facilities for program.
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Code Generation and Optimization for Transactional Memory Construct in an Unmanaged Language Programming Systems Lab Microprocessor Technology Labs Intel.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Rich Transactions on Reasonable Hardware J. Eliot B. Moss Univ. of Massachusetts,
Object Oriented Programming Elhanan Borenstein Lecture #12 copyrights © Elhanan Borenstein.
Nested Parallelism in Transactional Memory Kunal Agrawal, Jeremy T. Fineman and Jim Sukha MIT.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
We should define semantics for languages, not for TM Tim Harris (MSR Cambridge)
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
EPFL - March 7th, 2008 Interfacing Software Transactional Memory Simplicity vs. Flexibility Vincent Gramoli.
Supporting Nested Transactional Memory in LogTM Authors Michelle J Moravan Mark Hill Jayaram Bobba Ben Liblit Kevin Moore Michael Swift Luke Yen David.
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
The Cost of Privatization Hagit Attiya Eshcar Hillel Technion & EPFLTechnion.
Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.
Adaptive Locks: Combining Transactions and Locks for efficient Concurrency Takayuki Usui et all.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer, Pascal Felber (TU Dresden, Germany / Uni Neuchatel, Switzerland)
1 Scalable and transparent parallelization of multiplayer games Bogdan Simion MASc thesis Department of Electrical and Computer Engineering.
Software Transactional Memory for Dynamic-Sized Data Structures Maurice Herlihy, Victor Luchangco, Mark Moir, William Scherer Presented by: Gokul Soundararajan.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Object Oriented Programming Elhanan Borenstein Lecture #4.
Microsoft Research Faculty Summit Panacea or Pandora’s Box? Software Transactional Memory Panacea or Pandora’s Box? Christos Kozyrakis Assistant.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
Extending Open64 with Transactional Memory features Jiaqi Zhang Tsinghua University.
CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.
Operating Systems Lecture 7 OS Potpourri Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of Software.
WG5: Applications & Performance Evaluation Pascal Felber
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs.
On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
Object-Oriented Programming Chapter Chapter
Consistency Oblivious Programming Hillel Avni Tel Aviv University.
© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.
CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.
1 Synchronization via Transactions. 2 Concurrency Quiz If two threads execute this program concurrently, how many different final values of X are there?
MULTIVIE W Slide 1 (of 21) Software Transactional Memory Should Not Be Obstruction Free Paper: Robert Ennals Presenter: Emerson Murphy-Hill.
Transactional Memory Student Presentation: Stuart Montgomery CS5204 – Operating Systems 1.
4 November 2005 CS 838 Presentation 1 Nested Transactional Memory: Model and Preliminary Sketches J. Eliot B. Moss and Antony L. Hosking Presented by:
Tuning Threaded Code with Intel® Parallel Amplifier.
CHAPTER 18 C – C++ Section 1: Exceptions. Error Handling with Exceptions Forces you to defend yourself Separates error handling code from the source.
CSE 332: C++ Exceptions Motivation for C++ Exceptions Void Number:: operator/= (const double denom) { if (denom == 0.0) { // what to do here? } m_value.
Adaptive Software Lock Elision
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Maurice Herlihy, Victor Luchangco, Mark Moir, William N. Scherer III
Names and Attributes Names are a key programming language feature
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Part 2: Software-Based Approaches
PHyTM: Persistent Hybrid Transactional Memory
Faster Data Structures in Transactional Memory using Three Paths
Many-core Software Development Platforms
Intel® Parallel Studio and Advisor
Chapter 4: Threads.
Software Transactional Memory
Changing thread semantics
On transactions, and Atomic Operations
Lecture 6: Transactions
On transactions, and Atomic Operations
Hybrid Transactional Memory
Software Transactional Memory Should Not be Obstruction-Free
Transactions with Nested Parallelism
Decomposing Hardware Lock Elision
Presentation transcript:

Software Transactional Memory system for C++ Serge Preis, Ravi Narayanaswami Intel Corporation

2 Software and Services Group 2/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Agenda Parallel programming challenges Transactional memory introduction and overview −Software implementation of TM specifics Intel STM system −Language extensions −C++ support −Compiler overview −Library overview Performance results and analysis Conclusion Questions and answers

3 Software and Services Group 3/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Parallel programming Shared memory gets more popular −Multi-core is getting momentum −2-socket workstations for maximum desktop performance −Hyper-threading is back Shared resources access may be a bottle-neck −Locks required to avoid races −Single global lock is simple, but limiting −Fine-grain locks may provide best scalability −Fine-grain locks are hard to design, implement and test −Poor implementation is error-prone and limiting −Locks are not composable

4 Software and Services Group 4/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others What is transactional memory Syntax: __tm_atomic { // transaction code goes here } Semantics: −Isolation: effects are localized −Atomicity: commit or rollback −Retry if data conflict is detected −Publication and privatization safety −Composability via nesting −Fine-grain: based on data accesses

5 Software and Services Group 5/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Example: Insert Node into a link list { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; }

6 Software and Services Group 6/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Example: Insert Node into a link list Thread 1 Thread 2 { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; }

7 Software and Services Group 7/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Example: Insert Node into a link list Lock { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; } Thread 1 Thread 2 Single global lock get lock and execute waits for lock to be released

8 Software and Services Group 8/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Example: Insert Node into a link list __tm_atomic { new_node->prev = node; new_node->next = node->next; node->next->prev = new_node; node->next = new_node; } Thread 1 Thread 2 TM Based Both threads execute in parallel, if they write to same node then abort and retry one of transactions

9 Software and Services Group 9/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Transactional memory overview Fine-grain concurrency management without locks −Concurrent readers are welcome −Re-execute entire transaction if conflict is detected Simple syntax and semantics −Looks and behaves like single global lock −Simpler create race-free programs Possible implementations −Purely software (STM) −Software with HW acceleration (HaSTM) −Separate HW and SW (HyTM) −HW-based with TX size restrictions (RTM) −HW-based for short transactions, software for unbounded (VTM) No TM hardware is out yet −SUN* ROCK* planned to be released next year −AMD* has published spec for TM H/W assistance

10 Software and Services Group 10/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Software transactional memory Weak atomicity: guarantees are only for transactional code −Same as for locks Unbounded: transactions of arbitrary size are supported −HW resources (memory) is the limit Instrumentation of memory accesses is required within transactions −Spatial and performance overhead −Including called functions −Not always possible −Different data and contention management techniques −Object based or word-based depending on language

11 Software and Services Group 11/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Intel STM implementation C/C++ Compiler + run-time library Word-based STM system Close nesting Failure atomicity (__tm_abort) Irrevocable execution support for I/O and legacy Support for C++ constructs Highly optimized System is complete, based on production compiler and published for everyone to try

12 Software and Services Group 12/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Basic language extensions Atomic blocks −Nesting is possible >Composability is a great advantage −Support of function calls >Including indirect −Failure atomicity using user aborts >Lexical only −Support of pre-compiled code >Dynamically incompatible with aborts Escape from atomic execution −Programmer ensures correctness __tm_atomic { if (local_var > Global_val) { Global_max = local_var; local_var = Global_arr[++loc_i]; foo(Global_max); } __tm_atomic { foo(Global_max++) } __tm_atomic { cout << “HelloWorld!”; //precompiled __tm_atomic { // some code __tm_abort; // inner TX aborted } if (some_condition) { __tm_abort; // runtime error }

13 Software and Services Group 13/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Annotations All functions called from atomic block are either annotated or treated as pre-compiled −Including class methods, template instantiations etc. −Can be deduced by compiler automatically for some cases −Special care is taken over indirect calls Main annotations are: −tm_callable: may be called from atomic region if processed by compiler −tm_safe: same as callable, but safe to mix with tm_abort (no unsafe pre- compiled code inside) −tm_pure: may be called from atomic region as is −tm_unknown: pre-compiled code (same as no annotation at all), overrides all other annotations in some cases Some functions processed specially −Memory allocations −Memory copying −There is special annotation to code TM wrappers for functions

14 Software and Services Group 14/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Annotating many functions at once Class annotations −tm_callable, tm_safe and tm_unknow supported −Serves as default to all methods introduced in the class −Inherited and may be overridden on child −May be overridden on individual methods, overriding rules apply Virtual members annotations −All annotations are supported −Inheritance has higher priority than class one −Overriding rules apply Templates annotations −My be overridden on explicit instantiation and specialization −Both functions and classes are supported Derived class Baseunknowncallablesafepure unknownYes callablenoYes no safenoNoYesno purenoNonoYes #define _ds(x) __declspec(x) struct A { virtual int _ds(tm_safe) getA() const; virtual int process(); //unknown }; _ds(tm_callable) struct B { virtual int getA() const; // tm_callable virtual int __ds(tm_unknown) process(); }; struct AB : A, B { // tm_callable int _ds(tm_pure) getA(); //tm_safe→error int process(); // tm_unknown };

15 Software and Services Group 15/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Exception Handling with TM Do not let exceptions leave atomic block open −Catch all exceptions at block boundary and commit implicitly −Abort on exception is also supported explicitly also with rethrow Process all handlers belonging to atomic block accordingly Support correct calls to copy c’tors and d’tors from C++ RTL during stack unwinding −Dynamically decide whether transactional or non-transactional version should be called at each frame Do not corrupt C++ RTL EH state during abort and retry −Should not long-jump out of catch block as we do for retry −At the same time should not execute user code after conflict is detected −Transparent for programmer −Extremely complicated task, now in implementation

16 Software and Services Group 16/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Intel STM optimizing compiler Language extensions Instrumentation of code −Transaction boundaries −Memory accesses and function calls −Exception handling Optimizations −Reduce deoptimization overhead −Optimized memory instrumentation >Taking order of access into account >Taking care of locals −Annotations propagation −TM mode based on transaction properties −Simplified instrumentation for some TM modes There is still plenty of room for optimization The product oBased on production version of Intel® C/C++ compiler oFull set of classic optimizations including IPO, vectorizer and parallelizer oWindows*/Linux, IA32/Intel 64 oCompatible with GCC on Linux and various versions of Microsoft* Visual Studio* on Windows* oVersion 3.0 is coming oPublicly available to try oFeedback is appreciated

17 Software and Services Group 17/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Intel STM library Comprehensive and flexible ABI −Supports various STM implementations −Allows compiler optimizations Effective contention management with switchable strategies −In-place updates −2 dynamically interchangeable strategies −Effective implementation −Additional obstinate mode for long transactions Irrevocable execution support Failure atomicity support for locals Nesting support with local aborts Special handling for memory allocation and copying STM library ABI at a glance TxnDesc* getTransaction() int beginTx(TxnDesc*,int modes_hints) void commitTx(TxnDesc*) int beginInnerTx(TxnDesc*,int) void commitInnerTx(TxnDesc*) void abortTx(TxnDesc*) void switchToSerialMode(TxnDesc*) void write (TxnDesc*,Type*,Type) Type read (TxnDesc*,Type*) void logValue (TxnDesc*,Type*) void logMem(TxnDesc*,void*,size t) void writeAfterRead(Type*,Type) void writeAfterWrite (Type*,Type) Type readAfterRead (Type*) Type readAfterWrite (Type*) Type readForWrite (Type*)

18 Software and Services Group 18/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Example of translation __declcpec(tm_callable) foo(); //... __tm_atomic { foo(++c); } Label: action = StartTX(); if (action & restoreLiveVariables) liveX = saved_liveX; if (action & retryTransaction) goto Label; else if (action & saveLiveVariables) saved_liveX = liveX; if (action & InstrumentedCode) { temp = Read (c); temp = temp + 1; Write (c); foo_$TXN(c); // TM-version of foo } else { c = c + 1; foo(c); } CommitTX();

19 Software and Services Group 19/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Performance data (SPLASH2 benchmarks) Speedup vs. serial execution on 8TScalability of BARNESS benchmark Performance of STM depends highly on workload

20 Software and Services Group 20/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Performance analysis For many workloads STM outperforms Single Global Lock −On 8 or more threads −Results still lower than Fine Grain Locks −More optimizations and improvements are on the way RAYTRACE is and example where ceiling is hit −When all 8 threads are running the HW is 100% busy and thus STM code increases the machine load beyond the limit and performance drops −Picture is quite different for 8T on 16-way HW, but for 16T is the same −Optimization to run short transactions in Global Lock mode helps much to this benchmark Optimizations are not good for all benchmarks: RADIOSYTY performs best in pure STM runs Small transactions optimization is ON

21 Software and Services Group 21/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Conclusion Pros −Simple programming model −Composability −Failure atomicity −Decent scalability at 8 threads and beyond for many workloads Cons −Overhead −Profitability highly depend on workload −Retries eat power −Function annotation are not that convenient Intel C/C++ STM compiler prototype edition is publicly available at −Includes Intel STM library, user documentation and examples −Active discussion forum for questions and comments We target the future

22 Software and Services Group 22/20 Copyright © 2008, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others Q&A