Presentation is loading. Please wait.

Presentation is loading. Please wait.

©2009 HP Confidential1 A Proposal to Incorporate Software Transactional Memory (STM) Support in the Open64 Compiler Dhruva R. Chakrabarti HP Labs, USA.

Similar presentations


Presentation on theme: "©2009 HP Confidential1 A Proposal to Incorporate Software Transactional Memory (STM) Support in the Open64 Compiler Dhruva R. Chakrabarti HP Labs, USA."— Presentation transcript:

1 ©2009 HP Confidential1 A Proposal to Incorporate Software Transactional Memory (STM) Support in the Open64 Compiler Dhruva R. Chakrabarti HP Labs, USA

2 2 How is transactional memory better than locks  Traditional threads programming is hard  Requires reasoning with locks  Prone to synchronization errors  Tools exist but complexities still remain  Transactional memory programming raises the abstraction level  Locks are not exposed  Program with atomic sections  Atomicity, Consistency, and Isolation (ACI) properties guaranteed  Underlying system implements transactions  Deadlock freedom at the programmer level

3 3 Quantitative comparison between locks and TM Source: Rossbach et al., Is Transactional Programming actually easier?, PPoPP 2010 –User study of undergrads in an OS class –Same programs written with coarse grain locks, fine grain locks, monitors, and TM –Compared development effort, ease, and programming errors –Conclusion was that TM was harder to use than coarse grain locks but easier than fine grain locks The study used API-based STM libraries for Java (DSTM2 and JDASTM) Use of compiler support for atomic section based programming should be even easier –Synchronization errors were much less for transactions On a similar programming problem, 70% errors with fine grain locks but 10% errors with transactions

4 4 What about performance? –Single-thread overhead is large because of logging costs –Multi-threaded performance is typically much better than coarse grain locks and approaches that of fine grain locks Shown on micobenchmarks using hashtable, map, tree operations [source: PLDI 2006 papers on transactions] Reasonable scalability using large transactions for STAMP benchmarks has been shown [source: http://stamp.stanford.edu ] Good results on minimum spanning forest of sparse graphs [source: Kang et al., An Efficient Transactional Memory Algorithm for Computing Minimum Spanning Forest of Sparse Graphs, PPoPP 2009] Large programs such as Quake and RMS applications have been transactified [http://www.bscmsrc.eu/research/software] –Numerous research papers have shown how to reduce overheads Some are purely library-based approaches Some optimize the calls made to STM, potentially reducing transaction regions Some feed the STM with information that leads to an optimized STM

5 5 Tool support –Few exist Herlihy et al., tm_db (An open source generic debugging library for transactional programs) Debugging/profiling support (Zyulkyarov et al., Debugging programs that use atomic blocks and transactional memory, PPoPP 2010) Fine grain conflict graph to aid performance analysis (Chakrabarti et al., New abstractions for effective performance analysis of STM programs, PPoPP 2010) –Going forward, debuggers and performance analysis tools will be important to adoption of STMs

6 6 State of STMs today –Has shown a lot of promise Atomic section included in emerging lanaguages, e.g. Fortress, X10, Chapel –Improved programmability over locks –Does require programmer annotations –Performance benefits have been shown but pathological situations exist –Debuggers and performance tools starting to show up –A small set of benchmarks exists, some large programs have been transactified More benchmarks and applications are required –All multi-threaded programming paradigms are not expressed easily in terms of atomic sections An example is cond-wait (or retry) Atomic section will have to co-exist with locks

7 7 Outline  What is an atomic section  What is an STM library  Basic STM API and a flavor of the draft spec  Basic STM library/compiler interface and a flavor of the Intel ABI  Platforms available today and their state  Proposal to incorporate STM in Open64 framework

8 8 Shared counter update lock(L) ++ counter unlock(L) Lock-based atomic { ++ counter } Atomic section-based For atomic section-based code, No need to associate shared data with locks Still need to identify atomic sections No deadlocks at the programmer level since there are no locks Livelocks could be present but usually resolved by contention manager Data races could still be present

9 9 STM library/compiler interface atomic { x = y w = z } Compiler TxStart() TxRead(y) TxWrite(x) TxRead(z) TxWrite(w) TxCommit STM

10 10 Different implementation strategies –Non-blocking vs blocking –Strong vs weak isolation It has been shown that strong isolation is very hard to provide Most STMs today only support weak isolation –Direct vs deferred update –Flattened vs closed vs open nesting –Transaction granularity: object vs word –Pessimistic vs optimistic concurrency control

11 11 Main STM data structures (blocking implementation) –Shared lock table A hash function maps a given address to an entry in the (tagless) hash table Designed to get to the lock without locking the hash table Shared addresses Lock Table

12 12 Transactional read –Reads are typically optimistic ---- no locks are acquired –Read from shared memory location into local buffer –Validate readset to ensure its consistency –If validation fails, the transaction is rolled back –The address and its current version are entered into a readset

13 13 Transactional write –Buffered write Make the change onto a local buffer Add the location and the new value to a write set (redo log) All subsequent reads of this location are serviced from the write set The original location is unchanged –Direct update Acquire the lock corresponding to the shared location. If already locked, abort Log the old value into a write set (undo log) Directly change the shared location

14 14 Transactional commit –Acquire locks for all entries of the writeset, if not already done –If a lock is held by another transaction, abort –Validate the read set, aborting if required –Copy all buffered data to shared locations, if required –Release all locks and update the versions of modified locations

15 15 A more elaborate API –Refer to Draft Specification of Transactional Language Constructs for C++ 1 for more details –Irrevocability of certain statements introduces complications Statements can be either safe or unsafe A conventional atomic section can contain only safe statements Necessitates use of attributes in certain cases Annotation of functions called within a transaction A relaxed transaction can contain unsafe statements –Supporting explicit abort/cancel of a transaction Only a conventional atomic section can contain an abort/cancel statement –Allowing nesting of transactions –Exceptions, exception specifications 1 http://software.intel.com/file/21569

16 16 Compiler/STM interface –Intel has released Intel® Transactional Memory Compiler and Runtime Application Binary Interface 2 –An interface that a compiler writer or an expert user calling the STM has to conform to –Enables use of different STMs without changing the application –A fixed naming convention of library routines is imposed –Standard interfaces for starting a transaction, getting a handle to a transaction, aborting/committing a transaction, reading/writing memory locations, etc. 2 http://software.intel.com/file/8097

17 17 Platforms available today –Some examples Intel STM compiler and library IBM xl C/C++ for transactional memory for AIX SkySTM: Sun Studio based compiler and STM Microsoft.NET implementation gcc transactional memory support (currently on a branch) TL2 TinySTM …

18 18 Proposal to incorporate STM support in Open64 –Writing an STM library First step is to support the stock atomic section −Provide a blocking implementation −Support closed nesting with partial roll-back Initial work is to provide support for the minimal set of entry points needed for simple programs to pass −Will follow the ABI document released by Intel Could leverage work done in gcc space –In the compiler space, mostly front-end work for functional completeness Main task is to lower the atomic section into calls to STM Will need to support annotations to support static checking of proper use of TM constructs Will follow the draft spec of the API Could leverage work done in gcc space –Set up a small number of benchmarks and applications for testing

19 19 Possible optimizations –Inter-procedural optimizations can reduce overheads substantially Not required for initial implementation but something that would possibly give an Open64-based implementation an edge over other frameworks Redundant calls to STM could be removed Read after read, write after read (of the same location) could be optimized Recognition of local memory accesses could remove barriers altogether IPA could feed STM with critical information to help reduce STM overheads –The STM library admits numerous optimizations Use of pessimistic concurrency in addition to primarily optimistic Use of application-specific policies Reduction of false conflicts Reduction of validation costs

20 20 Summary –An STM implementation based on Open64 will be useful to the community –Should conform to the draft API and the ABI –Will provide a great research platform for further advances in STMs –Will help development of more transactional applications


Download ppt "©2009 HP Confidential1 A Proposal to Incorporate Software Transactional Memory (STM) Support in the Open64 Compiler Dhruva R. Chakrabarti HP Labs, USA."

Similar presentations


Ads by Google