Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer, Pascal Felber (TU Dresden, Germany / Uni Neuchatel, Switzerland)

2 No one-size-fits-all TM!  STMs:  Design:  Invisible vs. visible reads  Object-based vs. word-based  Parameters:  Lock-based: #locks, address  lock mapping  HTMs:  Different interfaces (e.g., Rock vs. AMD’s ASF)  Resource bounds  Heterogeneous workloads: Global tuning does not help Divide and conquer !?

3 How to divide  User-driven? hmm, rather not …  Temporally  Runtime tuning can handle phases  … But only if whole workload has same phases  Memory  “Word-based”: Mapping function is difficult  Runtime overheads  Mapping needs to be stable  Memory allocator affects mapping heavily (see false conflicts)  “Object-based”: still need mapping or per-object data  Code  Problem: same function might operate on different data

4 How to conquer?  Tune concurrency control mechanisms  Use different STM implementations  Use HTM only where applicable/necessary  Tune TM parameters per partition  Challenge: Threads must agree on which mechanisms to use for each item/location!  Two-phase commit or similar is necessary when using several independent TM mechanisms  Improve mapping/partitioning at other levels  E.g., location  lock mapping

5 Data Partitioning  Partition memory automatically  We use Pool Allocation (Lattner et al, PLDI 05)  Mixed compile-time/runtime technique:  Based on pointer analysis for C/C++  Nodes in points-to graph become partitions  Partitions are instantiated dynamically at runtime and supplied to called functions that use these partitions  Memory allocator is not affected  Implementation extends Tanger (STM compiler)  STM load/store functions get pointer to partition

6 Example: Points-to graph for STAMP’s Vacation Type, if known struct has 4 fields, 2 are pointers A Red-Black Tree instance Partial, simplified DS graph for main() A second Red- Black Tree instance

7 Conquering …  Partition types determine STM implementation used per partition (TinySTM):  Multiple Locks (general purpose)  Single Shared Lock (infrequently updated partitions)  Single Exclusive Lock (low concurrency partitions)  Read-Only (no concurrency control necessary)  Thread-local, transaction-local  Loads/stores dispatched to type-specific STM functions on each call  Partition types and parameters can be tuned  E.g., read-only partitions get tuned on first write

8 Performance Exclusive Lock is faster than general purpose STM Partitioning decreases false conflicts in lock array. Lock hash function gets a 2 nd level at compile time. Partitioning adds runtime overhead TinySTM w/o partitioning support, 2 20 / 2 24 locks TinySTM with partitioning, 4 different tuning heuristics

9 Performance (2) Read-Only partitions during first phase of benchmark 5 x 256K locks 2 26 locks ! (2 24 livelocks due to false conflicts)

10 Challenges  Analysis: Calls to libraries?  Points-to graphs can probably be attached to libs (local per- function analysis + callgraph)  Analysis is bottom-up on call-graph  TM implementations that don’t support two-phase commit  Dispatch: Runtime overheads  JIT?  Size of binaries  Tuning partitions and partitioning  No direct feedback, partitioning results in even more parameters to be tuned  Partition selection / merging at compile-time/runtime

11 Questions? Tanger + TinySTM + …: http://tinystm.org (send email for version with partitioning support) http://tinystm.org

12 Backup Slides

13 Are there partitions?

14 Partition Type Performance & Tuning Strategies  Tuning strategy:  Start with read-only type  On reaching a certain number of aborts, switch to: 1.Single Exclusive Lock 2.Single Shared Lock 3.Multiple Locks  Part-1: switch directly to Multiple Locks, Part-4: try other types first (single locks, fewer multiple locks)

15 Analysis  We use Data Structure Analysis (DSA [1]):  Pointer analysis for LLVM compiler framework  Creates a points-to graph with Data Structure (DS) nodes  Context-sensitive:  Data structures distinguished based on call graphs  Field-sensitive:  distinguish between DS fields  Unification-based:  Pointers target a single node in the points-to graph  Information about pointers from different places get merged  If incompatible information, node is collapsed (= “nothing known”)  Can safely analyze incomplete programs:  Calls to external / not analyzed functions have an effect only on the data that escapes into / from these functions (get marked “External”)  Analyzing more code increases analysis precision [1] Chris Lattner, PhD thesis, 2005

16 Analysis (2) Integration into Tanger compilation process: 1.Compile and link program parts into LLVM intermediate representation module 2.Analyze module using DSA  Local intra-function analysis: per-function DS graph  Merge DS graphs bottom-up in callgraph (put callees’ information into callers)  Merge DS graphs top-down in callgraph (vice versa) 3.Transactify module  Use DSA information to decide between object-based / word- based  Requirement: If memory chunk (DS node) is object-based, then it must be safe for object-based everywhere in the program  DSA can give us this guarantee 4.Link in STM library and generate native code

Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer, Pascal Felber (TU Dresden, Germany / Uni Neuchatel, Switzerland)

Similar presentations

Presentation on theme: "Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer, Pascal Felber (TU Dresden, Germany / Uni Neuchatel, Switzerland)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer, Pascal Felber (TU Dresden, Germany / Uni Neuchatel, Switzerland)

Similar presentations

Presentation on theme: "Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer, Pascal Felber (TU Dresden, Germany / Uni Neuchatel, Switzerland)"— Presentation transcript:

Similar presentations

About project

Feedback