Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer, Pascal Felber (TU Dresden, Germany / Uni Neuchatel, Switzerland)

Similar presentations


Presentation on theme: "Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer, Pascal Felber (TU Dresden, Germany / Uni Neuchatel, Switzerland)"— Presentation transcript:

1 Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer, Pascal Felber (TU Dresden, Germany / Uni Neuchatel, Switzerland)

2 2 No one-size-fits-all TM!  STMs:  Design:  Invisible vs. visible reads  Object-based vs. word-based  Parameters:  Lock-based: #locks, address  lock mapping  HTMs:  Different interfaces (e.g., Rock vs. AMD’s ASF)  Resource bounds  Heterogeneous workloads: Global tuning does not help Divide and conquer !?

3 3 How to divide  User-driven? hmm, rather not …  Temporally  Runtime tuning can handle phases  … But only if whole workload has same phases  Memory  “Word-based”: Mapping function is difficult  Runtime overheads  Mapping needs to be stable  Memory allocator affects mapping heavily (see false conflicts)  “Object-based”: still need mapping or per-object data  Code  Problem: same function might operate on different data

4 4 How to conquer?  Tune concurrency control mechanisms  Use different STM implementations  Use HTM only where applicable/necessary  Tune TM parameters per partition  Challenge: Threads must agree on which mechanisms to use for each item/location!  Two-phase commit or similar is necessary when using several independent TM mechanisms  Improve mapping/partitioning at other levels  E.g., location  lock mapping

5 5 Data Partitioning  Partition memory automatically  We use Pool Allocation (Lattner et al, PLDI 05)  Mixed compile-time/runtime technique:  Based on pointer analysis for C/C++  Nodes in points-to graph become partitions  Partitions are instantiated dynamically at runtime and supplied to called functions that use these partitions  Memory allocator is not affected  Implementation extends Tanger (STM compiler)  STM load/store functions get pointer to partition

6 6 Example: Points-to graph for STAMP’s Vacation Type, if known struct has 4 fields, 2 are pointers A Red-Black Tree instance Partial, simplified DS graph for main() A second Red- Black Tree instance

7 7 Conquering …  Partition types determine STM implementation used per partition (TinySTM):  Multiple Locks (general purpose)  Single Shared Lock (infrequently updated partitions)  Single Exclusive Lock (low concurrency partitions)  Read-Only (no concurrency control necessary)  Thread-local, transaction-local  Loads/stores dispatched to type-specific STM functions on each call  Partition types and parameters can be tuned  E.g., read-only partitions get tuned on first write

8 8 Performance Exclusive Lock is faster than general purpose STM Partitioning decreases false conflicts in lock array. Lock hash function gets a 2 nd level at compile time. Partitioning adds runtime overhead TinySTM w/o partitioning support, 2 20 / 2 24 locks TinySTM with partitioning, 4 different tuning heuristics

9 9 Performance (2) Read-Only partitions during first phase of benchmark 5 x 256K locks 2 26 locks ! (2 24 livelocks due to false conflicts)

10 10 Challenges  Analysis: Calls to libraries?  Points-to graphs can probably be attached to libs (local per- function analysis + callgraph)  Analysis is bottom-up on call-graph  TM implementations that don’t support two-phase commit  Dispatch: Runtime overheads  JIT?  Size of binaries  Tuning partitions and partitioning  No direct feedback, partitioning results in even more parameters to be tuned  Partition selection / merging at compile-time/runtime

11 11 Questions? Tanger + TinySTM + …: http://tinystm.org (send email for version with partitioning support) http://tinystm.org

12 12 Backup Slides

13 13 Are there partitions?

14 14 Partition Type Performance & Tuning Strategies  Tuning strategy:  Start with read-only type  On reaching a certain number of aborts, switch to: 1.Single Exclusive Lock 2.Single Shared Lock 3.Multiple Locks  Part-1: switch directly to Multiple Locks, Part-4: try other types first (single locks, fewer multiple locks)

15 15 Analysis  We use Data Structure Analysis (DSA [1]):  Pointer analysis for LLVM compiler framework  Creates a points-to graph with Data Structure (DS) nodes  Context-sensitive:  Data structures distinguished based on call graphs  Field-sensitive:  distinguish between DS fields  Unification-based:  Pointers target a single node in the points-to graph  Information about pointers from different places get merged  If incompatible information, node is collapsed (= “nothing known”)  Can safely analyze incomplete programs:  Calls to external / not analyzed functions have an effect only on the data that escapes into / from these functions (get marked “External”)  Analyzing more code increases analysis precision [1] Chris Lattner, PhD thesis, 2005

16 16 Analysis (2) Integration into Tanger compilation process: 1.Compile and link program parts into LLVM intermediate representation module 2.Analyze module using DSA  Local intra-function analysis: per-function DS graph  Merge DS graphs bottom-up in callgraph (put callees’ information into callers)  Merge DS graphs top-down in callgraph (vice versa) 3.Transactify module  Use DSA information to decide between object-based / word- based  Requirement: If memory chunk (DS node) is object-based, then it must be safe for object-based everywhere in the program  DSA can give us this guarantee 4.Link in STM library and generate native code


Download ppt "Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer, Pascal Felber (TU Dresden, Germany / Uni Neuchatel, Switzerland)"

Similar presentations


Ads by Google