Download presentation
Presentation is loading. Please wait.
Published bySamantha Baldwin Modified over 8 years ago
1
Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer, Pascal Felber (TU Dresden, Germany / Uni Neuchatel, Switzerland)
2
2 No one-size-fits-all TM! STMs: Design: Invisible vs. visible reads Object-based vs. word-based Parameters: Lock-based: #locks, address lock mapping HTMs: Different interfaces (e.g., Rock vs. AMD’s ASF) Resource bounds Heterogeneous workloads: Global tuning does not help Divide and conquer !?
3
3 How to divide User-driven? hmm, rather not … Temporally Runtime tuning can handle phases … But only if whole workload has same phases Memory “Word-based”: Mapping function is difficult Runtime overheads Mapping needs to be stable Memory allocator affects mapping heavily (see false conflicts) “Object-based”: still need mapping or per-object data Code Problem: same function might operate on different data
4
4 How to conquer? Tune concurrency control mechanisms Use different STM implementations Use HTM only where applicable/necessary Tune TM parameters per partition Challenge: Threads must agree on which mechanisms to use for each item/location! Two-phase commit or similar is necessary when using several independent TM mechanisms Improve mapping/partitioning at other levels E.g., location lock mapping
5
5 Data Partitioning Partition memory automatically We use Pool Allocation (Lattner et al, PLDI 05) Mixed compile-time/runtime technique: Based on pointer analysis for C/C++ Nodes in points-to graph become partitions Partitions are instantiated dynamically at runtime and supplied to called functions that use these partitions Memory allocator is not affected Implementation extends Tanger (STM compiler) STM load/store functions get pointer to partition
6
6 Example: Points-to graph for STAMP’s Vacation Type, if known struct has 4 fields, 2 are pointers A Red-Black Tree instance Partial, simplified DS graph for main() A second Red- Black Tree instance
7
7 Conquering … Partition types determine STM implementation used per partition (TinySTM): Multiple Locks (general purpose) Single Shared Lock (infrequently updated partitions) Single Exclusive Lock (low concurrency partitions) Read-Only (no concurrency control necessary) Thread-local, transaction-local Loads/stores dispatched to type-specific STM functions on each call Partition types and parameters can be tuned E.g., read-only partitions get tuned on first write
8
8 Performance Exclusive Lock is faster than general purpose STM Partitioning decreases false conflicts in lock array. Lock hash function gets a 2 nd level at compile time. Partitioning adds runtime overhead TinySTM w/o partitioning support, 2 20 / 2 24 locks TinySTM with partitioning, 4 different tuning heuristics
9
9 Performance (2) Read-Only partitions during first phase of benchmark 5 x 256K locks 2 26 locks ! (2 24 livelocks due to false conflicts)
10
10 Challenges Analysis: Calls to libraries? Points-to graphs can probably be attached to libs (local per- function analysis + callgraph) Analysis is bottom-up on call-graph TM implementations that don’t support two-phase commit Dispatch: Runtime overheads JIT? Size of binaries Tuning partitions and partitioning No direct feedback, partitioning results in even more parameters to be tuned Partition selection / merging at compile-time/runtime
11
11 Questions? Tanger + TinySTM + …: http://tinystm.org (send email for version with partitioning support) http://tinystm.org
12
12 Backup Slides
13
13 Are there partitions?
14
14 Partition Type Performance & Tuning Strategies Tuning strategy: Start with read-only type On reaching a certain number of aborts, switch to: 1.Single Exclusive Lock 2.Single Shared Lock 3.Multiple Locks Part-1: switch directly to Multiple Locks, Part-4: try other types first (single locks, fewer multiple locks)
15
15 Analysis We use Data Structure Analysis (DSA [1]): Pointer analysis for LLVM compiler framework Creates a points-to graph with Data Structure (DS) nodes Context-sensitive: Data structures distinguished based on call graphs Field-sensitive: distinguish between DS fields Unification-based: Pointers target a single node in the points-to graph Information about pointers from different places get merged If incompatible information, node is collapsed (= “nothing known”) Can safely analyze incomplete programs: Calls to external / not analyzed functions have an effect only on the data that escapes into / from these functions (get marked “External”) Analyzing more code increases analysis precision [1] Chris Lattner, PhD thesis, 2005
16
16 Analysis (2) Integration into Tanger compilation process: 1.Compile and link program parts into LLVM intermediate representation module 2.Analyze module using DSA Local intra-function analysis: per-function DS graph Merge DS graphs bottom-up in callgraph (put callees’ information into callers) Merge DS graphs top-down in callgraph (vice versa) 3.Transactify module Use DSA information to decide between object-based / word- based Requirement: If memory chunk (DS node) is object-based, then it must be safe for object-based everywhere in the program DSA can give us this guarantee 4.Link in STM library and generate native code
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.