Presentation is loading. Please wait.

Presentation is loading. Please wait.

Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Similar presentations


Presentation on theme: "Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with."— Presentation transcript:

1 Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with Object Assemblies TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A AA A A A A A

2 Data parallelism: - Highly coarse-grained (MapReduce) - Highly fine-grained (numeric computations on dense arrays) -Problem-specific methods Taming parallelism Task-parallelism Message-passing

3 Taming parallelism Our target: Data-parallel computations over large, unstructured, shared-memory graphs Unknown granularity High-level correctness as well as efficiency. Our target: Data-parallel computations over large, unstructured, shared-memory graphs Unknown granularity High-level correctness as well as efficiency.

4 Delaunay mesh refinement Triangulate a given set of points. Delaunay property: No point is contained within the circumcircle of a triangle. Quality property: No bad triangles—i.e., triangles with an angle > 120 o. Mesh refinement: Fix bad triangles through an iterative algorithm.

5 Retriangulation Cavity: all triangles whose circumcircle contains new point. Quality constraint may not hold for all new triangles.

6 Sequential mesh refinement Mesh m = /* read input mesh */ Worklist wl = new Worklist(m.getBad()); foreach triangle t in wl { Cavity c = new Cavity(t); c.expand(); c.retriangulate(); m.updateMesh(c); wl.add(c.getBad()); } Cavities are contiguous “regions” in the mesh. Worst-case cavities can encompass the whole mesh.

7 Parallelization Computation over complex, unstructured graphs Mesh = Heap-allocated graph. Nodes = triangles. Edges = adjacency Atomicity: Cavities must be retriangulated atomically. Non-overlapping cavities can be processed in parallel. Seems impossible to handle with static analysis: – Shape of data structure changes greatly over time. – Shape of data structure is highly input-dependent. – Without deep algorithmic knowledge, impossible to say if statically if cavities will overlap. Lots of recent work, notably by Pingali et al.

8 List of similar applications Delaunay mesh refinement, Delaunay triangulation Agglomerative clustering, ray tracing Social network maintenance Minimum spanning tree, Maximum flow N-body simulation, epidemiological simulation Sparse matrix-vector multiplication, sparse Cholesky factorization Belief propagation, survey propagation in Bayesian inference Iterative dataflow analysis, Petri net simulation Finite-difference PDE solution

9 Locality of updates in Chorus Cavity On a mesh of ~100,000 triangles from Lonestar benchmarks: Average cavity size = 3.75 triangles. Maximum cavity size = 12 triangles Average-case locality the essence of parallelism. Chorus: parallel computation driven by “neighborhoods” in heaps.

10 Heaps, regions, assemblies Heap = directed graph Nodes = objects Labeled edges = pointers Region = induced subgraph Assembly = region + thread of control Typically speculative and shortlived.

11 Assembly class = set of local variables + set of guarded updates + constructor + public variables. Program = set of classes Synchronization happens in guard evaluation. Programs, assembly classes busy executing update terminated ready to be preempted or execute next update :: Guard: Update

12 g is a condition on the local variables and owned objects of Guards can merge assemblies :: merge (u.f): S :: merge (u.f) when g: S u f gets a bigger region, keeps local state dies. must be in ready state while merge happens

13 Split into assemblies of class T. Other assemblies not affected. Not a synchronization construct. Updates can split an assembly split(T)

14 Attempts to access objects outside region lead to exceptions. Local updates x = u.f; x.f = y; u f

15 Delaunay mesh refinement Use two assembly classes: Triangle and Cavity. – Cavity = local region in mesh. Each triangle: – Determines if it is bad (local check). – If so, merges with neighbors to become cavity. Each cavity: – Determines if it is complete (local check). – If no, merges with a neighbor. – If yes, retriangulates (locally) and splits.

16 Delaunay mesh refinement: sketch assembly Triangle::... action:: merge (v.f, Cavity) when isBad: skip assembly Cavity::... action:: merge (v.f) when (not isComplete):... isComplete: retriangulate(); split(Triangle)

17 Delaunay mesh refinement: sketch assem Triangle::... action:: merge (v.f, Cavity, u) when bad?: skip assem Cavity::... action:: merge (v.f) when (not complete?): skip complete?: retriangulate(); split(Triangle) What happens on a conflict? Cavity i “absorbed” by cavity j. Cavity j now has some “unnecessary” triangles. j will later split. What happens on a conflict? Cavity i “absorbed” by cavity j. Cavity j now has some “unnecessary” triangles. j will later split.

18 Boruvka’s algorithm for minimum spanning tree Assembly = spanning tree Initially, each assembly has one node. As algorithm progresses, trees merge.

19 Race-freedom No aliasing, only ownership transfer. can merge with only when is not in the middle of an update.

20 Deadlock-freedom Classic definition: Process P waits for a resource from Q and vice versa. Deadlock in Chorus: – has a locally enabled merge with – No other progress is possible. But one of the merges can always be carried out. (An assembly can always be killed at its ready state.) u

21 JChorus Chorus + sequential Java. Assembly classes in addition to object classes. 7: assembly Cavity { 8: action { // expand cavity 9: merge(outgoingedges, TriangleObject t): 10: { outgoingedges.remove(t); 11: frontier.add(t); 12: build(); } 13: } 14: Set members; Set border; 15: Queue frontier; // current frontier 16: List outgoingedges; // outgoing edges on which to merge 17: TriangleObject initial;...

22 Division-based implementation Division = set of assemblies mapped to a core. Local access: Merge-actions within a division Split-actions Local updates Remote access: Merge-actions issued across divisions Uses assembly-level locks.

23 Implementation strategies Adaptive divisions. Heuristic for reducing the number of remote merges. During a merge, not only the target assembly, but also assemblies reachable by k pointer indirections, are migrated. Adaptation heuristic does elementary load balancing. Union-find data structure to relate objects and assemblies that they belong to Needed for splits and merges. Token-passing for deadlock prevention and termination detection.

24 Experiments: Delaunay refinement from Lonestar benchmarks Large dataset from Lonestar benchmarks. – 100,364 triangles. – 47,768 initially bad. 1 to 8 threads. Competing approaches: – Object-level locking – DSTM (Software transactions)

25 Locality: mesh snapshots The initial mesh and divisionsMesh after several thousand retriangulations

26 Delaunay: Speedup over sequential

27 Delaunay: Self-relative speedup

28 Delaunay: Conflicts

29 Related models Threads + explicit locking: Global heap abstraction, arbitrary aliasing. Software transactions: Burden of reasoning passed to transaction manager. In most implementations, heap is viewed as global. Static data partitioning: Unpredictable nature of the computation makes static analysis hard. Actors: Based on low-level messaging. If sending references, potential of races. If copying triangles, inefficient. Pingali et al’s Galois: Same problem, but ours is an alternative.

30 More information Parallel programming with object assemblies. Roberto Lublinerman, Swarat Chaudhuri, Pavol Cerny. OOPSLA


Download ppt "Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with."

Similar presentations


Ads by Google