Elixir : A System for Synthesizing Concurrent Graph Programs Dimitrios Prountzos1 Roman Manevich2 Keshav Pingali1 1. The University of Texas at Austin 2. Ben-Gurion University of the Negev
Goal Allow programmer to easily implement correct and efficient parallel graph algorithms Graph algorithms are ubiquitous Social network analysis, Computer graphics, Machine learning, … Difficult to parallelize due to their irregular nature Best algorithm and implementation usually Platform dependent Input dependent Need to easily experiment with different solutions Focus: Fixed graph structure Only change labels on nodes and edges Each activity touches a fixed number of nodes
Example: Single-Source Shortest-Path Problem Formulation Compute shortest distance from source node S to every other node Many algorithms Bellman-Ford (1957) Dijkstra (1959) Chaotic relaxation (Miranker 1969) Delta-stepping (Meyer et al. 1998) Common structure Each node has label dist with known shortest distance from S Key operation relax-edge(u,v) 2 5 A A C B 2 1 7 C 4 3 3 12 D E 2 2 F 9 1 G if dist(A) + WAC < dist(C) dist(C) = dist(A) + WAC
Dijkstra’s Algorithm Scheduling of relaxations: Use priority queue of nodes, ordered by label dist Iterate over nodes u in priority order On each step: relax all neighbors v of u Apply relax-edge to all (u,v) 2 5 A B 3 5 1 7 C 4 3 D E 7 2 2 6 F 9 1 G <B,5> <C,3> <B,5> <B,5> <E,6> <D,7>
Chaotic Relaxation Scheduling of relaxations: 2 5 Scheduling of relaxations: Use unordered set of edges Iterate over edges (u,v) in any order On each step: Apply relax-edge to edge (u,v) A B 5 1 7 C 4 3 12 D E 2 2 Don’t show animation F 9 1 G (C,D) (B,C) (S,A) (C,E)
Insights Behind Elixir Parallel Graph Algorithm What should be done How it should be done Operators Schedule Unordered/Ordered algorithms Order activity processing Identify new activities : activity Static Schedule Dynamic Schedule Operator Delta “TAO of parallelism” PLDI 2011
Insights Behind Elixir Parallel Graph Algorithm q = new PrQueue q.enqueue(SRC) while (! q.empty ) { a = q.dequeue for each e = (a,b,w) { if dist(a) + w < dist(b) { dist(b) = dist(a) + w q.enqueue(b) } Operators Schedule Order activity processing Identify new activities Static Schedule Dynamic Schedule Dijkstra-style Algorithm
Contributions Language Operator Delta Inference Operators/Schedule separation Allows exploration of implementation space Operator Delta Inference Precise Delta required for efficient fixpoint computations Automatic Parallelization Inserts synchronization to atomically execute operators Avoids data-races / deadlocks Specializes parallelization based on scheduling constraints Parallel Graph Algorithm Operators Schedule Order activity processing Identify new activities Fix shadow Static Schedule Dynamic Schedule Synchronization
SSSP in Elixir Graph type Operator Fixpoint Statement Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int) ] Graph type relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ] Operator Fixpoint Statement sssp = iterate relax ≫ schedule
Operators Redex pattern Guard Update Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int) ] relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ] Redex pattern Guard Update sssp = iterate relax ≫ schedule ad bd ad ad+w a w b a w b if bd > ad + w
Fixpoint Statement Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int) ] relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ] sssp = iterate relax ≫ schedule Scheduling expression Apply operator until fixpoint
Scheduling Examples q = new PrQueue q.enqueue(SRC) while (! q.empty ) { a = q.dequeue for each e = (a,b,w) { if dist(a) + w < dist(b) { dist(b) = dist(a) + w q.enqueue(b) } Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int) ] relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ] sssp = iterate relax ≫ schedule Locality enhanced Label-correcting group b ≫unroll 2 ≫approx metric ad Dijkstra-style metric ad ≫group b
Operator Delta Inference Parallel Graph Algorithm Operators Schedule Order activity processing Identify new activities Static Dynamic Schedule
Identifying the Delta of an Operator ? b relax1 ? a
Delta Inference Example w2 relax2 a b w1 relax1 Drop animation assume (da + w1 < db) assume ¬(dc + w2 < db) db_post = da + w1 assert ¬(dc + w2 < db_post) Query Program SMT Solver (c,b) does not become active
Delta Inference Example – Active Apply relax on all outgoing edges (b,c) such that: dc > db +w2 and c ≄ a relax1 relax2 a b c w1 w2 assume (da + w1 < db) assume ¬(db + w2 < dc) db_post = da + w1 assert ¬(db_post + w2 < dc) Query Program SMT Solver
Galois/OpenMP Parallel Runtime System Architecture Algorithm Spec Elixir C++ Program Synthesize code Insert synchronization Galois/OpenMP Parallel Runtime Parallel Thread-Pool Graph Implementations Worklist Implementations
Experiments ... Explored Dimensions Grouping Statically group multiple instances of operator Unrolling Statically unroll operator applications by factor K Dynamic Scheduler Choose different policy/implementation for the dynamic worklist ... Delete Compare against hand-written parallel implementations
Implementation Variant SSSP Results Group + Unroll improve locality Implementation Variant 24 core Intel Xeon @ 2 GHz USA Florida Road Network (1 M nodes, 2.7 M Edges)
Breadth-First Search Results Scale-Free Graph 1 M nodes, 8 M edges USA road network 24 M nodes, 58 M edges
Conclusion Graph algorithm = Operators + Schedule Elixir language : imperative operators + declarative schedule Allows exploring implementation space Automated reasoning for efficiently computing fixpoints Correct-by-construction parallelization Performance competitive with hand-parallelized code
Thank You!
Backup Slides
Related Work DSL-Synthesis Synthesis from logical specifications SPIRAL [Puchel et al. IEEE’05], Pochoir [Tang et al. SPAA’11], Green-Marl [Hong et al. ASPLOS’12] Synthesis from logical specifications [Itzhaky et al. OOPSLA’10] [Srivastava et al. POPL’10] Sketching[Lezama et al. PLDI 08], Paraglide [Vechev et al. PLDI’08] Term and Graph Rewriting Progress[Schurr’99], GrGen [Gei’06], GP [Plump’09] Finite Differencing [Paige’82]
Read paper for… Full scheduling language Parallelizing ordered iterations Automatic reasoning to enable level-parallel execution Specialization of dynamic scheduler Synchronization details Synthesis procedures
Influence Patterns a d c b b d a c b=c a=d a=c b=d b=d a=c b=c a=d Slide with all patterns b=d a=c b=c a=d