Elixir : A System for Synthesizing Concurrent Graph Programs

Slides:



Advertisements
Similar presentations
Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } P1() Challenge: Correct and Efficient Synchronization { ……………………………
Advertisements

Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } T1() Challenge: Correct and Efficient Synchronization { ……………………………
Satisfiability Modulo Theories (An introduction)
Undoing the Task: Moving Timing Analysis back to Functional Models Marco Di Natale, Haibo Zeng Scuola Superiore S. Anna – Pisa, Italy McGill University.
Implementing Parallel Graph Algorithms Spring 2015 Implementing Parallel Graph Algorithms Lecture 1: Introduction Roman Manevich Ben-Gurion University.
CSE 101- Winter ‘15 Discussion Section January 26th 2015.
Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.
Structure-driven Optimizations for Amorphous Data-parallel Programs 1 Mario Méndez-Lojo 1 Donald Nguyen 1 Dimitrios Prountzos 1 Xin Sui 1 M. Amber Hassaan.
The of Parallelism in Algorithms Keshav Pingali The University of Texas at Austin Joint work with D.Nguyen, M.Kulkarni, M.Burtscher, A.Hassaan, R.Kaleem,
Chapter 23 Minimum Spanning Trees
Tirgul 12 Algorithm for Single-Source-Shortest-Paths (s-s-s-p) Problem Application of s-s-s-p for Solving a System of Difference Constraints.
CPSC 411, Fall 2008: Set 9 1 CPSC 411 Design and Analysis of Algorithms Set 9: More Graph Algorithms Prof. Jennifer Welch Fall 2008.
CS 410 Applied Algorithms Applied Algorithms Lecture #3 Data Structures.
Shortest Path Problems
Shortest Path Algorithms
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
Betweenness Centrality: Algorithms and Implementations Dimitrios Prountzos Keshav Pingali The University of Texas at Austin.
Dijkstra’s Algorithm Slide Courtesy: Uwash, UT 1.
CMPE 150- Introduction to Computer Networks 1 CMPE 150 Fall 2005 Lecture 21 Introduction to Computer Networks.
Data Structures, Spring 2006 © L. Joskowicz 1 Data Structures – LECTURE 15 Shortest paths algorithms Properties of shortest paths Bellman-Ford algorithm.
A Lightweight Infrastructure for Graph Analytics Donald Nguyen Andrew Lenharth and Keshav Pingali The University of Texas at Austin.
Graph Algorithms. Overview Graphs are very general data structures – data structures such as dense and sparse matrices, sets, multi-sets, etc. can be.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Graph Algorithms. Overview Graphs are very general data structures – data structures such as dense and sparse matrices, sets, multi- sets, etc. can be.
Applying Data Copy To Improve Memory Performance of General Array Computations Qing Yi University of Texas at San Antonio.
Keshav Pingali The University of Texas at Austin Parallel Program = Operator + Schedule + Parallel data structure SAMOS XV Keynote.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
1 Shortest Path Algorithms Andreas Klappenecker [based on slides by Prof. Welch]
CSCI-455/552 Introduction to High Performance Computing Lecture 18.
Dijkstras Algorithm Named after its discoverer, Dutch computer scientist Edsger Dijkstra, is an algorithm that solves the single-source shortest path problem.
Distributed Asynchronous Bellman-Ford Algorithm
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.
A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.
Program Analysis and Synthesis of Parallel Systems Roman ManevichBen-Gurion University.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
Transformation of Timed Automata into Mixed Integer Linear Programs Sebastian Panek.
A GPU Implementation of Inclusion-based Points-to Analysis Mario Méndez-Lojo (AMD) Martin Burtscher (Texas State University, USA) Keshav Pingali (U.T.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Graphs Data Structures and Algorithms A. G. Malamos Reference Algorithms, 2006, S. Dasgupta, C. H. Papadimitriou, and U. V. Vazirani Introduction to Algorithms,Third.
1 Keshav Pingali University of Texas, Austin Introduction to parallelism in irregular algorithms.
All-Pairs Shortest Paths & Essential Subgraph 01/25/2005 Jinil Han.
Parallel graph algorithms Antonio-Gabriel Sturzu, SCPD Adela Diana Almasi, SCPD Adela Diana Almasi, SCPD Iulia Alexandra Floroiu, ISI Iulia Alexandra Floroiu,
Shortest Path Algorithms. Definitions Variants  Single-source shortest-paths problem: Given a graph, finding a shortest path from a given source.
Implementing Parallel Graph Algorithms Spring 2015 Implementing Parallel Graph Algorithms Lecture 2: Introduction Roman Manevich Ben-Gurion University.
Group 8: Denial Hess, Yun Zhang Project presentation.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 17: Research Roman Manevich Ben-Gurion University.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Theory-Aided Model Checking of Concurrent Transition Systems Guy Katz, Clark Barrett, David Harel New York University Weizmann Institute of Science.
Roman Manevich Rashid Kaleem Keshav Pingali University of Texas at Austin Synthesizing Concurrent Graph Data Structures: a Case Study.
Suppose G = (V, E) is a directed network. Each edge (i,j) in E has an associated ‘length’ c ij (cost, time, distance, …). Determine a path of shortest.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
CSCE 411 Design and Analysis of Algorithms Set 9: More Graph Algorithms Prof. Jennifer Welch Spring 2012 CSCE 411, Spring 2012: Set 9 1.
Single Source Shortest Paths Chapter 24 CSc 4520/6520 Fall 2013 Slides adapted from George Bebis, University of Reno, Nevada.
Zaiben Chen et al. Presented by Lian Liu. You’re traveling from s to t. Which gas station would you choose?
TIRGUL 10 Dijkstra’s algorithm Bellman-Ford Algorithm 1.
CS 395T: Program Synthesis for Heterogeneous Parallel Computers
Shortest Paths.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Exploratory Decomposition Dr. Xiao Qin Auburn.
Amir Kamil and Katherine Yelick
Supporting Fault-Tolerance in Streaming Grid Applications
Foundations of Programming Languages – Course Overview
Shortest Paths.
All-Pairs Shortest Paths
Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs
Peng Jiang, Linchuan Chen, and Gagan Agrawal
Shortest path algorithm
Amir Kamil and Katherine Yelick
Shortest Paths.
Md. Tanveer Anwar University of Arkansas
TensorFlow: A System for Large-Scale Machine Learning
Presentation transcript:

Elixir : A System for Synthesizing Concurrent Graph Programs Dimitrios Prountzos1 Roman Manevich2 Keshav Pingali1 1. The University of Texas at Austin 2. Ben-Gurion University of the Negev

Goal Allow programmer to easily implement correct and efficient parallel graph algorithms Graph algorithms are ubiquitous Social network analysis, Computer graphics, Machine learning, … Difficult to parallelize due to their irregular nature Best algorithm and implementation usually Platform dependent Input dependent Need to easily experiment with different solutions Focus: Fixed graph structure Only change labels on nodes and edges Each activity touches a fixed number of nodes

Example: Single-Source Shortest-Path Problem Formulation Compute shortest distance from source node S to every other node Many algorithms Bellman-Ford (1957) Dijkstra (1959) Chaotic relaxation (Miranker 1969) Delta-stepping (Meyer et al. 1998) Common structure Each node has label dist with known shortest distance from S Key operation relax-edge(u,v) 2 5 A A C B 2 1 7 C 4 3 3 12 D E 2 2 F 9 1 G if dist(A) + WAC < dist(C) dist(C) = dist(A) + WAC

Dijkstra’s Algorithm Scheduling of relaxations: Use priority queue of nodes, ordered by label dist Iterate over nodes u in priority order On each step: relax all neighbors v of u Apply relax-edge to all (u,v) 2 5 A B 3 5 1 7 C 4 3 D E 7 2 2 6 F 9 1 G <B,5> <C,3> <B,5> <B,5> <E,6> <D,7>

Chaotic Relaxation Scheduling of relaxations: 2 5 Scheduling of relaxations: Use unordered set of edges Iterate over edges (u,v) in any order On each step: Apply relax-edge to edge (u,v) A B 5 1 7 C 4 3 12 D E 2 2 Don’t show animation F 9 1 G (C,D) (B,C) (S,A) (C,E)

Insights Behind Elixir Parallel Graph Algorithm What should be done How it should be done Operators Schedule Unordered/Ordered algorithms Order activity processing Identify new activities : activity Static Schedule Dynamic Schedule Operator Delta “TAO of parallelism” PLDI 2011

Insights Behind Elixir Parallel Graph Algorithm q = new PrQueue q.enqueue(SRC) while (! q.empty ) { a = q.dequeue for each e = (a,b,w) { if dist(a) + w < dist(b) { dist(b) = dist(a) + w q.enqueue(b) } Operators Schedule Order activity processing Identify new activities Static Schedule Dynamic Schedule Dijkstra-style Algorithm

Contributions Language Operator Delta Inference Operators/Schedule separation Allows exploration of implementation space Operator Delta Inference Precise Delta required for efficient fixpoint computations Automatic Parallelization Inserts synchronization to atomically execute operators Avoids data-races / deadlocks Specializes parallelization based on scheduling constraints Parallel Graph Algorithm Operators Schedule Order activity processing Identify new activities Fix shadow Static Schedule Dynamic Schedule Synchronization

SSSP in Elixir Graph type Operator Fixpoint Statement Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int) ] Graph type relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ] Operator Fixpoint Statement sssp = iterate relax ≫ schedule

Operators Redex pattern Guard Update Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int) ] relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ] Redex pattern Guard Update sssp = iterate relax ≫ schedule ad bd ad ad+w a w b a w b if bd > ad + w

Fixpoint Statement Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int) ] relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ] sssp = iterate relax ≫ schedule Scheduling expression Apply operator until fixpoint

Scheduling Examples q = new PrQueue q.enqueue(SRC) while (! q.empty ) { a = q.dequeue for each e = (a,b,w) { if dist(a) + w < dist(b) { dist(b) = dist(a) + w q.enqueue(b) } Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int) ] relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ] sssp = iterate relax ≫ schedule Locality enhanced Label-correcting group b ≫unroll 2 ≫approx metric ad Dijkstra-style metric ad ≫group b

Operator Delta Inference Parallel Graph Algorithm Operators Schedule Order activity processing Identify new activities Static Dynamic Schedule

Identifying the Delta of an Operator ? b relax1 ? a

Delta Inference Example w2 relax2 a b w1 relax1 Drop animation assume (da + w1 < db) assume ¬(dc + w2 < db) db_post = da + w1 assert ¬(dc + w2 < db_post) Query Program SMT Solver (c,b) does not become active

Delta Inference Example – Active Apply relax on all outgoing edges (b,c) such that: dc > db +w2 and c ≄ a relax1 relax2 a b c w1 w2 assume (da + w1 < db) assume ¬(db + w2 < dc) db_post = da + w1 assert ¬(db_post + w2 < dc) Query Program SMT Solver

Galois/OpenMP Parallel Runtime System Architecture Algorithm Spec Elixir C++ Program Synthesize code Insert synchronization Galois/OpenMP Parallel Runtime Parallel Thread-Pool Graph Implementations Worklist Implementations

Experiments ... Explored Dimensions Grouping Statically group multiple instances of operator Unrolling Statically unroll operator applications by factor K Dynamic Scheduler Choose different policy/implementation for the dynamic worklist ... Delete Compare against hand-written parallel implementations

Implementation Variant SSSP Results Group + Unroll improve locality Implementation Variant 24 core Intel Xeon @ 2 GHz USA Florida Road Network (1 M nodes, 2.7 M Edges)

Breadth-First Search Results Scale-Free Graph 1 M nodes, 8 M edges USA road network 24 M nodes, 58 M edges

Conclusion Graph algorithm = Operators + Schedule Elixir language : imperative operators + declarative schedule Allows exploring implementation space Automated reasoning for efficiently computing fixpoints Correct-by-construction parallelization Performance competitive with hand-parallelized code

Thank You!

Backup Slides

Related Work DSL-Synthesis Synthesis from logical specifications SPIRAL [Puchel et al. IEEE’05], Pochoir [Tang et al. SPAA’11], Green-Marl [Hong et al. ASPLOS’12] Synthesis from logical specifications [Itzhaky et al. OOPSLA’10] [Srivastava et al. POPL’10] Sketching[Lezama et al. PLDI 08], Paraglide [Vechev et al. PLDI’08] Term and Graph Rewriting Progress[Schurr’99], GrGen [Gei’06], GP [Plump’09] Finite Differencing [Paige’82]

Read paper for… Full scheduling language Parallelizing ordered iterations Automatic reasoning to enable level-parallel execution Specialization of dynamic scheduler Synchronization details Synthesis procedures

Influence Patterns a d c b b d a c b=c a=d a=c b=d b=d a=c b=c a=d Slide with all patterns b=d a=c b=c a=d