A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

Slides:

Advertisements

Similar presentations

Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.

Advertisements

Inferring Locks for Atomic Sections Cornell University (summer intern at Microsoft Research) Microsoft Research Sigmund CheremTrishul ChilimbiSumit Gulwani.

Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.

Shape Analysis by Graph Decomposition R. Manevich M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine B. Cook MSR Cambridge.

Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.

CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.

Some Properties of SSA Mooly Sagiv. Outline Why is it called Static Single Assignment form What does it buy us? How much does it cost us? Open questions.

Stanford University CS243 Winter 2006 Wei Li 1 Register Allocation.

Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets Tayfun Elmas 1, Shaz Qadeer 2, Serdar Tasiran 1 1 Koç University, İstanbul,

A Rely-Guarantee-Based Simulation for Verifying Concurrent Program Transformations Hongjin Liang, Xinyu Feng & Ming Fu Univ. of Science and Technology.

Swarat Chaudhuri Penn State Roberto Lublinerman Pavol Cerny Penn State IST Austria Parallel Programming with Object Assemblies Parallel Programming with.

Minimum Spanning Trees Definition Two properties of MST’s Prim and Kruskal’s Algorithm –Proofs of correctness Boruvka’s algorithm Verifying an MST Randomized.

ParaMeter: A profiling tool for amorphous data-parallel programs Donald Nguyen University of Texas at Austin.

Program Representations. Representing programs Goals.

A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.

Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.

Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.

Galois System Tutorial Donald Nguyen Mario Méndez-Lojo.

Structure-driven Optimizations for Amorphous Data-parallel Programs 1 Mario Méndez-Lojo 1 Donald Nguyen 1 Dimitrios Prountzos 1 Xin Sui 1 M. Amber Hassaan.

Galois System Tutorial Mario Méndez-Lojo Donald Nguyen.

The of Parallelism in Algorithms Keshav Pingali The University of Texas at Austin Joint work with D.Nguyen, M.Kulkarni, M.Burtscher, A.Hassaan, R.Kaleem,

Galois Performance Mario Mendez-Lojo Donald Nguyen.

The Galois Project Keshav Pingali University of Texas, Austin Joint work with Milind Kulkarni, Martin Burtscher, Patrick Carribault, Donald Nguyen, Dimitrios.

Program analysis Mooly Sagiv html://

Purity Analysis : Abstract Interpretation Formulation Ravichandhran Madhavan, G. Ramalingam, Kapil Vaswani Microsoft Research, India.

Program analysis Mooly Sagiv html://

Overview of program analysis Mooly Sagiv html://

Improving Code Generation Honors Compilers April 16 th 2002.

Fast Agglomerative Clustering for Rendering Bruce Walter, Kavita Bala, Cornell University Milind Kulkarni, Keshav Pingali University of Texas, Austin.

Betweenness Centrality: Algorithms and Implementations Dimitrios Prountzos Keshav Pingali The University of Texas at Austin.

Comparison Under Abstraction for Verifying Linearizability Daphna Amit Noam Rinetzky Mooly Sagiv Tom RepsEran Yahav Tel Aviv UniversityUniversity of Wisconsin.

A Lightweight Infrastructure for Graph Analytics Donald Nguyen Andrew Lenharth and Keshav Pingali The University of Texas at Austin.

Graph Algorithms. Overview Graphs are very general data structures – data structures such as dense and sparse matrices, sets, multi-sets, etc. can be.

SAGE: Self-Tuning Approximation for Graphics Engines

Design and Analysis of Computer Algorithm September 10, Design and Analysis of Computer Algorithm Lecture 5-2 Pradondet Nilagupta Department of Computer.

Graph Partitioning Donald Nguyen October 24, 2011.

MST Many of the slides are from Prof. Plaisted’s resources at University of North Carolina at Chapel Hill.

Elixir : A System for Synthesizing Concurrent Graph Programs

Program Analysis and Synthesis of Parallel Systems Roman ManevichBen-Gurion University.

UNC Chapel Hill Lin/Foskey/Manocha Minimum Spanning Trees Problem: Connect a set of nodes by a network of minimal total length Some applications: –Communication.

A GPU Implementation of Inclusion-based Points-to Analysis Mario Méndez-Lojo (AMD) Martin Burtscher (Texas State University, USA) Keshav Pingali (U.T.

Shape Analysis Overview presented by Greta Yorsh.

1 Keshav Pingali University of Texas, Austin Introduction to parallelism in irregular algorithms.

1 Keshav Pingali University of Texas, Austin Operator Formulation of Irregular Algorithms.

Fast Points-to Analysis for Languages with Structured Types Michael Jung and Sorin A. Huss Integrated Circuits and Systems Lab. Department of Computer.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.

Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.

Implementing Parallel Graph Algorithms Spring 2015 Implementing Parallel Graph Algorithms Lecture 2: Introduction Roman Manevich Ben-Gurion University.

Experimental Study of Directed Feedback Vertex Set Problem With Rudolf Fleischer and Liwei Yuan Fudan University, Shanghai Xi Wu.

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

Sep/05/2001PaCT Fusion of Concurrent Invocations of Exclusive Methods Yoshihiro Oyama (Japan Science and Technology Corporation, working in University.

Motivation  Parallel programming is difficult  Culprit: Non-determinism Interleaving of parallel threads But required to harness parallelism  Sequential.

Roman Manevich Rashid Kaleem Keshav Pingali University of Texas at Austin Synthesizing Concurrent Graph Data Structures: a Case Study.

Points-to Analysis as a System of Linear Equations Rupesh Nasre. Computer Science and Automation Indian Institute of Science Advisor: Prof. R. Govindarajan.

Safety Guarantee of Continuous Join Queries over Punctuated Data Streams Hua-Gang Li *, Songting Chen, Junichi Tatemura Divykant Agrawal, K. Selcuk Candan.

Algorithm Design and Analysis June 11, Algorithm Design and Analysis Pradondet Nilagupta Department of Computer Engineering This lecture note.

Dynamic Region Selection for Thread Level Speculation Presented by: Jeff Da Silva Stanley Fung Martin Labrecque Feb 6, 2004 Builds on research done by:

Mesh Generation, Refinement and Partitioning Algorithms Xin Sui The University of Texas at Austin.

Optimizing The Optimizer: Improving Data Flow Analysis in Soot Michael Batchelder 4 / 6 / 2005 A COMP-621 Class Project.

May 12th – Minimum Spanning Trees

Minimum Spanning Trees

Compositional Pointer and Escape Analysis for Java Programs

Janus: exploiting parallelism via hindsight

Martin Rinard Laboratory for Computer Science

Amir Kamil and Katherine Yelick

Synchronization trade-offs in GPU implementations of Graph Algorithms

Changing thread semantics

Amir Kamil and Katherine Yelick

Presentation transcript:

A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of Computer Science, The University of Texas at Austin 2: Institute for Computational Engineering and Sciences, The University of Texas at Austin

Motivation 2 Graph algorithms are ubiquitous Goal: Compiler analysis for optimization of parallel graph algorithms Computational biology Social Networks Computer Graphics

Organization Parallelization of graph algorithms in Galois system – Speculative execution – Example: Boruvka MST algorithm Optimization opportunities – Reduce speculation overheads – Analysis problem: LockSet shape analysis Lockset shape analysis – Abstract Data Type (ADT) modeling – Hierarchy summarization abstraction – Predicate discovery Evaluation – Fast and infers all available optimizations – Optimizations give speedup up to 12x 3

Boruvka’s Minimum Spanning Tree Algorithm 4 Build MST bottom-up repeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST } until graph is a single node cd ab ef g d a,c b ef g lt

Algorithm = repeated application of operator to graph – Active node: Node where computation is needed – Activity: Application of operator to active node – Neighborhood: Sub-graph read/written to perform activity – Unordered algorithms: Active nodes can be processed in any order Amorphous data-parallelism – Parallel execution of activities, subject to neighborhood constraints Neighborhoods are functions of runtime values – Parallelism cannot be uncovered at compile time in general Parallelism in Boruvka i1i1 i2i2 i3i3 5

Optimistic Parallelization in Galois Programming model – Client code has sequential semantics – Library of concurrent data structures Parallel execution model – Thread-level speculation (TLS) – Activities executed speculatively Conflict detection – Each node/edge has associated exclusive lock – Graph operations acquire locks on read/written nodes/edges – Lock owned by another thread  conflict  iteration rolled back – All locks released at the end Two main overheads – Locking – Undo actions 6 i1i1 i2i2 i3i3

Overheads (I): Locking Optimizations – Redundant locking elimination – Lock removal for iteration private data – Lock removal for lock domination ACQ(P): set of definitely acquired locks per program point P Given method call M at P: Locks(M)  ACQ(P)  Redundant Locking 7

Overheads (II): Undo actions 8 Lockset Grows Lockset Stable Failsafe … foreach (Node a : wl) { … … } foreach (Node a : wl) { Set aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } g.removeNode(lt); mst.add(minW); wl.add(a); } Program point P is failsafe if:  Q : Reaches(P,Q)  Locks(Q)  ACQ(P)

GSet wl = new GSet (); wl.addAll(g.getNodes()); GBag mst = new GBag (); foreach (Node a : wl) { Set aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } g.removeNode(lt); mst.add(minW); wl.add(a); } Lockset Analysis Redundant Locking Locks(M)  ACQ(P) Undo elimination  Q : Reaches(P,Q)  Locks(Q)  ACQ(P) Need to compute ACQ(P) 9 Runtime overhead : Runtime overhead

Analysis Challenges The usual suspects: – Unbounded Memory  Undecidability – Aliasing, Destructive updates Specific challenges: – Complex ADTs: unstructured graphs – Heap objects are locked – Adapt abstraction to ADTs We use Abstract Interpretation [CC’77] – Balance precision and realistic performance 10

Organization Parallelization of graph algorithms in Galois system – Speculative execution – Example: Boruvka MST algorithm Optimization opportunities – Reduce speculation overheads – Analysis problem: LockSet shape analysis Lockset shape analysis – Abstract Data Type (ADT) modeling – Hierarchy summarization abstraction – Predicate discovery Evaluation – Fast and infers all available optimizations – Optimizations give speedup up to 12x 11

Shape Analysis Overview 12 HashMap-Graph Tree-based Set ………… Graph edges … } Graph Spec Concrete ADT Implementations in Galois library Predicate Discovery Shape Analysis Boruvka.java Optimized Boruvka.java Set cont … } Set Spec ADT Specifications

ADT Specification Graph set set edges Set neighbors(Node n); } Graph Spec Set S1 = g.neighbors(n);... Boruvka.java Abstract ADT state by virtual set + n.rev(src) + n.rev(src).dst + n.rev(dst) + nghbrs = n.rev(src).dst + n.rev(dst).src, ret = new Set >(cont=nghbrs) ) Assumption: Implementation satisfies Spec

Graph set set + n.rev(src) + n.rev(src).dst + n.rev(dst) + nghbrs = n.rev(src).dst + n.rev(dst).src, ret = new Set >(cont=nghbrs) ) Set neighbors(Node n); } Modeling ADTs 14 c ab Graph Spec dst src dst src

Modeling ADTs 15 c ab nodes edges Abstract State cont ret nghbrs Graph Spec dst src dst src Graph set set + n.rev(src) + n.rev(src).dst + n.rev(dst) + nghbrs = n.rev(src).dst + n.rev(dst).src, ret = new Set >(cont=nghbrs) ) Set neighbors(Node n); }

Organization Parallelization of graph algorithms in Galois system – Speculative execution – Example: Boruvka MST algorithm Optimization opportunities – Reduce speculation overheads – Analysis problem: LockSet shape analysis Lockset shape analysis – Abstract Data Type (ADT) modeling – Hierarchy summarization abstraction – Predicate discovery Evaluation – Fast and infers all available optimizations – Optimizations give speedup up to 12x 16

cont S1 S2 L(S1.cont) L(S2.cont) Abstraction Scheme 17 cont S1S2 L(S1.cont) L(S2.cont) (S1 ≠ S2) ∧ L(S1.cont) ∧ L(S2.cont) Parameterized by set of LockPaths: L(Path)   o. o ∊ Path  Locked(o) – Tracks subset of must-be-locked objects Abstract domain elements have the form: Aliasing-configs  2 LockPaths  … 

  Joining Abstract States 18 Aliasing is crucial for precision May-be-locked does not enable our optimizations #Aliasing-configs : small constant (  6)

lt GSet wl = new GSet (); wl.addAll(g.getNodes()); GBag mst = new GBag (); foreach (Node a : wl) { Set aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } g.removeNode(lt); mst.add(minW); wl.add(a); } Example Invariant in Boruvka 19 The immediate neighbors of a and lt are locked a ( a ≠ lt ) ∧ L(a) ∧ L(a.rev(src)) ∧ L(a.rev(dst)) ∧ L(a.rev(src).dst) ∧ L(a.rev(dst).src) ∧ L(lt) ∧ L(lt.rev(dst)) ∧ L(lt.rev(src)) ∧ L(lt.rev(dst).src) ∧ L(lt.rev(src).dst) …..

Heuristics for Finding Paths Hierarchy Summarization (HS) – x.( fld )* – Type hierarchy graph acyclic  bounded number of paths – Preflow-Push: L( S.cont) ∧ L(S.cont.nd) Nodes in set S and their data are locked 20 Set S Node NodeData cont nd

Footprint Graph Heuristic Footprint Graphs (FG)[Calcagno et al. SAS’07] – All acyclic paths from arguments of ADT method to locked objects – x.( fld | rev(fld) )* – Delaunay Refinement: L(S.cont) ∧ L(S.cont.rev(src)) ∧ L(S.cont.rev(dst)) ∧ L(S.cont.rev(src).dst) ∧ L(S.cont.rev(dst).src) – Nodes in set S and all of their immediate neighbors are locked Composition of HS, FG – Preflow-Push: L(a.rev(src).ed) 21 FG HS

Organization Parallelization of graph algorithms in Galois system – Speculative execution – Example: Boruvka MST algorithm Optimization opportunities – Reduce speculation overheads – Analysis problem: LockSet shape analysis Shape analysis – Abstract Data Type modeling – Hierarchy summarization abstraction – Predicate discovery Evaluation – Fast and infers all available optimizations – Optimizations give speedup up to 12x 22

Experimental Evaluation Implement on top of TVLA – Encode abstraction by 3-Valued Shape Analysis [SRW TOPLAS’02] Evaluation on 4 Lonestar Java benchmarks Inferred all available optimizations # abstract states practically linear in program size 23 BenchmarkAnalysis Time (sec) Boruvka MST6 Preflow-Push Maxflow7 Survey Propagation12 Delaunay Mesh Refinement16

Impact of Optimizations for 8 Threads 24 8-core Intel 3.00 GHz

Related Work Safe programmable speculative parallelism [Prabhu et al. PLDI’10] – Focused on value speculation on ordered algorithms – Different rollback freedom condition Transactional Memory compiler optimizations [Harris et al. PLDI’06, Dragojevic et al. SPAA’09] – Similar optimizations – Don’t target rollback freedom – Imprecise for unbounded data-structures Optimizations for parallel graph programs [Mendez-Lojo et al. PPOPP’10] – Manual optimizations – Failsafe subsumes cautious Verifying conformance of ADT implementation to specification – The Jahob project (Kuncak, Rinard, Wies et al.) 25

Conclusion New application for static analysis – Optimization of optimistically parallelized graph programs Novel shape analysis – Utilize observations on the structure of concrete states and programming style Enables optimizations crucial for performance 26

Thank You! 27

Backup 28

Outline of Boruvka MST Code GSet wl = new GSet (); wl.addAll(g.getNodes()); GBag mst = new GBag (); foreach (Node a : wl) { Set aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } g.removeNode(lt); mst.add(minW); wl.add(a); } Pick arbitrary worklist node Find lightest neighbor Update neighbors of lightest Update worklist and MST

Approximating Sets of Locked Objects 30 Reachability-based Scheme cont S1S2 Reach(S1.cont) Reach(S2.cont) cont S1S2 Reach(S1.cont) Reach(S2.cont)

cont S1S2 L(S1.cont) L(S2.cont) Approximating Sets of Locked Objects 31 Hierarchy Summarization Scheme cont S1S2 L(S1.cont) L(S2.cont) (S1 ≠ S2) ∧ L(S1.cont) ∧ L(S2.cont)

Graph API uses flags to enable/disable locking and storing undo actions – removeEdge(Node src, Node dst, Flag f); Enabling Optimizations in Galois Challenge: Find minimal flag per ADT method call Solution: Lockset analysis 32 UNDOLOCKS NONE ALL Lock(src) Lock(dst) Lock( (src,dst) ) addEdge(src, dst);

Optimization Conditions 33

Speculation Overheads and Optimizations Source of OverheadOptimization Locking shared objects Redundant locking elimination Lock elision for iteration private data Lock domination Backup original state for rollback Avoid backups after failsafe points 34

Modeling ADTs 35 cd ab ns es Abstract State cont ret nghbrs Graph set ns; // set es; // + n.rev(src) + n.rev(src).dst + n.rev(dst) + nghbrs = n.rev(src).dst + n.rev(dst).src, ret = new Set >(cont=nghbrs) ) Set neighbors(Node n); } Graph Spec ab 5 srcdst ed

Failsafe Points – Eliminating Undo Actions cd ab Graph Node Graph Edge Edge Data 36 lt GSet wl = new GSet (); wl.addAll(g.getNodes()); GBag mst = new GBag (); foreach (Node a : wl) { Set aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } g.removeNode(lt); mst.add(minW); wl.add(a); } a g.neighbors(lt); CAUTIOUS OPERATOR [Mendez et al. PPOPP’10]

Boruvka’s Minimum Spanning Tree Algorithm 37 Build MST bottom up repeat { pick arbitrary active node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST } until graph is a singular node cd ab ef g

Parallelism in Boruvka’s Algorithm cd ab ef g f 38 Dependences between activities are functions of runtime values Parallelism cannot be uncovered at compile time in general Don’t Care Non-Determinism All produced MSTs correct and optimal

Failsafe Points – Eliminating Undo Actions cd ab Graph Node Graph Edge Edge Data Acquired New Lock Redundant 39 lt GSet wl = new GSet (); wl.addAll(g.getNodes()); GBag mst = new GBag (); foreach (Node a : wl) { Set aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } g.removeNode(lt); mst.add(minW); wl.add(a); } a g.neighbors(lt); CAUTIOUS OPERATOR [Mendez et al. PPOPP’10]

GSet wl = new GSet (); wl.addAll(g.getNodes()); GBag mst = new GBag (); foreach (Node a : wl) { Set aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } g.removeNode(lt); mst.add(minW); wl.add(a); } Redundant Locking Example cd ab Graph Node Graph Edge Edge Data Acquired New Lock Redundant 40 lt a

Boruvka’s Minimum Spanning Tree Algorithm cd ab ef g GSet wl = new GSet (); wl.addAll(g.getNodes()); GBag mst = new GBag (); foreach (Node a : wl) { Set aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } g.removeNode(lt); mst.add(minW); wl.add(a); } 41 Build MST iteratively Pick random active node Contract edge with lightest neighbor

GSet wl = new GSet (); wl.addAll(g.getNodes()); GBag mst = new GBag (); foreach (Node a : wl) { Set aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } g.removeNode(lt); mst.add(minW); wl.add(a); } Parallelism in Boruvka’s Algorithm cd ab ef g f 42 Dependences between activities are functions of runtime values Parallelism cannot be uncovered at compile time in general Don’t Care Non-Determinism All produced MSTs correct and optimal

Abstraction of a Single State 43 cd ab a lt State after first loop in Boruvka We maintainWe loose Set of definitely locked objects denoted by lockpaths Maybe locked information Aliasing of top level variablesCardinality of sets Uniquely pointed-to types and objects referenced from the stack Content sharing of multiple collections

Hierarchy Summarization Intuition 44 Graph Set Weight Node Edge Iterator es src, dst ns cont ed past, at, future nd all g aNghbrs ltNghbrs nIter a, lt, n w, wan, minW e, an Gset gcont wl GBag mst bcont Void