Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Hypergraph-Partitioning Approaches for Workload Decomposition

Similar presentations


Presentation on theme: "A Hypergraph-Partitioning Approaches for Workload Decomposition"— Presentation transcript:

1 A Hypergraph-Partitioning Approaches for Workload Decomposition
Ümit V. Çatalyürek and Cevdet Aykanat Department of Biomedical Informatics The Ohio State University Department of Computer Engineering Bilkent University

2 What do we mean by Decomposition
Decomposition/partitioning of computation into smaller works/work groups = Workload partitioning = Workload assignment (but not mapping) Divide the work and data for efficient parallel computation CSCAPES Seminar - March 1st, 2007

3 CSCAPES Seminar - March 1st, 2007
Outline Partitioning-based Decomposition Models / Why Hypergraphs Standard Graph Model for SpMxV Hypergraph Models for 1D Decomposition for SpMxV Fine-Grain Hypergraph Model for 2D Decomposition Proposed Two-phase Coarse-grain Decomposition for both graph and hypergraph models Application: SpMxV Coarse-grain (checkerboard) decomposition of SpMxV SpMxV: Experiment Results Conclusion CSCAPES Seminar - March 1st, 2007

4 What People Have Done: Existing Graph Models
Standard graph model Multi-constraint graph partitioning Skewed partitioning Bipartite graph model Are they any good? They might be sufficient for some cases but not for all CSCAPES Seminar - March 1st, 2007

5 Why they are not sufficient
Flaws Wrong cost metric for communication volume Latency: # messages also important Minimize the maximum volume and/or # messages Processor distance: # of switches etc All of above? Some of above? Limitations Standard Graph Model can express symmetric dependencies Directed graph, convert to undirected graph, weighting Symmetric=identical partitioning of input and output data Multiple computation phases (a solution: multi-constraint partitioning: we developed multi-constraint hypergraph partitioner) From: Bruce Hendrickson’s shortcomings of standard graph partitioning approachs Blue text: our work helps on these issues Light blue: not explicit minimization but we put upper bound on #msgs. Directed graph, convert to undirected graph, weighting convert directed edges to undirected edges give weight 1 to one-way dependency, 2 to two-way dependency CSCAPES Seminar - March 1st, 2007

6 CSCAPES Seminar - March 1st, 2007
Hypergraph Models We proposed use of hypergraphs for the workload partitioning (PhD thesis, Bilkent’99) Coarser-grain: owner computes (mapping [HiPC’95], LP decomposition [Para’95], SpMxV [Irr’96, TPDS’99]) 1D partition in SpMxV Fine-grain: assign each operation between input&output [Irr’01,PPSC’01] 2D fine-grain decomposition in SpMxV (yi=aij xj) Coarse-grain: 2D checkerboard partitioning in SpMxV Advantages Correct cost for communication volume Naturally handles asymmetry Practical to use Public tools Good tools Insures a better upper bound on the number of messages CSCAPES Seminar - March 1st, 2007

7 Application Parallel Matrix-Vector Multiplication y=Ax
Parallel iterative solvers 1D rowwise or columnwise partitioning of A symmetric partitioning processor Pk computes linear vector operations on k-th blocks of vectors. rowwise: Pk computes yk = Ark x entries of the x-vector are communicated columnwise : Pk computes yk = Ack xk, where y =  yk entries of the yk vectors are communicated CSCAPES Seminar - March 1st, 2007

8 Graph Model for Representing Sparse Matrices
1 2 v 5 3 4 6 8 9 10 7 edge (vi, vj)  E  yi  yi + aij xj and yj  yj + ajixi exchange of xi and xj values before local matrix-vector products CSCAPES Seminar - March 1st, 2007

9 Graph Model Minimizes the Wrong Metric
k l m h j Vi Vk Vj Vm Vh Vl cost() = 2  5 = 10 words, but actual communication volume is 7 words P1 sends xi to both P2 and P4; P2 and P4 send {xj, xk, xl } and {xm, xh }, respectively, to P1 CSCAPES Seminar - March 1st, 2007

10 Hypergraph Model for Representing Sparse Matrices for 1D Decomposition
Each {vertex, net} pair represents unique nonzero net-cut metric: cutsize() = n  NE w(ni) connectivity - 1 metric: cutsize() = n  NE w(ni) (c(nj) - 1) CSCAPES Seminar - March 1st, 2007

11 Hypergraph models the Correct Metric
nj P2 i j k l m h nk Vj P1 Vi Vk i nl j Vl k P2 ni l nm P3 Vm nh Vh m P4 h P3 P4 connectivity values: c (ni) = 2, c (nj ) = c (nk ) = c (nl ) = c (nm ) = c (nh ) = 1 connectivity - 1 metric: cutsize() = 1   1 = 7 words CSCAPES Seminar - March 1st, 2007

12 Fine-Grain Hypergraph Model for 2D Decomposition
M x M matrix A with Z nonzeros is represented by H=(V, N) Z vertices: one vertex vij for each aij  0 2 M nets: one net for each row and for each column of A N =NR NC row nets: NR = {m1, m2, …, mM } column nets: NC = {n1, n2, …, nM } vij  mi and vij  nj iff aij  0 column-net nj represents dependency of atomic tasks to xj row-net mi represents dependency of computing yi to partial y'i results vih vii vik vij vjj vlj mi ( ri / yi ) nj ( cj / xj ) CSCAPES Seminar - March 1st, 2007

13 Fine-Grain Hypergraph Model
nonzero-vertex 1,1 2,5 7,4 4,4 3,3 2,3 3,2 2,2 1,2 4,1 3,5 5,5 6,6 7,6 4,7 5,7 7,7 6,8 8,8 8,4 column-net 4 n 2 7 m 1 3 5 8 6 1,6 n 6 2 3 4 5 6 7 8 1 m 8 one vertex for each nonzero row-net CSCAPES Seminar - March 1st, 2007

14 Fine-Grain Hypergraph Model for 2D Decomposition
unit net weighting: w(n) = 1 for each net n  N use connectivity-1 metric: cutsize() = n  NE (c(nj) - 1) minimizing cutsize corresponds to minimizing total volume of communication consistency of the model : exact correspondence between cutsize and communication volume maintain symmetric partitioning: yi, xi assigned to the same processor consistency condition : vii  ni and vii  mi for each vertex vii (holds iff aii  0 ) consider a K-way partition {V1, V2, … , VK} H=(V, N)  induces a partition on nonzeros of matrix A decode vii  Vk  assign yi and xi to processor Pk CSCAPES Seminar - March 1st, 2007

15 Fine-Grain Hypergraph Model for 2D Decomposition
1,6 1,1 2,5 7,4 4,4 3,3 2,3 3,2 2,2 1,2 4,1 3,5 5,5 6,6 7,6 4,7 5,7 7,7 6,8 8,8 8,4 4 n 2 7 m 1 3 5 8 6 P 1 2 3 4 5 6 7 8 1 1 2 1 2 2 2 2 3 2 2 3 4 1 3 3 5 3 3 6 1 1 x2 7 3 1 2 8 3 1 cutsize() = 8 Communication Volume=8 CSCAPES Seminar - March 1st, 2007

16 Two-phase Coarse-grain Decomposition
Decompose domain along one dimension to a group of processors SpMxV: rowwise decomposition graph/hypergraph partitioning: minimize communication volume during expand phase of reduction Phase 2: Decompose domain way along the other dimension within each group SpMxV: columnwise decomposition multiconstraint graph/hypergraph partitioning: minimize communication volume during gather phase of reduction maintains computational balance while preserving coherence among decompositions within different processor groups. SpMxV: checkerboard decomposition Applicable to both graph and hypergraph models CSCAPES Seminar - March 1st, 2007

17 Two-phase in SpMxV: Phase 1 Rowwise decomposition thru HP
CSCAPES Seminar - March 1st, 2007

18 Two-phase in SpMxV: Phase 1 Rowwise decomposition thru HP
13 5 1 6 14 11 3 2 15 10 7 9 8 16 12 4 R 1 P & P 11 12 P & P 21 22 R 2 CSCAPES Seminar - March 1st, 2007

19 CSCAPES Seminar - March 1st, 2007
Two-phase in SpMxV: Phase 2 Columnwise decomposition thru Multi-constraint HP 13 5 1 6 14 11 3 2 15 10 7 9 8 16 12 4 R 1 P & P 11 12 P & P 21 22 R 2 CSCAPES Seminar - March 1st, 2007

20 CSCAPES Seminar - March 1st, 2007
Two-phase in SpMxV: Phase 2 Columnwise decomposition thru Multi-constraint HP 13 5 1 6 14 11 3 2 15 10 7 9 8 16 12 4 P , W =12 11 11 P , W =11 12 12 P , W =12 21 21 P , W =12 22 22 CSCAPES Seminar - March 1st, 2007

21 Experimental Results: Communication Volume
CSCAPES Seminar - March 1st, 2007

22 CSCAPES Seminar - March 1st, 2007
Experimental Results CSCAPES Seminar - March 1st, 2007

23 Experimental Results: Communication Volume
CSCAPES Seminar - March 1st, 2007

24 Experimental Results: Maximum # messages
CSCAPES Seminar - March 1st, 2007

25 Experimental Results: Partitioning Time
CSCAPES Seminar - March 1st, 2007

26 Experimental Results: Summary
CSCAPES Seminar - March 1st, 2007

27 CSCAPES Seminar - March 1st, 2007
Conclusion A suite of models/approaches for workload partitioning 1D decomposition: Coarse-grain (owner computes) 2D decomposition: Fine-grain Doesn’t restrict the place of computation to the owner of input or output 2D decomposition: Coarse-grain (checkerboard) Two-phase with better upper bound on the number of messages Two-phase is applicable to both graph and hypergraph models Which one to use For better balanced workload and/or comm vol min  Fine-grain If latency is important  use proposed two-phase CSCAPES Seminar - March 1st, 2007

28 CSCAPES Seminar - March 1st, 2007
End of Talk CSCAPES Seminar - March 1st, 2007

29 CSCAPES Seminar - March 1st, 2007
Backup slides CSCAPES Seminar - March 1st, 2007

30 CSCAPES Seminar - March 1st, 2007
Graph Partitioning Graph G=(V, E) : set of vertices V and set of edges E every edge eij  E connects pair of distinct vertices vi and vj K-way graph partition by edge separator: ={V1, V2, …, VK} Vk is nonempty subset of V, i.e., Vk  V, parts are pairwise disjoint, i.e., Vk  Vl = , union of K parts is equal to V, i.e., k=1K Vk = V. an edge eij is said to be cut if vi  Vk and vj  Vl and kl uncut if vi  Vk and vj  Vk a partition is said to be balanced if Wk  Wavg (1 + ) Wk : weight of part Vk,  : maximum imbalance ratio cost of a partition cutsize() = eij  EE w(eij) where EE is set of cut edges CSCAPES Seminar - March 1st, 2007

31 CSCAPES Seminar - March 1st, 2007
Graph Partitioning Part Weights: W1=16, W2=16 Balance equation: Wk  Wavg (1 + ) this is a balanced partition with = 0 cut edges: Ee= {{v1 , v6}, {v4 , v8}, {vv , v7}, {v5 , v7}} cutsize() = e  EE w(e) cutsize() = 7 P P 1 2 v v 2 v 1 1 1 3 6 3 3 1 2 1 1 2 v v 1 2 8 2 v 3 3 5 3 3 9 v 4 1 1 1 2 2 4 3 v 2 v 1 v 5 7 10 CSCAPES Seminar - March 1st, 2007

32 Hypergraph Partitioning
Hypergraph H=(V,N): a set of vertices V and a set of nets N nets (hyperedges) connect two or more vertices every net nj  N is a subset of vertices, i.e., nj  V graph is a special instance of hypergraph K-way hypergraph partition: {V1, V2, … , VK} a net that has at least one pin in a part is said to connect that part connectivity set C(nj) of a net nj : set of parts connected by nj connectivity c(nj) = | C(nj) | of a net nj : number of parts connected by nj. a net nj is said to be cut if c(nj) > 1 uncut if c(nj) = 1 two cutsize definitions widely used in VLSI community: net-cut metric: cutsize() = n  NE w(ni) connectivity - 1 metric: cutsize() = n  NE w(ni) (c(nj) - 1) CSCAPES Seminar - March 1st, 2007

33 Hypergraph Partitioning
18 17 16 15 14 13 12 11 10 9 1 8 7 6 5 4 3 2 V cut nets: NE = {n1, n8, n15} connectivity sets: C(n1) = {V1,V2}, C(n8) = C(n15) = {V1,V2,V3} connectivity values: c (n1 ) = 2, c (n8 ) = c (n15 ) = 3 cutsize values assuming unit net weights: net-cut metric: cutsize() = |NE| = 3 connectivity - 1 metric: cutsize() = = 5 CSCAPES Seminar - March 1st, 2007

34 Graph Model for Representing Sparse Matrices
standard graph model G=(V, E) for matrix A vertex set : one vertex vi for each row/column i of A vi  V  task i of computing inner product yi = < ri, x> edge set E : (vi, vj)  E  aij  0 and aji  0 each edge denotes bidirectional interaction between tasks i and j edge (vi, vj)  E  yi  yi + aij xj and yj  yj + ajixi exchange of xi and xj values before local matrix-vector products  communication of two words edge weighting: w (vi, vj) = 2 vi (ri / ci ) vj (rj / cj ) aij, aji CSCAPES Seminar - March 1st, 2007


Download ppt "A Hypergraph-Partitioning Approaches for Workload Decomposition"

Similar presentations


Ads by Google