Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Biomedical Informatics Dynamic Load Balancing (Repartitioning) & Matrix Partitioning Ümit V. Çatalyürek Associate Professor Department of.

Similar presentations


Presentation on theme: "Department of Biomedical Informatics Dynamic Load Balancing (Repartitioning) & Matrix Partitioning Ümit V. Çatalyürek Associate Professor Department of."— Presentation transcript:

1 Department of Biomedical Informatics Dynamic Load Balancing (Repartitioning) & Matrix Partitioning Ümit V. Çatalyürek Associate Professor Department of Biomedical Informatics Department of Electrical & Computer Engineering The Ohio State University Workshop on Combinatorial Scientific Computing & Petascale Simulations 2008 June 10-13, 2008, Santa Fe, NM

2 Department of Biomedical Informatics OSU’s CSCAPES Contributions Load Balancing Parallel Static Load Balancing Parallel Dynamic Load Balancing Parallel Graph Coloring Distance-1 coloring Distance-2 coloring talk by Bozdag Friday morning Parallel Matrix Partitioning Parallel Matrix Ordering Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 2 CSCAPES Workshop, June 10, 2008

3 Department of Biomedical Informatics Roadmap Dynamic Load Balancing Motivation Background Classification of Repartitioning Techniques Graph and Hypergraph Approaches New Hypergraph Model for Dynamic Load Balancing Parallel Multilevel Hypergraph Partitioning with Fixed Vertices Experimental Results & Summary Matrix Partitioning 1D Hypergraph-based Methods: Row-wise and Column-wise 2D Hypergraph-based Methods: Fine-grain, Jagged-Like, Checkerboard Experimental Results & Summary Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 3 CSCAPES Workshop, June 10, 2008

4 Department of Biomedical Informatics Partitioning and Load Balancing Goal: assign data to processors to minimize application runtime maximize utilization of computing resources Metrics: minimize processor idle time (balance workloads) keep inter-processor communication costs low Impacts performance of a wide range of simulations Adaptive mesh refinement Contact detection Particle simulations x b A = Linear solvers & preconditioners CSCAPES Workshop, June 10, 2008 4 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

5 Department of Biomedical Informatics Dynamic Load Balancing/Repartitioning Applications with workload or locality that changes during simulation require dynamic load balancing (a.k.a. repartitioning) Adaptive mesh refinement Particle methods Contact detection Repartitioning has additional cost: Moving data from old to new decomposition executionT = #iter x ( computationT + communicationT) + repartT + migrationT CSCAPES Workshop, June 10, 2008 5 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

6 Department of Biomedical Informatics Roadmap Dynamic Load Balancing Motivation Background Classification of Repartitioning Techniques Graph and Hypergraph Approaches New Hypergraph Model for Dynamic Load Balancing Parallel Multilevel Hypergraph Partitioning with Fixed Vertices Experimental Results & Summary Matrix Partitioning 1D Hypergraph-based Methods: Row-wise and Column-wise 2D Hypergraph-based Methods: Fine-grain, Jagged-Like, Checkerboard Experimental Results & Summary Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 6 CSCAPES Workshop, June 10, 2008

7 Department of Biomedical Informatics Classification of Dynamic Load Balancing Approaches Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 7 CSCAPES Workshop, June 10, 2008

8 Department of Biomedical Informatics Graph and Hypergraph Partitioning GraphsHypergraphs Communityload-balancing (highly successful for PDE problems) VLSI, recently Computational Science ModelVertices = computation/data Edge = relationship between computation/data (bi-directional) Vertices= computation/data Edge = dependency to data elements (multi- way) GoalEvenly distribute vertex weight while minimizing weight of cut edges Evenly distribute vertex weight while minimizing cut size AlgorithmsKernighan, Lin, Simon, Hendrickson, Leland, Kumar, Karypis, et al. Kernighan, Schweikert, Fiduccia, Mattheyes, Sanchis, Alpert, Kahng, Hauck, Borriello, Çatalyürek, Aykanat, Karypis, et al. Serial Partitioner Chaco (SNL), Jostle (U. Greenwich), METIS (U. Minn.), Party (U. Paderborn), Scotch (U. Bordeaux) hMETIS (Karypis), PaToH (Çatalyürek), Mondriaan (Bisseling) Parallel Partitioner ParMETIS (U. Minn.), PJostle (U. Greenwich) Zoltan PHG (Sandia), Parkway (Trifunovic) CSCAPES Workshop, June 10, 2008 8 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

9 Department of Biomedical Informatics Impact of Hypergraph Models (Where Graph is not Sufficient) Greater expressiveness  Greater applicability Structurally non-symmetric systems circuits, biology Rectangular systems linear programming, least-squares methods Non-homogeneous, highly connected topologies circuits, nanotechnology, databases Multiple models for different granularity partitioning Owner compute, fine-grain, checkerboard/cartesian, Mondriaan Accurate communication model  lower application communication costs Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 9 CSCAPES Workshop, June 10, 2008 P4P4 P3P3 P1P1 ViVi VkVk VjVj VmVm VhVh VlVl nini nknk nlnl nmnm nhnh Mondriaan Partitioning Courtesy of Rob Bisseling P4P4 P3P3 P1P1 P2P2 ViVi VkVk VjVj VmVm VhVh VlVl

10 Department of Biomedical Informatics Roadmap Dynamic Load Balancing Motivation Background Classification of Repartitioning Techniques Graph and Hypergraph Approaches New Hypergraph Model for Dynamic Load Balancing Parallel Multilevel Hypergraph Partitioning with Fixed Vertices Experimental Results & Summary Matrix Partitioning 1D Hypergraph-based Methods: Row-wise and Column-wise 2D Hypergraph-based Methods: Fine-grain, Jagged-Like, Checkerboard Experimental Results & Summary Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 10 CSCAPES Workshop, June 10, 2008

11 Department of Biomedical Informatics Hypergraph Model : #parts edge e i connects Cut = Cut = total comm volume CSCAPES Workshop, June 10, 2008 11 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

12 Department of Biomedical Informatics Start with application hypergraph Add one partition vertex for each partition migration edges connecting application vertices to their partition vertices Weight the hyperedges: Migration edge weight = size of application objects (migration size) Application edge weight = size of communication elements Scale application edge weights by  ≈ number of application communications between repartitions (#iter) Perform hypergraph partitioning with partition vertices “fixed” Hypergraph Repartitioning CSCAPES Workshop, June 10, 2008 12 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

13 Department of Biomedical Informatics Start with application hypergraph Add one partition vertex for each partition migration edges connecting application vertices to their partition vertices Weight the hyperedges: Migration edge weight = size of application objects (migration size) Application edge weight = size of communication elements Scale application edge weights by  ≈ number of application communications between repartitions (#iter) Perform hypergraph partitioning with partition vertices “fixed” Hypergraph Repartitioning CSCAPES Workshop, June 10, 2008 13 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

14 Department of Biomedical Informatics Start with application hypergraph Add one partition vertex for each partition migration edges connecting application vertices to their partition vertices Weight the hyperedges: Migration edge weight = size of application objects (migration size) Application edge weight = size of communication elements Scale application edge weights by  ≈ number of application communications between repartitions (#iter) Perform hypergraph partitioning with partition vertices “fixed” Hypergraph Repartitioning executionT = #iter x ( computationT + communicationT) + repartT + migrationT CSCAPES Workshop, June 10, 2008 14 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

15 Department of Biomedical Informatics Start with application hypergraph Add one partition vertex for each partition migration edges connecting application vertices to their partition vertices Weight the hyperedges: Migration edge weight = size of application objects (migration size) Application edge weight = size of communication elements Scale application edge weights by  ≈ number of application communications between repartitions (#iter) Perform hypergraph partitioning with partition vertices “fixed” Hypergraph Repartitioning CSCAPES Workshop, June 10, 2008 15 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

16 Department of Biomedical Informatics Roadmap Dynamic Load Balancing Motivation Background Classification of Repartitioning Techniques Graph and Hypergraph Approaches New Hypergraph Model for Dynamic Load Balancing Parallel Multilevel Hypergraph Partitioning with Fixed Vertices Experimental Results & Summary Matrix Partitioning 1D Hypergraph-based Methods: Row-wise and Column-wise 2D Hypergraph-based Methods: Fine-grain, Jagged-Like, Checkerboard Experimental Results & Summary Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 16 CSCAPES Workshop, June 10, 2008

17 Department of Biomedical Informatics Implementation of Hypergraph Repartitioning Implemented in Zoltan toolkit Based on parallel multilevel parallel hypergraph partitioner with recursive bisection (IPDPS’06) Automatically construct augmented hypergraph … with added capability for handling “fixed vertices.” Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 17 CSCAPES Workshop, June 10, 2008

18 Department of Biomedical Informatics Experimental Results Experiments on OSU-RI cluster 64 compute nodes connected with Infiniband Dual 2.4 GHz AMD Opteron processors with 8 GB RAM Sandia-Thunderbird cluster 4,480 compute nodes connected with Infiniband Dual 3.6 GHz Intel EM64T processors with 6 GB RAM Zoltan v3 (alpha) hypergraph partitioner & ParMETIS v3.1 graph partitioner Test problems: 2DLipid: density functional theory; 4K x 4K; 5.6M nonzeros Xyce: ASIC Stripped; 680K x 680K; 2.3M nonzeros Cage14: DNA Electrophoresis; 1.5M x 1.5M; 27M nonzeros Xyce ASIC Stripped Cage Electrophoresis CSCAPES Workshop, June 10, 2008 18 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

19 Department of Biomedical Informatics Communication Volume 2DLipid Hypergraph is better Zoltan-repart trades comm with migration to min tot cost Scratch methods are comparable for large alpha (#iter) Xyce Cage14

20 Department of Biomedical Informatics Dynamic Graph: Partitioning Time on T-bird 2DLipid Cage14 Xyce

21 Department of Biomedical Informatics Summary of Dynamic Load Balancing A novel hypergraph model for dynamic load balancing Single hypergraph that incorporates both communication volume in the application and data migration cost Performs better or comparable to graph-based dynamic load balancing A parallel dynamic load balancing tool Essential for peta-scale applications Scales similar to those of graph-based tools Future Work There is always room for improvement: speed and/or quality Direct k-way refinement Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 21 CSCAPES Workshop, June 10, 2008

22 Department of Biomedical Informatics Roadmap Dynamic Load Balancing Motivation Background Classification of Repartitioning Techniques Graph and Hypergraph Approaches New Hypergraph Model for Dynamic Load Balancing Parallel Multilevel Hypergraph Partitioning with Fixed Vertices Experimental Results & Summary Matrix Partitioning 1D Hypergraph-based Methods: Row-wise and Column-wise 2D Hypergraph-based Methods: Fine-grain, Jagged-Like, Checkerboard Experimental Results & Summary Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 22 CSCAPES Workshop, June 10, 2008

23 Department of Biomedical Informatics Matrix Partitioning Hypergraph Models for Sparse-Matrix Partitioning 1D row-wise column-wise 2D Fine-grain Jagged-like Checkerboard Serial Tool: PaToH & Matlab interface Matrix Partitioning Partitioned Matrix Display CSCAPES Workshop, June 10, 2008 23 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

24 Department of Biomedical Informatics 1D Partitioning M x N matrices with K processors Worst case Total Volume = (K-1) x N words or (K-1) x M words Total Number Messages = K x (K-1) Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 24 CSCAPES Workshop, June 10, 2008

25 Department of Biomedical Informatics 2D Partitioning: Jagged-Like M x N matrices with K=PxQ processors Worst case Total Volume = (K-P) x N + (Q-1) x M Total Number Messages = K x (K-Q) + K x (Q-1) = K x (K-1) Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 25 CSCAPES Workshop, June 10, 2008

26 Department of Biomedical Informatics 2D Partitioning: Checkerboard M x N matrices with K=PxQ processors Worst case Total Volume = (P-1) x N + (Q-1) x M Total Number Messages = P+Q-2 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 26 CSCAPES Workshop, June 10, 2008

27 Department of Biomedical Informatics cage5 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 27 CSCAPES Workshop, June 10, 2008

28 Department of Biomedical Informatics cage5 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 28 CSCAPES Workshop, June 10, 2008

29 Department of Biomedical Informatics cage5 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 29 CSCAPES Workshop, June 10, 2008

30 Department of Biomedical Informatics Experimental Results Tested 1,413 matrices (out of 1,877) from UFL Collection #rows >= 500 and #columns >= 500 #non-zeros < 10,000,000 K-way partitioning for K = 4, 16, 64 and 256 If 50 x K >= max {#rows, #columns} Partitioning instance = matrix & K For each partitioning instance we run RW, CW, JL, CH, FG methods Linux Cluster 64 dual 2.4GHz Opteron CPUs, 8GB ram Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 30 CSCAPES Workshop, June 10, 2008

31 Department of Biomedical Informatics Experimental Results: Total Communication Volume Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 31 CSCAPES Workshop, June 10, 2008 All Instances (4040) Square Symmetric (2231) Performance Profiles

32 Department of Biomedical Informatics Experimental Results: Total Communication Volume Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 32 CSCAPES Workshop, June 10, 2008 Square Non-symmetric (1102) Rectangular (707) N>M (662)  CW better than RW M>N (45)

33 Department of Biomedical Informatics Experimental Results: Total Number of Messages Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 33 CSCAPES Workshop, June 10, 2008

34 Department of Biomedical Informatics Experimental Results: Execution Time Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 34 CSCAPES Workshop, June 10, 2008

35 Department of Biomedical Informatics Summary of Matrix Partitioning Hypergraph models for Matrix Partitioning Well.. some are not new but not have been adopted by applications yet. Why? (Information dissemination problem? Tool?) More hypergraph-based methods are being developed! Corner-Model Hybrid Mondrian with Fine-Grain Matlab interface to PaToH for Matrix Partitioning Currently supports: RW, CW, JL, CH, FG Will be available soon Work in progress Parallel Matrix Partitioning via Zoltan CSCAPES Workshop, June 10, 2008 35 Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning"

36 Department of Biomedical Informatics Thanks Contact Info: umit@bmi.osu.edu http://bmi.osu.edu/~umit Also: http://www.cs.sandia.gov/Zoltan/ http://www.cscapes.org/ Umit Catalyurek "Dynamic Load Bal. & Matrix Partitioning" 36 CSCAPES Workshop, June 10, 2008


Download ppt "Department of Biomedical Informatics Dynamic Load Balancing (Repartitioning) & Matrix Partitioning Ümit V. Çatalyürek Associate Professor Department of."

Similar presentations


Ads by Google