Presentation is loading. Please wait.

Presentation is loading. Please wait.

Burkhard Monien, Universität Paderborn Henning Meyerhenke, Georgia Institute of Technology SIAM Workshop on Combinatorial Scientific Computing Darmstadt,

Similar presentations


Presentation on theme: "Burkhard Monien, Universität Paderborn Henning Meyerhenke, Georgia Institute of Technology SIAM Workshop on Combinatorial Scientific Computing Darmstadt,"— Presentation transcript:

1 Burkhard Monien, Universität Paderborn Henning Meyerhenke, Georgia Institute of Technology SIAM Workshop on Combinatorial Scientific Computing Darmstadt, Germany, May 20 th, 2011

2 Outline Introduction Global Methods Local Search Techniques Multilevel Methods Methods based on Random Walks and Diffusion Related and Future Directions SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC2

3 Graph Partitioning in Computer Science and Combinatorial Scientific Computing SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC3

4 Application: Numerical Simulations Numerical simulations: Classical parallel applications Domain and corresponding PDEs are discretized into mesh Task: Map mesh onto processors for efficient parallel solution of linear systems (discretized PDEs) Partition mesh (or dual graph) such that: Load is balanced, Communication within solvers is minimized YF-17 fighter, [www.aero.polimit.it] Crash analysis, [www.crash-analysis.com/pages/gallery.shtml] SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC4

5 Application: VLSI Circuit Layout/Design 1) Numerical simulation of semiconductors 2) Layout of chip components Communication within component cheaper than between components Find layout that minimizes inter- component communication Mesh from SRAM simulation, [http://www.cogenda.com/article/Genius] Intel Atom processor, [http://photos.macnn.com/news/0912/ intelatom45nm-lg2.jpg] SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC5

6 Application: Image Segmentation Segmentation: Find larger regions in an image with similar visual characteristics Simplify image, preprocessing Image modeled as a graph: Each pixel is a vertex Edges between pixels that are spatially not too far from each other Edge weights model visual similarity SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC6 [http://people.cs.uchicago.edu/~pff/segment/]

7 Problem Formulation Traditional static graph partitioning problem (GPP): Given a graph, partition into by a mapping such that is balanced ( ) and the weight of the cut edges is minimized Dynamic case: Repartitioning problem  Solve the GPP with additional objective: Minimum migration costs SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC7

8 Criticism and Adaptations Edge cut does not model cost of solver communication accurately  Hypergraph partitioning Some solvers profit from good partition shapes  Shape optimization  Connected parts often desirable Synchronous computations: Maximum norm instead of summation norm SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC8 B. Hendrickson: Graph Partitioning and Parallel Solvers: Has the Emperor No Clothes? (Extended Abstract). IRREGULAR 1998: 218-225.

9 Complexity and Approximation Results Graph partitioning is NP-hard optimization problem O(sqrt(log n)) approximation algorithm for sparsest cut, balanced separotors [Arora, Hazan, Kale, SIAM J. Comput. 2010], [Sherman, FOCS 2009] Approximation algorithms: Rather complicated implementation Not fast enough in practice, e.g. solve many flow problems For practice: Guarantees are still quite far away from optimum  Heuristics in practice SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC9

10 Spectral Partitioning Geometric Approaches Metaheuristics SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC10

11 Spectral Partitioning Formulate edge-cut minimization as binary quadratic program: Relax integral constraint, solve eigenvector problem Pro: Mathematical analysis (connected parts under certain conditions), optimized eigensolvers Con: Quality often not comparable to best methods, (sequential) running time higher than with local optimizers SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC11 A. Pothen, H.D. Simon, K.P. Liou: Partitioning sparse matrices with Eigenvectors of graphs. SIAM J. Marix Anal. & Appl. 11 (1990), no. 3, pp. 430-452.

12 Geometric Methods Coordinate Nested Dissection (CND) and Recursive Inertial Bisection (RIB): Bisect with hyperplanes Space-filling curves Very fast, low memory consumption, mostly easy to parallelize But: Coordinates are necessary and (more importantly) methods are not well suited to artifacts such as holes and fissures Mainly as preprocessing or for specific uses SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC12 [Schloegel et al., 2003] Lebesgue curve

13 Metaheuristics Introduction and Overview Metaheuristics have been applied successfully to a variety of optimization problems Graph partitioning: Evolutionary / genetic algorithms Population Reinforced Opti- mization Based Exploration Fusion and fission Simulated annealing … Drawback: (Mostly) Very time-consuming SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC13 [http://brainz.org/15-real-world-applications-genetic-algorithms/] See e.g.C. Walshaw: Multilevel Refinement for Combinatorial Optimisation: Boosting Metaheuristic Performance. Hybrid Metaheuristics 2008: 261-289.

14 Kernighan-Lin Fiduccia-Mattheyses Helpful Sets SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC14

15 Local Search Heuristics Overview Kernighan-Lin (KL), Fiduccia-Mattheyses (FM) Helpful Sets (HS) Other variations Local search methods to improve existing partition Vertex exchanges based on gain (edge cut improvement by exchange) SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC15 Gain: 3

16 Discussion of KL/FM/HS Advantages: Very fast (without coordinate data) Reasonably good quality in multilevel process Very popular, graph tools: Metis, Jostle, Chaco, Scotch, KaPPa, Party; hypergraph tools: hMetis, PaToH, Mondriaan, MLPart, Parkway, Zoltan Disadvantages: KL/FM/HS focus only on edge cut Not easy to parallelize No quality guarantees for KL/FM SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC16

17 Global View for Local Methods: Matchings, Weighted Aggregation, n-level SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC17

18 Multilevel Strategy Local methods need reasonably good starting solution Restrict search space, avoid cutting “heavy” edges General procedure: Recursive coarsening Initial partitioning Interpolation and local improvement SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC18

19 Matching Algorithms Approximate Maximum Weighted Matching Serial algorithms (among others): SHEM (no guar.), Greedy (½-appr.) LAM (Preis, ½-appr.) PGA’ (Drake and Hougardy, ½-appr.) GPA, ROMA (Maue and Sanders, ½ and ) Comparison [Maue, Sanders, WEA 2007] : GPA yields better quality than Greedy and PGA’ Randomization techniques such as ROMA often improve quality further Scalable parallelization requires parallel matching SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC19

20 Guiding Matching Algorithms Maximum total weight not exact model for coarsening Other aspects need to be considered (e.g. uniformity) Heuristic rationale: Contract heavy edges to decrease cut size Contract light vertices for uniformity Idea: Use edge rating function to guide the matching algorithm according to rationale based on local info MWM algorithms can be reused, new edge ratings are more meaningful than edge weights SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC20

21 Edge Ratings for Matchings KaPPa: Scalable parallelization with MPI, no quality penalty as with other parallel KL/FM partitioners FM local search in boundary areas (BFS) Four promising edge ratings for matching: All yield significantly better partitions than edge weight only One good choice: One of the key ingredients for better quality Sequential MWM: Using edge ratings with GPA yields better partitioning quality than SHEM and Greedy SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC21 M. Holtgrewe, P. Sanders, and C. Schulz, “Engineering a scalable high quality graph partitioner,” in 24th Intl. Parallel and Distributed Processing Symposium (IPDPS 2010). IEEE, 2010, pp. 1–12.

22 [Chevalier and Safro, 2009] Weighted Aggregation / AMG AMG: Hierarchy-based preconditioner and solver for linear systems Idea: AMG coarsening algorithms also suitable for MGP coarsening Weighted aggregation: Choose coarse vertex set (independent set with strong coupling) Use interpolation scheme to assign fractions of fine vertices to coarse ones ≈15% edge cut improvement compared to matching when used with simple FM SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC22 C. Chevalier, I. Safro: Comparison of Coarsening Schemes for Multilevel Graph Partitioning. Learning and Intelligent Optimization (LION) 2009: 191-205.

23 n-level Graph Partitioning Main idea: Deep but simple hierarchy, simple local heuristic Only one edge contraction between consecutive levels Edge is chosen based on edge rating function value KL-like local search Main difference: Very localized, search only around uncontracted edge Local search stopped based on random walk model V. Osipov and P. Sanders, “n-level graph partitioning,” in Proc. 18th Annual European Symposium on Algorithms (ESA’10), 2010, pp. 278–289. SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC23

24 Multilevel Methods Summary and Conclusions Multilevel process crucial for partitioning quality Several new approaches: Edge ratings to guide approximate MWM AMG / Weighted aggregation n-level hierarchy Substantial quality improvements possible! Nice to have: Scalable parallel and publicly available implementation of all these features to have the choice SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC24

25 Identifying Dense Regions with Random Walks/Diffusion SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC25

26 Random Walks and Diffusion Random walks: Stochastic process on graphs, starts on arbitrary vertex Pick next vertex to go to from neighbors with probability proportional to edge weight Likely to stay in dense graph region when in there Diffusion: Desire of a substance to distribute itself in space Related to random walks Steady state: Balanced load on all vertices SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC26

27 Load Balancing by Diffusion Denote: work load of node ; Discrete diffusion: M doubly stochastic (random walk analogy!) L positive semidefinite Diffusion flow is optimal with respect to the l 2 -norm Lemma: SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC27

28 Shape Optimizing Partitioning Idea 1: Compute good partition shapes with small surfaces! Idea 2: Diffusive process decides which elements go where! Results in: Short partition boundaries Small partition diameters Few cut edges Connected partitions more often Small migration costs in case of repartitioning Higher, but reasonable running time SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC28 Metis (KL) Shape Optimized

29 k-means and the Bubble Framework Bubble framework: Lloyd’s k-means algorithm transferred to graphs Basic idea for GP: [Walshaw et al., 1995], [Diekmann et al., 2000] Graph distance (path length):, does not distinguish dense and sparse regions SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC29

30 Good Shapes with Disturbed Diffusion Requirement for similarity measure: Reflect how well connected two vertices/regions are  Use diffusion! [Schamberger, IPDPS Workshops 2004] Diffusion load spreads faster into densely connected regions Disturbed diffusion to avoid balanced state Use set of privileged source vertices SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC30

31 Disturbed Diffusion FOS/C FOS/C: First Order Scheme with Constant drain  Source set determines structure of Lemma: Diffusive iteration converges if solution can be computed by solving linear system: (FOS/C procedure) SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC31

32 Bubble Operation AssignPartition Input: Centers. Output: Partition. For each part : Solve FOS/C procedure, center as source vertex, disturbance by drain vector Linear system for : Assignment of vertex to a part: Two balancing procedures also use diffusion values SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC32 H. Meyerhenke, B. Monien, S. Schamberger: Graph Partitioning and Disturbed Diffusion. Parallel Computing, 35(10-11):544-569, 2009. Independent operations  parallelism

33 Bubble Operation ComputeCenters SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC33 Input: Partition. Output: Center set C. For each part : Solve FOS/C procedure, vertices of part as source set, disturbance by drain vector Linear system for : New center of part : SIAM CSC, May 2011Recent Trends in Graph Partitioning for SC33 Independent for each part  parallelism

34 Optimization Criterion Quadratic optimization problem for min balanced cut ( ): Spectral methods: Relax integrality constraint and solve eigenvector problem SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC34 H. Meyerhenke: Beyond good shapes: Diffusion-based graph partitioning is relaxed cut optimization. In Proc. 21st International Symposium on Algorithms and Computation (ISAAC). Springer, 2010. Invited to special ISAAC 2010 issue of Algorithmica. Theorem: Under mild conditions, AssignPartition followed by ScaleBalance together compute the global minimum of a similar relaxed optimization problem.

35 The Bubble-FOS/C Heuristic Discussion Advantages: Mathematical analysis: Proven convergence, relaxed edge cut optimization Good experimental results on FEM graphs Disadvantage: High running time (due to linear system solving)  Use simpler diffusive mechanism to retain the good properties, but accelerate the process SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC35

36 Random Walks for Clustering “Suitable” random walk length important! Distance or similarity measures based on random walks: Euclid. Commute Time Distance (ECTD) [Fouss et al., IEEE Trans. KDE 2007] Algebraic distances [Chen and Safro, 2010] Diffusion distances [Lafon and Lee, PAMI 2006] … Other clustering methods with similar ideas: Markov Clustering [van Dongen, 2000] Clustering spatial data using random walks [Harel and Koren, KDD 2001] Isoperimetric graph partitioning [Grady and Schwartz, SIAM J. SC, 2006] … SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC36

37 Faster Local Diffusive Approach TruncCons, k-way extension and variant of [Pellegrini, Euro-Par 2007] Consolidation: Same initial load for nodes of current subdomain, all others 0. Stop/Truncate FOS after very few iterations. Rationale: Local improvement, load flows from the subdomain borders into the graph. Computational work can be restricted to area near to the part boundaries Repeat process with newly computed partition SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC37 H. Meyerhenke, B. Monien, T. Sauerwald: A New Diffusion-based Multilevel Algorithm for Computing Graph Partitions of Very High Quality. In Proc. 22nd IEEE Internatl. Parallel and Distributed Processing Symposium (IPDPS'08). Winner of the Best Algorithms Paper Award.

38 DibaP: Diffusion-based Partitioning Hybrid Multilevel Algorithm DibaP: Hybrid algorithm, Multilevel + Bubble-FOS/C + TruncCons 1) + 2): Coarsening Approx. maximum weighted matching Algebraic multigrid (AMG) 3) Initial partitioning: Bubble-FOS/C 4) + 5): Local improvement Small hierarchy levels: Bubble-FOS/C Large hierarchy levels: TruncCons SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC38

39 DibaP Experimental Results (1): Partitioning Walshaw’s archive traces best partitions for 34 benchmark graphs (24 entries per graph): At the time more than 80 records, now 16 left SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC39 Average results for 8 benchmark graphs (sum norm) Maximum norm: Even slightly higher improve- ment with DibaP

40 DibaP Experimental Results (2): Repartitioning [M., M., SIAM CSE 2011] Repartitioning of 2D synthetic dynamic graph sequences MPI parallel implementation pDibaP Running time ca. 35x slower than ParMetis Ca. 15-30% higher quality SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC40

41 Diffusive Graph Partitioning Summary and Outlook Good solution quality High, but acceptable running time Theoretical foundation Especially suitable for repartitioning Further acceleration desirable: Combination with other techniques, faster solvers Faster implementations, tailored to parallel hardware Adaptation to different scenario: Clustering of P2P networks [Gehweiler and Meyerhenke, HPGC’10] SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC41

42 SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC42

43 What could not be covered… Hypergraph partitioning: Models communication better in many cases; so far mostly used with KL/FM Can the new techniques developed for graph partitioning applied to hypergraphs as well? Flow-based algorithms: MQI [Lang and Rao, IPCO 2004], KaFFPa [Sanders and Schulz, TR 2011] exploit max-flow min-cut, related implementations: [Lang, Mahoney, Orecchia, SEA 2009] Theoretical work on local partitioning with random walks [Anderson, Chung, Lang, FOCS 2006], [Andersen, Peres, STOC 2009] Resource awareness [Walshaw, Cross, FGCS 2001], [Moulitsas and Karypis, ICA3PP 2008] Semi-definite programming and other optim. techniques Practical methods for other apps (road networks: [Delling et al., TR 2010]) SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC43

44 Massive (and hierarchical) parallelism Heterogeneity: Architectures Workloads Archit. Mapping Future Directions in GP Transfer new techniques to hypergraphs Dynamic graphs Better theoretical understanding of new techniques Social networks: Power law degree distribution Dynamics SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC44 [http://www.nvidia.com] [prblog.typepad.com]

45 10 th DIMACS Implementation Challenge Graph Partitioning and Graph Clustering Capture the state-of-the-art in Graph partitioning Graph clustering Participation: Provide data Submit solvers Development phase has started More info: http://www.cc.gatech.edu/dimacs10/ SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC45

46 Thank you! SIAM CSC, May 20th, 2011Recent Trends in Graph Partitioning for SC46 Acknowledgments: This work was partially supported by German Research Foundation (DFG) Priority Programme 1307 Algorithm Engineering, by DARPA Ubiquitous High Performance Computing (UHPC), and by the CASS-MT Center of Pacific Northwest National Laboratory (PNNL).


Download ppt "Burkhard Monien, Universität Paderborn Henning Meyerhenke, Georgia Institute of Technology SIAM Workshop on Combinatorial Scientific Computing Darmstadt,"

Similar presentations


Ads by Google