Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hans De Sterck Department of Applied Mathematics University of Colorado at Boulder Ulrike Meier Yang Center for Applied Scientific Computing Lawrence Livermore.

Similar presentations


Presentation on theme: "Hans De Sterck Department of Applied Mathematics University of Colorado at Boulder Ulrike Meier Yang Center for Applied Scientific Computing Lawrence Livermore."— Presentation transcript:

1 Hans De Sterck Department of Applied Mathematics University of Colorado at Boulder Ulrike Meier Yang Center for Applied Scientific Computing Lawrence Livermore National Laboratory Reducing Complexity in Algebraic Multigrid

2 2 Copper 2004 Outline l introduction: AMG l complexity growth when using classical coarsenings l Parallel Modified Independent Set (PMIS) coarsening l scaling results l conclusions and future work

3 3 Copper 2004 Introduction solve from 3D PDE – sparse! large problems (10 9 dof) - parallel unstructured grid problems

4 4 Copper 2004 Algebraic Multigrid (AMG) multi-level iterative algebraic: suitable for  unstructured!

5 5 Copper 2004 —Select coarse “grids” —Define interpolation, —Define restriction and coarse-grid operators AMG building blocks Setup Phase: Solve Phase

6 6 Copper 2004 AMG complexity - scalability Operator complexity C op = e.g., 3D: C op = 1 + 1/8 + 1/64 + … < 8 / 7 measure of memory use, and work in solve phase scalable algorithm: O(n) operations per V-cycle (C op bounded) AND number of V-cycles independent of n (  AMG independent of n)

7 7 Copper 2004 AMG Interpolation after relaxation: Ae  0 (relative to e) heuristic: error after interpolation should also satisfy this relation approximately derive interpolation from:

8 8 Copper 2004 AMG interpolation “large” a ij should be taken into account accurately “strong connections”: i strongly depends on j (and j strongly influences i ) if with strong threshold 

9 9 Copper 2004 AMG coarsening l (C1) Maximal Independent Set: Independent: no two C-points are connected Maximal: if one more C-point is added, the independence is lost l (C2) All F-F connections require connections to a common C- point (for good interpolation) l F-points have to be changed into C-points, to ensure (C2); (C1) is violated more C-points, higher complexity

10 10 Copper 2004 Classical Coarsenings 1. Ruge-Stueben (RS) —two passes: 2 nd pass to ensure that F-F have common C —disadvantage: highly sequential 2. CLJP —based on parallel independent set algorithms developed by Luby and later by Jones & Plassman —also ensures that F-F have common C 3. hybrid RS-CLJP (“Falgout”) —RS in processor interiors, CLJP at interprocessor boundaries

11 11 Copper 2004 Classical coarsenings: complexity growth procdofC op max avg nonzeros per row t setup t solve Iter  AMG 132 3 3.76161 (5/8)2.911.8150.035 864 3 4.69463 (6/11)8.324.3780.098 64128 3 5.721223 (7/16)36.288.59100.172 example: hybrid RS-CLJP (Falgout), 7-point finite difference Laplacian in 3D,  = 0.25 increased memory use, long solution times, long setup times  loss of scalability

12 12 Copper 2004 our approach to reduce complexity do not add C points for strong F-F connections that do not have a common C point less C points, reduced complexity, but worse convergence factors expected can something be gained?

13 13 Copper 2004 PMIS coarsening (De Sterck, Yang) Parallel Modified Independent Set (PMIS) do not enforce condition (C2) weighted independent set algorithm: points i that influence many equations ( i large), are good candidates for C-points add random number between 0 and 1 to i to break ties

14 14 Copper 2004 pick C-points with maximal measures (like in CLJP), then make all their neighbors fine (like in RS) proceed until all points are either coarse or fine PMIS coarsening

15 15 Copper 2004 PMIS select 1 è select C-pts with maximal measure locally è make neighbor F-pts è remove neighbor edges 3.75.35.05.95.45.33.4 5.28.08.58.28.68.95.1 5.98.18.88.98.48.25.9 5.78.68.38.88.38.15.0 5.38.78.38.48.38.85.9 5.08.88.58.68.78.95.3 3.25.65.85.65.9 3.0

16 16 Copper 2004 PMIS: remove and update 1 è select C-pts with maximal measure locally è make neighbors F-pts è remove neighbor edges 3.75.35.05.9 5.28.0 5.98.1 5.78.68.15.0 8.4 8.6 5.6

17 17 Copper 2004 PMIS: select 2 è select C-pts with maximal measure locally è make neighbors F-pts è remove neighbor edges 3.75.35.0 5.9 5.28.0 5.98.1 5.78.15.0 8.4 8.6 5.6 8.6

18 18 Copper 2004 PMIS: remove and update 2 è select C-pts with maximal measure locally è make neighbors F-pts è remove neighbor edges 3.75.3 5.28.0

19 19 Copper 2004 PMIS: final grid è select C-pts with maximal measure locally è make neighbor F- pts è remove neighbor edges

20 20 Copper 2004 Preliminary results: 7pt 3D Laplacian on an un- structured grid (n = 76,527), serial,  =0.5, GS with GMRES(10) CoarseningC op # itsTime# its Time RS4.43125.7485.37 CLJP3.83125.3985.15 PMIS1.56588.59174.53 Implementation in CASC/LLNL’s Hypre/BoomerAMG library (Falgout, Yang, Henson, Jones, …)

21 21 Copper 2004 PMIS results: 27-point finite element Laplacian in 3D, 40 3 dof per proc (IBM Blue) Falgout (  =0.5) and PMIS-GMRES(10) (  =0.25) procC op max nzlevelst setup t solve Itert total 11.62124128.396.48714.87 82.194461616.6511.20827.85 642.436971737.9516.89954.84 2562.518041948.3618.42966.78 3842.558571957.7021.661079.36 11.113664.608.421013.02 81.114275.8113.271319.08 641.114886.8416.831523.67 2561.125197.2318.501625.73 3841.125297.5619.621727.18

22 22 Copper 2004 PMIS results: 7-point finite difference Laplacian in 3D, 40 3 dof per proc (IBM Blue) Falgout (  =0.5) and PMIS-GMRES(10) (  =0.25) procC op max nzLevelst setup t solve Itert total 13.62104114.914.0268.99 84.452921410.526.95717.47 645.074501621.8211.89933.71 2565.285211926.7213.18939.90 3845.355491930.4114.131044.54 12.325672.288.051310.36 82.376583.7813.301717.08 642.397195.1919.442124.63 2562.4073105.5920.472226.06 3842.4174105.9722.562428.53

23 23 Copper 2004 Conclusions PMIS leads to reduced, scalable complexities for large problems on parallel computers using PMIS-GMRES, large problems can be done efficiently, with good scalability, and without requiring much memory (Blue Gene/L)

24 24 Copper 2004 Future work parallel aggressive coarsening, multi-pass interpolation to further reduce complexity (using one-pass RS, PMIS, or hybrid) improved interpolation formulas for more aggressively coarsened grids (Jacobi improvement, …), to reduce need for GMRES parallel First-Order System Least-Squares-AMG code for large-scale PDE problems Blue Gene/L applications


Download ppt "Hans De Sterck Department of Applied Mathematics University of Colorado at Boulder Ulrike Meier Yang Center for Applied Scientific Computing Lawrence Livermore."

Similar presentations


Ads by Google