Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 1 Zoltan: Toolkit of parallel combinatorial algorithms for unstructured, dynamic and/or adaptive computations Unstructured Communication Tools -Communication.

Similar presentations


Presentation on theme: "1 1 Zoltan: Toolkit of parallel combinatorial algorithms for unstructured, dynamic and/or adaptive computations Unstructured Communication Tools -Communication."— Presentation transcript:

1 1 1 Zoltan: Toolkit of parallel combinatorial algorithms for unstructured, dynamic and/or adaptive computations Unstructured Communication Tools -Communication “plans” describe complex interprocessor patterns -Data migration -Mapping between different partitions Dynamic Load Balancing: -Geometric (coordinate-based): fast, maintains geometric locality -Topological (graph/hypergraph-based): explicitly model application communication costs -Interfaces to ParMETIS, Scotch, PaToH Distributed Data Directories -Parallel look-ups of off-processor data -Scalable (O(N)) total memory usage ABC 010 DEF 210 GHI 121 Parallel Graph Coloring -Finds disjoint sets of vertices, identifying independence -Distance 1 -Distance 2

2 2 2 Zoltan2: Next generation toolkit targeting needs of applications on emerging architectures Multijagged (MJ) Geometric (Coordinate) Partitioning  MPI+OpenMP implementation  Multisection results in less data movement, greater scalability during partitioning than RCB  Ex: Used in Trilinos’ MueLu multigrid solver on 524K cores On-node Balanced Graph Coloring  Finds disjoint sets of vertices for parallelism in multicore execution  Each label has roughly equal number of vertices  Balanced coloring reduces idle cores in GPUs 16-part 4x4 MJ partition 4 2 5 3 6 1 Tasks Allocated Nodes in Torus Network 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 Architecture-aware Geometric Task Mapping  Map MPI tasks to cores to keep congestion and communication costs low  Uses MJ to assign interdependent tasks to “nearby” cores  Reduced MiniGhost execution time on 64K cores of Cielo by 34% relative to default; by 24% relative to custom 2x2x4 grouping 123456 1XX 2XXXXXX 3XXX 4XXXX 5XX 6XXX

3 3 3 Further information: http://www.cs.sandia.gov/Zoltan Download via the Trilinos toolkit: http://trilinos.orghttp://www.cs.sandia.gov/Zoltanhttp://trilinos.org ZoltanZoltan2 Parallelism:MPI-onlyMPI+X API:Application builds model (e.g., graph, hypergraph) for algorithm Application describes its data (matrix, mesh, coordinates); algorithm builds model Capabilities:Parallel partitioning Parallel coloring Global and local ordering Parallel partitioning Architecture-aware task placement On-node balanced coloring On-node ordering Optional TPLs:Scotch (INRIA/Bordeaux) ParMETIS (U. Minnesota) PaToH (Ohio St. U.) Scotch (INRIA/U.Bordeaux) ParMETIS (U.Minnesota) ParMA Partition Improvement (RPI) AMD (U.Florida) LDMS: Lightweight Distributed Metric Service (Sandia) Maturity:Highly mature; maintenance onlyResearch platform for emerging archs Integration:No dependence on TrilinosIntegrated with Trilinos next-generation software stack Language:C (with F90 and C++ APIs)Templated C++11 Distribution:Stand-alone or in TrilinosIn Trilinos


Download ppt "1 1 Zoltan: Toolkit of parallel combinatorial algorithms for unstructured, dynamic and/or adaptive computations Unstructured Communication Tools -Communication."

Similar presentations


Ads by Google