1 1  Capabilities: Serial (thread-safe), shared-memory (SuperLU_MT, OpenMP or Pthreads), distributed-memory (SuperLU_DIST, hybrid MPI+ OpenM + CUDA).

1 1  Capabilities: Serial (thread-safe), shared-memory (SuperLU_MT, OpenMP or Pthreads), distributed-memory (SuperLU_DIST, hybrid MPI+ OpenM + CUDA). All implemented in C, having Fortran interface. Sparse LU decomposition, triangular solution with multiple right-hand sides. Incomplete LU (ILU) preconditioner in serial SuperLU. Sparsity-preserving ordering:  Minimum degree ordering applied to A T A or A T +A [MMD, Liu `85]  ‘Nested dissection’ ordering applied to A T A or A T +A [(Par)METIS, (PT)-Scotch User-controllable pivoting: partial pivoting, threshold pivoting, static pivoting. Condition number estimation. Iterative refinement. Componentwise error bounds.  Download: www.crd.lbl.gov/~xiaoye/SuperLU  Further information: Contact: Sherry Li, xsli@lbl.gov Developers: Sherrry Li, Jim Demmel, John Gilbert, Laura Grigori, Piush Sao, Meiyue Shao, Ichitaro Yamazaki SuperLU – supernodal sparse LU direct solver

2 2  Increased scalability via new DAG-based scheduling algorithms to shorten critical path. Idle time (MPI_Wait) was significantly reduced (2.6x faster using 1000s cores)  Architecture-aware: exploit heterogeneous nodes Offload fine-grained Schur-complement updates to GPU or MIC accelerators.  Programming: MPI + OpenMP + CUDA  Pipeline execution of CPU and GPU tasks 3x faster on multi-GPU, or multi-Xeon Phi clusters, 2-5x reduction in memory usage. “A distributed CPU-GPU sparse direct solver”, P. Sao, R. Vuduc and X.S. Li, Euro-Par 2014, LNCS Vol. 8632. Porto, Portugal, August 25- 29, 2014. “A Sparse Direct Solver for Distributed Memory Xeon Phi-accelerated Systems”, P. Sao, X. Liu, R. Vuduc, and X.S. Li, X. Liu, IPDPS 2015, May 25-29, 2015. SuperLU_DIST: Recent advances CPU copy Accelerator copy Pipeline execution: CPU & Accelerator

3 3  Over 26,000 downloads in FY 2014. SuperLU is mentioned in 5% of the NERSC projects (weighted by allocation size)  Used in many high-end simulation codes: ASCEM/Amanzi: Advanced Simulation Capability for Environmental Management, DOE Denovo: radiation transport simulations for nuclear reactors, DOE DGDFT: Dicontinuous Galerkin Method for Density Functional Theory, DOE FEAP: finite element analysis, UC Berkeley H2plus: water simulation code, DOE HiFi: multi-fluid modeling for plasma applications, U. Washington M3D-C1: plasma fusion energy, DOE NekTar: High-order spectral-element Navier-Stokes solver, NCAR NIMROD: plasma fusion energy, DOE Omega3P: accelerator cavity design, DOE OpenSees: earthquake engineering, Pacific Earthquake Engineering Research Center PMAMR: CCSE code for carbon sequestration, DOE PHOENIX: stellar and planetary atmosphere code QUEST: Quantum electron simulation toolbox, UC Davis VORPAL: Plasma physics simulation code, Tech-X  Adopted in many commercial mathematical libraries and simulation software, including AMD (circuit simulation), Boeing (aircraft design), Chevron, ExxonMobile (geology), Cray's LibSci, FEMLAB, HP's MathLib, IMSL, NAG, OptimaNumerics, Python (SciPy), Walt Disney Feature Animation. SuperLU usage and impact

1 1  Capabilities: Serial (thread-safe), shared-memory (SuperLU_MT, OpenMP or Pthreads), distributed-memory (SuperLU_DIST, hybrid MPI+ OpenM + CUDA).

Similar presentations

Presentation on theme: "1 1  Capabilities: Serial (thread-safe), shared-memory (SuperLU_MT, OpenMP or Pthreads), distributed-memory (SuperLU_DIST, hybrid MPI+ OpenM + CUDA)."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 1  Capabilities: Serial (thread-safe), shared-memory (SuperLU_MT, OpenMP or Pthreads), distributed-memory (SuperLU_DIST, hybrid MPI+ OpenM + CUDA).

Similar presentations

Presentation on theme: "1 1  Capabilities: Serial (thread-safe), shared-memory (SuperLU_MT, OpenMP or Pthreads), distributed-memory (SuperLU_DIST, hybrid MPI+ OpenM + CUDA)."— Presentation transcript:

Similar presentations

About project

Feedback