Block Low Rank Approximations in LS-DYNA

Slides:



Advertisements
Similar presentations
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Advertisements

Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
A Discrete Adjoint-Based Approach for Optimization Problems on 3D Unstructured Meshes Dimitri J. Mavriplis Department of Mechanical Engineering University.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Solving Linear Systems (Numerical Recipes, Chap 2)
The QR iteration for eigenvalues. ... The intention of the algorithm is to perform a sequence of similarity transformations on a real matrix so that the.
Rayan Alsemmeri Amseena Mansoor. LINEAR SYSTEMS Jacobi method is used to solve linear systems of the form Ax=b, where A is the square and invertible.
Iterative methods TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A A A A.
Katsuyo Thornton*, R. Edwin García✝, Larry Aagesen*
1cs542g-term Notes  Assignment 1 will be out later today (look on the web)
Shawn Sickel A Comparison of some Iterative Methods in Scientific Computing.
Newton Method for the ICA Mixture Model
ENGG 1801 Engineering Computing MATLAB Lecture 7: Tutorial Weeks Solution of nonlinear algebraic equations (II)
Sparse Matrix Methods Day 1: Overview Matlab and examples Data structures Ax=b Sparse matrices and graphs Fill-reducing matrix permutations Matching and.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CSE 245: Computer Aided Circuit Simulation and Verification
Face Detection using the Viola-Jones Method
MUMPS A Multifrontal Massively Parallel Solver IMPLEMENTATION Distributed multifrontal.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
PDCS 2007 November 20, 2007 Accelerating the Complex Hessenberg QR Algorithm with the CSX600 Floating-Point Coprocessor Yusaku Yamamoto 1 Takafumi Miyata.
Scalable Symbolic Model Order Reduction Yiyu Shi*, Lei He* and C. J. Richard Shi + *Electrical Engineering Department, UCLA + Electrical Engineering Department,
Chapter 2 System of Linear Equations Sensitivity and Conditioning (2.3) Solving Linear Systems (2.4) January 19, 2010.
Lecture 7 - Systems of Equations CVEN 302 June 17, 2002.
On the Use of Sparse Direct Solver in a Projection Method for Generalized Eigenvalue Problems Using Numerical Integration Takamitsu Watanabe and Yusaku.
Computational Aspects of Multi-scale Modeling Ahmed Sameh, Ananth Grama Computing Research Institute Purdue University.
Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }
MECH4450 Introduction to Finite Element Methods Chapter 9 Advanced Topics II - Nonlinear Problems Error and Convergence.
Cracow Grid Workshop, November 5-6, 2001 Concepts for implementing adaptive finite element codes for grid computing Krzysztof Banaś, Joanna Płażek Cracow.
MA/CS 471 Lecture 15, Fall 2002 Introduction to Graph Partitioning.
CS 290H Lecture 15 GESP concluded Final presentations for survey projects next Tue and Thu 20-minute talk with at least 5 min for questions and discussion.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Symmetric-pattern multifrontal factorization T(A) G(A)
1 Spring 2003 Prof. Tim Warburton MA557/MA578/CS557 Lecture 32.
Programming assignment # 3 Numerical Methods for PDEs Spring 2007 Jim E. Jones.
Finding Rightmost Eigenvalues of Large, Sparse, Nonsymmetric Parameterized Eigenvalue Problems Minghao Wu AMSC Program Advisor: Dr. Howard.
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
The Landscape of Sparse Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage More Robust More.
Parallel Direct Methods for Sparse Linear Systems
Katsuyo Thornton1, R. Edwin García2, Larry Aagesen3
Jack Dongarra University of Tennessee
Distributed Computation Framework for Machine Learning
A computational loop k k Integration Newton Iteration
Solving Linear Systems Ax=b
Gauss-Siedel Method.
Eigenspectrum calculation of the non-Hermitian O(a)-improved Wilson-Dirac operator using the Sakurai-Sugiura method H. Sunoa, Y. Nakamuraa,b, K.-I. Ishikawac,
Lecture 19 MA471 Fall 2003.
A Comparison of some Iterative Methods in Scientific Computing
Parallelism in High-Performance Computing Applications
Autar Kaw Benjamin Rigsby
GPU Implementations for Finite Element Methods
ENGG 1801 Engineering Computing
MATLAB EXAMPLES Matrix Solution Methods
Finite Element Surface-Based Stereo 3D Reconstruction
CSCE569 Parallel Computing
6.5 Taylor Series Linearization
RECORD. RECORD COLLABORATE: Discuss: Is the statement below correct? Try a 2x2 example.
~ Least Squares example
Lecture 13: Singular Value Decomposition (SVD)
Solving Linear Systems: Iterative Methods and Sparse Systems
Programming assignment # 3 Numerical Methods for PDEs Spring 2007
RKPACK A numerical package for solving large eigenproblems
Local Defect Correction for the Boundary Element Method
Parallelized Analytic Placer
A computational loop k k Integration Newton Iteration
Computer Animation Algorithms and Techniques
Chapter 2 A Survey of Simple Methods and Tools
Pivoting, Perturbation Analysis, Scaling and Equilibration
Optimization on Graphs
Presentation transcript:

Block Low Rank Approximations in LS-DYNA Cleve Ashcraft, Roger Grimes, Bob Lucas, Francois-Henry Rouet, and Clement Weisbecker June 2, 2017

Multifrontal Solvers at LSTC Multifrontal solvers are increasingly used in LS-DYNA BCSLIB-EXT World standard for shared memory mf2 Distributed memory/OpenMP MUMPS BLR and other research We routinely solve tens of millions of equations Users want more Hundreds of millions, even billions Today LS-DYNA runs on thousands of cores “… the current code is limited to 4096 processes so I cannot run the job up to the 96k cores I wanted to.”

Outline BLR prior art Segments FSUC results Impact on LS-DYNA

This is not our first look at BLR BLR for multifrontal considered in the last millennium No segments LSTC investigated in the last decade Implemented FSUC in mf2 to compute storage reduction Segments derived from the elimination tree and initial matrix Absolute tolerance for BLR Preconditioned Conjugate Gradients Reducing storage to stay in-core would be worthwhile Non-linear and eigenvalue solvers use lots of triangular solves

Compression vs. tree height – encouraging

Compression vs. tree height – not so encouraging

Why I gave up the last time around 10 Sec. 4 8T 1T OOC SP E-6

MUMPS BLR results demand another look … Richer set of segments Relative tolerance MUMPS scales the initial matrix Tolerance relative to diagonal block More sophisticated iterative solvers available Indefinite problems Block shift invert eigensolver

Outline BLR prior art Segments FSUC results Impact on LS-DYNA

Elimination tree segments (any ordering)

Elimination tree segments (any ordering)

LS-GPart nested dissection LSTC’s nested dissection algorithm Uses level-sets from multiple pseudo-peripheral nodes Cleve Ashcraft and Francois-Henry Rouet

Segments from LS-GPart nested dissection

Segment fragments

LS-Gpart “wire basket” segments

Impact of different segments Work in progress Need better metrics for evaluating quality Numbers from last week Arbitrary blocking 82% etree segments 40% LS-Gpart 36% LS-Gpart wire 39%* * Found two bugs on Wednesday  Why is LS-GPart only marginally better?

Outline BLR prior art Segments FSUC results Impact on LS-DYNA

FSUC storage vs. BLR threshold

FSUC error norm vs. storage

FSUC norm of error vs. storage Why not horizontal?

Outline BLR prior art Segments FSUC results Impact on LS-DYNA

Iterations for non-linear convergence

Final energy

Implementing FSCU now, then plan on FCSU Summary Block Low Rank approximations are encouraging FSUC integrated into development version of LS-DYNA Non-MPI frontal matrices for now Implementing FSCU now, then plan on FCSU Focused on understanding end-to-end impact on implicit finite element problems MPI/OpenMP once overall impact better understood

Thank you! 27