Symmetric Eigensolvers in Sca/LAPACK Osni Marques

Slides:

Advertisements

Similar presentations

A Large-Grained Parallel Algorithm for Nonlinear Eigenvalue Problems Using Complex Contour Integration Takeshi Amako, Yusaku Yamamoto and Shao-Liang Zhang.

Advertisements

Scientific Computing QR Factorization Part 2 – Algorithm to Find Eigenvalues.

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.

Implementation of 2-D FFT on the Cell Broadband Engine Architecture William Lundgren Gedae), Kerry Barnes (Gedae), James Steed (Gedae)

Advanced Computational Software Scientific Libraries: Part 2 Blue Waters Undergraduate Petascale Education Program May 29 – June

Eigenvalue and eigenvectors  A x = λ x  Quantum mechanics (Schrödinger equation)  Quantum chemistry  Principal component analysis (in data mining)

MATH 685/ CSI 700/ OR 682 Lecture Notes

Solving Linear Systems (Numerical Recipes, Chap 2)

Lecture 13 - Eigen-analysis CVEN 302 July 1, 2002.

DEF: Characteristic Polynomial (of degree n) QR - Algorithm Note: 1) QR – Algorithm is different from QR-Decomposition 2) a procedure to calculate the.

Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.

Linear Transformations

Mar Numerical approach for large-scale Eigenvalue problems 1 Definition Why do we study it ? Is the Behavior system based or nodal based? What are.

Chapter 4. Numerical Interpretation of Eigenvalues In terms of matrix arithmetic eigenvalues turn matrix multiplication into scalar multiplication. Numerically.

Ch 7.9: Nonhomogeneous Linear Systems

1 High-Performance Eigensolver for Real Symmetric Matrices: Parallel Implementations and Applications in Electronic Structure Calculation Yihua Bai Department.

Introduction to Scientific Computing Doug Sondak Boston University Scientific Computing and Visualization.

Symmetric Definite Generalized Eigenproblem

Linear Algebra on GPUs Vasily Volkov. GPU Architecture Features SIMD architecture – Don’t be confused by scalar ISA which is only a program model We use.

Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.

Cache memory Direct Cache Memory Associate Cache Memory Set Associative Cache Memory.

1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)

Parallel & Cluster Computing Linear Algebra Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma SC08 Education.

MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 6. Eigenvalue problems.

1 Resolution of large symmetric eigenproblems on a world-wide grid Laurent Choy, Serge Petiton, Mitsuhisa Sato CNRS/LIFL HPCS Lab. University of Tsukuba.

Dominant Eigenvalues & The Power Method

Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)

Simple Load Balancing CS550 Operating Systems. Announcements Project will be posted – TBA This project will use the client-server model and will require.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

MUMPS A Multifrontal Massively Parallel Solver IMPLEMENTATION Distributed multifrontal.

® Backward Error Analysis and Numerical Software Sven Hammarling NAG Ltd, Oxford

1 Intel Mathematics Kernel Library (MKL) Quickstart COLA Lab, Department of Mathematics, Nat’l Taiwan University 2010/05/11.

1 “How Can We Address the Needs and Solve the Problems in HPC Benchmarking?” Jack Dongarra Innovative Computing Laboratory University of Tennesseehttp://

03/04/2009CS267 Lecture 12a1 CS 267 Dense Linear Algebra: Possible Class Projects James Demmel

PDCS 2007 November 20, 2007 Accelerating the Complex Hessenberg QR Algorithm with the CSX600 Floating-Point Coprocessor Yusaku Yamamoto 1 Takafumi Miyata.

1 Eigenvalue Problems in Nanoscale Materials Modeling Hong Zhang Computer Science, Illinois Institute of Technology Mathematics and Computer Science, Argonne.

© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.

Chapter 10 Real Inner Products and Least-Square (cont.) In this handout: Angle between two vectors Revised Gram-Schmidt algorithm QR-decompoistion of matrices.

On the Use of Sparse Direct Solver in a Projection Method for Generalized Eigenvalue Problems Using Numerical Integration Takamitsu Watanabe and Yusaku.

CS240A: Conjugate Gradients and the Model Problem.

Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }

Elementary Linear Algebra Anton & Rorres, 9 th Edition Lecture Set – 07 Chapter 7: Eigenvalues, Eigenvectors.

Numerical Analysis – Eigenvalue and Eigenvector Hanyang University Jong-Il Park.

Jungpyo Lee Plasma Science & Fusion Center(PSFC), MIT Parallelization for a Block-Tridiagonal System with MPI 2009 Spring Term Project.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA

Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.

Automatic Parameterisation of Parallel Linear Algebra Routines Domingo Giménez Javier Cuenca José González University of Murcia SPAIN Algèbre Linéaire.

Data Structures and Algorithms in Parallel Computing Lecture 7.

Similar diagonalization of real symmetric matrix

Toward an Automatically Tuned Dense Symmetric Eigensolver for Shared Memory Machines Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya.

Report from LBNL TOPS Meeting TOPS/ – 2Investigators  Staff Members:  Parry Husbands  Sherry Li  Osni Marques  Esmond G. Ng 

Performance of BLAS-3 Based Tridiagonalization Algorithms on Modern SMP Machines Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.

1 Instituto Tecnológico de Aeronáutica Prof. Maurício Vicente Donadon AE-256 NUMERICAL METHODS IN APPLIED STRUCTURAL MECHANICS Lecture notes: Prof. Maurício.

Parallel Programming & Cluster Computing Linear Algebra Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program’s.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.

Intro to Scientific Libraries Intro to Scientific Libraries Blue Waters Undergraduate Petascale Education Program May 29 – June

PROGRAMMING FUNDAMENTALS INTRODUCTION TO PROGRAMMING. Computer Programming Concepts. Flowchart. Structured Programming Design. Implementation Documentation.

TEMPLATE DESIGN © H. Che 2, E. D’Azevedo 1, M. Sekachev 3, K. Wong 3 1 Oak Ridge National Laboratory, 2 Chinese University.

ALGEBRAIC EIGEN VALUE PROBLEMS

A survey of Exascale Linear Algebra Libraries for Data Assimilation

Cache memory Direct Cache Memory Associate Cache Memory

Stochastic Differential Equations and Random Matrices

Automatic Performance Tuning

P A R A L L E L C O M P U T I N G L A B O R A T O R Y

Michael Overton Scientific Computing Group Broad Interests

A PARALLEL BISECTION ALGORITHM (WITHOUT COMMUNICATION)

1.3 Vector Equations.

Further Matrix Algebra

Linear Algebra Lecture 29.

Presentation transcript:

Symmetric Eigensolvers in Sca/LAPACK Osni Marques

03/17/2008 ParLab – Symmetric Eigensolvers2 LAPACK Symmetric Tridiagonal Eigensolvers QR (STEQR): all eigenvectors, O(n 3 ) Bisection plus inverse iteration (STEVX): subset of eigenvectors, O(n 2 ) Divide-and-conquer (STEDC): all eigenvectors, faster than the the previous two but needs more workspace. Multiple relative robust representations (STEGR): faster than all above for most matrices from industrial and scientific applications, least workspace Dhillon, Parlett, Voemel, Marques Typical performance (timing) of different eigensolvers on matrices coming from industrial applications. In the picture, “old” refers to the version currently available in LAPACK, which will be soon replaced by a “new” and more robust implementation; n ranges from 1824 to 8012.

03/17/2008 ParLab – Symmetric Eigensolvers3 The Essence of the MRRR Algorithm Factor T  I=LDL T, (L,D) is a relative robust representation RRR for the eigenvalue subset  determines all eigenvalues in  to high relative accuracy small relative changes in entries of L and D cause small relative changes in each eigenvalue in  Given an RRR for a set of eigenvalues: For each eigenvalue with a large relative gap Compute eigenvalue to high relative accuracy Compute the FP (Fernando Parlett) vector (eigenvector) For each of the remaining groups of eigenvalues Choose shift outside the group Compute new RRR, L + D + L + T =LDL T  new I Refine the eigenvalues.

03/17/2008 ParLab – Symmetric Eigensolvers4 Testing LAPACK functionalities At installation time: optional and limited number of test cases to verify the integrity of the installation (LAPACK/TESTING) During the development phase: intensive and stressful tests on a variety of computer architectures

03/17/2008 ParLab – Symmetric Eigensolvers5 Intensive Testing: Requirements and Goals Generation of difficult test cases Bookkeeping of test cases (so that new or competing algorithm can stressed in a similar way) Various platforms AMD Athlon AMD Opteron Itanium 2 Pentium III Pentium 4 POWER 3 SGI IP35 SUN sparcv9 CRAY X1  Various (Fortran) compilers: Intel, SUN, SGI, IBM… Accuracy Performance (time) Tuning of parameters (automatic or manual) Algorithmic choices (different IEEE variants) Reveal different numerical behaviors (in particular IEEE arithmetic features), as well as performance issues

03/17/2008 ParLab – Symmetric Eigensolvers6 Matrix Types Built-in matrices tridiagonal matrix (1D Poisson equation) Wilkinson tridiagonal matrix (eigenvalues clustered in pairs)  Built-in eigenvalue distributions repeated eigenvalues 1 =1 and i =1/k, i=2,3…n 1 =1 and i =1, i=1,2…n-1, n =1/k geometric distribution i = k (1- i)/(n-1), i=1,2…n different condition numbers (k) different random number distributions can be multiplied by random signs  Glued matrices combinations of the above cases very tight eigenvalue clusters Eigenvalue distributions (D) read from files: Q T DQ  T with random orthogonal Q Tridiagonal matrices from real world applications Chemistry (analysis of molecules) Harwell-Boeing Collection (structural engineering, etc) University of Florida Collection (FEM analysis, NASA) Matrices from LAPACK users  Lanczos algorithm without reorthogonalization to provoke very close eigenvalues

Examples of eigenvalue distributions of matrices from applications

Accuracy and timings for families of matrices, for a number of different computer architectures

03/17/2008 ParLab – Symmetric Eigensolvers9 Profiles

03/17/2008 ParLab – Symmetric Eigensolvers10 What have we found? LAPACK 3.0 STEGR (and STEDC!) fails on some of the new test matrices Different matrix classes with different challenges STEGR about 10 times slower than STEDC for glued Wilkinson matrices Architecture differences Pentium slows when infinity occurs Vectorization issues on CRAY  Reference tester for future development

03/17/2008 ParLab – Symmetric Eigensolvers11 Parallel Eigensolvers PDSYEVX: bisection + inverse iteration PDSYEVD: parallel divide and conquer (F. Tisseur) PDSYEVR: MRRR (C. Vömel)

03/17/2008 ParLab – Symmetric Eigensolvers12 Pitfalls of Parallelization Straightforward approach: n eigenpairs, p processors  cyclic assignment of  n/p eigenpairs to each processor Each processor computes orthogonal eigenvectors Orthogonality between processors is not guaranteed ScaLAPACK: PDSYEVX can break!

03/17/2008 ParLab – Symmetric Eigensolvers13 Parallelization: the right way

03/17/2008 ParLab – Symmetric Eigensolvers14 MRRR versus DC (Tridiagonal part of PDSYEVR and PDSYEVD) Lapw (n=22908, A. Tate). Runtime and efficiency of the tridiagonal MRRR/D&C part on the IBM SP5. Hubbard (n=63504, Ward and Bai). Runtime and efficiency of the tridiagonal MRRR/D&C part on the IBM SP5.

03/17/2008 ParLab – Symmetric Eigensolvers15 References Performance and Accuracy of LAPACK's Symmetric Tridiagonal Eigensolvers, J. Demmel, O. Marques, B. Parlett, and C. Vömel. SIAM J. Sci. Comp., 30:1508–1526, A Testing Infrastructure for Symmetric Tridiagonal Eigensolvers, J. Demmel, O. Marques, B. Parlett, and C. Vömel. ACM TOMS, 35, Computations of Eigenpair Subsets with the MRRR Algorithm, B. Parlett, O. Marques and C. Voemel. Numerical Linear Algebra with Applications, 13: ,  The Design and Implementation of the MRRR Algorithm, I. Dhillon, B. Parlett, and C. Vömel. Technical Report UT-CS , December,  ScaLAPACK’S MRRR Algorithm, C. Vömel, LAPACK Working Note 195, November ♦ (source code available upon request)