August 4, 2007, Deflation Methods in Fermion Inverters, 1 Methods for Fermion Inverters Walter Wilcox Baylor University Joint with Ron Morgan (Mathematics.

Slides:



Advertisements
Similar presentations
CSE 245: Computer Aided Circuit Simulation and Verification Matrix Computations: Iterative Methods (II) Chung-Kuan Cheng.
Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
1 Meson correlators of two-flavor QCD in the epsilon-regime Hidenori Fukaya (RIKEN) with S.Aoki, S.Hashimoto, T.Kaneko, H.Matsufuru, J.Noaki, K.Ogawa,
Applied Linear Algebra - in honor of Hans SchneiderMay 25, 2010 A Look-Back Technique of Restart for the GMRES(m) Method Akira IMAKURA † Tomohiro SOGABE.
MATH 685/ CSI 700/ OR 682 Lecture Notes
Solving Linear Systems (Numerical Recipes, Chap 2)
Numerical Algorithms Matrix multiplication
Iterative Methods and QR Factorization Lecture 5 Alessandra Nardi Thanks to Prof. Jacob White, Suvranu De, Deepak Ramaswamy, Michal Rewienski, and Karen.
Steepest Decent and Conjugate Gradients (CG). Solving of the linear equation system.
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
1cs542g-term Notes  Assignment 1 will be out later today (look on the web)
1cs542g-term Notes  In assignment 1, problem 2: smoothness = number of times differentiable.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)
Mar Numerical approach for large-scale Eigenvalue problems 1 Definition Why do we study it ? Is the Behavior system based or nodal based? What are.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
Sparse Matrix Methods Day 1: Overview Day 2: Direct methods
The Landscape of Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage (if sparse) More Robust.
An introduction to iterative projection methods Eigenvalue problems Luiza Bondar the 23 rd of November th Seminar.
3D Geometry for Computer Graphics
CS240A: Conjugate Gradients and the Model Problem.
Antonio RagoUniversità di Milano Techniques for automated lattice Feynman diagram calculations 1 Antonio RagoUniversità di Milano Techniques for automated.
CSE 245: Computer Aided Circuit Simulation and Verification
Sunday, 12 July 2015 BNL Domain Wall Fermions and other 5D Algorithms A D Kennedy University of Edinburgh.
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 6. Eigenvalue problems.
Last lecture summary Fundamental system in linear algebra : system of linear equations Ax = b. nice case – n equations, n unknowns matrix notation row.
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Eigenvalue Problems Solving linear systems Ax = b is one part of numerical linear algebra, and involves manipulating the rows of a matrix. The second main.
Using Adaptive Methods for Updating/Downdating PageRank Gene H. Golub Stanford University SCCM Joint Work With Sep Kamvar, Taher Haveliwala.
Algorithms for a large sparse nonlinear eigenvalue problem Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.
Fast Low-Frequency Impedance Extraction using a Volumetric 3D Integral Formulation A.MAFFUCCI, A. TAMBURRINO, S. VENTRE, F. VILLONE EURATOM/ENEA/CREATE.
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
Qualifier Exam in HPC February 10 th, Quasi-Newton methods Alexandru Cioaca.
PiCAP: A Parallel and Incremental Capacitance Extraction Considering Stochastic Process Variation Fang Gong 1, Hao Yu 2, and Lei He 1 1 Electrical Engineering.
The Finite Element Method A Practical Course
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
Professor Walter W. Olson Department of Mechanical, Industrial and Manufacturing Engineering University of Toledo System Solutions y(t) t +++++… 11 22.
CSE 245: Computer Aided Circuit Simulation and Verification Matrix Computations: Iterative Methods I Chung-Kuan Cheng.
CS240A: Conjugate Gradients and the Model Problem.
Strengthening deflation implementation for large scale LQCD inversions Claude Tadonki Mines ParisTech / LAL-CNRS-IN2P3 Review Meeting / PetaQCD LAL / Paris-Sud.
Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }
Al Parker July 19, 2011 Polynomial Accelerated Iterative Sampling of Normal Distributions.
Review of Linear Algebra Optimization 1/16/08 Recitation Joseph Bradley.
CHAP 3 WEIGHTED RESIDUAL AND ENERGY METHOD FOR 1D PROBLEMS
23/5/20051 ICCS congres, Atlanta, USA May 23, 2005 The Deflation Accelerated Schwarz Method for CFD C. Vuik Delft University of Technology
Shameless Baylor Advertisement Baylor’s Physics Department – 15 research- active faculty. Rank #18 in the percentage of faculty whose works are cited and.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
Consider Preconditioning – Basic Principles Basic Idea: is to use Krylov subspace method (CG, GMRES, MINRES …) on a modified system such as The matrix.
Krylov-Subspace Methods - I Lecture 6 Alessandra Nardi Thanks to Prof. Jacob White, Deepak Ramaswamy, Michal Rewienski, and Karen Veroy.
Precision Charmed Meson Spectroscopy and Decay Constants from Chiral Fermions Overlap Fermion on 2+1 flavor Domain Wall Fermion Configurations Overlap.
Krylov-Subspace Methods - II Lecture 7 Alessandra Nardi Thanks to Prof. Jacob White, Deepak Ramaswamy, Michal Rewienski, and Karen Veroy.
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
The Materials Computation Center, University of Illinois Duane Johnson and Richard Martin (PIs), NSF DMR Computer science-based.
The Landscape of Sparse Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage More Robust More.
Dynamical Lattice QCD simulation Hideo Matsufuru for the JLQCD Collaboration High Energy Accerelator Research Organization (KEK) with.
Model Problem: Solving Poisson’s equation for temperature
Eigenspectrum calculation of the non-Hermitian O(a)-improved Wilson-Dirac operator using the Sakurai-Sugiura method H. Sunoa, Y. Nakamuraa,b, K.-I. Ishikawac,
CSE 245: Computer Aided Circuit Simulation and Verification
1C9 Design for seismic and climate changes
Deflated Conjugate Gradient Method
Shengxin Zhu The University of Oxford
CSE 245: Computer Aided Circuit Simulation and Verification
Domain Wall Fermions and other 5D Algorithms
Conjugate Gradient Method
Maths for Signals and Systems Linear Algebra in Engineering Lecture 6, Friday 21st October 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL.
Administrivia: November 9, 2009
RKPACK A numerical package for solving large eigenproblems
Presentation transcript:

August 4, 2007, Deflation Methods in Fermion Inverters, 1 Methods for Fermion Inverters Walter Wilcox Baylor University Joint with Ron Morgan (Mathematics Dept.) and Abdou Abdel-Rehim (Baylor Postdoctoral Fellow)

August 4, 2007, Deflation Methods in Fermion Inverters, 2 Outline Deflation basics Morgan/Wilcox algorithm (non-Hermitian) GMRES-DR (Morgan) GMRES-Proj (multiple rhs’s of Ax=b) shifting (multi-mass) new results (rhs’s with D-BiCGStab) M. Lüscher algorithm A. Stathopoulos/K. Orginos algorithm (Hermitian) (arXiv: v4) (arXiv: v1; talk) (math-ph/ , arXiv: , )

August 4, 2007, Deflation Methods in Fermion Inverters, 3 Deflation or related in lattice QCD problems (not comprehensive!) de Forcrand, Nucl. Phys. B (P.S.) 47, p.228,1996. (experiments with multiple rhs’s.) Edwards, Heller, Narayanan, Nucl. Phys. B 540; Dong, Lee, Liu, Zhang, Phys. Rev. Lett., 85, 5051, , (projecting out low overlap H 2 eigenmodes) Neff, Eicker, Lippert, Negele, Schilling, Phys. Rev. D64:114509, (deflation for all-to-all propagators on  5 M) Giusti, Hoelbling, Lüscher and Wittig, Comput. Phys. Commun. 153, 31, (quark propagator low-mode preconditioning) DeGrand and Schaefer, Comput. Phys. Commun , (low mode averaging)

August 4, 2007, Deflation Methods in Fermion Inverters, 4 Related talks at this conference J. Bloch, “An Iterative Method to Compute the Overlap Dirac Operator at Nonzero Chemical Potential”. (Bloch, Frommer, Lang, Wettig, arXiv: ) M. Clark, “Adaptive Multi-Grid for QCD” (Brannick et al, arXiv: ) K. Orginos, “A Solver for Multiple Right-Hand Sides” ( v1)

August 4, 2007, Deflation Methods in Fermion Inverters, 5 Krylov subspace: Starting, residual vectors: q is poly of degree m or less that has value 1 at 0. Deflation basics

August 4, 2007, Deflation Methods in Fermion Inverters, 6 Matrix: bidiagonal, diagonal is 0.1, 1, 2, 3, …1999, superdiagonal is all 1’s GMRES polynomial of degree 10

August 4, 2007, Deflation Methods in Fermion Inverters, 7 GMRES polynomial of degree 100 (close up view)

August 4, 2007, Deflation Methods in Fermion Inverters, 8 GMRES polynomial of degree 150 (close up view)

August 4, 2007, Deflation Methods in Fermion Inverters, 9 Residual norm curve Matrix vector products

August 4, 2007, Deflation Methods in Fermion Inverters, 10 Small eigenvalues and Krylov methods SPD matrix: convergence is approximately related to Example: Eigenvalues: 0.1, 1, 2, 3,... n If one removes 4 eigenvalues, convergence is improved by factor of about 6 (remove 10 for improvement factor of 10). Non-restarted methods like CG and BiCGStab naturally remove some eigenvalues as the iteration proceeds – leads to superlinear convergence. Problem: Restarted GMRES often cannot “remove” eigenvalues.

August 4, 2007, Deflation Methods in Fermion Inverters, 11 Solution: Add approximate eigenvectors to the subspace. Morgan, 1995 (GMRES-IR), 2002 (GMRES-DR) Subspace: Eigenvector portion + Krylov portion y i ’s are chosen to be harmonic Ritz vectors.

August 4, 2007, Deflation Methods in Fermion Inverters, 12 GMRES-DR(m,k): Solves linear equations and compute eigenvalues simultaneously. Add approximate eigenvectors to the Krylov subspace for the linear equations which essentially removes the corresponding eigenvalues and can thus improve convergence.

August 4, 2007, Deflation Methods in Fermion Inverters, 13 GMRES-DR vs. other methods Matrix: bidiagonal

August 4, 2007, Deflation Methods in Fermion Inverters, 14 Aspects of GMRES-DR GMRES-DR creates an Arnoldi-like recurrence, where V k is the n by k matrix with columns spanning the approximate eigenvectors, and H k (bar) is small, k+1 by k. Have both approximate eigenvectors and their products with A in compact storage.

August 4, 2007, Deflation Methods in Fermion Inverters, 15 For multiple right-hand sides Solve first right-hand side with GMRES- DR Use the computed eigenvectors for the other right-hand sides. Method is GMRES-Proj: Alternate: 1) projection over eigenvectors 2) cycles of regular GMRES

August 4, 2007, Deflation Methods in Fermion Inverters, 16 GMRES-Proj for the 2nd rhs (following GMRES-DR for 1st rhs) Matrix: bidiagonal

August 4, 2007, Deflation Methods in Fermion Inverters, 17 Wilson 20 3 x 32 at  cr

August 4, 2007, Deflation Methods in Fermion Inverters, 18 Twisted Mass 20 3 x 32 at  cr

August 4, 2007, Deflation Methods in Fermion Inverters, 19 Krylov subspaces are shift invariant in that A-   I generates the same Krylov subspace no matter what the shift. So the goal is to solve all shifted systems with ONE Krylov subspace. For non-restarted methods this has been done, for example; QMR (Freund) and BiCGStab (Frommer). Multi-masses or multiple shifts

August 4, 2007, Deflation Methods in Fermion Inverters, 20 Restarted methods with multiple shifts Restarting makes it more difficult because the shifted residuals are not parallel to one another, generating different Krylov subspaces. Frommer and Glassner (restarted GMRES): Force residuals to all be parallel after a restart. Can continue using one Krylov subspace for all shifted systems Minimal residual property is maintained only for the base shift system.

August 4, 2007, Deflation Methods in Fermion Inverters, 21 GMRES-DR for multiple shifts Subspaces generated by GMRES-DR are combination of approximate eigenvectors portion and Krylov subspace portion, but remarkably when put together, they are Krylov themselves (with a different starting vector). So, GMRES-DR can be restarted like GMRES for multiple shifts.

August 4, 2007, Deflation Methods in Fermion Inverters, 22 Multiple right-hand sides with multiple shifts and deflation Deflating eigenvalues is difficult for multiple shifts because one can not keep residual vectors parallel unless one has exact eigenvectors. Solution: force error to be in the direction of one vector,namely v k+1 from Then can correct error at the end. Need solution of one extra right-hand side.

August 4, 2007, Deflation Methods in Fermion Inverters, 23 Solution of ten right hand sides. Matrix: bidiagonal Blue: base system (sigma=0) Red: shifted system (sigma=-2) Green: base (uncorrected)

August 4, 2007, Deflation Methods in Fermion Inverters, 24 => Deflated BiCGStab (D-BiCGStab) What if you don’t like restarting, but still want to solve multiple right-hand sides? Problem: Projection over eigenvectors is not good enough to last for the entire run of BiCGStab. Solution: Use a projection over both right and left eigenvectors. Deflated BiCGStab for the second and subsequent right- hand sides: 1) Project over right and left eigenvectors 2) Run BiCGStab

August 4, 2007, Deflation Methods in Fermion Inverters, 25 Deflated BiCGStab for the 2nd rhs with Left-Right Projection Matrix: bidiagonal

August 4, 2007, Deflation Methods in Fermion Inverters, x32 Wilson at  cr

August 4, 2007, Deflation Methods in Fermion Inverters, x32 Wilson at  cr : number of eigenvectors Speedup at  cr on 20 3 x32: BiCG/D-BiCG ~ 5 Speedup at  cr on 16 4 : ~ 2.7

August 4, 2007, Deflation Methods in Fermion Inverters, 28

August 4, 2007, Deflation Methods in Fermion Inverters, 29 Some questions about Wilson matrix computations How does the optimal number of eigenvalues depend on the size of the problem? Test: all at  cr. Answer: it increases, but not nearly proportional to n (fortunately). Proj/D-BiCG n=24,576 (8 4 )k~10 n=49,152 (8 3 x16)k~15 n=98,304 (8 3 x32)k~20 n=393,216 (16 4 )k~20 n=1,536,000(20 3 x32)k~40

August 4, 2007, Deflation Methods in Fermion Inverters, 30 Some questions about Wilson matrix computations After eigenvalues are deflated, how does the number of iterations vary with the size of the problem? (  cr.) Answer: it increases, but not nearly proportional to n (fortunately). ProjD-BiCG n=24,576 (8 4 ) iters ~ 100 n=49,152 (8 3 x16) iters ~ 125 n=98,304 (8 3 x32) iters ~ 140 n=393,216 (16 4 ) iters ~ n=1,536,000 (20 3 x32) iters ~ ~ 45% increase in iters with volume change of 3.

August 4, 2007, Deflation Methods in Fermion Inverters, 31 Hermitian Systems Lüscher’s domain decomposition deflation algorithm. Stathopoulos/Orginos multiple right-hand side deflation algorithm. Both are tested for dynamical Wilson/clover fermions, using M H M.

August 4, 2007, Deflation Methods in Fermion Inverters, 32 Lüscher algorithm Breaks problem up into a deflation subspace, S, defined on 4 4 blocks and the orthogonal complement,. Uses projectors P L and/or P R. Basic deflated system: where After SAP preconditioning:

August 4, 2007, Deflation Methods in Fermion Inverters, 33 More algorithm details… Outer part - Uses SAP (Schwartz Alternating Procedure) as a right preconditioner on 4 3 x8 blocks and the GCR (Generalized Conjugate Residual - consistent with SAP) algorithm for the Krylov inverter on the global (orthogonal) space. (See Lüscher’s PoS LAT2005:002, 2006 talk based on earlier work.) Preconditioning reduces the iteration count of GCR and deflation overhead. Inner part (“little Dirac”) - This system is fairly large. It is even-odd preconditioned and “global mode” deflated. Also uses GCR here. See Frank, Vuik, SIAM. J. Sci. Comput. 23, 442, 2001 Nabben, Vuik, SIAM. J. Sci. Comput. 27, 1742, 2006 for mathematical background of domain decomposition and preconditioning.

August 4, 2007, Deflation Methods in Fermion Inverters, 34 Even more algorithm details… Initial eigenvalue computation: Prepares deflated space for later use. Applies inverse iteration (SAP) to N s (=20) random global vectors to get low modes, which are then projected onto domains. “Little Dirac” subspace (=N s N d ): 20 x 2592 or 20 x 8192 (small vectors). Total overhead: 150s for 24 3 x48 lattice and 184s for 32 3 x64 lattice (tuned to m cr ). Solves the M H M matrix on the full system knocking out the deflated eigenvectors with P L projector while again using GCR algorithm. Done for one mass at a time. Deflation on domains works because of “local coherence”. Only a small number of global vectors projected onto the blocks are needed to project out low modes.

August 4, 2007, Deflation Methods in Fermion Inverters, 35 Analogy Domain decomposition deflation achieved by using low modes which are smooth but far from being approximate eigenmodes => “local coherence”. Tested on 2 flavor Wilson/clover configs (50).

August 4, 2007, Deflation Methods in Fermion Inverters, x64 lattice solver times Peak speedup (BiCG/DFL): 366/32=11.4 Integrated speedup (BiCG/DFL): 966/314=3.1 (5 masses) ~ 13% outer (?% inner) increase in iters with a volume change of 3.

August 4, 2007, Deflation Methods in Fermion Inverters, 37 Stathopoulos/Orginos algorithm eig-CG(Nev,m) like GMRES-DR, solves linear equations and does simultaneous improvements of the deflated eigenvectors. The eigenvector part is restarted, which, however, does not affect the solution of the CG linear equations. incremental eig-CG(s) calls eig-CG, and adds Nev new eigenvectors to a separate subspace after each rhs, and does orthogonalization. It is used for the first s s 1 rhs’s. init-CG uses the final information generated by incremental eig-CG. Accuracy is the key!

August 4, 2007, Deflation Methods in Fermion Inverters, 38 More technical page eig-CG(Nev,m) has a restarted subspace of maximum dimension m. (Made up of Nev previous eigenvectors, Nev current eigenvectors and (m-2*Nev) Krylov vectors). Uses Rayleigh-Ritz to compute eigenvectors and appends portions of the CG search space (Krylov part) to the eigenvectors. Typically, however, the linear equations converge faster than the eigenvalue part. Incremental eig-CG(s) (s = 2,…) accepts (s-1)*Nev eigenvectors, calls eig-CG for s s 1 rhs’s, and accumulates another Nev approximate Ritz vectors from each new right hand side. Needs significant storage. init-CG does a standard Galerkin projection on the initial solution vector. A single restart is done. Tested on “several” anisotropic, 2 flavor Wilson fermion gauge fields. Uses single precision, except on dot products. m=100, Nev=10 for 48 total rhs’s (s 1 =24).

August 4, 2007, Deflation Methods in Fermion Inverters, 39 Convergence of deflated eigenvalues Point: converges as fast as if they weren’t restarting.

August 4, 2007, Deflation Methods in Fermion Inverters, 40 Incremental RHS solver history Spike on last 24 caused by a restart necessary because of eigenvector accuracy.

August 4, 2007, Deflation Methods in Fermion Inverters, 41 Solver performance vs. quark mass Last 24 right-hand sides only compared to non-deflated CG. Peak speedup ~10 on smaller lattice near m cr. Integrated speedup ~6 (all rhs’s.) Peak speedup ~ 6.9 on larger lattice near m cr. ~ 190% increase in iters with a volume change of 3.

August 4, 2007, Deflation Methods in Fermion Inverters, 42 Summaries… For Hermitian systems (M H M), the Stathopoulos/Orginos algorithm is effective for a sufficiently large number of rhs’s. Uses many eigenvectors, but no spectral preconditioners. Uses eigenvectors on starts (and a single init-CG restart). Krylov/RayleighRitz based. Needs accurate eigenvectors which improve over additional rhs’s. Like GMRES-DR solves linear equations at the same time as computing eigenvectors. “Large” V 2 problem.

August 4, 2007, Deflation Methods in Fermion Inverters, 43 Lüscher’s algorithm, built within his SAP+GCR inverter, applied to M H M, works well for QCD and also defeats critical slow down. There is a overhead in compute time for subspace generation, but gets amortized over many rhs’s or masses. Uses many inexact eigenvectors and makes extensive use of spectral preconditioners. DD+Krylov+preconditioning. Uses eigenvectors at every iteration, but very small number of iterations. Deflation on domains is a new idea. “Small” V 2 problem.

August 4, 2007, Deflation Methods in Fermion Inverters, 44 Deflated GMRES is also Krylov/RayleighRitz based. Useful for multiple rhs’s as well as shifting. D-BiCGStab can be used for multiple rhs’s also. We would do Wilson/clover without the M H M step plus shifting for various masses. We don’t need spectral preconditioners for GMRES-Proj or D- BiCGStab; we use a modest number of fairly accurate eigenvectors, which are used at restarts or a single time for D-BiCGStab (better eigenvector accuracy needed for D- BiCGStab than Proj.). “Mild” V 2 problem. (Caveat: Our lattices are 8X smaller than Lüscher’s.)

August 4, 2007, Deflation Methods in Fermion Inverters, 45 is a breakthrough method for lattice QCD!

August 4, 2007, Deflation Methods in Fermion Inverters, 46 Serial Multi-Mass Because of the twisted mass, , it is not possible to apply multi-mass solvers to twisted mass problems simultaneously with even-odd preconditioning. We can accelerate the convergence of twisted-mass problems with multiple masses and even-odd preconditioning. The method is based on solving the systems serially but using an improved initial guess by making a minimal residual projection over available solutions of the previous systems. Improves as the number of solved systems increases.

August 4, 2007, Deflation Methods in Fermion Inverters, 47 Mass number kappa  X 0 =0With projection High -> low With projection Low -> high Total MVP10,7006,0806,590 Using Serial multi-mass with Twisted Mass Fermions