1 Linear Triangular System L – lower triangular matrix, nonsingular Lx=b L: nxn nonsingular lower triangular b: known vector b(1) = b(1)/L(1,1) For i=2:n.

Slides:



Advertisements
Similar presentations
Method If an nxn matrix A has an LU-factorization, then the solution of AX = b can be determined by a Forward substitution followed by a Back substitution.
Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Lecture 19: Parallel Algorithms
Linear Systems LU Factorization CSE 541 Roger Crawfis.
CS 484. Dense Matrix Algorithms There are two types of Matrices Dense (Full) Sparse We will consider matrices that are Dense Square.
MATH 685/ CSI 700/ OR 682 Lecture Notes
Scientific Computing Linear Systems – Gaussian Elimination.
Chapter 8 Numerical Technique 大葉大學 資訊工程系 黃鈴玲 Linear Algebra.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Lecture 7 Intersection of Hyperplanes and Matrix Inverse Shang-Hua Teng.
LU Factorization LU-factorization Matrix factorization Forward substitution Back substitution.
Lecture 9: Introduction to Matrix Inversion Gaussian Elimination Sections 2.4, 2.5, 2.6 Sections 2.2.3, 2.3.
Numerical Algorithms Matrix multiplication
Linear Systems Pivoting in Gaussian Elim. CSE 541 Roger Crawfis.
Lecture 6 Matrix Operations and Gaussian Elimination for Solving Linear Systems Shang-Hua Teng.
Chapter 9 Gauss Elimination The Islamic University of Gaza
Linear Algebraic Equations
Matrices and Systems of Equations
CS 584. Dense Matrix Algorithms There are two types of Matrices Dense (Full) Sparse We will consider matrices that are Dense Square.
1/26 Design of parallel algorithms Linear equations Jari Porras.
ECIV 520 Structural Analysis II Review of Matrix Algebra.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Dense Matrix Algorithms CS 524 – High-Performance Computing.
ECIV 301 Programming & Graphics Numerical Methods for Engineers REVIEW II.
ECE 552 Numerical Circuit Analysis
Mujahed AlDhaifallah (Term 342) Read Chapter 9 of the textbook
The Islamic University of Gaza Faculty of Engineering Civil Engineering Department Numerical Analysis ECIV 3306 Chapter 10 LU Decomposition and Matrix.
ECIV 301 Programming & Graphics Numerical Methods for Engineers Lecture 18 LU Decomposition and Matrix Inversion.
1 Systems of Linear Equations Gauss-Jordan Elimination and LU Decomposition.
Major: All Engineering Majors Author(s): Autar Kaw
Assignment Solving System of Linear Equations Using MPI Phạm Trần Vũ.
Scientific Computing Linear Systems – LU Factorization.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Linear Systems Gaussian Elimination CSE 541 Roger Crawfis.
Chap. 2 Matrices 2.1 Operations with Matrices
MATH 250 Linear Equations and Matrices
Introduction to Numerical Analysis I MATH/CMPSC 455 PA=LU.
CS 584 l Assignment. Systems of Linear Equations l A linear equation in n variables has the form l A set of linear equations is called a system. l A solution.
Yasser F. O. Mohammad Assiut University Egypt. Previously in NM Introduction to NM Solving single equation System of Linear Equations Vectors and Matrices.
ΑΡΙΘΜΗΤΙΚΕΣ ΜΕΘΟΔΟΙ ΜΟΝΤΕΛΟΠΟΙΗΣΗΣ 4. Αριθμητική Επίλυση Συστημάτων Γραμμικών Εξισώσεων Gaussian elimination Gauss - Jordan 1.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
MA/CS 375 Fall MA/CS 375 Fall 2002 Lecture 21.
Lecture 8 Matrix Inverse and LU Decomposition
Lecture 7 - Systems of Equations CVEN 302 June 17, 2002.
Solving Linear Systems Solving linear systems Ax = b is one part of numerical linear algebra, and involves manipulating the rows of a matrix. Solving linear.
Chapter 9 Gauss Elimination The Islamic University of Gaza
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Linear Systems Dinesh A.
Unit #1 Linear Systems Fall Dr. Jehad Al Dallal.
2 - 1 Chapter 2A Matrices 2A.1 Definition, and Operations of Matrices: 1 Sums and Scalar Products; 2 Matrix Multiplication 2A.2 Properties of Matrix Operations;
Matrices. Variety of engineering problems lead to the need to solve systems of linear equations matrixcolumn vectors.
Sinwook Lee Digital Media Lab. HYU.  Linear Equations  To gain the appropriate solution, 1..n of equations are necessary.  The num of equations < n.
Numerical Methods.  LU Decomposition is another method to solve a set of simultaneous linear equations  For most non-singular matrix [A] that one could.
Numerical Computation Lecture 6: Linear Systems – part II United International College.
Chapter 8 Numerical Technique
Simultaneous Linear Equations
Spring Dr. Jehad Al Dallal
Gaussian Elimination.
Linear Equations.
Linear Algebra Lecture 15.
Lecture 22: Parallel Algorithms
CSCE569 Parallel Computing
Major: All Engineering Majors Author(s): Autar Kaw
Numerical Analysis Lecture14.
Dense Linear Algebra (Data Distributions)
To accompany the text “Introduction to Parallel Computing”,
Lecture 8 Matrix Inverse and LU Decomposition
Linear Systems of Equations: solution and applications
The Gauss Jordan method
Ax = b Methods for Solution of the System of Equations (ReCap):
Presentation transcript:

1 Linear Triangular System L – lower triangular matrix, nonsingular Lx=b L: nxn nonsingular lower triangular b: known vector b(1) = b(1)/L(1,1) For i=2:n b(i) = (b(i)-L(i,1:i-1)b(1:i-1))/L(i,i) end Forward substitution, row version

2 Triangular System Column version (column sweep method): As soon as a variable is solved, its effect can be subtracted from subsequent equations Lx = b for j=1:n-1 b(j) = b(j)/L(j,j) b(j+1:n) = b(j+1:n)-b(j)L(j+1:n,j) end b(n) = b(n)/L(n,n) Forward substitution, column version Column version is more amenable to parallel computing

3 Triangular System: Parallel Lb As soon as x_i (or a few x_i variables) is computed, the value is passed downward to neighboring cpus; As soon as a cpu receives x_i value, it passes the value downward to neighboring cpus; Then local b vector is updated. Disadvantage: load imbalance, about 50% cpus are active on average Remedy: cyclic or block cyclic distribution of rows. Lb blockBlock cyclic

4 Triangular System: Inversion A – NxN lower triangular Divide A into equal blocks Can inverse A recursively: Inverse A1; Inverse A2; Compute X by matrix multiplication  Matrix multiplication

5 Triangular System: Inversion / / / / / / First phase: invert diagonal elements of A 2 nd phase: compute 2x2 diagonal blocks of A^(-1) … K-th phase: compute diagonal 2^(k-1) x 2^(k-1) blocks of A^(-1) Essentially matrix multiplications; K-th phase: N/2^(k-1) pairs of 2^(k-2)x2^(k-2) matrix multiplications Can do in parallel on P=K^3 processors

6 Gaussian Elimination Ax = b A = LU, L – unit lower triangular U – upper triangular Ax = b  LUx = b  Ly = b, Ux = y Especially with multiple rhs or solve same equations (same coefficient matrix) many times

7 A = LU A – nxn matrix A(1:k,1:k) nonsingular for k=1:n-1 (kij) version For k=1:n-1 for i=k+1:n A(i,k) = A(i,k)/A(k,k) for j=k+1:n A(i,j) = A(i,j) – A(i,k)A(k,j) end or For k=1:n-1 A(k+1:n,k) = A(k+1:n,k)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)- A(k+1:n,k)A(k,k+1:n) end LU Factorization After factorization, L is in strictly lower triangular part of A, U is in upper triangular part of A (including diagonal) A(k,k) is the pivot

8 Factorization Breakdown If A(k,k)=0 at any stage  breakdown, LU factorization may not exist even if A is nonsingular Theorem: Assume A is nxn matrix. (1) A has an LU factorization if A(1:k,1:k) is non-singular for all k=1:n-1. (2) If the LU factorization exists and A is non-singular, then the LU factorization is unique. Avoid method breakdown  pivoting Pivoting is also necessary to improve accuracy. Small pivot  increased errors Make sure no large entries appear in L or U. Use large pivots.

9 Block LU Factorization A – nxn matrix, n = r*N A11 – rxr matrix, A22 – (n-r)x(n-r) matrix, A12 – rx(n-r) matrix, A21 – (n-r)xr A11 = L11*U11 A12 = L11*U12  U12 A21 = L21*U11  L21 A22 = L21*U12+L22*U22  A22-L21*U12 = A’ = L22*U22 LU factorization iteratively

10 Block LU Factorization A – nxn matrix A(1:k,1:k) is non-singular for k=1:n-1 1<= r <= n Upon completion, A(i,j) overwritten by L(i,j) for i>j; A(i,j) overwritten by U(i,j) for i<=j s = 1 While s <= n q = min(n,s+r-1) Use scalar algorithm to LU-factorize A(s:q,s:q) into L and U Solve LZ = A(s:q,q+1:n) for Z and overwrite A(s:q,q+1:n) with Z Solve WU = A(q+1:n,s:q) for W and overwrite A(q+1:n,s:q) with W A(q+1:n,q+1:n) = A(q+1:n,q+1:n) – WZ s = q+1 End Matrix multiplication accounts for significant fraction of operations

11 Permutation Matrix Permutation matrix: identity matrix with its rows re-ordered. p = [ ] encodes permutation matrix P p(k) is the column index of the “1” in k-th row PA: row-permuted version of A AP: column-permuted version of A Interchange permutation matrix: identity matrix with two rows swapped Row 1 and 4 swapped EA: swap rows 1 and 4 of A AE: swap columns 1 and 4 of A

12 Permutation Matrix A permutation matrix can be expressed as a series of row interchanges: If E_{k} is the interchange permutation matrix with rows k and p(k) interchanged, Then P can be encoded by vector p(1:n). If x(1:n) is a vector, then Px can be computed using p(1:n) For k=1:n swap x(k) and x(p(k)) End p(1:n) vector is useful for pivoting

13 Partial Pivoting Pivoting is crucial to preventing breakdown and improving accuracy Partial pivoting: choose largest element in a column (or row) and interchange rows (columns) Swap rows 1 and 3 Swap rows 2 and 3

14 LU Factorization with Row Partial Pivoting A – nxn matrix After factorization, strictly lower triangular part of A contains L; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting Algorithm F2: For k=1:n-1 Determine s with k<=s<=n s.t. |A(s,k)| is largest among |A(k:n,k)| swap A(k,1:n) and A(s,1:n) p(k) = s if A(k,k) != 0 A(k+1:n,k) = A(k+1:n,k)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end

15 How to Use Factorized A Solve Ax = b Using LU factorization of row partial pivoting Need to swap elements of b according to partial pivoting  information in p(1:n-1) Assume A is LU factorized with row partial pivoting using algorithm F2: For k=1:n-1 swap b(k) and b(p(k)) End Solve Ly = b Solve Ux = y L - unit lower triangular matrix whose lower triangular part is the same as that of A; U - upper triangular part of A (including diagonal)

16 LU Factorization With Row Partial Pivoting A – nxn matrix After factorization, strictly lower triangular part of A contains multipliers; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting Algorithm F1: For k=1:n-1 Determine s with k<=s<=n s.t. |A(s,k)| is largest among |A(k:n,k)| swap A(k,k:n) and A(s,k:n) // only difference with F2 p(k) = s if A(k,k) != 0 A(k+1:n,k) = A(k+1:n,k)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end

17 How to Use Factorized A Solve Ax = b Using LU factorization of partial pivoting Need to swap elements of b according to partial pivoting  information in p(1:n-1) Need to multiply appropriate coefficients  information in lower triangular part of A Assume A is LU factorized with partial pivoting using algorithm F1: For k=1:n-1 swap b(k) and b(p(k)) b(k+1:n) = b(k+1:n) – b(k)A(k+1:n,k) End Solve Ux = b U - upper triangular part of A (including diagonal)

18 Column Partial Pivoting Column partial pivoting: search row k for the largest element, exchange that column with column k. A – nxn matrix After factorization, strictly lower triangular part of A contains L; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting Algorithm G: For k=1:n-1 Determine s with k<=s<=n s.t. |A(k,s)| is largest among |A(k,k:n)| swap A(1:n,k) and A(1:n,s) p(k) = s if A(k,k) != 0 A(k+1:n,k) = A(k+1:n,k)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end

19 How to Use Factorized A Solve Ax = b Using LU factorization with column partial pivoting Need to swap elements of x according to partial pivoting  information in p(1:n-1) Assume A is LU factorized with column partial pivoting using algorithm G: Solve Ly = b Solve Ux = y For k=n-1:-1:1 swap x(k) and x(p(k)) end L - unit lower triangular matrix whose lower triangular part is the same as that of A; U - upper triangular part of A (including diagonal)

20 Complete Pivoting Complete pivoting: the largest element in submatrix A(k:n,k:n) is permuted into (k,k) as the pivot Need a row interchange and a column interchange A – nxn matrix p(1:n-1) – vector encoding row interchanges q(1:n-1) – vector encoding column interchanges After factorization, lower triangular part of A contains L, upper triangular part of A contains U (including diagonal)

21 LU Factorization with Complete Pivoting LU factorization with complete pivoting For k=1:n-1 Determine s (k<=s<=n) and t (k<=t<=n) s.t. |A(s,t)| is largest among |A(i,j)| for i=k:n, j=k:n swap A(k,1:n) and A(s,1:n) swap A(1:n,k) and A(1:n,t) p(k) = s q(k) = t if A(k,k) != 0 A(k+1:n,k) = A(k+1:n)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end

22 How to Use Factorized A Solve Ax = b By LU factorization with complete pivoting Suppose A is LU factorized with complete pivoting, p(1:n-1) and q(1:n-1) are permutation encoding vectors for k=1:n-1 swap b(k) and b(p(k)) End Solve Ly = b for y Solve Ux = y for x For k=n-1:-1:1 swap x(k) and x(q(k)) End L and U are lower and upper triangular parts of factorized A

23 Parallelization of Gaussian Elimination A(k,k) Row-wise 1D block decomposition At step k, the processor holding the pivot sends row k: A(k,k:n) to bottom neighboring processor; At each processor, forward data immediately to bottom neighbor upon receiving data from top processor; then update its own data; then wait for data from top neighbor Disadvantage: load imbalance Remedy: row-wise block cyclic distribution

24 Parallelization with Partial Pivoting Row-wise block/block-cyclic decomposition Gaussian elimination with column partial pivoting More difficult with row partial pivoting Pivoting search on the processor holding row k, no communication among processors; Column index of the new pivot element together with row k: A(k:n) need to be sent out; On each processor, upon receiving data from top neighbor, forward immediately to bottom neighbor, and swap column k and new pivot column of own data; update own data; wait data from top neighbor;