Presentation is loading. Please wait.

Presentation is loading. Please wait.

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.

Similar presentations


Presentation on theme: "A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER."— Presentation transcript:

1 A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER (1704-1752) April 3, 2009

2 Outline  Problem statement and motivation  Novel approach  Revisiting Cramer’s rule  Matrix condensation  Illustration of the proposed scheme  Implementation results  Challenges ahead 2

3 Solving large-scale linear systems  Many scientific applications  Computer models in finance, biology, physics  Real-time load flow calculations for electric utilities Short-circuit fault, economic analysis Consumer generation on the electric grid may soon require real-time calculations (hybrid cars, solar panels) 3

4 Why try to improve?  We want parallel processing for speed  Current schemes use Gaussian elimination  Mainstream approach: LU decomposition  O(N 3 ) computational complexity, O(N 2 ) parallelizable  If you had N 2 processors  O(N) time  So far so good …  The “catch”: I rregular communication patterns for load balancing across processing nodes 4

5 Cramer’s rule revisited 5 ax + by = e cx + dy = f e b f d a b c d x =  The solution to a linear system Ax = b is given by  x i = |A i (b)|/|A| where A i (b) denotes A with its i th column replaced by b  O(N!) computational complexity

6 Chio’s matrix condensation 6 a 1 b 1 c 1 a 2 b 2 c 2 a 3 b 3 c 3 Matrix A = a 1 –(n-2) a 1 b 1 a 2 b 2 a 1 c 1 a 2 c 2 a 1 b 1 a 3 b 3 a 1 c 1 a 3 c 3 b 2’ c 2’ b 3’ c 3’ = a 1 –(n-2)  a 1,1 cannot be 0  If a 1,1 is 1 then a 1,1 –(n-2) =1  Let D denote the matrix obtained by replacing each element a i,j by a 1,1 a 1,j a i,1 a i,j Then |A| = |D| a 1,1 n-2  Recursive determinant calculation  O(N 3 ) computational complexity

7 Highlights of the approach  Chio’s condensation combined with Cramer’s rule results in O(N 4 )  Goal to remain at O(N 3 )  Retain attractive parallel processing potential  Solution: clever bookkeeping to reduce computations  “Mirror” matrix before applying condensation  Each matrix solves for half of the unknowns  Condense each until matrix size matches the number of unknowns  Mirror the matrices again 7

8 3 3 2 1 2 -5 4 0 -5 0 -4 -3 0 4 2 3 4 -4 2 3 -2 3 1 4 -5 2 2 4 -2 4 -5 1 3 -2 0 2 -5 -3 -2 2 -4 -3 -5 0 2 0 -4 4 4 -2 0 -4 -5 -5 -2 0 1 0 2 -4 4 0 1 -5 0 0 0-40034-3-204 3 3 2 1 2 -5 4 0 -5 0 -4 -3 0 4 2 3 4 -4 2 0-40 Mirroring the matrix 3 -2 3 1 4 -5 2 2 4 -2 4 -5 1 3 -2 0 2 -5 -3 -2 2 -4 -3 -5 0 2 0 -4 4 4 -2 0 -4 -5 -5 -2 0 1 0 2 034-3-20 9 unknowns to solve for

9 3 3 2 1 2 -5 4 0 -5 0 -4 -3 0 4 2 3 4 -4 2 3 -2 3 1 4 -5 2 2 4 -2 4 -5 1 3 -2 0 2 -5 -3 -2 2 -4 -3 -5 0 2 0 -4 4 4 -2 0 -4 -5 -5 -2 0 1 0 2 -4 4 0 1 -5 0 0 0-40034-3-204 3 3 2 1 2 -5 4 0 -5 0 -4 -3 0 4 2 3 4 -4 2 3 -2 3 1 4 -5 2 2 4 -2 4 -5 1 -3 2 0 -2 5 3 2 4 3 5 0 0 4 -4 2 2 0 4 5 1 5 5 2 0 1 0 1 -2 0-4003 320 4 0 1 -5 0 0 4 Mirroring the matrix (cont’)

10 3 3 2 1 2 -5 4 0 -5 0 -4 -3 0 4 2 3 4 -4 2 3 -2 3 1 4 -5 2 2 4 -2 4 -5 1 3 -2 0 2 -5 -3 -2 2 -4 -3 -5 0 2 0 -4 4 4 -2 0 -4 -5 -5 -2 0 1 0 2 -4 4 0 1 -5 0 0 0-40034-3-204 3 3 2 1 2 -5 4 0 -5 0 -4 -3 0 4 2 3 4 -4 2 3 -2 3 1 4 -5 2 2 4 -2 4 -5 1 -3 2 0 -2 5 3 2 4 3 5 0 0 4 -4 2 2 0 4 5 1 5 5 2 0 1 0 1 -2 0-4003 320 4 0 1 -5 0 0 4 Mirroring the matrix (cont’) 5 unknowns to solve for 4 unknowns to solve for

11 3 3 2 1 2 -5 4 0 -5 0 -4 -3 0 4 2 3 4 -4 2 3 -2 3 1 4 -5 2 2 4 -2 4 -5 1 3 -2 0 2 -5 -3 -2 2 -4 -3 -5 0 2 0 -4 4 4 -2 0 -4 -5 -5 -2 0 1 0 2 -4 4 0 1 -5 0 0 0-40034-3-204 3 3 2 1 2 -5 4 0 -5 0 -4 -3 0 4 2 3 4 -4 2 3 -2 3 1 4 -5 2 2 4 -2 4 -5 1 -3 2 0 -2 5 3 2 4 3 5 0 0 4 -4 2 2 0 4 5 1 5 5 2 0 1 0 1 -2 0-4003 320 4 0 1 -5 0 0 4 Mirroring the matrix (cont’) 5 unknowns to solve for 4 unknowns to solve for

12 3 3 2 1 2 -5 4 0 -5 0 -4 -3 0 4 2 3 4 -4 2 3 -2 3 1 4 -5 2 2 4 -2 4 -5 1 3 -2 0 2 -5 -3 -2 2 -4 -3 -5 0 2 0 -4 4 4 -2 0 -4 -5 -5 -2 0 1 0 2 -4 4 0 1 -5 0 0 0-40034-3-204 Chio’s matrix condensation

13 3 3 2 1 2 -5 4 0 -5 0 -4 -3 0 4 2 3 4 -4 2 3 -2 3 1 4 -5 2 2 4 -2 4 -5 1 3 -2 0 2 -5 -3 -2 2 -4 -3 -5 0 2 0 -4 4 4 -2 0 -4 -5 -5 -2 0 1 0 2 -4 4 0 1 -5 0 0 0-40034-3-204 Chio’s matrix condensation (cont’) = 0

14 3 3 2 1 2 -5 4 0 0 -5 0 -4 -3 0 4 2 3 4 -4 2 3 -2 3 1 4 -5 2 2 4 -2 4 -5 1 3 -2 0 2 -5 -3 -2 2 -4 -3 -5 0 2 0 -4 4 4 -2 0 -4 -5 -5 -2 0 1 0 2 -4 4 0 1 -5 0 0 0-40034-3-204 Chio’s matrix condensation (cont’) -6 =

15 3 3 1 2 -5 4 0 0 0 -4 -3 0 4 -6 3 4 -4 2 3 -15 3 1 4 -5 2 2 6 -2 4 -5 1 3 -15 0 2 -5 -3 -2 2 -4 3 -5 0 2 0 -4 4 4 -18 -2 0 -4 -5 -5 9 0 1 0 2 4 0 1 -4 -5 0 0 0-40034-3-204 Chio’s matrix condensation (cont’) = 2 -4 24

16 3 3 2 1 2 -5 4 0 0 0 -4 -3 0 4 -6 3 4 -4 2 3 -15 3 1 4 -5 2 2 6 -2 4 -5 1 3 -15 0 2 -5 -3 -2 2 -4 3 -5 0 2 0 -4 4 4 -18 -2 0 -4 -5 -5 9 0 1 0 2 24 0 1 -4 -5 0 0 0-40034-3-204 Chio’s matrix condensation (cont’) = -13 4

17 3 2 1 2 -5 4 0 -13 -2 2 -17 -5 0 4 1 8 -20 8 -19 6 3 3 0 -9 27 -27 6 2 -19 -8 8 -5 -3 3 -6 3 -21 6 -18 6 -4 -7 4 14 -20 4 12 4 -14 -4 -20 5 -19 -15 -5 10 8 7 -25 17 6 8 7 -4 -35 16 0 0-1200912-9-6012 Chio’s matrix condensation (cont’) = 30-6-156 3-18924

18 -13 -2 2 -17 -5 0 1 8 -20 8 -19 6 3 0 -9 27 -27 6 -19 -8 8 -5 -3 -6 3 -21 6 -18 6 -7 4 14 -20 4 12 -14 -4 -20 5 -19 -15 10 8 7 -25 17 6 8 7 -4 -35 16 0 -1200912-9-6012 Chio’s matrix condensation (cont’) 0-6-156 3-18924 The value in the a1,1 position cannot be zero

19 9 -5 1 -3 6 7 -2 2 4 -5 -2 6 -7 5 2 -2 -9 2 973-40 -3 Mirroring the matrix

20 9 -5 1 -3 6 7 -2 2 4 -5 -2 6 -7 5 1 2 -2 -9 2 973-40 -3 9 -5 1 -3 6 7 -2 2 4 -5 -2 6 -7 5 1 2 -2 -9 2 97 3 -40 -3 Mirroring the matrix (cont’)

21 9 -5 1 -3 6 7 -2 2 4 -5 -2 6 -7 5 1 2 -2 -9 2 973-40 -3 9 -5 1 -3 6 7 -2 2 4 5 2 -6 7 -5 7 -2 -9 2 9 7 -3 40 Chio’s matrix condensation -6 8 19 -10 6 -17 14 -9 8 -7 4 -8 -6 1 420-1211-6 3 unknowns to solve for 2 unknowns to solve for

22 Applying Cramer’s rule -6 8 19 -10 6 -17 14 -9 8 -7 4 -8 -6 1 420-1211-6 8 19 -10 6 -17 14 -9 8 -7 4 -8 -6 1 420-1211-6

23 Applying Cramer’s rule -6 8 19 -10 6 -17 14 -9 8 -7 4 -8 -6 1 420-1211-6 8 19 -10 6 -17 14 -9 8 -7 4 420-1211 = = 2688 7728 = 3  Answer for x 9

24 Applying Cramer’s rule -6 8 19 -10 6 -17 14 -9 8 -7 4 -8 -6 1 420-12 11 -6 8 19 -10 6 -17 14 -9 8 -7 4 420-1211 = = 2688 180 = 0.07  Answer for x 8

25 Overview of data flow structure 25 Mirroring of the matrix keeps an O(N 3 ) algorithm. Original Matrix (N) Original Matrix Mirror (N) (N/2) (N/2) Image (N/2) ( N/2) Image (N/4) (N/4)Image (N/4) (N/4) I (N/4)(N/4) I (N/4) Original Matrix (N) 24 variables 12 variables 6 variables 3 x 3 Chio’s condensation

26 Parallel computations  Similar to LU-decomposition (Access by rows)  Broadcast communication only  Send-ahead on lead row values 26  Mirroring provides an advantage  Algorithm mirrors as matrix reduces in size  Load naturally redistributed among processors  LU-decomposition needs blocking and interleaving to avoid idle processors, leads to complex communication patterns (overhead) Figure 9.2: Parallel Scientific Computing in C++ and MPI. George Em Karniadakis and Robert M. Kirby II

27 Paradigm shift – key points  Apply Cramer’s rule  Employ matrix condensation for efficient determinant calculations  Highly parallel O(N 3 ) process  Clever bookkeeping to re-use information  Final result  O(N 3 ) comp. with O(N 2 ) comm.  Key advantage: regular communication patterns with low comm overhead and balanced processing load 27

28 Implementation results  Trial platform  Single-core Pentium M @ 1.5 GHz  64 KB L1 cache, 1 MB L2 cache  Coded in C with SSE used for core function (Chio’s condensation)  Memory access optimized using cache blocking  Double precision variables and calculations  Result: ~2.4x slower than Matlab (consistently) 28

29 Challenges ahead  Further code improvement/optimization  Current L2 miss rate is high  Precision improvement  Parallel implementation  GPU implementation  Distributed architecture implementation  Sparse matrix optimization  Other linear algebra applications (e.g. matrix inversion) 29

30 Thank you 30


Download ppt "A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER."

Similar presentations


Ads by Google