Download presentation
Presentation is loading. Please wait.
1
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008 1 Accelerating generalized Cholesky decomposition using multiple processors
2
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008 2 Application in Least-Squares Collocation
3
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008 3 Error-covariance estimation
4
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008 4 Cholesky Factorization L: lower triangular matrix
5
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008 5 Generalized Cholesky
6
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008 6 More Generalized Cholesky
7
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008 7 Parallization When diagonal element has been computed may each element in the row be reduced separately: Hence each processor may take care of one column.
8
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008 8 Blockwise factorization Should one row be factorized at at time ? Or should we make the factorization of blocks of elements ? Out-of-core factorization needed for large matrices, so let the processors work on blocked matrices.
9
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008 9 3 blocks ‘Column-wise’ 1-dim. of size 9 3 blocks rectangular 2-dim. of size 3*3 Block division Column-wise and rectangular Blocks 1 2 3Blocks 1 2 c 11 c 21 c 31 c 41 c 51 c 12 c 22 c 13 c 32 c 23 c 33 c 14 c 24 c 34 c 42 c 43 c 52 c 15 c 25 c 35 c 44 c 45 c 16 c 26 c 36 c 46 c 56 c 55 c 54 c 53 c 61 c 62 c 66 c 65 c 64 c 63 Block 3 c 11 c 21 c 31 c 41 c 51 c 12 c 22 c 13 c 32 c 23 c 33 c 14 c 24 c 34 c 42 c 43 c 52 c 15 c 25 c 35 c 44 c 45 c 16 c 26 c 36 c 46 c 56 c 55 c 54 c 53 c 61 c 62 c 66 c 65 c 64 c 63
10
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008 10 Blocksize tests NEQ = 10000, Nproc = 4NEQ = 20000, Nproc = 2
11
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008 11 Parallelization Flowchart over the Choleski factorisation with NES_MP and related subroutine(s)
12
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008 12 Parallelization Results Results (Perf. test on two PCs, Compiler PGF90) GOCE (4x3GHz, 2GB)IKOS (4x2.66GHz, 4GB) PROCNEQ.NESNES_MPNESNES_MP 16400775177 2640013013671 4640087 181001570347 28100228 48100177 1100002966650586290 210000446159 410000369
13
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008 13 Integration in GEOCOL18 ServerNEQGeocol17aGeocol18zr Processors 124 GOCE5000370804624 100002971851630354 200009464424928442081 IKOS500023 10000330 Geocol integration tests: Timing (in s) for equation solving only.
14
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008 14 Performance Increase
15
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008 15 Conclusion Generalized Cholesky-factorization enables the use of parallelization for solution and error-covariance computation. Time gain using parallelization depends on number of processors, block-size and how busy the computer is doing other things.
16
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan. 2008 16 Note: further use of multiprocessing Evaluation of spherical harmonic series (N.Pavlis et al.). Establishing the normal-equation matrix or computing a column of covariances Factorisation may start as soon as a row of blocks has been established. Gives realistic speeds of LSC applications (minutes instead of days).
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.