Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix Makoto Tokyo-Tech Katsuki Chuo University Mituhiro.

Similar presentations


Presentation on theme: "1 Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix Makoto Tokyo-Tech Katsuki Chuo University Mituhiro."— Presentation transcript:

1 1 Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix Makoto Yamashita @ Tokyo-Tech Katsuki Fujisawa @ Chuo University Mituhiro Fukuda @ Tokyo-Tech Yoshiaki Futakata @ University of Virginia Kazuhiro Kobayashi @ National Maritime Research Institute Masakazu Kojima @ Tokyo-Tech Kazuhide Nakata @ Tokyo-Tech Maho Nakata @ RIKEN ISMP 2009 @ Chicago [2009/08/26]

2 2 Extremely Large SDPs  Arising from various fields Quantum Chemistry Sensor Network Problems Polynomial Optimization Problems  Most computation time is related to Schur complement matrix (SCM)  [SDPARA]Parallel computation for SCM  In particular, sparse SCM

3 3 Outline 1.SemiDefinite Programming and Schur complement matrix 2.Parallel Implementation 3.Parallel for Sparse Schur complement 4.Numerical Results 5.Future works

4 4 Standard form of SDP

5 5 Primal-Dual Interior-Point Methods

6 6 Computation for Search Direction Schur complement matrix ⇒ Cholesky Factorizaiton Exploitation of Sparsity in1.ELEMENTS 2.CHOLESKY

7 7 Bottlenecks on Single Processor Apply Parallel Computation to the Bottlenecks in second Opteron 246 (2.0GHz) LiOHHF m1059215018 ELEMENTS6150( 43%)16719( 35%) CHOLESKY7744( 54%)20995( 44%) TOTAL14250(100%)47483(100%)

8 8 SDPARA  SDPA parallel version (generic SDP solver)  MPI & ScaLAPACK  Row-wise distribution for ELEMENTS  parallel Cholesky factorization for CHOLESKY http://sdpa.indsys.chuo-u.ac.jp/sdpa/

9 9 Row-wise distribution for evaluation of the Schur complement matrix  4 CPU is available  Each CPU computes only their assigned rows .  No communication between CPUs  Efficient memory management

10 10 Parallel Cholesky factorization  We adopt Scalapack for the Cholesky factorization of the Schur complement matrix  We redistribute the matrix from row-wise to two-dimensional block-cyclic distribtuion Redistribution

11 11 Computation time on SDP from Quantum Chemistry [LiOH] AIST super cluster Opteron 246 (2.0GHz) 6GB memory/node

12 12 Sclability on SDP from Quantum Chemistry [NF] Total 29 times ELEMENTS 63 times CHOLESKY 39 times ELEMENTS is very effective

13 13 Sparse Schur complement matrix  Schur complement matrix becomes very sparse for some applications. ⇒ Simple Row-wise loses its efficiency from Control Theory (100%) from Sensor Network(2.12%)

14 14 Sparseness of Schur complement matrix  Many applications have diagonal block structure

15 15 Exploitation of Sparsity in SDPA  We change the formula by row-wise F1F1 F2F2 F3F3

16 16 ELEMENTS for Sparse Schur complement 150403020 13520 7010 505 30 3 Load on each CPU CPU1:190 CPU2:185 CPU3:188

17 17 CHOLESKY for Sparse Schur complement  Parallel Sparse Cholesky factorization implemented in MUMPS  MUMPS adopts Multiple Frontal method 150403020 13520 7010 505 30 3 Memory storage on each processor should be consecutive. The distribution for ELEMENTS matches this method.

18 18 Computation time for SDPs from Polynomial Optimization Problem tsubasa Xeon E5440 (2.83GHz) 8GB memory/node Parallel Sparse Cholesky achieves mild scalability. ELEMENTS attains 24x speed-up on 32 CPUs.

19 19 ELEMENTS Load-balance on 32 CPUs  Only first processor has a little heavier computation.

20 20 Automatic selection of sparse / dense SCM  Dense Parallel Cholesky achieves higher scalability than Sparse Parallel Cholesky  Dense becomes better for many processors.  We estimate both computation time using computation cost and scalability.

21 21 Sparse/Dense CHOLESKY for a small SDP from POP tsubasa Xeon E5440 (2.83GHz) 8GB memory/node Only on 4 CPUs, the auto selection failed. (since scalability on sparse cholesky is unstable on 4 CPUs.)

22 22 Numerical Results  Comparison with PCSDP Sensor Network Problem generated by SFSDP  Multi Threading Quantum Chemistry

23 23 SDPs from Sensor Network #sensors1,000 (m=16,450: density 1.23%) #CPU124816 SDPARA28.222.116.713.827.3 PCSDPM.O.1527887591368 #sensors35,000 (m=527,096: density ) #CPU124816 SDPARA1080845614540506 PCSDPMemory Over. if #sensors >= 4,000 (time unit : second)

24 24 MPI + Multi Threading for Quantum Chemistry N.4P.DZ.pqgt11t2p(m=7230) second 64x speed-up on [16nodesx8threads]

25 25 Concluding Remarks & Future works 1.New parallel schemes for sparse Schur complement matrix 2.Reasonable Scalability 3.Extremely large-scale SDPs with sparse Schur complement matrix  Improvement on Multi-Threading for sparse Schur complement matrix


Download ppt "1 Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix Makoto Tokyo-Tech Katsuki Chuo University Mituhiro."

Similar presentations


Ads by Google