SDPA: Leading-edge Software for SDP Informs ’ 08 Tokyo Institute of Technology Makoto Yamashita Mituhiro Fukuda Masakazu Kojima Kazuhide Nakata.

Slides:



Advertisements
Similar presentations
Primal Dual Combinatorial Algorithms Qihui Zhu May 11, 2009.
Advertisements

Partitional Algorithms to Detect Complex Clusters
Satyen Kale (Yahoo! Research) Joint work with Sanjeev Arora (Princeton)
Globally Optimal Estimates for Geometric Reconstruction Problems Tom Gilat, Adi Lakritz Advanced Topics in Computer Vision Seminar Faculty of Mathematics.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Efficient Sparse Matrix-Matrix Multiplication on Heterogeneous High Performance Systems AACEC 2010 – Heraklion, Crete, Greece Jakob Siegel 1, Oreste Villa.
Automatic Control Laboratory, ETH Zürich Automatic dualization Johan Löfberg.
Sum of Squares and SemiDefinite Programmming Relaxations of Polynomial Optimization Problems The 2006 IEICE Society Conference Kanazawa, September 21,
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Sparse LU Factorization for Parallel Circuit Simulation on GPU Ling Ren, Xiaoming Chen, Yu Wang, Chenxi Zhang, Huazhong Yang Department of Electronic Engineering,
MATH 685/ CSI 700/ OR 682 Lecture Notes
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
High Performance Computing The GotoBLAS Library. HPC: numerical libraries  Many numerically intensive applications make use of specialty libraries to.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
Efficient and Numerically Stable Sparse Learning Sihong Xie 1, Wei Fan 2, Olivier Verscheure 2, and Jiangtao Ren 3 1 University of Illinois at Chicago,
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
Pablo A. Parrilo ETH Zürich Semialgebraic Relaxations and Semidefinite Programs Pablo A. Parrilo ETH Zürich control.ee.ethz.ch/~parrilo.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
MUMPS A Multifrontal Massively Parallel Solver IMPLEMENTATION Distributed multifrontal.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
Scalabilities Issues in Sparse Factorization and Triangular Solution Sherry Li Lawrence Berkeley National Laboratory Sparse Days, CERFACS, June 23-24,
High Performance Solvers for Semidefinite Programs
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
1 High-Performance Implementation of Positive Matrix Completion for SDPs Makoto Yamashita (Tokyo Institute of Technology) Kazuhide Nakata (Tokyo Institute.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
June 21, 2007 Minimum Interference Channel Assignment in Multi-Radio Wireless Mesh Networks Anand Prabhu Subramanian, Himanshu Gupta.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Computación algebraica dispersa con GPUs y su aplicación en tomografía electrónica Non-linear iterative optimization method for locating particles using.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
DUAL FACE ALGORITHM FOR LINEAR PROGRAMMING SOUTHEAST UNIVERSITY PING-QI PAN.
PDCS 2007 November 20, 2007 Accelerating the Complex Hessenberg QR Algorithm with the CSX600 Floating-Point Coprocessor Yusaku Yamamoto 1 Takafumi Miyata.
1 Eigenvalue Problems in Nanoscale Materials Modeling Hong Zhang Computer Science, Illinois Institute of Technology Mathematics and Computer Science, Argonne.
Automatic Performance Tuning of SpMV on GPGPU Xianyi Zhang Lab of Parallel Computing Institute of Software Chinese Academy of Sciences
296.3Page :Algorithms in the Real World Linear and Integer Programming II – Ellipsoid algorithm – Interior point methods.
Fast Support Vector Machine Training and Classification on Graphics Processors Bryan Catanzaro Narayanan Sundaram Kurt Keutzer Parallel Computing Laboratory,
SuperLU_DIST on GPU Cluster Sherry Li FASTMath Meeting, Oct. 1-2, /2014 “A distributed CPU-GPU sparse direct solver”, P. Sao, R. Vuduc and X.S. Li, Euro-Par.
Tao Lin Chris Chu TPL-Aware Displacement- driven Detailed Placement Refinement with Coloring Constraints ISPD ‘15.
Accelerating the Singular Value Decomposition of Rectangular Matrices with the CSX600 and the Integrable SVD September 7, 2007 PaCT-2007, Pereslavl-Zalessky.
On the Use of Sparse Direct Solver in a Projection Method for Generalized Eigenvalue Problems Using Numerical Integration Takamitsu Watanabe and Yusaku.
Advanced Operations Research Models Instructor: Dr. A. Seifi Teaching Assistant: Golbarg Kazemi 1.
1 Efficient Parallel Software for Large-Scale Semidefinite Programs Makoto Tokyo-Tech Katsuki Chuo University MSC Yokohama.
INFOMRS Charlotte1 Parallel Computation for SDPs Focusing on the Sparsity of Schur Complements Matrices Makoto Tokyo Tech Katsuki Fujisawa.
Introduction to Semidefinite Programs Masakazu Kojima Semidefinite Programming and Its Applications Institute for Mathematical Sciences National University.
Direct Methods for Sparse Linear Systems Lecture 4 Alessandra Nardi Thanks to Prof. Jacob White, Suvranu De, Deepak Ramaswamy, Michal Rewienski, and Karen.
1 Enclosing Ellipsoids of Semi-algebraic Sets and Error Bounds in Polynomial Optimization Makoto Yamashita Masakazu Kojima Tokyo Institute of Technology.
Exact Differentiable Exterior Penalty for Linear Programming Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison December 20, 2015 TexPoint.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Nonsmooth Optimization for Optimal Power Flow over Transmission Networks GlobalSIP 2015 Authors: Y. Shi, H. D. Tuan, S. W. Su and H. H. M. Tam.
HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.
Practical Message-passing Framework for Large-scale Combinatorial Optimization Inho Cho, Soya Park, Sejun Park, Dongsu Han, and Jinwoo Shin KAIST 2015.
1 Ellipsoid-type Confidential Bounds on Semi-algebraic Sets via SDP Relaxation Makoto Yamashita Masakazu Kojima Tokyo Institute of Technology.
Toward an Automatically Tuned Dense Symmetric Eigensolver for Shared Memory Machines Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya.
1 Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix Makoto Tokyo-Tech Katsuki Chuo University Mituhiro.
Performance of BLAS-3 Based Tridiagonalization Algorithms on Modern SMP Machines Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.
An Out-of-core Implementation of Block Cholesky Decomposition on A Multi-GPU System Lin Cheng, Hyunsu Cho, Peter Yoon, Jiajia Zhao Trinity College, Hartford,
Analyzing Memory Access Intensity in Parallel Programs on Multicore Lixia Liu, Zhiyuan Li, Ahmed Sameh Department of Computer Science, Purdue University,
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
Matrix Multiplication in Hadoop
Operating System Protection Through Program Evolution
Amir Ali Ahmadi (Princeton University)
Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi
Nonnegative polynomials and applications to learning
Finding Subgraphs with Maximum Total Density and Limited Overlap
Presentation transcript:

SDPA: Leading-edge Software for SDP Informs ’ 08 Tokyo Institute of Technology Makoto Yamashita Mituhiro Fukuda Masakazu Kojima Kazuhide Nakata Chuo University Katsuki Fujisawa National Maritime Research Institute Kazuhiro Kobayashi RIKEN Maho Nakata

2 SDPA (SemiDefinite Programming Algorithm) Project  Open Source Software to solve SemiDefinite Programming  Since the 1 st release in 1995, it has kept high quality  In 2008, the latest version SDPA 7 was released and has been updated continuously  Many software for more advantage

3 SDPA Family SDPA SDPARA (Parallel with MPI) SDPA-C (Matrix Completion) SDPA-M (Matlab Interface) SDPA-GMP (Multiple Precision) SDPARA-C accessible on SDPA Online Solver as Web service

4 Outline of this talk 1.SDP and the improvements of SDPA7 2.Parallel with MPI 3.Multiple Precision 4.Online Solver 5.Future Works

5 SDPA  Basic software of SDPA family  Primal-Dual Interior-Point Method  Mehrotra-type Predictor-Corrector  Exploitation of Sparisty

6 Standard form of SDP

7 Applications of SDP  Control Theory Lyapnov condition  Combinatorial Optimization Max Cut Theta function  Quantum Chemistry Reduced Density Matrix

8 Primal-Dual Interior-Point Methods

9 Computation for Search Direction Schur complement matrix ⇒ Cholesky Factorizaiton Exploitation of Sparsity in

10 Exploitation of Sparsity in Schur complement matrix  Minimize the number of numerical operations with proper formulations

11 SDPA 7  SDPA7 resolves bottlenecks of SDPA6 Introduce sparse Cholesky factorization for the Schur complement matrix Adopt new data structure Reduce memory space for temporary variables Introduce configure script for easier installation

12 Sparsity pattern of Schur complement matrix Fully dense Schur complement matrix Fully dense Schur complement matrix Sparse Schur complement matrix Sparse Schur complement matrix minimum degree ordering  to minimize the number of fill-in minimum degree ordering  to minimize the number of fill-in

13 New Data Structure  For multiple diagonal structure  SDPA6 stores nonzero elements of each block stores all blocks  SDPA 7 stores nonzero elements of each block stores only nonzero blocks

14 r2S_broydenTri300.dat-s SDPA 6SDPA 7 Computing B1009.0s0.8s Cholesky B5179.5s0.5s Total CPU Time6204.4s2.8s Input data90MB1MB Matrix B272MB4MB Dense Matrix7MB3MB Total Memory380MB19MB Sparse Schur New Data Structure Efficient Temporary Xeon 2.80GHz, 2GB memory, Linux 2.4

15 Configure script  Easier installation $./configure – with-blas= ” -lblas ” – with-lapack= ” -llapack ” $ make $ make install  We can link with Optimized BLAS, i.e., ATLAS, GotoBLAS, Intel MKL

16 Matlab Interface  SDPA-M is the Matlab interface [mDIM,nBLOCK,bLOCKsTRUCT,c,F] = read_data( ’ example1.dat-s ’ ); [objVal,x,X,Y,INFO] = sdpam(mDIM,nBLOCK,bLOCKsTRUCT,c,F,OPTION);  SeDuMi Input interface [At,b,c,K] = fromsdpa( ’ example1.dat-s ’ ); [x,y,info] = sedumiwrap(At,b,c,K,[],pars);  Current version is for only LP and SDP cones  Parameter control is based on SDPA

17 Extremely Large Problem and Bottlenecks The largest size requires 8.6GB memory We replace these two bottlenecks by parallel computation Opteron GHz 6GB Memory

18 Exploitation of Sparsity in SDPA  We change the formula by row-wise  We keep this scheme on parallel computation F1F1 F2F2 F3F3

19 Row-wise distribution for evaluation of the Schur complement matrix  4 CPU is available  Each CPU computes only their assigned rows .  No communication between CPUs  Efficient memory management

20 Parallel Cholesky factorization  We adopt Scalapack for the Cholesky factorization of the Schur complement matrix  We redistribute the matrix from row-wise to two-dimensional block-cyclic-distribtuion Redistribution

21 Computation Time for NH3 Opteron 880 (2.4GHz) 32GB memory/node

22 Scalability for LiF Total 28 times Comp B 43 times Chol B 46 times Row-wise distribution for Comp B is very effective Opteron 880 (2.4GHz) 32GB memory/node

23 Matrix Completion  Structual Sparsity in Primal Variable  Nakata-Fujisawa-Fukuda-Kojima-Murota Positive Definite Matrix Completion Chordal Graph Sparse Cholesky Factorization  SDPA-C = SDPA + Completion  SDPARA-C = SDPARA + Completion [2003]

24 Aggregate Sparsity Pattern Enough for objective function and equality constraints All elements are required for PSD

25 Positive Definite Matrix Completion Assign values not in to be positive definite

26 Multiple Precision  SDPA uses ‘ double ’ precision 53 significant bit almost 8 digit  SDPA 7 result (gpp124-1 from SDPLIB) Objective Function (Only 5 digits) e+00(Primal) e+00(Dual) Feasibility e-12 (Primal) e-07 (Dual)  Some applications requires more accuracy

27 SDPA-GMP  GMP: Gnu Multiple Precision Library  Arbitrary fixed precision  ‘ double ’ precision is replaced by GMP  Ultra High Accuracy by long computation time

28 Ultra Accuracy of SDPA-GMP

29 Comparison on SDPA and SDPA-GMP(384bit) gpp124-1(SDPLIB) SDPA-GMP(7.1.0) Relative gap e-26 Objective Function e+00 (Primal) e+00 (Dual) Feasibility e-57 (Primal) e-29 (Dual) Computation time sec. (59 iterations) SDPA(7.1.0) Relative gap e-07 Objective Function e+00 (Primal) e+00 (Dual) Feasibility e-12 (Primal) e-07 (Dual) Computation time 0.14 sec. (20 iterations)

30 SDPA Online Solver  SDPA Online Solver will offer SDPA/SDPARA/SDPARA-C via the Internet. Internet Interface User 1.Input 2.Ninf-G 3.SDPARA on PC cluster 4.Solution

31 To use Online Solver  Users without parallel environment can use SDPARA/SDPARA-C.  No Charge.  Registration via the Internet is required so that passwords to protect users data will be generated automatically.  Access SDPA Project Home Page. [SDPA Online for your future.]

32 Online Solver Interface

33 Online Solver Usage

34 Conclusion  The latest version 7 attains higher performance than version 6  Parallel Solver enables us to solve extremely large SDPs  Matrix Completion is useful for Structural Sparsity  SDPA-GMP generates ultra high accuracy solution  Online Solver provides powerful computation resources via the Internet

35 Future works  Callable Library of SDPA7  Automatic Selection from SDPA/SDPARA/SDPARA-C