Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

Similar presentations


Presentation on theme: "Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden"— Presentation transcript:

1 Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden Peter.Gottschling@smartsoft-computing.com Tel.: +49 (0) 351 463 34018

2 Software libraries MTL4 FEniCS Compressed sparse matrices Insertion Benchmarks Vision Overview

3 Generic library for high-performance numeric operations in mathematical notation Many new techniques as implicit enable-if and meta- tuning Most modern iterative solvers Focus on high-performance simulation: FEM/XFEM/FVM/FDM Commercial version in preparation Parallel version in progress Multi-core, GPU support and multigrid in near future Matrix Template Library 4

4 Innovative Produktentwicklung durch Finite-Elemente-Methode (FEM) Innovative Produktentwicklung durch template < class LinearOperator, class HilbertSpaceX, class HilbertSpaceB, class Preconditioner, class Iteration > int cg(const LinearOperator& A, HilbertSpaceX& x, const HilbertSpaceB& b, const Preconditioner& M, Iteration& iter) { typedef typename mtl::Collection ::value_type Scalar; Scalar rho, rho_1, alpha, beta; HilbertSpaceX p(size(x)), q(size(x)), r(size(x)), z(size(x)); r = b - A*x; while (! iter.finished(r)) { z = solve(M, r); rho = dot(r, z); if (iter.first()) p = z; else { beta = rho / rho_1; p = z + beta * p; } q = A * p; alpha = rho / dot(p, q); x += alpha * p; r -= alpha * q; rho_1 = rho; ++iter; } return iter; } Linearer Gleichungslöser

5 Free software for solving differential equations FFC – FEniCS Form Compiler High-level math language for formulating differential equations Generate C++ code DOLFIN – generic FEM kernel C++ library for FEM cores: assembler, mesh and function abstraction Interface to uBLAS, PETSc, Trillinos, and MTL4 Paper focus in matrix assembly FEniCS

6 Compressed Sparse Row Format Most common general-purpose sparse format Entries sorted Kind of run- length encoding on rows

7 In-Flight Insertion Very simple use Like dense matrices Simple realization Extremely expensive All following entries are changed Quadratic complexity A[0][1]= 6;

8 Dedicated insertion phase Matrix is available after terminating insertion Later modification impossible Works for distributed matrices as well Used in PETSc, includes construction of communication buffers for dist. SpMVP Janus derives its name from it (two faces) Two-phase Insertion

9 Inserter = object providing operations to set up other objects, e.g. matrices or vectors, efficiently Insertion phase lasts as long as inserter lives Insert within a scope (block, function) Matrix ready when inserter destroyed Later insertion possible with another inserter Extends to distributed matrices and vectors MTL4 inserters have minimal memory usage Inserter Concept in MTL4

10 int main(int argc, char* argv[]) { compressed2D A(3, 5); { matrix::inserter > ins(A); ins[0][0] << 1.0; ins[0][2] << 2.0; ins[1][3] << 3.0; ins[2][1] << 4.0; ins[2][4] << 5.0; } std::cout << "A is\n" << A << '\n'; return 0; } Using Inserters

11 Direct Insertion Reserve s entries per row Find insert position By linear or binary search Move remainder in row Linear in s That is constant A[0][1]= 6;

12 Indirect Insertion For saturated rows use “spare” container std::map of index pair Logarithmic in number of spare entries Additional allocation About 10 times slower than direct insertion A[0][4]= 7;

13 Assemble CRS matrix Row order important, and order within row Performance measure: number of non-zeros inserted per second Reassembly Three libraries: uBLAS (including vector-of-vector), MTL4, PETSc Ordinary workstation (Intel) All benchmarks run in a simple interface routine for each library, e.g. Benchmark void insert row(Matrix& A, int row_idx, int ∗ cols_idx, double ∗ a, int n) { for(int j=0; j<n; j++) A(row_idx, cols_idx[j]) += a[j]; }

14 10,000 rows, 5 non-zeros/row MTL4: 46 million entries per second uBLAS: 5.9 million entries per second uBLAS (gov): 2 million entries per second PETSc: 22 million entries per second Benchmark: Assembly rate with ascending rows

15 100,000 rows, 50 non-zeros/row MTL4: 29.6 million entries per second uBLAS: 6.5 million entries per second uBLAS (gov): 2.8 million entries per second PETSc: 32.3 million entries per second Benchmark: Assembly rate with ascending rows

16 10,000 rows, 5 non-zeros/row MTL4: 41.4 million entries per second uBLAS: 31,300 entries per second uBLAS (gov): 1.9 million entries per second PETSc: 19.9 million entries per second Benchmark: Assembly rate with random rows

17 100,000 rows, 50 non-zeros/row MTL4: 25.6 million entries per second uBLAS: measuring abandonned uBLAS (gov): 2.7 million entries per second PETSc: 25.6 million entries per second Benchmark: Assembly rate with random rows

18 10,000 rows, 5 non-zeros/row MTL4: 4.8 million entries per second uBLAS: 16,700 entries per second uBLAS (gov): 1.8 million entries per second PETSc: 15,900 entries per second Benchmark: Assembly rate with entirely random entries

19 10,000 rows, 50 non-zeros/row MTL4: 2.9 million entries per second uBLAS: 3,340 entries per second uBLAS (gov): 1.7 million entries per second PETSc: 13,400 entries per second Benchmark: Assembly rate with random rows

20 How to do Science in Silicon? Graphic application CPU GPU

21 Scientific Software Scientific application CPU GPUMulti-CorePar. Arch. Scien. Proc.

22 Introduced new approach for setting and modifying compressed sparse matrices Does not need preparation phase Minimal memory footprint Optimal performance Tuned block-insertion under progress Extends to distributed data structures Conclusions


Download ppt "Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden"

Similar presentations


Ads by Google