Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by PLASMA (Parallel Linear Algebra for Scalable Multicore Architectures) ‏ The Innovative Computing Laboratory University of Tennessee Knoxville.

Similar presentations


Presentation on theme: "Presented by PLASMA (Parallel Linear Algebra for Scalable Multicore Architectures) ‏ The Innovative Computing Laboratory University of Tennessee Knoxville."— Presentation transcript:

1 Presented by PLASMA (Parallel Linear Algebra for Scalable Multicore Architectures) ‏ The Innovative Computing Laboratory University of Tennessee Knoxville

2 2 Dongarra_PLASMA_SC07 2 Dongarra_KOJAK_SC07 Why Multicore? The ILP Wall The Memory Wall The Power Wall ILP is expensive and does not generate much concurrency TLP provides higher concurrency Power consumption grows with MHz 3 Latency is completely exposed Latency can be hidden using multithreading Single core Multicore Power consumption grows linearly with the number of transistors

3 3 Dongarra_PLASMA_SC07 3 Dongarra_KOJAK_SC07 Programming for Multicores Parallel software for multicores should have two characteristics: fine granularity:  high parallelism degree is needed  cores are (and probably will be) associated with relatively small local memories. asynchronicity:  high parallelism degree make synchronizations a bigger bottleneck  hide the latency

4 4 Dongarra_PLASMA_SC07 4 Dongarra_KOJAK_SC07 Developing Parallel Algorithms LAPACK Threaded BLAS PThreadsOpenMP parallelism s LAPACK sequential BLAS PThreadsOpenMP parallelism sequential BLAS

5 5 Dongarra_PLASMA_SC07 5 Dongarra_KOJAK_SC07 Developing Parallel Algorithms: why? BLAS2 operations cannot be efficiently parallelized because they are bandwidth bound. strict synchronizations poor parallelism poor scalability

6 6 Dongarra_PLASMA_SC07 6 Dongarra_KOJAK_SC07 Tiled Cholesky Factorization In some cases it is possible to use the LAPACK algorithm breaking the elementary operations into tiles. Cholesky: do DPOTF2 on for all do DTRSM on end for all do DGEMM on end

7 7 Dongarra_PLASMA_SC07 7 Dongarra_KOJAK_SC07 Tiled LU Factorization In many cases different algorithms are needed which must be invented or can be found in literature. LU and QR: DTSTRF: DGETRF: DGESSM: DSSSSM:

8 8 Dongarra_PLASMA_SC07 8 Dongarra_KOJAK_SC07 Tiled LU Factorization In many cases different algorithms are needed which must be invented or can be found in literature. LU and QR:

9 9 Dongarra_PLASMA_SC07 9 Dongarra_KOJAK_SC07 k=1 DGETRF k=1, j=2 DGESSM k=1, j=3 DGESSM k=1, i=2 DTSTRF k=1, i=2, j=2 DSSSSM k=1, i=2, j=3 DSSSSM k=1, i=3 DTSTRF k=1, i=3, j=2 DSSSSM k=1, i=3, j=3 DSSSSM Tiled LU Factorization

10 10 Dongarra_PLASMA_SC07 10 Dongarra_KOJAK_SC07 Block Data Layout Fine granularity may require novel data formats to overcome the limitations of BLAS on small chunks of data. Column-Major Block data layout

11 11 Dongarra_PLASMA_SC07 11 Dongarra_KOJAK_SC07 Graph Driven Asynchronous Execution The whole factorization can be represented as a DAG: nodes: tasks that operate on tiles edges: dependencies among tasks Tasks can be scheduled asynchronously and in any order as long as dependencies are not violated. DTSTRF DGETRF DGESSM DSSSSM

12 12 Dongarra_PLASMA_SC07 12 Dongarra_KOJAK_SC07 A critical path can be defined as the shortest path that connects all the nodes with the higher number of outgoing edges. Graph Driven Asynchronous Execution Priorities:

13 13 Dongarra_PLASMA_SC07 13 Dongarra_KOJAK_SC07 Fork-Join vs Asynchronous Time Idle time Fork-Join Asynchronous

14 14 Dongarra_PLASMA_SC07 14 Dongarra_KOJAK_SC07 Performance: Cholesky

15 15 Dongarra_PLASMA_SC07 15 Dongarra_KOJAK_SC07 Performance: QR

16 16 Dongarra_PLASMA_SC07 16 Dongarra_KOJAK_SC07 Performance: LU

17 17 Dongarra_PLASMA_SC07 17 Dongarra_KOJAK_SC07 Contacts http://icl.cs.utk.edu/~buttari http://www-math.cudenver.edu/~langou http://icl.cs.utk.edu/~kurzak http://netlib.org/utk/people/JackDongarra http://icl.cs.utk.edu


Download ppt "Presented by PLASMA (Parallel Linear Algebra for Scalable Multicore Architectures) ‏ The Innovative Computing Laboratory University of Tennessee Knoxville."

Similar presentations


Ads by Google