Presented by PLASMA (Parallel Linear Algebra for Scalable Multicore Architectures) ‏ The Innovative Computing Laboratory University of Tennessee Knoxville.

Presented by PLASMA (Parallel Linear Algebra for Scalable Multicore Architectures) ‏ The Innovative Computing Laboratory University of Tennessee Knoxville

2 Dongarra_PLASMA_SC07 2 Dongarra_KOJAK_SC07 The ILP Wall The Memory Wall The Power Wall ILP is expensive and does not generate much concurrency TLP provides higher concurrency Power consumption grows with MHz 3 Latency is completely exposed Latency can be hidden using multithreading Single core Multicore Power consumption grows linearly with the number of transistors Why Multicore?

3 Dongarra_PLASMA_SC07 3 Dongarra_KOJAK_SC07 Parallel software for multicores should have two characteristics: Programming for Multicores  fine granularity:  high parallelism degree is needed  cores are (and probably will be) associated with relatively small local memories  asynchronicity:  high parallelism degree make synchronizations a bigger bottleneck  hide the latency

4 Dongarra_PLASMA_SC07 4 Dongarra_KOJAK_SC07 LAPACK Threaded BLAS PThreadsOpenMP parallelism s LAPACK sequential BLAS PThreadsOpenMP parallelism sequential BLAS Developing Parallel Algorithms

5 Dongarra_PLASMA_SC07 5 Dongarra_KOJAK_SC07 BLAS2 operations cannot be efficiently parallelized because they are bandwidth bound. strict synchronizations poor parallelism poor scalability Developing Parallel Algorithms: why?

6 Dongarra_PLASMA_SC07 6 Dongarra_KOJAK_SC07 do DPOTF2 on for all do DTRSM on end for all do DGEMM on end Tiled Cholesky Factorization In some cases it is possible to use the LAPACK algorithm breaking the elementary operations into tiles. Cholesky:

7 Dongarra_PLASMA_SC07 7 Dongarra_KOJAK_SC07 DTSTRF: DGETRF: DGESSM: DSSSSM: Tiled LU Factorization In many cases different algorithms are needed which must be invented or can be found in literature. LU and QR:

8 Dongarra_PLASMA_SC07 8 Dongarra_KOJAK_SC07 Tiled LU Factorization In many cases different algorithms are needed which must be invented or can be found in literature. LU and QR:

9 Dongarra_PLASMA_SC07 9 Dongarra_KOJAK_SC07 k=1 DGETRF k=1, j=2 DGESSM k=1, j=3 DGESSM k=1, i=2 DTSTRF k=1, i=2, j=2 DSSSSM k=1, i=2, j=3 DSSSSM k=1, i=3 DTSTRF k=1, i=3, j=2 DSSSSM k=1, i=3, j=3 DSSSSM Tiled LU Factorization

10 Dongarra_PLASMA_SC07 10 Dongarra_KOJAK_SC07 Column-Major Block data layout Block Data Layout Fine granularity may require novel data formats to overcome the limitations of BLAS on small chunks of data.

11 Dongarra_PLASMA_SC07 11 Dongarra_KOJAK_SC07 DTSTRF DGETRF DGESSM DSSSSM Graph Driven Asynchronous Execution The whole factorization can be represented as a DAG: Tasks can be scheduled asynchronously and in any order as long as dependencies are not violated.  nodes: tasks that operate on tiles  edges: dependencies among tasks

12 Dongarra_PLASMA_SC07 12 Dongarra_KOJAK_SC07 Priorities: Graph Driven Asynchronous Execution A critical path can be defined as the shortest path that connects all the nodes with the higher number of outgoing edges.

13 Dongarra_PLASMA_SC07 13 Dongarra_KOJAK_SC07 Time Idle time Fork-Join Asynchronous Fork-Join vs Asynchronous

14 Dongarra_PLASMA_SC07 14 Dongarra_KOJAK_SC07 Performance: Cholesky

15 Dongarra_PLASMA_SC07 15 Dongarra_KOJAK_SC07 Performance: QR

16 Dongarra_PLASMA_SC07 16 Dongarra_KOJAK_SC07 Performance: LU

17 Dongarra_PLASMA_SC07 17 Dongarra_KOJAK_SC07 Contacts http://icl.cs.utk.edu/~buttari http://www-math.cudenver.edu/~langou http://icl.cs.utk.edu/~kurzak http://netlib.org/utk/people/JackDongarra http://icl.cs.utk.edu

Presented by PLASMA (Parallel Linear Algebra for Scalable Multicore Architectures) ‏ The Innovative Computing Laboratory University of Tennessee Knoxville.

Similar presentations

Presentation on theme: "Presented by PLASMA (Parallel Linear Algebra for Scalable Multicore Architectures) ‏ The Innovative Computing Laboratory University of Tennessee Knoxville."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presented by PLASMA (Parallel Linear Algebra for Scalable Multicore Architectures) ‏ The Innovative Computing Laboratory University of Tennessee Knoxville.

Similar presentations

Presentation on theme: "Presented by PLASMA (Parallel Linear Algebra for Scalable Multicore Architectures) ‏ The Innovative Computing Laboratory University of Tennessee Knoxville."— Presentation transcript:

Similar presentations

About project

Feedback