Presentation is loading. Please wait.

Presentation is loading. Please wait.

High Performance Computing The GotoBLAS Library. HPC: numerical libraries  Many numerically intensive applications make use of specialty libraries to.

Similar presentations


Presentation on theme: "High Performance Computing The GotoBLAS Library. HPC: numerical libraries  Many numerically intensive applications make use of specialty libraries to."— Presentation transcript:

1 High Performance Computing The GotoBLAS Library

2 HPC: numerical libraries  Many numerically intensive applications make use of specialty libraries to perform common operations: Linear algebra operators (e.g., dot products, matrix-vector multiplies) Fast Fourier transforms Linear solvers  To maximize application performance (and throughput), we want these libraries to be highly optimized for each computer architecture  One commonly used numerical library is BLAS: Contains routines that provide standard building blocks for performing basic vector and matrix operations Commonly used in scientific and engineering software and graphics processing “High-profile” since it is used with the Linpack benchmark, used to rank the fastest supercomputers in the world (Top 500 list)

3 HPC: GotoBLAS  GotoBLAS is an implementation of the BLAS library developed by TACC researcher Kazushige Goto.  Kazushige has been called “the Michael Jordan of high- performance linear algebra kernels.”  Software is designed for all common chipset architectures, including: Power 4, Power 5 Opteron Blue Gene/L Pentium 4/Xeon (32-bit and 64-bit) Itanium 2

4 HPC: GotoBLAS  Most vendors provide their own BLAS implementation: Significant development overhead incurred for new architectures Large code base with many switching branches based on input sizing  Kazushige’s approach uses a simplified model No major context switching Functions separated based on performance impact  Non-performance bits written in C  Crucial performance kernels written in assembly  GotoBLAS tries to minimize assembler codes Actual assembler code is really small Easy to improve and debug  Benefit: It takes only 3 to 7 days to develop a tuned BLAS for a new architecture

5 GotoBLAS DGEMM performance ArchitectureEfficiency Itanium298.9% PPC440 FP298.2% Alpha 2126496.5% POWER596.2% Pentium495.7% Opteron92.8% PPC970MP92.0% SPARC IV92.0% Efficiency indicates the ratio of observed performance to the maximum theoretical value. DGEMM is one of the most widely used BLAS functions; it performs matrix-matrix multiplies.

6 Example GotoBLAS comparisons DGEMM POWER5 1.9GHz 0 760 1520 2280 3040 3800 4560 5320 6080 6840 7600 0500100015002000 Size MFlops GOTOESSLATLAS

7 HPC: GotoBLAS  In April 2006, TACC released the latest version of GotoBLAS: Free to use for academic and research purposes Supports a wide range of Fortran compiler interfaces Available to commercial users through UT’s Office of Technology Commercialization  Source code for the library is now available.  Redistribution rights are also available.

8 Thanks for your time! Karl W. Schulz, karl@tacc.utexas.edukarl@tacc.utexas.edu Kazushige Goto, kgoto@tacc.utexas.edukgoto@tacc.utexas.edu


Download ppt "High Performance Computing The GotoBLAS Library. HPC: numerical libraries  Many numerically intensive applications make use of specialty libraries to."

Similar presentations


Ads by Google