Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Tennessee www.netlib.org/atlas Automatically Tuned Linear Algebra Software (ATLAS) R. Clint Whaley University of Tennessee www.netlib.org/atlas.

Similar presentations


Presentation on theme: "University of Tennessee www.netlib.org/atlas Automatically Tuned Linear Algebra Software (ATLAS) R. Clint Whaley University of Tennessee www.netlib.org/atlas."— Presentation transcript:

1 University of Tennessee www.netlib.org/atlas Automatically Tuned Linear Algebra Software (ATLAS) R. Clint Whaley University of Tennessee www.netlib.org/atlas

2 What is ATLAS  A package that adapts to differing architectures via AEOS techniques -Initially, supply BLAS  Automated Empirical Optimization of Software (AEOS) -Machine searches opt space -Finds application- apparent architecture  AEOS requires: -Method of code variation »Code generation »Multiple implement. »Parameterization -Sophisticated Timers -Robust search heuristic

3 University of Tennessee www.netlib.org/atlas Why ATLAS is needed  BLAS require many man-hours / platform -Only done if financial incentive is there »Many platforms will never have an optimal version -Lags behind hardware -May not be affordable by everyone -Improves vendor code  Allows for portably optimal codes -Obsolescence insurance  Operations may be important, but not general enough for standard

4 University of Tennessee www.netlib.org/atlas ATLAS Software  Coming soon -pthread support -Open source kernels »SSE & 3DNOW! »GOTO ev5/6 BLAS -Performance for banded and packed -More LAPACK  Coming not-so- soon -Sparse support -User customization  Currently provided -Full BLAS (C & F77) »Level 3 BLAS u Generated GEMM -1-2 hours install time per precision u Recursive GEMM- based L3 BLAS -Antoine Petitet »Level 2 BLAS u GEMV & GER ker »Level 1 BLAS -Some LAPACK »LU, LLt

5 University of Tennessee www.netlib.org/atlas Algorithmic Approach for Matrix Multiply  Only generated code is on-chip multiply  All BLAS operations written in terms of generated on-chip multiply  All transpose cases coerced through data copy to 1 case of on-chip multiply -Only 1 case generated per platform M C A B N K N M K * NB

6 University of Tennessee www.netlib.org/atlas Algorithmic approach for Level 3 BLAS  Recur down to L1 cache block size  Need kernel at bottom of recursion -Use gemm-based kernel for portability 0 0 0 0 0 0 0 Recursive TRMM

7 University of Tennessee www.netlib.org/atlas 500x500 DGEMM Across Various Architectures

8 University of Tennessee www.netlib.org/atlas 500 x 500 Double Precision RB LU factorization

9 University of Tennessee www.netlib.org/atlas 500x500 Recursive BLAS on UltraSparc 2200


Download ppt "University of Tennessee www.netlib.org/atlas Automatically Tuned Linear Algebra Software (ATLAS) R. Clint Whaley University of Tennessee www.netlib.org/atlas."

Similar presentations


Ads by Google