Presentation is loading. Please wait.

Presentation is loading. Please wait.

CUBLAS and CUSPARSE MVM Timing Gavin Harrison. SMVM Algorithm.

Similar presentations


Presentation on theme: "CUBLAS and CUSPARSE MVM Timing Gavin Harrison. SMVM Algorithm."— Presentation transcript:

1 CUBLAS and CUSPARSE MVM Timing Gavin Harrison

2 SMVM Algorithm

3 NVIDIA Memory Hierarchy Global Memory: large/high latency. Shared Memory: shared cache for each set of processors. Constant/texture memory: read only in global memory + on chip cache. – Constant memory faster, but only one port. – Texture Memory doesn’t suffer greatly from irregular access. Also, beneficial given 2D spatial locality.

4 Tuning SMVM for GPU (GT 280) Use multiple threads / row, use syncthreads and combine partial results. Access memory at stride. – Half warps access sequential addresses. – Allows for fewer memory reads from global memory. Align rows. – Also helps decrease memory reads from global memory. Use texture memory for input vector. – Input vector is reused. – Texture reads are cached, and benefit from spacial locality.

5 Improvements in Fermi (GTX 580) General L1/L2 cache structure. – L1 cache and Shared Memory cache are configurable to be 48 KB or 16 KB (64 KB shared between them). – L2 is 768 KB. Improved support for double precision floating point numbers. Added support for 32 bit integer multiplication. 32 SPs per SM.

6 CUSPARSE SMVM Performance

7 CUSPARSE SMVM Speedup Over OSKI (single precision)

8 CUBLAS MVM Performance

9 CUBLAS MVM Speedup over ATLAS


Download ppt "CUBLAS and CUSPARSE MVM Timing Gavin Harrison. SMVM Algorithm."

Similar presentations


Ads by Google