Presentation is loading. Please wait.

Presentation is loading. Please wait.

Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji,

Similar presentations


Presentation on theme: "Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji,"— Presentation transcript:

1 Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji, et al.

2 Overview  Problem:  Want to use the GPU for things other than graphics, however the costs can be high  Solution:  Improve the CUDA drivers  Results:  As compared to node of a supercomputer, worth it  Conclusion  These improvements make using GPGPU’s more feasible

3 Problem: Need to computation power  Why GPU’s?  GPU’s are not being fully realized as a resource, often sitting idle when not being used for graphics  Better performance for less power as compared to CPU’s  What’s the issue? Cost.  Efficient scheduling – timing data loads with its uses  Memory management – using the small amount of memory available effectively  Loads and stores – waiting for memory transfers, taking 100’s of cycles

4 Solutions  Brook+ by AMD, Larrabee by Intel  CUDA by NVIDA  Greatest technological maturity at the time  Paper investigating existing technology and suggested improvements 30 Multi- Processors 8 Streaming Processors 16kb

5 NVIDA’s Tesla C1060 GPU vs. Hitachi HA8000-tc/RS425 (T2K) Super Computer  T2K – fastest supercomputer in Japan T2KC1060 Cores/MPs1630 Clock frequency2.3 GHz1.3 GHz Single SIMD vector length 432 Single peak294 Gflops933 Gflops Main memory32 GB4 GB Memory single peak Cost~$40,000~$2,500 Power300 W200 W

6 Issues to Overcome  High SIMD vector length  Small main memory size  High register spill cost  No L2 cache but rather read-only texture caches

7 Methods to Hide Away Latency

8  Computation time between communications > Communication latency  Worth sending the data over to the GPU  Increasing bandwidth and size of messages makes the constant term in overhead latency seem smaller  Efficient use of registers to prevent spills  Deciding what work to do where, GPU vs. CPU, work sharing  Minimizing divergent warps using atomic operations found in CUDA  Divergent warp occur when threads must follow both paths

9 Results  Variable-sized multi-round data transfer scheduling Number of rounds

10 Results  Use of atomic instructions in CUDA to minimize latency

11 Conclusion  CUDA gives programmers the ability to harness the power of the GPU for general uses.  The improvements presented allow this option to be more feasible.  Strategic use of GPGPU’s as a resource will improve speed and efficiency.  However, presented material mainly theoretical, not much strong data to back up  More suggestions than implementations, promoting GPGPU use


Download ppt "Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji,"

Similar presentations


Ads by Google