Presentation is loading. Please wait.

Presentation is loading. Please wait.

 GPU Power Model Nandhini Sudarsanan Nathan Vanderby Neeraj Mishra Usha Vinodh

Similar presentations


Presentation on theme: " GPU Power Model Nandhini Sudarsanan Nathan Vanderby Neeraj Mishra Usha Vinodh"— Presentation transcript:

1  GPU Power Model Nandhini Sudarsanan sudar003@umn.edusudar003@umn.edu Nathan Vanderby vande501@umn.edu Neeraj Mishra mish0088@umn.edu Usha Vinodh kuma0253@un.edu Chi Xu xuchi@umn.eduvande501@umn.edumish0088@umn.edukuma0253@un.eduxuchi@umn.edu

2 Outline  Introduction and Motivation  Analytical Model Description  Experiment Setup  Results  Conclusion and Further Work CSCI 8205: GPU Power Model 2 5/4/11

3 Introduction  Develop a methodology for building an accurate power model for a GPU.  Validate with a NVIDA’s GTX 480 GPU.  Measure power efficiency of various NVIDIA SDK benchmarks.  Accurate power model can help  Explore various architectural and algorithmic trade offs.  Figure out balance of workload between GPU and CPU. CSCI 8205: GPU Power Model 3 5/4/11

4 Motivation  Power Consumption: Key criterion for future Hardware Devices and Embedded Software.  Effect of increased power density has been not been felt till now  Supply voltage was scaled back too.  Current and Power density remained constant.  Further reduction in supply voltage difficult in future  Supply voltage approaching close to threshold voltage.  Gate oxide thickness almost equal to 1nm. CSCI 8205: GPU Power Model 4 5/4/11

5 Motivation CSCI 8205: GPU Power Model 5 5/4/11

6 GPU Processing Power CSCI 8205: GPU Power Model 6 5/4/11

7 Price of Power  Maximum Load = Lot of Power  Nvidia 8800 GTX: 137W  Intel Xeon LS5400: 50W CSCI 8205: GPU Power Model 7 5/4/11

8 Power Wall  Power Density in GPUs larger that even high end CPUs  Power gating, Clock gating have been successfully employed in CPUs [Brooks, Hpca 2001]  Power gating, Clock gating and other H/W based schemes are not used in most GPUs [Kim Isca 2010]  Accurate power model can help  Explore various architectural and algorithmic trade offs.  Figure out balance of workload between GPU and CPU. CSCI 8205: GPU Power Model 8 5/4/11

9 Background  Power consumption can be divided into: Power = Dynamic_power + Static_power + Short_Ckt_Power  Dynamic power is determined by run-time events  Fixed-function units: texture filtering and rasterization  Programmable units: memory and floating point  Static power determined by  circuit technology  chip layout  operating temperature. P = V CC * N* K design * I leak CSCI 8205: GPU Power Model 9 5/4/11

10 Previous Power Models  Statistical power modeling approach for GPU [Matsuoka 2010]  Uses 13 CUDA Performance counters (ld,st,branch,tlb miss) to obtain profile  Finds correlation b/w profiles and power by statistical model learning.  Lot of information not captured by counters lost  Cycle-level simulations based Power Model,[Skadron HWWS'04]  Assume hypothetical architecture to explore new GPU microarchitectures and model power and leakage properties  Cycle-level processor simulations are time consuming [Martonosi&Isci 2003]  Do not allow a complete view of operating system effects, I/O [Isci 2003] CSCI 8205: GPU Power Model 10 5/4/11

11 Outline  Introduction and Motivation  Analytical Model Description  Parser  Power Model  Experiment Setup  Results  Conclusion and Further Work CSCI 8205: GPU Power Model 11 5/4/11

12 Need for a Parser  GPGPUsim is time consuming  GPGPUsim output is not tailored to our needs  Parser is very fast  GPGPUsim works only with CUDA 2.3 or prior CSCI 8205: GPU Power Model 12 5/4/11

13 Limitations of the Parser  Dynamic loops are not automatically determined.  Branch prediction is assumed to be taken  Highly tailored to our specific needs.  A change in the PTX layout might require change to parser. CSCI 8205: GPU Power Model 13 5/4/11

14 Outline  Introduction and Motivation  Analytical Model Description  Parser  Power Model  Experiment Setup  Results  Conclusion and Further Work CSCI 8205: GPU Power Model 14 5/4/11

15 Power Model  PTX Level CSCI 8205: GPU Power Model 15 5/4/11

16 Power Model  Assembly Level CSCI 8205: GPU Power Model 16 5/4/11

17 Outline  Introduction and Motivation  Analytical Model Description  Parser  Power Model  Experiment Setup  Results  Conclusion and Further Work CSCI 8205: GPU Power Model 17 5/4/11

18 Experiment Setup - Hardware  Measure Power Consumption and Temperature  Sample Temperature @ 10Hz, GPU sensor  Current Clamp for PCIE & GPU Power Cable  Data Acquisition Card @ 100Hz  GPU Performance Counter  Profile 57 Counters per Kernel  9 Executions CSCI 8205: GPU Power Model 18 5/4/11

19 Experiment Setup - Software  Driver API  Generate and Modify PTX code  Minimize control loops  CUDA 4.0  Built in Binary -> Assembly Converter (cuobjdump)  MATLAB to build model  Remote login CSCI 8205: GPU Power Model 19 5/4/11

20 CUDA – Fermi Architecture  Third Generation Streaming Multiprocessor(SM)  32 CUDA cores per SM, 4x over GT200  1024 thread block size, 2x over GT200  Unified address space enables full C++ support  Improved Memory Subsystem 5/4/11CSCI 8205: GPU Power Model 20

21 CUDA – Fermi Architecture 5/4/11CSCI 8205: GPU Power Model 21 Fermi Memory Hierarchy Registers SM - 0 L1 Cache Shared Mem. Registers SM - N L1 CacheShared Mem. L2 Cache Global Memory

22 Benchmarks  Small number of overhead operations (loop counters, initialization, etc.).  Computational intensive work to allow for an experiment of significant length for accurate current measurement.  Exhibit high utilization of the CUDA cores, few data hazards as possible.  Grid and block sizes appropriately so that all SM are used, since idle SM leak.  Accordingly 7 benchmarks were selected from CUDA SDK. 5/4/11CSCI 8205: GPU Power Model 22

23 Benchmarks  Our benchmarks  2D convolution  Matrix Multiplication  Vector Addition  Vector Reduction  Scalar Product  DCT 8x8  3DFD 5/4/11CSCI 8205: GPU Power Model 23

24 Limitations of PTX  Higher level than assembly  Divide & Sqrt: 1 PTX line, library in assembly  Compiler optimizations from PTX -> assembly  Doesn’t reflect RAW dependencies  Performance counters use assembly CSCI 8205: GPU Power Model 24 5/4/11

25 Outline  Introduction and Motivation  Analytical Model Description  Parser  Power Model  Experiment Setup  Results  Conclusion and Further Work CSCI 8205: GPU Power Model 25 5/4/11

26 Results CSCI 8205: GPU Power Model 26 5/4/11

27 Outline  Introduction and Motivation  Analytical Model Description  Parser  Power Model  Experiment Setup  Results  Conclusion and Further Work CSCI 8205: GPU Power Model 27 5/4/11

28 Conclusion and Further Work  Conclusion  Further Work  Take into account context switches  Consider Multiple kernels running simultaneously CSCI 8205: GPU Power Model 28 5/4/11

29 The End Thanks Q&A CSCI 8205: GPU Power Model 29 5/4/11


Download ppt " GPU Power Model Nandhini Sudarsanan Nathan Vanderby Neeraj Mishra Usha Vinodh"

Similar presentations


Ads by Google