Presentation is loading. Please wait.

Presentation is loading. Please wait.

The PTX GPU Assembly Simulator and Interpreter N.M. Stiffler Zheming Jin Ibrahim Savran.

Similar presentations


Presentation on theme: "The PTX GPU Assembly Simulator and Interpreter N.M. Stiffler Zheming Jin Ibrahim Savran."— Presentation transcript:

1 The PTX GPU Assembly Simulator and Interpreter N.M. Stiffler Zheming Jin Ibrahim Savran

2 Parallel Thread Execution – PTX Pseudo Assembly language used in nVidia’s Cuda Programming Environment

3 Compute Unified Device Architecture – CUDA Parallel architecture developed by nVidia Used in nVidia’s GPU’s Using CUDA, the latest NVIDIA GPUs effectively become open architectures like CPUs. ▫Perk - GPUs have a parallel architecture, each core capable of running thousands of threads simultaneously ▫ If a given application is parallelizable then the GPU can offer large performance benefits.

4 Visual

5 Accessing the GPU? A subset of C with nVidia extensions ▫Why is it a subset of C  recursion-free  function-pointer-free

6 Tying It Together

7 NVCC 'nvcc‘ (nVidia gcc) compiler translates code written in CUDA into PTX GPU contains a compiler which translates the PTX into something which can be run on the processing cores.

8 CSCE513 Course Presentation Zheming Jin

9 Compiling CUDA

10 Parallel Thread eXecution A Pseduo-assembly language Defines a Virtual Machine and ISA The ‘nvcc’ compiler translates code in CUDA into PTX

11 The GPU device Device NVIDIA GTX 260 No. of CUDA Cores 192 Processor Clock 1242 MHz Memory Clock 999 MHz Memory bandwidth 111.9 GB/sec Max. Graphics Card Power 182 W

12 Ocelot Project(Cont) Aim to compile CUDA programs to so that they can be run on architectures other than NIVIDA GPUs Allow architectures other than NVIDIA GPUs to leverage the parallelism in PTX The project is freely available on Google Code.

13 Ocelot Compilation process

14 GPU Hardware emulation A device emulation mode for the purpose of debugging. The device code is compiled for and runs on the host, allowing the programmer to use the host’s native debugging support to debug the application as if it were a host application

15 GPU Hardware emulation A device emulation mode for the purpose of debugging. The device code is compiled for and runs on the host, allowing the programmer to use the host’s native debugging support to debug the application as if it were a host application

16 Matrix Multiplications

17 Ibrahim Savran CUDA PTX & an Interpreter

18 Outline Some Notes About CUDA Infrastructure A CUDA Example and PTX PTX Overview – The goals of PTX – PTX Instructions (some) Case Study: PTX Simulator Interpretation Process Steps Demo Time Q/A Section References

19 Caveats There’s a lot not detailed in manuals – Bit level formats of instructions – Handling of divergent thread paths – What kind of variables are stored where

20 A CUDA Example and PTX __global__ void vecAdd(float* A) { int i = threadIdx.x; A[i] = A[i] + 2.0; } void main() { // Kernel invocation vecAdd >>(A); }.reg.b32 r1, r2;.global.f32 array[N]; start: mov.b32 r1, %tid.x; shl.b32 r1, r1, 2; // shift thread id by 2 bits ld.global.b32 r2, array[r1]; // thread[tid] gets array[tid] add.u32 r2, r2, 2; // add 2

21 The goals of PTX Provide a stable ISA that spans multiple GPU generations. Provide a machine-independent ISA for C/C++ and other compilers to target. Provide a common source-level ISA for optimizing code generators and translators, which map PTX to specific target machines. Provide a scalable programming model that spans GPU sizes from a single unit to many parallel units. (1 to 30 parallel units)

22 Instruction Classes of PTX Data movement and conversion – Including load/store, cvt Computational – Arithmetic and Logic Operations Control Flow – jumps, branches, calls Parallel Synchronization – bar, sync

23 PTX Instructions (some)

24 Case Study: PTX Simulator This PTX Simulator is developed by Patrick Moran and Dr. Bakos. - Case Study “A Clustering Algorithm 'K-means'” The goal of this project is finding performance- limiting execution behaviors, such as control divergence and instruction-level latencies.

25 Interpretation Processes 1- Use the nVidia compiler with the -ptx option 2- Delete the unnecessary parts from the PTX file. Taking the "raw" ptx file and turns it into a cleaned up PTX, as well as a file that helps us to lay out variables in shared memory. 3- Interpret the new PTX file

26 Demo Time

27 Q/A Session ?

28 Reference NVIDIA Corporation, NVIDIA CUDATM: Programming Guide Version 2.3. Introduction to CUDA Programming Ibrahim Savran’s notes Course textbook P. A. Moran, J. D. Bakos, “ A PTX Simulator for Performance Tuning CUDA Code" NVIDIA Corporation, PTX ISA Version 1.4 March 31, 2009


Download ppt "The PTX GPU Assembly Simulator and Interpreter N.M. Stiffler Zheming Jin Ibrahim Savran."

Similar presentations


Ads by Google