Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tim Madden ODG/XSD.  Graphics Processing Unit  Graphics card on your PC.  “Hardware accelerated graphics”  Video game industry is main driver.  More.

Similar presentations


Presentation on theme: "Tim Madden ODG/XSD.  Graphics Processing Unit  Graphics card on your PC.  “Hardware accelerated graphics”  Video game industry is main driver.  More."— Presentation transcript:

1 Tim Madden ODG/XSD

2  Graphics Processing Unit  Graphics card on your PC.  “Hardware accelerated graphics”  Video game industry is main driver.  More recently used for non-graphics applications.

3  Card on the PCI-Express buss.  GPU card contains its own RAM and processor(s).  What is a CORE?  A core is an ALU, arithmetic logic unit.  ALU is basically a single processor that can run a computer program.  Modern PCs have “Quad Core.” Basically 4 processors. This refers to the processor on the motherboard that runs Windows.  GPU has hundreds of Cores!

4  Originally for graphics applications  Graphics code developed using DirectX SDK (Windows and X box) or Open GL (general platform).  OpenGL/DirectX are precompiled libraries that run on GPU. Only allow graphics.  CUDA- a general SDK allowing the writing of C programs that run on GPU.  CUDA allows for any general application to run on GPU. Non-graphics, scientific programming.

5  Parallel programming?  What is a “Thread?”  A sequence of commands in a program that run after another. void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }

6  A typical program on a PC has many threads running at once.  An EPICS IOC has about 20 threads running.  This Powerpoint program is running 8 threads (at time of typing this sentence). void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); }

7  The more threads running, the slower each thread.  Solution is to add more processors. A “core” is a processor.  “Quad Core” PC has 4 processors, each running hundreds of threads. void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } PROCESSOR void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } PROCESSOR void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } PROCESSOR void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } void oneThread(int N) { int counter=0; while(1) { printf(“Thread %d Count %d\n”, N, counter++); Sleep(1000); } PROCESSOR

8  Make a thread, that in turn makes a new thread, etc…  Void haveChildren()  Update global thread counter, and printf.  Sleep 500ms  Call haveChildren() on a New thread.  Display a window  If OK is hit on window, then exit(0)  When haveChildren is called, an infinite number of threads is created. An infinite number of windows will display.  Threads show in Task Manager

9

10  Instead of running 100’s of threads, let us run millions of threads!  GPU can have 1024 processors. Each processor can run 1000’s of threads at once.  Adding more processors speeds up the program.

11 Thread

12  On the host (not the GPU) we write a single thread to process an image.  1 pixel at a time.  For a 1kx1k image, this is 1M operations in sequence. // My image data Short *image = new short[1024*1024]; Int k For (k=0; k<1024*1024; k++) { image[k] = image[k] + 1; }

13  Write code for a single pixel, and call the code in 1M separate threads.  Cuda will dole out threads to Cores for you on the GPU.  Pixel X runs on thread X. __global__ void subtractDarkImage_k( unsigned short *d_Dst, unsigned short *d_Src, int dataSize ){ const int i = blockDim.x * blockIdx.x + threadIdx.x; if(i >= dataSize) return; d_Dst[i] =d_Src[i] +1; }

14  Plugin to Area Detector to run calculations on GPU.  When new image comes from detector:  Host sends image to GPU  GPU does calcs.  Host retrieves result from GPU..  Host sends results to EPICS etc.  GPU code compiled as DLL.  Epics Area Detector loads DLL and runs.  Allows arbitrary calculations on GPU. Just make a new DLL.  Separates cross compile of GPU code, from EPICS build.  One Area Detector plugin for all GPU calculations.  Can define EPICS variables in the DLL. Host queries DLL for parameters and connects EPICS PVs.  Debug by attaching to IOC Process.  Set traps in DLL.  Recompile DLL  Restart IOC to load updated DLL. No IOC rebuild.

15  Sending image to GPU and back.  Dark Subtraction on Host versus GPU.  Fast convolution on GPU versus Host.  Running several programs on GPU at once.

16

17  Low End GPU, Nvidea Quadro NVS 290  Data transfer to/ from GPU: 4ms round trip for 1MB image.  Dark Subtraction: 1kx1k image, 16 bit.  8ms on Host  30ms on GPU  Fast Convolution :1k x 1k image, 16 bit.  250ms on Host  50ms on GPU  Overhead spawning threads on GPU?  Very simple calculation better on host.  Complex calculation better on GPU

18


Download ppt "Tim Madden ODG/XSD.  Graphics Processing Unit  Graphics card on your PC.  “Hardware accelerated graphics”  Video game industry is main driver.  More."

Similar presentations


Ads by Google