Presentation is loading. Please wait.

Presentation is loading. Please wait.

Krzysztof M. Korcyl, Joanna Pła ż ek, Janusz Chwastowski, Piotr Pozna ń ski Cracow University of Technology, ul. Warszawska 24, 31-155 Cracow, Poland emails:

Similar presentations


Presentation on theme: "Krzysztof M. Korcyl, Joanna Pła ż ek, Janusz Chwastowski, Piotr Pozna ń ski Cracow University of Technology, ul. Warszawska 24, 31-155 Cracow, Poland emails:"— Presentation transcript:

1 Krzysztof M. Korcyl, Joanna Pła ż ek, Janusz Chwastowski, Piotr Pozna ń ski Cracow University of Technology, ul. Warszawska 24, Cracow, Poland s: {kkorcyl, jplazek, jchwastowski, Selected issues of histogramming on GPGPUs

2 thousands of sensors sensor interface cards data collection nodes network data monitoring node (s) GPU histogramming card Large Scale Data Quality Monitoring System

3 RAM CPU SHARED BLOCK SHARED BLOCK SHARED BLOCK SHARED BLOCK GLOBAL DEVICE NETWORK CARD RING BUFFER Data flow Input data RAM GLOBAL page-locked (pinned) memory

4 thread_0 hist[0] thread_1 hist[0] … thread_30 hist[0] thread_31 hist[0] thread_0 hist[1] thread_1 hist[1] … thread_0 hist[255] thread_1 hist[255] … thread_31 hist[255] 32 histograms (threads) * 256 bins * 4 B = B = 32 kB Storage histograms in shared memory banked coding

5 thread_0 hist[0] thread_0 hist[1] … thread_0 hist[30] thread_0 hist[31] thread_0 hist[32] thread_0 hist[33] … thread_31 hist[255] 48 histograms (threads) * 256 bins * 4 B = B = 48 kB Storage histograms in shared memory notbanked coding

6 Results Zeus CPU - GPGPU operating system: Scientific Linux 5 processors: 12-core Intel Xeon RAM: 99 GB Tesla M2090 Global memory available on device in bytes: Shared memory available per block in bytes: Warp size in threads: 32 Number of multiprocessors on device: 16

7 Results Input data 100 events two data sets – one fully random and the other with half of the channels set to 0 Implementation banked, banked_halfZero, notbaned, notbanked_halfZero, cpu_halfZero, GPU_FPoperation (use some floating point operation), GPU_pinned_RAM (use page-locked memory).

8

9

10 Future Explore histogramming efficiency with CPU and GPGPU for other data types: bit, 16-bit integer, floating-point (range of interesing values + underflow and overflow) Implement data transfer over the network: Data computers send data to histogramming node(s) Server at the histogramming node collects partial data and combines them in CPU RAM Monitoring thread on CPU activates GPU kernel when data ready Look into removing transmission bottleneck by installing 10Gb Ethernet card at the histogramming node


Download ppt "Krzysztof M. Korcyl, Joanna Pła ż ek, Janusz Chwastowski, Piotr Pozna ń ski Cracow University of Technology, ul. Warszawska 24, 31-155 Cracow, Poland emails:"

Similar presentations


Ads by Google