Presentation is loading. Please wait.

Presentation is loading. Please wait.

Selected issues of histogramming on GPGPUs

Similar presentations

Presentation on theme: "Selected issues of histogramming on GPGPUs"— Presentation transcript:

1 Selected issues of histogramming on GPGPUs
Krzysztof M. Korcyl, Joanna Płażek, Janusz Chwastowski, Piotr Poznański Cracow University of Technology, ul. Warszawska 24, Cracow, Poland s: {kkorcyl, jplazek, jchwastowski,

2 Large Scale Data Quality Monitoring System
sensor interface cards data monitoring node (s) GPU histogramming card network data collection nodes thousands of sensors

Input data RAM  GLOBAL page-locked (pinned) memory

4 Storage histograms in shared memory „banked” coding
thread_ hist[0] thread_ hist[0] thread_30 hist[0] thread_31 hist[0] thread_ hist[1] thread_ hist[1] thread_ hist[255] thread_ hist[255] thread_31 hist[255] 32 histograms (threads) * 256 bins * 4 B = B = 32 kB

5 Storage histograms in shared memory „notbanked” coding
thread_0 hist[0] thread_0 hist[1] thread_0 hist[30] thread_0 hist[31] thread_0 hist[32] thread_0 hist[33] thread_ hist[255] 48 histograms (threads) * 256 bins * 4 B = B = 48 kB

6 Results Zeus CPU - GPGPU operating system: Scientific Linux 5
processors: 12-core Intel Xeon RAM: 99 GB Tesla M2090 Global memory available on device in bytes: Shared memory available per block in bytes: 49152 Warp size in threads: 32 Number of multiprocessors on device: 16

7 Results Input data 100 events
two data sets – one fully random and the other with half of the channels set to 0 Implementation banked, banked_halfZero, notbaned, notbanked_halfZero, cpu_halfZero, GPU_FPoperation (use some floating point operation), GPU_pinned_RAM (use page-locked memory).



10 Future Explore histogramming efficiency with CPU and GPGPU for other data types: bit, 16-bit integer, floating-point (range of interesing values + underflow and overflow) Implement data transfer over the network: Data computers send data to histogramming node(s) Server at the histogramming node collects partial data and combines them in CPU RAM Monitoring thread on CPU activates GPU kernel when data ready Look into removing transmission bottleneck by installing 10Gb Ethernet card at the histogramming node

Download ppt "Selected issues of histogramming on GPGPUs"

Similar presentations

Ads by Google