Presentation is loading. Please wait.

Presentation is loading. Please wait.

2-6-2015 Challenge the future Delft University of Technology Evaluating Multi-Core Processors for Data-Intensive Kernels Alexander van Amesfoort Delft.

Similar presentations


Presentation on theme: "2-6-2015 Challenge the future Delft University of Technology Evaluating Multi-Core Processors for Data-Intensive Kernels Alexander van Amesfoort Delft."— Presentation transcript:

1 2-6-2015 Challenge the future Delft University of Technology Evaluating Multi-Core Processors for Data-Intensive Kernels Alexander van Amesfoort Delft University of Technology a.s.vanamesfoort@tudelft.nl joint work with Ana Varbanescu, Rob van Nieuwpoort, and Henk Sips

2 2 Evaluating Multi-Core Processors for Data-Intensive Kernels Outline Data-intensive applications Gridding Platforms Implementation strategies Measurements Guidelines Conclusion

3 3 Evaluating Multi-Core Processors for Data-Intensive Kernels Data-Intensive Applications Low Arithmetic Intensity (comp : comm) -drastic application- and platform-specific effort Difficult platform choice Memory wall is still getting bigger Data-intensive study worthwhile Provide guidelines and insight into performance behavior and effort

4 4 Evaluating Multi-Core Processors for Data-Intensive Kernels Radio Astronomy Imaging Gridding places irregularly spaced samples on a regular grid (de)gridding consumes most time in imaging Use gridding as a HPC streaming kernel

5 5 Evaluating Multi-Core Processors for Data-Intensive Kernels Gridding (W Projection) Unpredictable, sparse access patterns Low AI (0.33) forall (i = 0..N_freq; j = 0..N_samples)// for all samples g_index = func1((u, v, w)[j], freq[i]); c_index = func2((u, v, w)[j], freq[i]); for (x = 0; x < SUPPORT; x++)// sweep the convolution kernel G[g_index+x] += C[C_index+x] * V[i,j]; Parameterize these properties

6 6 Evaluating Multi-Core Processors for Data-Intensive Kernels Platforms and Test Setup High provided Flop/Byte ratios PlatformCoresClock (GHz) Local Mem (kB) Compute (GFlop/s) Flop/Byte Ratio Dual Xeon 53202x41.8632 + 32 L1 2 x 4096 L2 59.52.8 Core i7 9204 (HT)2.6632 + 32 L1 256 L2 8192 L3 85.22.7 PS3 Cell1+63.20256153.66.0 QS21 Cell2+163.20256409.68.0 Geforce 8800 GTX161.3516+8+8345.04.0 Geforce GTX 280301.3016+8+8936.06.6

7 7 Evaluating Multi-Core Processors for Data-Intensive Kernels Implementation Strategies CPU (pthreads) -replicated grid, master-worker queues, SIMD Cell/B.E. (Cell SDK) -master-worker queues, SIMD, double buffering, PPE multi-threading, line reuse GPU (CUDA) -replicated grid, 1D texturing of the convolution matrix Similar at a high level -but different, non-portable code

8 8 Evaluating Multi-Core Processors for Data-Intensive Kernels CPU Experiments Core i7 suffers less from irregular accesses Still 3x more locality needed Hyperthreading shows a lot of benefit

9 9 Evaluating Multi-Core Processors for Data-Intensive Kernels Cell/B.E. Experiments Achieves the highest performance Could perform much better with more work Some optimizations were applied to the computation

10 10 Evaluating Multi-Core Processors for Data-Intensive Kernels GPU Experiments Write conflicts in the grid problematic Also requires much more work/locality Tesla C1040 results unexplainable

11 11 Evaluating Multi-Core Processors for Data-Intensive Kernels Discussion Reached good speedups, but still way below peak A lot of effort Best performance on Cell/B.E. -depends on application requirements GPUs suitable for lots of data parallelism -can exploit 2D or 3D spatial locality Don’t underestimate standard CPUs -flexibility, availability, cost, and ease of programming

12 12 Evaluating Multi-Core Processors for Data-Intensive Kernels Guidelines Good performance requires: -regular data accesses -data reuse between independent samples Or else: suffer (redesign algorithm) -conceptually, resolve irregularity at a higher level -avoid write conflicts -stream jobs: overlap/multi-buffering in the hierarchy -parameterized job size

13 13 Evaluating Multi-Core Processors for Data-Intensive Kernels Conclusion Challenges: -platform choice -fitting the application onto the platform Similar strategies, different implementation Provided guidelines focussing at memory and data optimization -or change the algorithm


Download ppt "2-6-2015 Challenge the future Delft University of Technology Evaluating Multi-Core Processors for Data-Intensive Kernels Alexander van Amesfoort Delft."

Similar presentations


Ads by Google