Download presentation

Presentation is loading. Please wait.

Published byNichole Blurton Modified over 2 years ago

1
Delivering High Performance to Parallel Applications Using Advanced Scheduling Nikolaos Drosinos, Georgios Goumas Maria Athanasaki and Nectarios Koziris National Technical University of Athens Computing Systems Laboratory {ndros,goumas,maria,nkoziris}@cslab.ece.ntua.gr www.cslab.ece.ntua.gr

2
5/9/2003Parallel Computing 20032 Overview Introduction Background Code Generation Computation/Data Distribution Communication Schemes Summary Experimental Results Conclusions – Future Work

3
5/9/2003Parallel Computing 20033 Introduction Motivation: A lot of theoretical work has been done on arbitrary tiling but there are no actual experimental results! There is no complete method to generate code for non-rectangular tiles

4
5/9/2003Parallel Computing 20034 Introduction Contribution: Complete end-to-end SPMD code generation method for arbitrarily tiled iteration spaces Simulation of blocking and non-blocking communication primitives Experimental evaluation of proposed scheduling scheme

5
5/9/2003Parallel Computing 20035 Overview Introduction Background Code Generation Computation/Data Distribution Communication Schemes Summary Experimental Results Conclusions – Future Work

6
5/9/2003Parallel Computing 20036 Background Algorithmic Model: FOR j 1 = min 1 TO max 1 DO … FOR j n = min n TO max n DO Computation(j 1,…,j n ); ENDFOR … ENDFOR Perfectly nested loops Constant flow data dependencies (D)

7
5/9/2003Parallel Computing 20037 Background Tiling: Popular loop transformation Groups iterations into atomic units Enhances locality in uniprocessors Enables coarse-grain parallelism in distributed memory systems Valid tiling matrix H:

8
5/9/2003Parallel Computing 20038 Tiling Transformation Example: FOR j 1 =0 TO 11 DO FOR j 2 =0 TO 8 DO A[j 1,j 2 ]:=A[j 1 -1,j 2 ] + A[j 1 -1,j 2 -1]; ENDFOR

9
5/9/2003Parallel Computing 20039 Rectangular Tiling Transformation 3 0 P = 0 3 1/3 0 H = 0 1/3

10
5/9/2003Parallel Computing 200310 Non-rectangular Tiling Transformation 3 3 P = 0 3 1/3 -1/3 H = 0 1/3

11
5/9/2003Parallel Computing 200311 Why Non-rectangular Tiling? Reduces communication Enables more efficient scheduling schemes 8 communication points 6 communication points 6 time steps5 time steps

12
5/9/2003Parallel Computing 200312 Overview Introduction Background Code Generation Computation/Data Distribution Communication Schemes Summary Experimental Results Conclusions – Future Work

13
5/9/2003Parallel Computing 200313 Computation Distribution We map tiles along the longest dimension to the same processor because: It reduces the number of processors required It simplifies message-passing It reduces total execution times when overlapping computation with communication

14
5/9/2003Parallel Computing 200314 Computation Distribution P3 P2 P1

15
5/9/2003Parallel Computing 200315 Data Distribution Computer-owns rule: Each processor owns the data it computes Arbitrary convex iteration space, arbitrary tiling Rectangular local iteration and data spaces

16
5/9/2003Parallel Computing 200316 Data Distribution

17
5/9/2003Parallel Computing 200317 Data Distribution

18
5/9/2003Parallel Computing 200318 Data Distribution

19
5/9/2003Parallel Computing 200319 Overview Introduction Background Code Generation Computation/Data Distribution Communication Schemes Summary Experimental Results Conclusions – Future Work

20
5/9/2003Parallel Computing 200320 P3 P2 P1 Communication Schemes With whom do I communicate?

21
5/9/2003Parallel Computing 200321 P3 P2 P1 With whom do I communicate? Communication Schemes

22
5/9/2003Parallel Computing 200322 What do I send? Communication Schemes

23
5/9/2003Parallel Computing 200323 Blocking Scheme P3 P2 P1 j1j1 j2j2 12 time steps

24
5/9/2003Parallel Computing 200324 Non-blocking Scheme P3 P2 P1 j1j1 j2j2 6 time steps

25
5/9/2003Parallel Computing 200325 Overview Introduction Background Code Generation Computation/Data Distribution Communication Schemes Summary Experimental Results Conclusions – Future Work

26
5/9/2003Parallel Computing 200326 Code Generation Summary Advanced Scheduling = Suitable Tiling + Non-blocking Communication Scheme Sequential Code Dependence Analysis Parallelization Parallel SPMD Code Tiling Transformation Sequential Tiled Code Tiling Computation Distribution Data Distribution Communication Primitives

27
5/9/2003Parallel Computing 200327 Code Summary – Blocking Scheme

28
5/9/2003Parallel Computing 200328 Code Summary – Non-blocking Scheme

29
5/9/2003Parallel Computing 200329 Overview Introduction Background Code Generation Computation/Data Distribution Communication Schemes Summary Experimental Results Conclusions – Future Work

30
5/9/2003Parallel Computing 200330 Experimental Results 8-node SMP Linux Cluster (800 MHz PIII, 128 MB RAM, kernel 2.4.20) MPICH v.1.2.5 ( --with-device=p4, --with-comm=shared ) g++ compiler v.2.95.4 ( -O3 ) FastEthernet interconnection 2 micro-kernel benchmarks (3D): Gauss Successive Over-Relaxation (SOR) Texture Smoothing Code (TSC) Simulation of communication schemes

31
5/9/2003Parallel Computing 200331 SOR Iteration space M x N x N Dependence matrix: Rectangular Tiling: Non-rectangular Tiling:

32
5/9/2003Parallel Computing 200332 SOR

33
5/9/2003Parallel Computing 200333 SOR

34
5/9/2003Parallel Computing 200334 TSC Iteration space T x N x N Dependence matrix: Rectangular Tiling: Non-rectangular Tiling:

35
5/9/2003Parallel Computing 200335 TSC

36
5/9/2003Parallel Computing 200336 TSC

37
5/9/2003Parallel Computing 200337 Overview Introduction Background Code Generation Computation/Data Distribution Communication Schemes Summary Experimental Results Conclusions – Future Work

38
5/9/2003Parallel Computing 200338 Conclusions Automatic code generation for arbitrary tiled spaces can be efficient High performance can be achieved by means of a suitable tiling transformation overlapping computation with communication

39
5/9/2003Parallel Computing 200339 Future Work Application of methodology to imperfectly nested loops and non-constant dependencies Investigation of hybrid programming models (MPI+OpenMP) Performance evaluation on advanced interconnection networks (SCI, Myrinet)

40
5/9/2003Parallel Computing 200340 Questions? http://www.cslab.ece.ntua.gr/~ndros

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google