Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reducing the Communication Cost via Chain Pattern Scheduling Florina M. Ciorba, Theodore Andronikos, Ioannis Drositis, George Papakonstantinou and Panayotis.

Similar presentations


Presentation on theme: "Reducing the Communication Cost via Chain Pattern Scheduling Florina M. Ciorba, Theodore Andronikos, Ioannis Drositis, George Papakonstantinou and Panayotis."— Presentation transcript:

1 Reducing the Communication Cost via Chain Pattern Scheduling Florina M. Ciorba, Theodore Andronikos, Ioannis Drositis, George Papakonstantinou and Panayotis Tsanakas National Technical University of Athens Computing Systems Laboratory cflorina@cslab.ece.ntua.gr www.cslab.ece.ntua.gr

2 July 29, 2005NCA'052Outline IntroductionIntroduction Definitions and notations Chain pattern scheduling unbounded #P – high communication fixed #P – moderate communication Performance results Conclusions Future work

3 July 29, 2005NCA'053Introduction  Motivation: A lot of work has been done in parallelizing loops with dependencies, but very little work exists on explicitly minimizing the communication incurred by certain dependence vectors

4 July 29, 2005NCA'054Introduction  Contribution: Enhancing the data locality for loops with dependencies Reducing the communication cost by mapping iterations tied by certain dependence vectors to the same processor Applicability to various multiprocessor architectures

5 July 29, 2005NCA'055Outline Introduction Definitions and notationsDefinitions and notations Chain pattern scheduling unbounded #P – high communication fixed #P – moderate communication Performance results Conclusions Future work

6 July 29, 2005NCA'056 Definitions and notations  Algorithmic model: FOR (i 1 =l 1 ; i 1 <=u 1 ; i 1 ++) FOR (i 2 =l 2 ; i 2 <=u 2 ; i 2 ++) … FOR (i n =l n ; i n <=u n ; i n ++) Loop Body ENDFOR … ENDFOR Perfectly nested loops Constant flow data dependencies

7 July 29, 2005NCA'057 Definitions and notations J – the index space of an n-dimensional uniform dependence loop ECT – earliest computation time of an iteration (time-patterns) Pat k – set of points (called pattern) of J with ECT k Pat 0 – contains the boundary (pre-computed) points Pat 1 – initial pattern pat k – pattern outline (the upper boundary) of Pat k Pattern points – the points that define the polygon shape of a pattern Pattern vectors – are those dependence vectors d i whose end- points are the pattern points of Pat 1 Chain of computations – a sequence of iterations executed by the same processor (space-patterns)

8 July 29, 2005NCA'058 Definitions and notations Index space of a loop with d 1 =(1,3), d 2 =(2,2), d 3 =(4,1), d 4 =(4,3) The pattern vectors are d 1, d 2, d 3 Pat 1, Pat 2, Pat 3, pat 1, pat 2 and pat 3 are shown Few chains of computations are shown

9 July 29, 2005NCA'059 Definitions and notations d c – the communication vector (one of the pattern vectors) j = p + λd c is the family of lines of J formed by d c C r = is a chain formed by d c |C r | is the number of iteration points of C r r – is the starting point of a chain C – is the set of C r chains and |C| is the number of C r chains |C M | – is the cardinality of the maximal chain D r in – the volume of “incoming” data for C r D r out – the volume of “outgoing” data for C r D r in + D r out is the total communication associated with C r #P – the number of available processors m – the number of dependence vectors, except d c

10 July 29, 2005NCA'0510 Definitions and notations Communication vector is d c = d 2 = (2,2) C r=(0,0) communicates with C r=(0,2), C r=(1,0) and C r=(3,0)

11 July 29, 2005NCA'0511Outline Introduction Definitions and notations Chain pattern schedulingChain pattern scheduling unbounded #P – high communicationunbounded #P – high communication fixed #P – moderate communicationfixed #P – moderate communication Performance results Conclusions Future work

12 July 29, 2005NCA'0512 Chain pattern scheduling Scenario 1: unbounded #P – high communication All points of a chain C r are mapped to the same processor #P is assumed to be unbounded Each chain is mapped to a different processor  Disadvantages  Unrealistic because for large index spaces the number of chains formed, hence of processors needed, is prohibitive  Provides limited data locality (only for points tied by d c )  Total communication volume is V = (D r in + D r out ) |C|≈2m|C M ||C|

13 July 29, 2005NCA'0513 Chain pattern scheduling Scenario 1: unbounded #P – high communication Each chain is mapped to a different processor. 24 chains are formed.

14 July 29, 2005NCA'0514 Chain pattern scheduling Scenario 2: fixed #P – moderate communication All points of a chain C r are mapped to the same processor #P is arbitrarily chosen to be fixed cyclic mapping  Mapping I: cyclic mapping [8]  Each chain from the pool of unassigned chains is mapped to a processor in a cyclic fashion  Disadvantages:  Provides limited data locality  Total communication volume is a function of #P and r 1,…,r m  Due to not being able to predict for what dependence vector the communication is eliminated and in which case, the total communication volume is bounded above by V ≈2m|C M ||C|

15 July 29, 2005NCA'0515 Chain pattern scheduling Scenario 2: fixed #P – moderate communication Mapping I: cyclic mapping

16 July 29, 2005NCA'0516 Chain pattern scheduling Scenario 2: fixed #P – moderate communication chain pattern mapping  Mapping II: chain pattern mapping  It zeroes the communication cost imposed by as many dependence vectors as possible  #P is divided into a group of n a processors used in the area above d c, and another group of n b processors used in the area below d c  Chains above d c are cyclically mapped to the n a processors  Chains below d c are cyclically mapped to the n b processors  This way communication cost is additionally zeroed along one dependence vector in the area above d c, and along another dependence vector in the area below d c

17 July 29, 2005NCA'0517 Chain pattern scheduling Scenario 2: fixed #P – moderate communication Mapping II: chain pattern mapping n a =2 n b =3 n a =2 n b =3

18 July 29, 2005NCA'0518 Chain pattern scheduling Scenario 2: fixed #P – moderate communication chain pattern mapping  Mapping II: chain pattern mapping  Total communication volume in this case is bounded above by V ≈2(m- 2 ) |C M ||C|  Differences from cyclic mapping  Processors do not span the entire index space, but only a part of it  A different cycle size is chosen to map different areas of the index space

19 July 29, 2005NCA'0519 Chain pattern scheduling Scenario 2: fixed #P – moderate communication chain pattern mapping  Mapping II: chain pattern mapping  Advantages  Provides better data locality than the cyclic mapping  Uses a more realistic #P than the cyclic mapping  Suitable for Distributed memory systems (a chain is mapped to a single processor) Symmetric multiprocessor systems (a chain is mapped to a single node, that may contain more than one processors) Heterogeneous systems (longer chains are mapped to faster processors, whereas shorter chains to slower processors)

20 July 29, 2005NCA'0520Outline Introduction Definitions and notations Chain pattern scheduling unbounded #P – high communication fixed #P – moderate communication Performance resultsPerformance results Conclusions Future work

21 July 29, 2005NCA'0521 Performance results  Simulation setup Simulation program written in C++ The distributed memory system was emulated Index spaces range from 10×10 … 1000×1000 iterations Dependence vectors d 1 =(1,3), d c =d 2 =(2,2), d 3 =(4,1), d 4 =(4,3) #P ranges from 5 … 8 Comparison with the cyclic mapping Communication reduction achieved ranges from 15% - 35%

22 July 29, 2005NCA'0522 Performance results

23 July 29, 2005NCA'0523 Performance results

24 July 29, 2005NCA'0524Outline Introduction Definitions and notations Chain pattern scheduling unbounded #P – high communication fixed #P – moderate communication Performance results ConclusionsConclusions Future work

25 July 29, 2005NCA'0525Conclusions The total communication cost can be significantly reduced if the communication incurred by certain dependence vectors is eliminated The chain pattern mapping outperforms other mapping schemes (e.g. cyclic mapping) by enhancing the data locality

26 July 29, 2005NCA'0526Outline Introduction Definitions and notations Chain pattern scheduling unbounded #P – high communication fixed #P – moderate communication Performance results Conclusions Future workFuture work

27 July 29, 2005NCA'0527 Future work Simulate other architectures (such as shared memory systems, SMPs and heterogeneous systems) Experiment also with the centralized (i.e. master-slave) version of the chain pattern scheduling scheme

28 July 29, 2005NCA'0528 Thank you Questions?


Download ppt "Reducing the Communication Cost via Chain Pattern Scheduling Florina M. Ciorba, Theodore Andronikos, Ioannis Drositis, George Papakonstantinou and Panayotis."

Similar presentations


Ads by Google