Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Maria Athanasaki, Evangelos Koukis, Nectarios Koziris National Technical.

Similar presentations


Presentation on theme: "Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Maria Athanasaki, Evangelos Koukis, Nectarios Koziris National Technical."— Presentation transcript:

1 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Maria Athanasaki, Evangelos Koukis, Nectarios Koziris National Technical University of Athens School of Electrical and Computer Engineering Computing Systems Laboratory

2 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Previous work  M. Athanasaki, A. Sotiropoulos, G. Tsoukalas, N. Koziris, "Pipelined Scheduling of Tiled Nested Loops onto Clusters of SMPs using Memory Mapped Network Interfaces", SuperComputing Conference on High Performance Networking and Computing (SC2002), Baltimore, Maryland, November 16-22, 2002.  G. Goumas, A.Sotiropoulos and N. Koziris, "Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping," Proceedings of the 2001 International Parallel and Distributed Processing Symposium (IPDPS2001), IEEE Press, San Francisco, California, April 2001.

3 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Overview  Tiling for parallelization  Non-overlapping vs. Overlapping execution scheme  Grouping  Application on a cluster of SMPs with a fixed number of nodes  Experimental-Simulation Results

4 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Nested For-Loops for (i 1 =l 1 ; i 1 <=u 1 ; i 1 ++) for (i 2 =l 2 ; i 2 <=u 2 ; i 2 ++) … … … … … for (i n =l n ; i n <=u n ; i n ++) { Loop Body }

5 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Dependence Vectors i2i2 i1i1 for (i 1 =0; i 1 <=7; i 1 ++) for (i 2 =0; i 2 <=7; i 2 ++) A[i,j]=A[i-1,j]+A[i,j-1]

6 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Tiling i2i2 i1i1

7 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Tiling i2i2 i1i1 Processor 0 Processor 1

8 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Overview  Tiling for parallelization  Non-overlapping vs. Overlapping execution scheme  Grouping  Application on a cluster of SMPs with a fixed number of nodes  Experimental-Simulation Results

9 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Non-Overlapping Scheme i2i2 i1i1 Processor 0 Processor 1 Processor 2

10 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Non-Overlapping vs. Overlapping Scheme P0 P1 P2 P3 P0 P1 P2 P3

11 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Overlapping Scheme i2i2 i1i1 Processor 0 Processor 1 Processor 2

12 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Overview  Tiling for parallelization  Non-overlapping vs. Overlapping execution scheme  Grouping  Application on a cluster of SMPs with a fixed number of nodes  Experimental-Simulation Results

13 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Generalization to SMPs – “Grouping” SMP0 SMP1 SMP2 SMP3 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1

14 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Example: Grouping + Non overlapping Communication Scheme Tile Space Group Space SMP node0 SMP node1 Scheduling vector Π=(1,0)

15 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Example: Grouping + Overlapping Communication Scheme Tile Space Group Space SMP node0 SMP node1 Scheduling vector Π=(1,1)

16 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Overview  Tiling for parallelization  Non-overlapping vs. Overlapping execution scheme  Grouping  Application on a cluster of SMPs with a fixed number of nodes  Experimental-Simulation Results

17 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Scheduling onto a Fixed Number of SMPs  Dynamic Scheduling by the Operating System  Run time overhead for generating a lot of processes  Context switching slows down the execution  Static Scheduling at Compile Time

18 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Scheduling onto a Fixed Number of SMPs  Cyclic Assignment Schedule  Mirror Assignment Schedule  Cluster Assignment Schedule  Retiling

19 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cyclic Assignment SMP0 SMP1 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 Cyclic assignment on 2 SMP nodes with 2 CPUs each SMP0 SMP1

20 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cyclic Assignment CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 Cyclic assignment on 2 SMP nodes with 2 CPUs each SMP0 SMP1 SMP0 SMP1 chunk

21 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cyclic Assignment – Non Overlapping Communication CPU0 CPU1 CPU0 CPU1 Cyclic assignment on 2 SMP nodes with 2 CPUs each SMP0 SMP1 t

22 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cyclic Assignment - Overlapping Communication Cyclic assignment on 2 SMP nodes with 2 CPUs each t CPU0 CPU1 CPU0 CPU1 SMP0 SMP1 

23 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cyclic Assignment - Communication CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 Cyclic assignment on 2 SMP nodes with 2 CPUs each SMP0 SMP1 SMP0 SMP1 chunk

24 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Scheduling onto a Fixed Number of SMPs  Cyclic Assignment Schedule  Mirror Assignment Schedule  Cluster Assignment Schedule  Retiling

25 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Mirror Assignment SMP0 SMP1 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 CPU0 Mirror assignment on 2 SMP nodes with 2 CPUs each SMP1 SMP0 chunk

26 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Mirror Assignment – Non Overlapping Communication Mirror assignment on 2 SMP nodes with 2 CPUs each CPU0 CPU1 CPU0 CPU1 SMP0 SMP1 t

27 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Mirror Assignment - Overlapping Communication Mirror assignment on 2 SMP nodes with 2 CPUs each t CPU0 CPU1 CPU0 CPU1 SMP0 SMP1

28 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Mirror Assignment - Communication SMP0 SMP1 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 CPU0 Mirror assignment on 2 SMP nodes with 2 CPUs each SMP1 SMP0

29 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Scheduling onto a Fixed Number of SMPs  Cyclic Assignment Schedule  Mirror Assignment Schedule  Cluster Assignment Schedule  Retiling

30 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cluster Assignment SMP0 SMP1 CPU0 Cluster assignment on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 tiles “TILE”

31 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cluster Assignment SMP0 SMP1 CPU0 Cluster assignment on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 TILES GROUPS

32 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cluster Assignment – Non Overlapping Communication SMP0 SMP1 CPU0 Cluster assignment on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 t

33 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cluster Assignment – Overlapping Communication SMP0 SMP1 CPU0 Cluster assignment on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 t 

34 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cluster Assignment - Communication SMP0 SMP1 CPU0 Cluster assignment on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 TILES GROUPS 

35 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Scheduling onto a Fixed Number of SMPs  Cyclic Assignment Schedule  Mirror Assignment Schedule  Cluster Assignment Schedule  Retiling

36 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Retiling SMP0 SMP1 CPU0 Retiling on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 old tiles new tiles

37 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Retiling SMP0 SMP1 CPU0 Retiling on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 old tiles new tiles retaining computation volume of a tile

38 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Retiling – Non Overlapping Communication SMP0 SMP1 CPU0 Retiling on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 t

39 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Retiling –Overlapping Communication SMP0 SMP1 CPU0 Retiling on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 t

40 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Retiling - Communication SMP0 SMP1 CPU0 Retiling on 2 SMP nodes with 2 CPUs each CPU1 CPU0 CPU1 

41 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Overview  Tiling for parallelization  Non-overlapping vs. Overlapping execution scheme  Grouping  Application on a cluster of SMPs with a fixed number of nodes  Experimental-Simulation Results

42 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Experimental Platform  Linux SMP (Symmetric Multi- Processors) Cluster  2 nodes  1GB RAM  2 Pentium III 1266MHz  Myrinet high performance interconnect  GM low level message passing system

43 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes The Myrinet interconnect  User-level Networking  Based on the GM message passing interface  All message exchange using DMA  Directly to/from pinned userspace buffers  Communication is offloaded to the NIC  Programmable NIC  LANai RISC processor @ 133-333MHz  2-8MB SRAM  2+2Gbps full duplex fiber links

44 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes GM Architecture  Comprised of three main parts  User library  Kernel driver  Firmware on NIC  OS bypass design  Regions of NIC memory mapped to the VM of a process GM Library Application GM kernel module GM firmware User Kernel NIC

45 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Sending and Receiving messages over Myrinet/GM Sending application Host NIC Send q Send DMARecv DMA Host DMA LANai Receiving application Host NIC Recv q Send DMARecv DMA Host DMA LANai BufferEvent qBufferEvent q

46 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Initial Code for (i=1; i<=X; i++) for (j=1; j<=Y; j++) for (k=1; k<=Z; k++) { A[i][j][k] = func(A[i-1][j][k], A[i][j-1][k], A[i][j][k-1]) }

47 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes cyclic mirror cluster retile cyclic mirror cluster retile Experimental results 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 500 1000 1500 2000 2500 3000 3500 Speedup / # processors Height of Iteration Space Non Overlapping Execution Scheme 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 500 1000 1500 2000 2500 3000 3500 Speedup / # processors Height of Iteration Space Overlapping Execution Scheme

48 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Simulation results mirror cyclic retile 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 4000 8000 12000 16000 20000 Speedup / # processors Height of Iteration Space Overlapping Execution Scheme cluster mirror 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 4000 8000 12000 16000 20000 Speedup / # processors Height of Iteration Space Non Overlapping Execution Scheme retile cluster cyclic

49 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Simulation results retile cluster cyclic mirror 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 4000 8000 12000 16000 20000 Speedup / # processors Height of Iteration Space Non Overlapping Execution Scheme mirror cluster retile 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 4000 8000 12000 16000 20000 Speedup / # processors Height of Iteration Space Overlapping Execution Scheme cyclic

50 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Advantages - Disadvantages AdvantagesDisadvantages cyclic + fast pipeline filling- communication mirror + better communication than cyclic - idle time steps - worse communication than cluster, retile cluster + communication: 1) little volume of data to be transferred 2) data combined in fewer messages - slow pipeline filling retile + fast pipeline filling + communication: little volume of data to be transfered - reorganizes tiles  annuls optimal tile shape for cache hits

51 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes The End

52 National Technical University of Athens Computing Systems Laboratory PDP 2004 Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Cyclic Assignment - Overlapping Communication SMP0 SMP1 SMP0 SMP1 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 CPU0 CPU1 equivalent schedulings P t scheduling on a fixed number of processors empty pipeline waiting for the necessary data to become available t P scheduling on an unlimited number of processors


Download ppt "Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Maria Athanasaki, Evangelos Koukis, Nectarios Koziris National Technical."

Similar presentations


Ads by Google