Presentation is loading. Please wait.

Presentation is loading. Please wait.

PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS Wim Heirman, Iñigo Artundo, Joni Dambre, Christof Debaes, Pham.

Similar presentations


Presentation on theme: "PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS Wim Heirman, Iñigo Artundo, Joni Dambre, Christof Debaes, Pham."— Presentation transcript:

1 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS Wim Heirman, Iñigo Artundo, Joni Dambre, Christof Debaes, Pham Doan Tinh, Bui Viet Khoi, Hugo Thienpont, Jan Van Campenhout ISEE, HoChiMinh City, 24 October 2007

2 2 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Abstract The interconnection network inside a multi- processor system has a very irregular load. Can we adapt this network to its (time-varying) demands? Yes, using (optical) reconfiguration technology. We show the resulting network speedup, obtained through system-level simulations.

3 3 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Introduction to multiprocessor systems (DSM) and interconnection networks Reconfigurable Interconnects and Optical networks Simulation results on performance improvement Conclusions Outline

4 4 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City supercomputer on-chip server Multiprocessor Interconnects Multiprocessing is everywhere: –supercomputers –servers –on-chip (multi-core) Processors need to communicate to solve a single problem Interconnection network becomes main system component Our focus: distributed shared- memory (DSM) servers

5 5 A DSM machine is made of: Nodes, each composed of:  The processing unit  Some levels of cache memory  The local memory  A network interface INTERCONNECTION NETWORK An interconnection network Architecture of a Distributed Shared-Memory system

6 6 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Distributed Shared-Memory hierarchy Network is part of the memory hierarchy Remote memory access requires network communication Network latency is very influential on performance CPU MEM NetIF CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF CPU MEM NetIF CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF cache instruction: 0.5 ns cache: 5 ns DDR: 50 ns network: 500 ns

7 7 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Introduction to multiprocessor systems (DSM) and interconnection networks Reconfigurable Interconnects and Optical networks Simulation results on performance improvement Conclusions Outline

8 8 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Non-uniform network traffic in space and time => Reconfigurable network? CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF time load Link #9 time load Link #13 Variable communication patterns time load Link #5

9 9 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM Base network (fixed) Extra links/elinks (reconfigurable) Proposed topology reconfiguration

10 10 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Requirements: Reconfiguration intervals Selection and switching times Reconfiguration interval Traffic pattern locality << Reconfiguration is initiated at reconfiguration points placed on fixed time intervals Topology is optimized for traffic in previous interval

11 11 Optical Advantages Low-loss transmission Capable to provide large bandwidths Almost no crosstalk between channels High area density Data transparent reconfigurability Electrical Problems (at high frequency) Cross-talk Signal Distortion High Power Consumption High Latency (RC Delay) Bhanu Jaiswal University at Buffalo Optical interconnects

12 12 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City CPU 1 CPU 2... CPU n Broadcast element Fiber links Processor nodes Tunable lasers CPU 1 CPU 2... CPU n Photodetectors Optical reconfiguration implementation Based on wavelenght- division multiplexing (WDM) Components: –tunable laser (VCSEL) per node –broadcast element –wavelength-selective receiver per node For each source node, elink destination is selected by tuning the laser to the proper wavelength

13 13 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Full broadcast not realistic: –Too much power is wasted –Limited number of available wavelenghts (trade-off with cost, tuning speed) Selective broadcast: each node can reach a subset of other nodes –not all ‘extra links’ possible –some high-traffic paths can have intermediate nodes –only 1 extra link per node CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM 3 possible destinations for top node, 1 is selected by tuning the node’s transmitter Selective Optical Broadcasting

14 14 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Using diffractive optics, light from each node is broadcasted to 9 spots Node placement on the prism determines possible elink destinations Selective Optical Broadcasting

15 15 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Introduction to multiprocessor systems (DSM) and interconnection networks Reconfigurable Interconnects and Optical networks Simulation results on performance improvement Conclusions Outline

16 16 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City - Complete NUMA memory system (cache system, directory protocol and allocator). - Detailed Network implementation with different topologies and extra links. - Real time Reconfiguration with prediction models and physical limitations. Virtutech SIMICS full-system simulator  16 processors at 1GHz.  2 levels cache system and 0.5 GBs main memory.  2 ns, 19 ns and 100+ ns access time to caches and main memory (local and remote).  4x4 Torus interconnection network.  Solaris 9.0 operating system. SunFire TM 6800 server Benchmark applications SPLASH-2 Scientific parallel algorithms Simulation environment

17 17 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Average benchmark performance Measure average remote memory access latency Calculate improvement over non-reconfigurable case Averaged over all benchmark applications Increasing # elinks: performance increases Larger network: more gain (longer hop distance) Saturation occurs at # elinks = # nodes

18 18 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Average benchmark performance (II) Maximum number of extra links terminating at one node (fan-out) Performance increase from f = 1 to f = 2, saturation afterwards Further results will be with –f = 2, #elinks = #nodes –prism implementation (f = 1, #elinks = #nodes, 9 destinations limitation) Network size (# nodes)

19 19 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Network latency improvement Changing reconfiguration interval length Simulations for # elinks = # nodes, f = 2 Different benchmark: different benefit! Remember: tuning speed << interval << traffic locality 16 nodes32 nodes64 nodes

20 20 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Network latency improvement (II) selective broadcast: f = 1, only 9 destinations full broadcast: f = 2, no limitations

21 21 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Introduction to multiprocessor systems (DSM) and interconnection networks Reconfigurable Interconnects and Optical networks Simulation results on performance improvement Conclusions Outline

22 22 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City The interconnection network presents a significant bottleneck in large multiprocessor systems Reconfigurable interconnects can adapt the network to the traffic at any point in time An optical implementation has been proposed Through simulation, we measured the resulting speed up: up to 40% of latency reduction can be achieved Obtained speedups depend on the application, network size, and the reconfigurable network constraints Conclusions

23 23 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Acknowledgements wim.heirman@ugent.be Thank you for your attention !

24 24 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Inter-node distances


Download ppt "PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS Wim Heirman, Iñigo Artundo, Joni Dambre, Christof Debaes, Pham."

Similar presentations


Ads by Google