Presentation is loading. Please wait.

Presentation is loading. Please wait.

Handling Global Traffic in Future CMP NoCs Ran Manevich, Israel Cidon, and Avinoam Kolodny. Group Research QNoC Electrical Engineering Department Technion.

Similar presentations


Presentation on theme: "Handling Global Traffic in Future CMP NoCs Ran Manevich, Israel Cidon, and Avinoam Kolodny. Group Research QNoC Electrical Engineering Department Technion."— Presentation transcript:

1 Handling Global Traffic in Future CMP NoCs Ran Manevich, Israel Cidon, and Avinoam Kolodny. Group Research QNoC Electrical Engineering Department Technion – Israel Institute of Technology Haifa, Israel SLIP 2012

2 Bandwidth Version of Rent’s Rule B – Cluster external bandwidth. k – Average bandwidth per module. G – Number of modules in a cluster. R – Rent’s exponent, 0<R<1. G = 16 B = ∑ Greenfield et al., “Implications of Rent’s Rule for NoC Design and Its Fault-Tolerance”, NOCS 2007

3 Rent’s Exponent Reflects Traffic Locality

4 CMP NoC Traffic Follows Rent’s Rule 2D Mesh NoC ~Average of CMP parallel programs * * Heirman et al., “Rent’s Rule and Parallel Programs: Characterizing Network Traffic Behaviour”, SLIP 2008

5 2D Mesh – Packets Classification by Distance For illustration purposes, packets are classified according to distances between sources and destinations. K=8 Nearest Neighbor (NN) – Dist = 1 Local – 1<Dist<2+K/8 Global – Dist ≥ 2+K/8 K=16

6 Fraction of global packets decreases in large systems Rent’s exponent (R) = 0.7 (Nearest Neighbor)

7 Dominance of Global Packets in BW/Router and Light Load Latency Nearest Neighbor traffic is dominant in small systems. * Zarkesh-Ha et al., “Hybrid Network on Chip (HNoC): local buses with globalmesh architecture”, SLIP 2010 * In large systems: 1.Global packets are minority. 2.Global packets dominate BW/router and average latency.

8 Problem!!! In large systems, global packets (minority): Consume most of the network’s BW. Significantly increase average light load latency.

9 Solution - PyraMesh Overall hops-count is reduced. Average latency is reduced. Average BW per router is reduced. Hierarchical 2D mesh. Global packets are routed through higher hierarchy levels. 1 2 3 4 5 6 7 8 hops instead of 14! Source Dest.

10 PyraMesh - Architecture K – The size of the base mesh. NL – Number of levels. NP – Number of pyramids on top of the base mesh. α i – Ratio between the sizes of levels i and i+1. C i – Number of routers in level i that are connected to a router in level i+1 along a single dimension. K = 8, NL = 2, NP = 1 α i = 4, C i = 2 K = 8, NL = 3, NP = 1 α i = 2, C i = 1 K = 8, NL = 2, NP = 4 α i = 4, C i = 1

11 Addressing – On each level i, node (X,Y) Base Mesh is represented by the nearest router in the North-East quarter: Routing – XY: PyraMesh – Addressing and Routing

12 Packets are distributed among levels i according to their travel distance (D) in the base mesh. DTh i – Distance threshold of level i. If D > DTh i, the packet is directed to level i+1. Example: DTh i = 6, 12, 20 PyraMesh – Packets Classification Highest LevelTravel Distance 4D>20 312<D≤20 26<D≤12 1 (Base Mesh)D≤6

13 Area overhead, Wiring overhead, Maximum bandwidth per router*, Average light-load latency* = F( K,NL,NP,α i,C i,Dth i *, R * ) PyraMesh – Optimization CONSTRAINTS OPTIMIZATION OBJECTIVES

14 Optimization Results Example of 16x16 System, R = 0.7 Throughput optimized PyraMesh: Light load latency optimized PyraMesh: D≤5 5<D≤8 D>8 Packets distance thresholds D≤6 6<D≤18 D>18

15 Light Load Latency Performance BMesh – The baseline mesh Scaled Mesh (SMesh) – Links wider than in BMesh by PyraMesh area overhead factor. HNoC –

16 Throughput Results, R = 0.7

17 Our Contributions The observation that global packets limit scalability of large systems. PyraMesh – A novel framework for hierarchical NoCs design. Characterization of Rentian traffic in large NoCs.

18 Conclusions Global packets limit performance in large (future) CMP systems. PyraMesh – A novel class of hierarchical 2D mesh topologies. PyraMesh handles global traffic in future CMP NoCs.

19 Thank You!

20 Related Work CMesh J. D. Balfour and W. J. Dally. “Design tradeoffs for tiled CMP on-chip networks”. International Conference on Supercomputing, 2006. GigaNoC C. Puttmann, J.-C. Niemann, M. Porrmann, and U. Rückert. “GigaNoC – A hierarchical network-on-chip for scalable chip-multiprocessors.” Euromicro DSD 2007. Long Range Links U. Y. Ogras and R. Marculescu. “ ‘It’s a small world after all’: NoC performance optimization via long-range link insertion”. IEEE Trans. on Very Large Scale Integr. (VLSI) Syst. 2006. Hierarchical Rings on a Mesh S. Bourduas and Z. Zilic. “Latency reduction of global traffic in wormhole-routed meshes using hierarchical rings for global routing”. ASAP 2007. Hierarchical 2-Levels 2D Mesh Markus Winter and Steffen Prusseit and Gerhard P. Fettweis. Hierarchical routing architectures in clustered 2D-mesh networks-on-chip. ISOCC 2010.


Download ppt "Handling Global Traffic in Future CMP NoCs Ran Manevich, Israel Cidon, and Avinoam Kolodny. Group Research QNoC Electrical Engineering Department Technion."

Similar presentations


Ads by Google