Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University.

Similar presentations


Presentation on theme: "1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University."— Presentation transcript:

1 1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University of California, San Diego

2 2 Motivation: Networks-on-Chip Chip-multiprocessors (CMPs) increasingly popular 2D-mesh networks often used as on-chip fabric I/O Area single tile 1.5mm 2.0mm 21.72mm 12.64mm Tilera Tile64 Intel 80-core

3 3 Motivation: 3D Integrated Circuits 3D Benefits –Reduced wire delays –Enormous bandwidth –Heterogeneous system integration Natural progression –3D-mesh for 3D CMPs 2D to 3D

4 4 Routing Algorithm Objectives Maximize throughput –How much load the network can handle Minimize hop count –Minimize routing delay between source and destination

5 5 Challenges For 2D-case, a near-optimal throughput routing algorithm with minimal hop count called O1TURN is known [Seo’05]. Surprisingly, optimality of O1TURN does not extend to 3D case, actual throughput performance degrades severely. Only known optimal throughput routing algorithm is Valiant (VAL) load-balancing, but VAL performs poorly on hop count (latency), twice that of minimal routing.

6 6 Main Contribution Developed a new oblivious routing algorithm called “Randomized Partially Minimal” (RPM) routing. RPM provably guarantees near-optimal worst-case throughput in 3D case. –Optimal for even radix k (e.g. 8 x 8 x 8 mesh). –Within factor of 1/k 2 for odd radix (e.g. 7 x 7 x 7 mesh). Good latency performance. –Only factor of 1.33 of minimal routing (much better than 2x cost of VAL, only known routing algorithm with optimal throughput) –In practice, 3D-meshes are asymmetric because number of device layers less than number of tiles per edge. –e.g., for 16 x 16 x 4 mesh (4 layers), RPM’s hop count just factor of 1.1 of minimal routing.

7 7 Outline Motivation for our work Existing 2D routing algorithms don’t extend well into 3D RPM routing algorithm Simulation results Extensions and future work

8 8 Existing Routing Algorithms The 2D case Dimension-Ordered Routing (DOR) –Route minimal XY Valiant load-balancing (VAL) –Route source → randomly chosen intermediate node → destination –Route minimal XY in both phases ROMM –Same as VAL, but intermediate node restricted to minimal direction Orthogonal 1-TURN (O1TURN) –Route minimal XY and YX with equal probability Extending to the 3D case … Dimension-Ordered Routing (DOR) –Route minimal XYZ Valiant load-balancing (VAL) –Route source → randomly chosen intermediate node → destination –Route minimal XYZ in both phases ROMM –Same as VAL, but intermediate node restricted to minimal direction Orthogonal 1-TURN (O1TURN) –Route along one of 6 minimal orthogonal paths (XYZ, XZY, YXZ, YZX, ZXY, ZYX) with equal probability

9 9 Worst-Case Throughput Best theoretical normalized worst-case throughput known to be 50% (well-known result). Worst-case throughput analysis can be reduced to a maximal weighted matching problem [Towles’02]. VAL achieves this optimal throughput, but has poor latency. As shown next, DOR, ROMM, and O1TURN are all far from optimal in 3D.

10 10 Poor Worst-Case Throughput Only 6-15% VAL/Optimal

11 11 How do 2D mesh algorithms fare in 3D? Worst case throughput of DOR, ROMM, O1TURN far from optimal Average hop count of VAL far from minimal Need a routing algorithm that can trade latency for worst-case throughput Hop Count (normalized to minimal) Normalized Worst-Case Throughput Normalized Average-Case Throughput 8 x 8 x 8 Network VALDORROMMO1TURN 0.50.3160.4540.513 VALDORROMMO1TURN 2111 0.50.0630.1320.15

12 12 Why O1TURN performs poorly in 3D? O1TURN – Worst-Case throughput optimal for 2D but more than 3 times worse than optimal for 3D The difference –2D traffic matrix is “admissible” for 2D mesh –In 3D, projected traffic on each 2D plane is no longer admissible !! Can we transform the 3D routing problem to routing admissible traffic on each 2D plane ?

13 13 Outline Motivation for our work Existing 2D algorithms don’t extend well into 3D RPM routing algorithm Simulation results Extensions and future work

14 14 Randomized Partially-Minimal Routing (RPM) Source Destination Random intermediate layer XY or YX routing on the intermediate layer X Y Z Phase-1 Z Source to intermediate layer Phase-2 Z Intermediate layer to destination

15 15 Main Idea Load-balance uniformly across the vertical layers Min XY/YX used on each layer Main Result: RPM has near-optimal worst-case throughput –Achieves optimal worst-case throughput when network radix k is even –Within a factor of 1/k 2 optimal when k is odd.

16 16 RPM achieves Near-Optimal Worst Case Throughput (optimal for even radix) VAL/Optimal RPM

17 17 Average-Case Throughput RPM outperforms VAL, DOR, ROMM and O1TURN in average- throughput on randomly generated traffic.

18 18 Average Hop Count Normalized hop count of RPM –Symmetric Meshes - 1.33 times minimal compared to 2x for VAL –Asymmetric 16x16x4 Mesh – 1.1 times minimal

19 19 Outline Motivation for our work Existing 2D routing algorithms don’t extend well into 3D RPM routing algorithm Simulation results Extensions and future work

20 20 Flit-Level Simulation Ideal throughput evaluation assumes –Ideal single-cycle router –Infinite buffers –No contention in switches, no flow control Flit-level simulation –PopNet network simulator –4 stage router pipeline – Route computation, VC allocation, Switch arbitration, Link traversal –Credit-based flow control –8 virtual channels, each 5 flits deep –Multi-flit packets injected into the network (5 flits/packet)

21 21 Flit-Level Simulation (cont’d) Network configurations simulated –4 x 4 x 4 Mesh –8 x 8 x 8 Mesh –16 x 16 x 4 Mesh Routing algorithms compared: DOR, VAL, ROMM, O1TURN, DUATO, RPM –DUATO is a minimal adaptive routing algorithm implemented for comparison Four different traffic traces used –Transpose traffic – (x,y,z) → (y,z,x) –Complement traffic – (x,y,z) → (k-x-1, k-y-1, k-z-1) –Uniform traffic –Worst Case traffic pattern for DOR (DOR-WC) – (x,y,z) → (k-z-1, k-y-1, k-x-1)

22 22 Uniform Traffic 8x8x8 Mesh16x16x4 Mesh

23 23 Transpose Traffic 8x8x8 Mesh16x16x4 Mesh

24 24 Complement Traffic 8x8x8 Mesh16x16x4 Mesh

25 25 DOR-WC Traffic 8x8x8 Mesh16x16x4 Mesh

26 26 To sum it up … 3D IC technology is emerging. Stacking cores in 3 dimensions offers several advantages over 2D placement of cores. 2D minimal Mesh routing algorithms have poor worst-case throughput in 3D, VAL has high latency penalty. RPM trades off latency (partially-minimal) for better worst case performance (near-optimal).

27 27 Thank You Questions?


Download ppt "1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University."

Similar presentations


Ads by Google