Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Novel 3D Layer-Multiplexed On-Chip Network Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California, San Diego.

Similar presentations


Presentation on theme: "A Novel 3D Layer-Multiplexed On-Chip Network Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California, San Diego."— Presentation transcript:

1 A Novel 3D Layer-Multiplexed On-Chip Network Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California, San Diego

2 2 Networks-on-Chip Chip-multiprocessors (CMPs) increasingly popular 2D-mesh networks often used as on-chip fabric I/O Area single tile 1.5mm 2.0mm 21.72mm 12.64mm Tilera Tile64 Intel 80-core

3 3D Integrated Circuits Reduced chip footprint Reduced wire delays High inter-layer bandwidth Heterogeneous system integration ≥ 2 active device layers Through Silicon Via Device layer 1 Device layer 2 Short inter-layer distances

4 Natural Progression: 3D Mesh for 3D CMPs What routing algorithms to use for 3D mesh networks? 2D Mesh 3D Mesh

5 Outline Oblivious routing on a 3D mesh Layer-multiplexed 3D architecture Evaluation

6 Oblivious Routing Objectives Maximize throughput – Distribute traffic evenly on network links – Maximize worst-case throughput as traffic is application dependent Minimize hop count – Minimize routing delay between source and destination – Reduce power

7 Ideal routing algorithm Minimal latency Maximum worst-case throughput Dimension Ordered Routing Minimal latency Poor worst-case throughput Valiant Routing Optimal worst-case throughput Poor latency Routing Algorithms for 3D Mesh Networks Average hop count (normalized to minimal) Worst-case throughput (fraction of network capacity) IDEAL DOR VAL O1TURN Routing Minimal latency Poor worst-case throughput O1TURN

8 Randomized Partially-Minimal Routing (RPM) Source Destination Random intermediate layer XY or YX routing on the intermediate layer X Y Z Phase-1Z Source to the intermediate layer Phase-2Z Intermediate layer to the destination

9 Main Idea Load-balance uniformly across the vertical layers – 2 phases of vertical routing Min XY/YX used on each layer

10 Routing Algorithms for 3D Mesh Networks Average hop count (normalized to minimal) Worst-case throughput (fraction of network capacity) IDEAL DOR VAL O1TURN RPM 1.1 Randomized Partially Minimal Routing Near-optimal worst-case throughput Low latency

11 RPM has Near-optimal Worst-case Throughput RPM is optimal for even radix, within 1/k 2 of optimal for odd radix.

12 Performance of RPM: Average-case Throughput

13 Outline Oblivious routing on a 3D mesh Layer-multiplexed (LM) 3D architecture Evaluation

14 Unique Features of 3D ICs Inter-layer distances are very small (~50 μm) – Order of magnitude lower than distances between adjacent tiles on a 2D plane (~1500 μm) – Vertical interconnects implemented using Through-Silicon-Vias (TSVs) have very low delay 50μm 1500μm TSV

15 Unique Features of 3D ICs Inter-layer distances are very small (~50 μm) – Order of magnitude lower than distances between adjacent tiles on a 2D plane (~1500 μm) – Vertical wires using Through-Silicon-Vias (TSVs) have very low delay Vertical bandwidth abundant as TSVs can be densely packed in 2D with small via pitch (~4 μm) 4 μm

16 Unique Features of 3D ICs Inter-layer distances are very small (~50 μm) – Order of magnitude lower than distances between adjacent tiles on a 2D plane (~1500 μm) – Vertical wires using Through-Silicon-Vias (TSVs) have very low delay Vertical wiring abundant as TSVs can be packed in 2D with small via pitch (~4 μm) Number of device layers likely to remain small (4-5 layers) due to thermal and manufacturing issues

17 RPM on a 3D Mesh Source Destination Random intermediate layer XY or YX routing on the intermediate layer X Y Z Phase-1Z Source to the intermediate layer Phase-2Z Intermediate layer to the destination 

18 Proposed Layer-Multiplexed Architecture Source Destination Random intermediate layer XY or YX routing on the intermediate layer X Y Z Phase-1Z Source to the intermediate layer Phase-2Z Intermediate layer to the destination P1 P2 P3 P4 P1 P2 P3 P4  RPM routing adapted to the LM architecture : RPM-LM

19 Power and Area Savings 5x5 crossbar in LM vs. 7x7 crossbar in 3D mesh P1 P2 P3 P4 Packet injection demultiplexer P1 P2 P3 P4 Packet ejection multiplexer Layer-Multiplexed Architecture P1 P2 P3 P4 Conventional 3D Mesh Decouple vertical routing from horizontal routing Restrict vertical routing to packet injection and packet ejection

20 Single Hop Vertical Communication Single hop vertical routing more power efficient than one-layer-per-hop routing – Leverages short inter-layer distances in 3D ICs – Better utilizes available vertical bandwidth

21 Packet Injection Demultiplexer P1 P4 P2 P3 To the injection port of the Layer 1 router To the injection port of the Layer 4 router Switch Arbitration Credits in from the injection port of routers on layers 1-4 Route Selection/Load Balancing VC Allocation Flit Counters......

22 Packet Ejection Multiplexer L1-P4 L2-P4 L3-P4 L4-P4 Arbiter P1 L1-P1 L2-P1 L3-P1 L4-P1 Arbiter Packets from layer4 Packets from layer2 Packets from layer3 Credits out for L1-P4, L2-P4, L3-P4 and L4-P4 Credits out for L1-P1, L2-P1, L3-P1 and L4-P1 Packets from layer4 Packets from layer2 Packets from layer3 VCID P2 P3 P4 Router on Layer 1

23 Outline Oblivious routing on a 3D mesh Layer-multiplexed 3D architecture Evaluation – Power and Area – Performance

24 Power and Area Evaluation Used Orion 2.0 models for router power and area estimation. 65nm process at 1V and 1GHz Buffers – 4VCs/port, 5flits/VC for routers – 5 flits/port for packet injection demultiplexer – 5 flits/port for each packet ejection multiplexer

25 Power Comparison 3D mesh – One 7-port router per tile LM – One 5-port router per tile – One packet injection demultiplexer for every 4 tiles – One packet ejection multiplexer per tile

26 Power Evaluation 27% power reduction

27 Area Evaluation 26.5% power reduction

28 Outline Oblivious routing on a 3D mesh Layer-multiplexed 3D architecture Evaluation – Power and Area – Performance

29 RPM on a 3D mesh vs. RPM-LM Worst-case throughput – RPM-LM achieves same (near-optimal) worst-case throughput as RPM Average-case throughput

30 Flit-Level Simulation Ideal throughput evaluation assumes – Ideal single-cycle router – Infinite buffers – No contention in switches, no flow control Flit-level simulation – PopNet network simulator – 5 stage router pipeline – Credit-based flow control – 8 virtual channels, each 5 flits deep – Multi-flit packets injected into the network (5 flits/packet)

31 Flit-Level Simulation (cont’d) Network configurations simulated – 4 x 4 x 4 mesh – 8 x 8 x 4 mesh Four different traffic traces used – Uniform traffic – Transpose traffic: (x,y,z) → (y,z,x) – Complement traffic: (x,y,z) → (k-x-1, k-y-1, k-z-1) – Worst Case traffic pattern for DOR (DOR-WC): (x,y,z) → (k-z-1, k-y-1, k-x-1)

32 Uniform Traffic 8x8x4 Mesh

33 Transpose Traffic 8x8x4 Mesh

34 Worst-case Traffic for DOR 8x8x4 Mesh

35 Summary of Contributions Proposed a 3D Layer-multiplexed architecture which is an optimization of a 3D mesh Exploits the optimality of RPM together with the high vertical bandwidth enabled in 3D technology LM architecture consumes 27% less power, occupies 26% less area than a 3D mesh RPM-LM has comparable (marginally better) performance to RPM on a 3D mesh

36 Thank you!!


Download ppt "A Novel 3D Layer-Multiplexed On-Chip Network Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California, San Diego."

Similar presentations


Ads by Google