Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.

Similar presentations


Presentation on theme: "Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti."— Presentation transcript:

1 Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti

2 2/19The NoX Router, Micro’11 Overview New low-latency router technique –Don’t arbitrate or speculate! Encode. XOR Property (A^B) ^ B = A –Hides arbitration latency –Eliminates dead cycles The NoX Router –Single-cycle/wormhole/mesh implementation –Frequency competitive with pure speculative –2.7%-34.4% better ED 2 on application traces –Up to 9.9% better throughput on synthetic traffic Control InputChannel Switch Fabric

3 3/19The NoX Router, Micro’11 Motivation Modern On-Chip Networks –Bandwidth Plentiful, Latency Critical –Control Complex, Speculative, Critical Path –Datapath Fast, Simple, Wire-Dominated NoX Tradeoff –Marginal increase in datapath complexity –Hide control latency Intel Teraflops Router LT BW NRC BW NRC VA SA ST LT RC VA SA ST BW LT BW NRC BW NRC VA SA VA SA ST LT VA NRC SA VA NRC SA ST Virtual Channel Router Pipeline Evolution

4 4/19The NoX Router, Micro’11 Switch Arbitration Techniques Non-Speculative –Arbitration occurs before switch traversal Speculative Switch Traversal [Mullins ISCA 2004] –Assume contention doesn’t happen –Wasted cycle in the event of contention Arbiter decides what gets sent on the next cycle Switch Fabric Control B A A clk port 0 port 1 grant valid out data out 014cycle23 A p0 A A B p1 ??? B A A ? B A p0 B A A B A No Contention Contention B Wins A Wins

5 5/19The NoX Router, Micro’11 Switch Arbitration Techniques Non-Speculative –Arbitration occurs before switch traversal Speculative Switch Traversal [Mullins ISCA 2004] –Assume contention doesn’t happen –Wasted cycle in the event of contention Arbiter decides what gets sent on the next cycle Encoding –Blindly transmit, XOR within switch fabric –No contention - data sent unmodified –Contention - data sent XOR’d Arbiter decides what was sent Switch Fabric Control B A B A A A^B A 014cycle23 clk port 0 port 1 grant valid out data out A p0 A A B p1 B^A A A A No Contention Contention B Wins

6 6/19The NoX Router, Micro’11 Coded Flit Buffer AA^B^CB^CC Receive Logic Works upon simple XOR property. –(A^B^C) ^ (B^C) = A Simple Decode –Always able to decode by XORing two sequential values –Maintains previous router’s arbitration order/fairness A 0 0 B^C 1 A^B^CCB^CB

7 7/19The NoX Router, Micro’11 Tradeoffs and Scaling Arbitration –O(log n) delay for most arbiters Decode logic –Constant with respect to # of ports Switch Fabric –XOR delay scales slightly worse than a mux/tristate-based solution –Maybe not an issue (control latency) Control InputChannel Switch Fabric Switch Fabric

8 8/19The NoX Router, Micro’11 The NoX Router Network of XORs Implementation Details –8x8 Mesh, 2mm long 64-bit links –Single Cycle (Router+Link) –Wormhole –Dimension ordered routing –Minimally buffered

9 9/19The NoX Router, Micro’11 Baseline Designs Non-Speculative –Serial arbitration & switch logic –Long cycle time –Efficient link utilization Speculative Techniques [Mullins ISCA 2004] –Hides arbitration latency –Potential for wasted link bandwidth –Spec-Fast & Spec-Accurate [Mullins ASP-DAC 2006]

10 10/19The NoX Router, Micro’11 Frequency Analysis Overheads present in all designs –248ps SRAM delay –98ps link latency ArchitectureClock Period% Non-Speculative0.92 ns- Spec-Fast0.69 ns33.3% Spec-Accurate0.72 ns27.7% NoX0.76 ns21.1%

11 11/19The NoX Router, Micro’11 Synthetic Traffic - Latency bandwidth (MB/s/node)

12 12/19The NoX Router, Micro’11 Synthetic Traffic – ED 2 bandwidth (MB/s/node)

13 13/19The NoX Router, Micro’11 Application Traffic - Latency

14 14/19The NoX Router, Micro’11 Application Traffic – ED 2

15 15/19The NoX Router, Micro’11 Power @ Fixed Bandwidth Traffic Pattern –Uniform Random –2GB/s/node injection rate Spec-Fast saturated Switch/Link glitching in speculative Marginal additional decode power Decode negligible

16 16/19The NoX Router, Micro’11 Area Floorplanning Standard RouterNoX Router Port 0 – 64x4 SRAMPort 1 – 64x4 SRAMPort 2 – 64x4 SRAMPort 3 – 64x4 SRAMPort 4 – 64x4 SRAM Crossbar Decoding and Masking 140 µm 70 µm 101.0 µm 161.2 µm Port 0 – 64x4 SRAMPort 1 – 64x4 SRAMPort 2 – 64x4 SRAMPort 3 – 64x4 SRAMPort 4 – 64x4 SRAM 140 µm 70 µm XOR Switch 102.2 µm 161.2 µm 28 µm

17 17/19The NoX Router, Micro’11 Going Further Input Speedup –What if we could drive two values from an input buffer in a single cycle –Final decode step has 2 values available Last packet sees no additional delay from contention at the previous router Multi-hop encoded forwarding –Don’t decode @ every hop, decode when packets diverge –Allow new collisions with the “head” flit –Requires additional sideband info Switch Fabric Flit Buffer A^B B AB

18 18/19The NoX Router, Micro’11 Conclusion New encoding-based low-latency router technique –Hides arbitration latency –Comparable frequency to speculative switch traversal techniques –Eliminates wasted interconnect bandwidth –Promising application to multiple router architectures

19 19/19The NoX Router, Micro’11 Thanks – Questions?

20 20/19The NoX Router, Micro’11 Virtual Channels Future Work Physical Channels vs. Virtual Channels –VC Router Benefits Dynamic bandwidth sharing (performance) –VC Router Negatives  Increased arbitration delay (performance)  Increased buffer energy (power)  Large unified crossbar (area, power) Possible but tradeoffs need to be re-evaluated –Structuring of input buffers/decode logic –VC credit accounting

21 21/19The NoX Router, Micro’11 Multi-Flit Support Current support is conservative –Performs similarly to speculative routers if multi-flit packets collide –Not all bad though ~70% of packets are single-flit coherence packets Only head-flit collisions matter Requests all single-flit Alternatives –Fragment multi-flit packets –Provide sufficient buffering space


Download ppt "Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti."

Similar presentations


Ads by Google