Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.

Slides:



Advertisements
Similar presentations
Prof. Natalie Enright Jerger
Advertisements

A Novel 3D Layer-Multiplexed On-Chip Network
Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University.
What is Flow Control ? Flow Control determines how a network resources, such as channel bandwidth, buffer capacity and control state are allocated to packet.
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
Miguel Gorgues, Dong Xiang, Jose Flich, Zhigang Yu and Jose Duato Uni. Politecnica de Valencia, Spain School of Software, Tsinghua University, China, Achieving.
1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.
NETWORK ON CHIP ROUTER Students : Itzik Ben - shushan Jonathan Silber Instructor : Isaschar Walter Final presentation part A Winter 2006.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
1 Lecture 16: On-Chip Networks Today: on-chip networks background.
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
Design of a High-Throughput Distributed Shared-Buffer NoC Router
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
CSE 291-a Interconnection Networks Lecture 15: Router (cont’d) March 5, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter 2007 Transcribed by Ling.
1 Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID.
Issues in System-Level Direct Networks Jason D. Bakos.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University.
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.
Diamonds are a Memory Controller’s Best Friend* *Also known as: Achieving Predictable Performance through Better Memory Controller Placement in Many-Core.
McRouter: Multicast within a Router for High Performance NoCs
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
Elastic-Buffer Flow-Control for On-Chip Networks
Networks-on-Chips (NoCs) Basics
International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia.
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
1 Lecture 26: Networks, Storage Topics: router microarchitecture, disks, RAID (Appendix D) Final exam: Monday 30 th Apr 10:30-12:30 Same rules as the midterm.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.
University of Michigan, Ann Arbor
Yu Cai Ken Mai Onur Mutlu
Lecture 16: Router Design
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Intel Slide 1 A Comparative Study of Arbitration Algorithms for the Alpha Pipelined Router Shubu Mukherjee*, Federico Silla !, Peter Bannon $, Joel.
Topology-aware QOS Support in Highly Integrated CMPs Boris Grot (UT-Austin) Stephen W. Keckler (NVIDIA/UT-Austin) Onur Mutlu (CMU) WIOSCA '10.
1 Lecture 15: NoC Innovations Today: power and performance innovations for NoCs.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Virtual-Channel Flow Control William J. Dally
Design Space Exploration for NoC Topologies ECE757 6 th May 2009 By Amit Kumar, Kanchan Damle, Muhammad Shoaib Bin Altaf, Janaki K.M Jillella Course Instructor:
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
Lecture 23: Interconnection Networks
Physical constraints (1/2)
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Exploring Concentration and Channel Slicing in On-chip Network Router
Lecture 23: Router Design
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Lecture 16: On-Chip Networks
NoC Switch: Basic Design Principles &
Lecture 17: NoC Innovations
Lecture: Interconnection Networks
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
Lecture: Interconnection Networks
Lecture 25: Interconnection Networks
Presentation transcript:

Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti

2/19The NoX Router, Micro’11 Overview New low-latency router technique –Don’t arbitrate or speculate! Encode. XOR Property (A^B) ^ B = A –Hides arbitration latency –Eliminates dead cycles The NoX Router –Single-cycle/wormhole/mesh implementation –Frequency competitive with pure speculative –2.7%-34.4% better ED 2 on application traces –Up to 9.9% better throughput on synthetic traffic Control InputChannel Switch Fabric

3/19The NoX Router, Micro’11 Motivation Modern On-Chip Networks –Bandwidth Plentiful, Latency Critical –Control Complex, Speculative, Critical Path –Datapath Fast, Simple, Wire-Dominated NoX Tradeoff –Marginal increase in datapath complexity –Hide control latency Intel Teraflops Router LT BW NRC BW NRC VA SA ST LT RC VA SA ST BW LT BW NRC BW NRC VA SA VA SA ST LT VA NRC SA VA NRC SA ST Virtual Channel Router Pipeline Evolution

4/19The NoX Router, Micro’11 Switch Arbitration Techniques Non-Speculative –Arbitration occurs before switch traversal Speculative Switch Traversal [Mullins ISCA 2004] –Assume contention doesn’t happen –Wasted cycle in the event of contention Arbiter decides what gets sent on the next cycle Switch Fabric Control B A A clk port 0 port 1 grant valid out data out 014cycle23 A p0 A A B p1 ??? B A A ? B A p0 B A A B A No Contention Contention B Wins A Wins

5/19The NoX Router, Micro’11 Switch Arbitration Techniques Non-Speculative –Arbitration occurs before switch traversal Speculative Switch Traversal [Mullins ISCA 2004] –Assume contention doesn’t happen –Wasted cycle in the event of contention Arbiter decides what gets sent on the next cycle Encoding –Blindly transmit, XOR within switch fabric –No contention - data sent unmodified –Contention - data sent XOR’d Arbiter decides what was sent Switch Fabric Control B A B A A A^B A 014cycle23 clk port 0 port 1 grant valid out data out A p0 A A B p1 B^A A A A No Contention Contention B Wins

6/19The NoX Router, Micro’11 Coded Flit Buffer AA^B^CB^CC Receive Logic Works upon simple XOR property. –(A^B^C) ^ (B^C) = A Simple Decode –Always able to decode by XORing two sequential values –Maintains previous router’s arbitration order/fairness A 0 0 B^C 1 A^B^CCB^CB

7/19The NoX Router, Micro’11 Tradeoffs and Scaling Arbitration –O(log n) delay for most arbiters Decode logic –Constant with respect to # of ports Switch Fabric –XOR delay scales slightly worse than a mux/tristate-based solution –Maybe not an issue (control latency) Control InputChannel Switch Fabric Switch Fabric

8/19The NoX Router, Micro’11 The NoX Router Network of XORs Implementation Details –8x8 Mesh, 2mm long 64-bit links –Single Cycle (Router+Link) –Wormhole –Dimension ordered routing –Minimally buffered

9/19The NoX Router, Micro’11 Baseline Designs Non-Speculative –Serial arbitration & switch logic –Long cycle time –Efficient link utilization Speculative Techniques [Mullins ISCA 2004] –Hides arbitration latency –Potential for wasted link bandwidth –Spec-Fast & Spec-Accurate [Mullins ASP-DAC 2006]

10/19The NoX Router, Micro’11 Frequency Analysis Overheads present in all designs –248ps SRAM delay –98ps link latency ArchitectureClock Period% Non-Speculative0.92 ns- Spec-Fast0.69 ns33.3% Spec-Accurate0.72 ns27.7% NoX0.76 ns21.1%

11/19The NoX Router, Micro’11 Synthetic Traffic - Latency bandwidth (MB/s/node)

12/19The NoX Router, Micro’11 Synthetic Traffic – ED 2 bandwidth (MB/s/node)

13/19The NoX Router, Micro’11 Application Traffic - Latency

14/19The NoX Router, Micro’11 Application Traffic – ED 2

15/19The NoX Router, Micro’11 Fixed Bandwidth Traffic Pattern –Uniform Random –2GB/s/node injection rate Spec-Fast saturated Switch/Link glitching in speculative Marginal additional decode power Decode negligible

16/19The NoX Router, Micro’11 Area Floorplanning Standard RouterNoX Router Port 0 – 64x4 SRAMPort 1 – 64x4 SRAMPort 2 – 64x4 SRAMPort 3 – 64x4 SRAMPort 4 – 64x4 SRAM Crossbar Decoding and Masking 140 µm 70 µm µm µm Port 0 – 64x4 SRAMPort 1 – 64x4 SRAMPort 2 – 64x4 SRAMPort 3 – 64x4 SRAMPort 4 – 64x4 SRAM 140 µm 70 µm XOR Switch µm µm 28 µm

17/19The NoX Router, Micro’11 Going Further Input Speedup –What if we could drive two values from an input buffer in a single cycle –Final decode step has 2 values available Last packet sees no additional delay from contention at the previous router Multi-hop encoded forwarding –Don’t every hop, decode when packets diverge –Allow new collisions with the “head” flit –Requires additional sideband info Switch Fabric Flit Buffer A^B B AB

18/19The NoX Router, Micro’11 Conclusion New encoding-based low-latency router technique –Hides arbitration latency –Comparable frequency to speculative switch traversal techniques –Eliminates wasted interconnect bandwidth –Promising application to multiple router architectures

19/19The NoX Router, Micro’11 Thanks – Questions?

20/19The NoX Router, Micro’11 Virtual Channels Future Work Physical Channels vs. Virtual Channels –VC Router Benefits Dynamic bandwidth sharing (performance) –VC Router Negatives  Increased arbitration delay (performance)  Increased buffer energy (power)  Large unified crossbar (area, power) Possible but tradeoffs need to be re-evaluated –Structuring of input buffers/decode logic –VC credit accounting

21/19The NoX Router, Micro’11 Multi-Flit Support Current support is conservative –Performs similarly to speculative routers if multi-flit packets collide –Not all bad though ~70% of packets are single-flit coherence packets Only head-flit collisions matter Requests all single-flit Alternatives –Fragment multi-flit packets –Provide sufficient buffering space