1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:

Slides:



Advertisements
Similar presentations
Prof. Natalie Enright Jerger
Advertisements

A Novel 3D Layer-Multiplexed On-Chip Network
Network-on-Chip (2/2) Ben Abdallah Abderazek The University of Aizu
What is Flow Control ? Flow Control determines how a network resources, such as channel bandwidth, buffer capacity and control state are allocated to packet.
ECE 1749H: Interconnection Networks for Parallel Computer Architectures: Flow Control Prof. Natalie Enright Jerger.
1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
1 Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background.
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
CSE 291-a Interconnection Networks Lecture 12: Deadlock Avoidance (Cont’d) Router February 28, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter.
1 Lecture 16: On-Chip Networks Today: on-chip networks background.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
CSE 291-a Interconnection Networks Lecture 15: Router (cont’d) March 5, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter 2007 Transcribed by Ling.
1 Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations.
CSE 291-a Interconnection Networks Lecture 10: Flow Control February 21, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter 2007 Transcribed by.
1 Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
1 Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance.
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Switching, routing, and flow control in interconnection networks.
29-Aug-154/598N: Computer Networks Switching and Forwarding Outline –Store-and-Forward Switches.
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
On-Chip Networks and Testing
Networks-on-Chips (NoCs) Basics
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
Deadlock CEG 4131 Computer Architecture III Miodrag Bolic.
1 Lecture 26: Networks, Storage Topics: router microarchitecture, disks, RAID (Appendix D) Final exam: Monday 30 th Apr 10:30-12:30 Same rules as the midterm.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.
BZUPAGES.COM Presentation On SWITCHING TECHNIQUE Presented To; Sir Taimoor Presented By; Beenish Jahangir 07_04 Uzma Noreen 07_08 Tayyaba Jahangir 07_33.
Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one.
Lecture 16: Router Design
CS440 Computer Networks 1 Packet Switching Neil Tang 10/6/2008.
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220.
1 Lecture 15: NoC Innovations Today: power and performance innovations for NoCs.
1 Lecture: Networks, Disks, Datacenters, GPUs Topics: networks wrap-up, disks and reliability, datacenters, GPU intro (Sections , App D, Ch 4)
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Virtual-Channel Flow Control William J. Dally
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
1 Switching and Forwarding Sections Connecting More Than Two Hosts Multi-access link: Ethernet, wireless –Single physical link, shared by multiple.
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
Network-on-Chip (2/2) Ben Abdallah, Abderazek The University of Aizu 1 KUST University, March 2011.
Flow Control Ben Abdallah Abderazek The University of Aizu
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
The network-on-chip protocol
Lecture 23: Interconnection Networks
Physical constraints (1/2)
Lecture 14: Large Cache Design III
Interconnection Networks: Flow Control
Lecture 23: Router Design
Lecture 16: On-Chip Networks
NoC Switch: Basic Design Principles &
Lecture 17: NoC Innovations
Mechanics of Flow Control
Lecture: Transactional Memory, Networks
Lecture: Interconnection Networks
CEG 4131 Computer Architecture III Miodrag Bolic
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Lecture: Networks Topics: TM wrap-up, networks.
Lecture: Interconnection Networks
CS 6290 Many-core & Interconnect
Lecture 25: Interconnection Networks
Multiprocessors and Multi-computers
Presentation transcript:

1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm: caches, multiprocessors, transactional memory, interconnection networks  Graded exams/assignments returned on Thursday; wrap-up lecture

2 Packets/Flits A message is broken into multiple packets (each packet has header information that allows the receiver to re-construct the original message) A packet may itself be broken into flits – flits do not contain additional headers Two packets can follow different paths to the destination Flits are always ordered and follow the same path Such an architecture allows the use of a large packet size (low header overhead) and yet allows fine-grained resource allocation on a per-flit basis

3 Flow Control The routing of a message requires allocation of various resources: the channel (or link), buffers, control state Bufferless: flits are dropped if there is contention for a link, NACKs are sent back, and the original sender has to re-transmit the packet Circuit switching: a request is first sent to reserve the channels, the request may be held at an intermediate router until the channel is available (hence, not truly bufferless), ACKs are sent back, and subsequent packets/flits are routed with little effort (good for bulk transfers)

4 Buffered Flow Control A buffer between two channels decouples the resource allocation for each channel – buffer storage is not as precious a resource as the channel (perhaps, not so true for on-chip networks) Packet-buffer flow control: channels and buffers are allocated per packet  Store-and-forward  Cut-through Time-Space diagrams HBBBT HBBBT HBBBT Cycle Channel HBBBT HBBBT HBBBT

5 Flit-Buffer Flow Control (Wormhole) Wormhole Flow Control: just like cut-through, but with buffers allocated per flit (not channel) A head flit must acquire three resources at the next switch before being forwarded:  channel control state (virtual channel, one per input port)  one flit of channel bandwidth  one flit buffer The other flits adopt the same virtual/physical channel as the head and only compete for the buffer  Consumes much less buffer space than cut-through routing – does not improve channel utilization as another packet cannot cut in (only one VC per input port)

6 Virtual Channel Flow Control Each switch has multiple virtual channels per phys. channel Each virtual channel keeps track of the output channel assigned to the head, and pointers to buffered packets A head flit must allocate the same three resources in the next switch before being forwarded By having multiple virtual channels per physical channel, two different packets are allowed to utilize the channel and not waste the resource when one packet is idle

7 Example Wormhole: Virtual channel: A B B A is going from Node-1 to Node-4; B is going from Node-0 to Node-5 Node-1 Node-0 Node-5 (blocked, no free VCs/buffers) Node-2Node-3Node-4 idle A B A Node-1 Node-0 Node-5 (blocked, no free VCs/buffers) Node-2Node-3Node-4 B A Traffic Analogy: B is trying to make a left turn; A is trying to go straight; there is no left-only lane with wormhole, but there is one with VC

8 Buffer Management Credit-based: keep track of the number of free buffers in the downstream node; the downstream node sends back signals to increment the count when a buffer is freed; need enough buffers to hide the round-trip latency On/Off: the upstream node sends back a signal when its buffers are close to being full – reduces upstream signaling and counters, but can waste buffer space

9 Deadlock Avoidance with VCs VCs provide another way to number the links such that a route always uses ascending link numbers Alternatively, use West-first routing on the 1 st plane and cross over to the 2 nd plane in case you need to go West again (the 2 nd plane uses North-last, for example)

10 Router Functions Crossbar, buffer, arbiter, VC state and allocation, buffer management, ALUs, control logic Typical on-chip network power breakdown:  30% link  30% buffers  30% crossbar

11 Virtual Channel Router Buffers and channels are allocated per flit Each physical channel is associated with multiple virtual channels – the virtual channels are allocated per packet and the flits of various VCs can be interweaved on the physical channel For a head flit to proceed, the router has to first allocate a virtual channel on the next router For any flit to proceed (including the head), the router has to allocate the following resources: buffer space in the next router (credits indicate the available space), access to the physical channel

12 Router Pipeline Four typical stages:  RC routing computation: the head flit indicates the VC that it belongs to, the VC state is updated, the headers are examined and the next output channel is computed (note: this is done for all the head flits arriving on various input channels)  VA virtual-channel allocation: the head flits compete for the available virtual channels on their computed output channels  SA switch allocation: a flit competes for access to its output physical channel  ST switch traversal: the flit is transmitted on the output channel A head flit goes through all four stages, the other flits do nothing in the first two stages (this is an in-order pipeline and flits can not jump ahead), a tail flit also de-allocates the VC

13 Router Pipeline Four typical stages:  RC routing computation: compute the output channel  VA virtual-channel allocation: allocate VC for the head flit  SA switch allocation: compete for output physical channel  ST switch traversal: transfer data on output physical channel RCVASAST -- SAST -- SAST -- SAST Cycle Head flit Body flit 1 Body flit 2 Tail flit RCVASAST -- SAST -- SAST -- SAST SA -- STALL

14 Speculative Pipelines Perform VA and SA in parallel Note that SA only requires knowledge of the output physical channel, not the VC If VA fails, the successfully allocated channel goes un-utilized RC VA SA ST --SAST --SAST --SAST Cycle Head flit Body flit 1 Body flit 2 Tail flit Perform VA, SA, and ST in parallel (can cause collisions and re-tries) Typically, VA is the critical path – can possibly perform SA and ST sequentially Router pipeline latency is a greater bottleneck when there is little contention When there is little contention, speculation will likely work well! Single stage pipeline? RC VA SA ST

15 Recent Intel Router Source: Partha Kundu, “On-Die Interconnects for Next-Generation CMPs”, talk at On-Chip Interconnection Networks Workshop, Dec 2006 Used for a 6x6 mesh 16 B, > 3 GHz Wormhole with VC flow control

16 Recent Intel Router Source: Partha Kundu, “On-Die Interconnects for Next-Generation CMPs”, talk at On-Chip Interconnection Networks Workshop, Dec 2006

17 Recent Intel Router Source: Partha Kundu, “On-Die Interconnects for Next-Generation CMPs”, talk at On-Chip Interconnection Networks Workshop, Dec 2006

18 Title Bullet