ECE 1749H: Interconnection Networks for Parallel Computer Architectures: Flow Control Prof. Natalie Enright Jerger.

Slides:



Advertisements
Similar presentations
Interconnection Networks: Flow Control and Microarchitecture.
Advertisements

Prof. Natalie Enright Jerger
A Novel 3D Layer-Multiplexed On-Chip Network
Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE.
TELE202 Lecture 8 Congestion control 1 Lecturer Dr Z. Huang Overview ¥Last Lecture »X.25 »Source: chapter 10 ¥This Lecture »Congestion control »Source:
Network-on-Chip (2/2) Ben Abdallah Abderazek The University of Aizu
What is Flow Control ? Flow Control determines how a network resources, such as channel bandwidth, buffer capacity and control state are allocated to packet.
Miguel Gorgues, Dong Xiang, Jose Flich, Zhigang Yu and Jose Duato Uni. Politecnica de Valencia, Spain School of Software, Tsinghua University, China, Achieving.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
1 Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background.
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
CSE 291-a Interconnection Networks Lecture 12: Deadlock Avoidance (Cont’d) Router February 28, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter.
1 Lecture 16: On-Chip Networks Today: on-chip networks background.
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
CSE 291-a Interconnection Networks Lecture 15: Router (cont’d) March 5, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter 2007 Transcribed by Ling.
CSE 291-a Interconnection Networks Lecture 10: Flow Control February 21, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter 2007 Transcribed by.
1 Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID.
Issues in System-Level Direct Networks Jason D. Bakos.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.
Dragonfly Topology and Routing
Switching, routing, and flow control in interconnection networks.
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Elastic-Buffer Flow-Control for On-Chip Networks
Networks-on-Chips (NoCs) Basics
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
Deadlock CEG 4131 Computer Architecture III Miodrag Bolic.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
NC2 (No.4) 1 Undeliverable packets & solutions Deadlock: packets are unable to progress –Prevention, avoidance, recovery Livelock: packets cannot reach.
1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.
BZUPAGES.COM Presentation On SWITCHING TECHNIQUE Presented To; Sir Taimoor Presented By; Beenish Jahangir 07_04 Uzma Noreen 07_08 Tayyaba Jahangir 07_33.
Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one.
Unit III Bandwidth Utilization: Multiplexing and Spectrum Spreading In practical life the bandwidth available of links is limited. The proper utilization.
Lecture 16: Router Design
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Virtual-Channel Flow Control William J. Dally
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
Computer Communication & Networks Lecture # 03 Circuit Switching, Packet Switching Nadeem Majeed Choudhary
Network-on-Chip (2/2) Ben Abdallah, Abderazek The University of Aizu 1 KUST University, March 2011.
Chapter 10 Congestion Control in Data Networks and Internets 1 Chapter 10 Congestion Control in Data Networks and Internets.
Flow Control Ben Abdallah Abderazek The University of Aizu
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
Chapter 3 Part 3 Switching and Bridging
The network-on-chip protocol
Lecture 23: Interconnection Networks
Interconnection Networks: Flow Control
Lecture 23: Router Design
Chapter 3 Part 3 Switching and Bridging
Mechanics of Flow Control
Lecture: Interconnection Networks
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
Congestion Control (from Chapter 05)
CEG 4131 Computer Architecture III Miodrag Bolic
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Lecture: Interconnection Networks
Chapter 3 Part 3 Switching and Bridging
Congestion Control (from Chapter 05)
Congestion Control (from Chapter 05)
Lecture 25: Interconnection Networks
Congestion Control (from Chapter 05)
Congestion Control (from Chapter 05)
Presentation transcript:

ECE 1749H: Interconnection Networks for Parallel Computer Architectures: Flow Control Prof. Natalie Enright Jerger

Announcements Project Progress Reports – Due March 9, submit by – Worth 15% of project grade – 1 page Discuss current status of project Any difficulties/problems encountered Anticipated changes from original project proposal Winter 2011ECE 1749H: Interconnection Networks (Enright Jerger)2

Announcements (2) 2 presentations next week – Elastic-Buffer Flow Control for On-Chip Networks Presenter: Islam – Express Virtual Channels: Toward the ideal interconnection fabric Presenter: Yu 1 Critique due Winter 2011ECE 1749H: Interconnection Networks (Enright Jerger)3

Switching/Flow Control Overview Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine allocation of resources to messages as they traverse network – Buffers and links – Significant impact on throughput and latency of network Winter 20114ECE 1749H: Interconnection Networks (Enright Jerger)

Flow Control Control state records – allocation of channels and buffers to packets – current state of packet traversing node Channel bandwidth advances flits from this node to next Buffers hold flits waiting for channel bandwidth Winter 2011ECE 1749H: Interconnection Networks (Enright Jerger)5 Buffer capacity Control State Channel Bandwidth

Packets Messages: composed of one or more packets – If message size is <= maximum packet size only one packet created Packets: composed of one or more flits Flit: flow control digit Phit: physical digit – Subdivides flit into chunks = to link width Winter 20116ECE 1749H: Interconnection Networks (Enright Jerger)

Packets (2) Off-chip: channel width limited by pins – Requires phits On-chip: abundant wiring means phit size == flit size RouteSeq# TypeVCID Packet Flit Head, Body, Tail, Head & Tail Phit HeaderPayload Head Flit Body FlitTail Flit Message Winter 20117ECE 1749H: Interconnection Networks (Enright Jerger)

Packets(3) Packet contains destination/route information – Flits may not  all flits of a packet must take same route RC Cache line TypeVCID Add r Head Flit Bytes 0-15Bytes 16-31Bytes 32-47Bytes Body Flits Tail Flit RC Coherence Command TypeVCID Add r Head & Tail Flit Cmd Winter 20118ECE 1749H: Interconnection Networks (Enright Jerger)

Switching Different flow control techniques based on granularity – Message-based: allocation made at message granularity (circuit-switching) – Packet-based: allocation made to whole packets – Flit-based: allocation made on a flit-by-flit basis Winter 20119ECE 1749H: Interconnection Networks (Enright Jerger)

Message-Based Flow Control Coarsest granularity Circuit-switching – Pre-allocates resources across multiple hops Source to destination Resources = links Buffers are not necessary – Probe sent into network to reserve resources Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Circuit Switching Once probe sets up circuit – Message does not need to perform any routing or allocation at each network hop – Good for transferring large amounts of data Can amortize circuit setup cost by sending data with very low per- hop overheads No other message can use those resources until transfer is complete – Throughput can suffer due setup and hold time for circuits – Links are idle until setup is complete Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Circuit Switching Example Significant latency overhead prior to data transfer – Data transfer does not pay per-hop overhead for routing and allocation Acknowledgement Configuration Probe Data Circuit 0 5 Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Circuit Switching Example (2) When there is contention – Significant wait time – Message from 1  2 must wait Acknowledgement Configuration Probe Data Circuit Winter ECE 1749H: Interconnection Networks (Enright Jerger)

S 08 Location Time S 08 A 08 D 08 T 08 S S 28 Time to setup+ack circuit from 0 to 8 Time setup from 2 to 8 is blocked Time-Space Diagram: Circuit-Switching Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Packet-based Flow Control Break messages into packets Interleave packets on links – Better utilization Requires per-node buffering to store in-flight packets Two types of packet-based techniques Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Store and Forward Links and buffers are allocated to entire packet Head flit waits at router until entire packet is received before being forwarded to the next hop Not suitable for on-chip – Requires buffering at each router to hold entire packet Packet cannot traverse link until buffering allocated to entire packet – Incurs high latencies (pays serialization latency at each hop) Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Store and Forward Example High per-hop latency – Serialization delay paid at each hop Larger buffering required 0 5 Total delay = 4 cycles per hop x 3 hops = 12 cycles Winter ECE 1749H: Interconnection Networks (Enright Jerger)

H Location Time BBBT HBBBT HBBBT HBBBT HBBBT Time-Space Diagram: Store and Forward Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Packet-based: Virtual Cut Through Links and Buffers allocated to entire packets Flits can proceed to next hop before tail flit has been received by current router – But only if next router has enough buffer space for entire packet Reduces the latency significantly compared to SAF But still requires large buffers – Unsuitable for on-chip Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Virtual Cut Through Example Lower per-hop latency Large buffering required 0 5 Total delay = 1 cycle per hop x 3 hops + serialization = 6 cycles Allocate 4 flit-sized buffers before head proceeds Winter ECE 1749H: Interconnection Networks (Enright Jerger)

H Location Time BBBT HBBBT HBBBT HBBBT HBBBT Time-Space Diagram: VCT Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Virtual Cut Through Throughput suffers from inefficient buffer allocation Cannot proceed because only 2 flit buffers available Winter ECE 1749H: Interconnection Networks (Enright Jerger)

H Location Time BBBT HBBBT HBBBT HBBBT HBBBT Insufficient Buffers Time-Space Diagram: VCT (2) Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Flit-Level Flow Control Help routers meet tight area/power constraints Flit can proceed to next router when there is buffer space available for that flit – Improves over SAF and VCT by allocating buffers on a flit-by-flit basis Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Wormhole Flow Control Pros – More efficient buffer utilization (good for on-chip) – Low latency Cons – Poor link utilization: if head flit becomes blocked, all links spanning length of packet are idle Cannot be re-allocated to different packet Suffers from head of line (HOL) blocking Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Wormhole Example 6 flit buffers/input port Blocked by other packets Channel idle but red packet blocked behind blue Buffer full: blue cannot proceed Red holds this channel: channel remains idle until read proceeds Winter ECE 1749H: Interconnection Networks (Enright Jerger)

H Location Time BBBT HBBBT HB HB HBBBT ContentionBBT BBT Time-Space Diagram: Wormhole Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Virtual Channels First proposed for deadlock avoidance – We’ll come back to this Can be applied to any flow control – First proposed with wormhole Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Virtual Channel Flow Control Virtual channels used to combat HOL blocking in wormhole Virtual channels: multiple flit queues per input port – Share same physical link (channel) Link utilization improved – Flits on different VC can pass blocked packet Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Virtual Channel Flow Control (2) Winter 2011ECE 1749H: Interconnection Networks (Enright Jerger)30 A (in) B (in) A (out) B (out)

Virtual Channel Flow Control (3) Winter 2011ECE 1749H: Interconnection Networks (Enright Jerger)31 AHA1A2A3A4A5 In1 BHB1B2B3B4B5 In2 AHBHA1B1A2 Out B2A3B3A4B4A5B5ATBT AT 33 BT A Downstream B Downstream AHA1A2A3A4A5AT BHB1B2B3B4B5BT

Virtual Channel Flow Control (3) Packets compete for VC on flit by flit basis In example: on downstream links, flits of each packet are available every other cycle Upstream links throttle because of limited buffers Does not mean links are idle – May be used by packet allocated to other VCs Winter 2011ECE 1749H: Interconnection Networks (Enright Jerger)32

Virtual Channel Example 6 flit buffers/input port 3 flit buffers/VC Blocked by other packets Buffer full: blue cannot proceed Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Summary of techniques LinksBuffersComments Circuit- Switching MessagesN/A (buffer-less)Setup & Ack Store and Forward Packet Head flit waits for tail Virtual Cut Through Packet Head can proceed WormholePacketFlitHOL Virtual Channel Flit Interleave flits of different packets Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Deadlock Using flow control to guarantee deadlock freedom give more flexible routing – Recall: routing restrictions needed for deadlock freedom If routing algorithm is not deadlock free – VCs can break resource cycle Each VC is time-multiplexed onto physical link – Holding VC implies holding associated buffer queue – Not tying up physical link resource Enforce order on VCs Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Deadlock: Enforce Order All message sent through VC 0 until cross dateline After dateline, assigned to VC 1 – Cannot be allocated to VC 0 again A0 A1 B0B1 C0 C1 D0D0 D1D1 Dateline Winter ECE 1749H: Interconnection Networks (Enright Jerger) B B C C D D A A A A B B D D C C

Deadlock: Escape VCs Enforcing order lowers VC utilization – Previous example: VC 1 underutilized Escape Virtual Channels – Have 1 VC that is deadlock free – Example: VC 0 uses DOR, other VCs use arbitrary routing function – Access to VCs arbitrated fairly: packet always has chance of landing on escape VC Assign different message classes to different VCs to prevent protocol level deadlock – Prevent req-ack message cycles Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Buffer Backpressure Need mechanism to prevent buffer overflow – Avoid dropping packets – Upstream nodes need to know buffer availability at downstream routers Significant impact on throughput achieved by flow control Two common mechanisms – Credits – On-off Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Credit-Based Flow Control Upstream router stores credit counts for each downstream VC Upstream router – When flit forwarded Decrement credit count – Count == 0, buffer full, stop sending Downstream router – When flit forwarded and buffer freed Send credit to upstream router Upstream increments credit count Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Credit Timeline Round-trip credit delay: – Time between when buffer empties and when next flit can be processed from that buffer entry – If only single entry buffer, would result in significant throughput degradation – Important to size buffers to tolerate credit turn-around Node 1Node 2 Flit departs router t1 Credit Process t2 t3 Credit Flit Process t4 t5 Credit round trip delay Winter ECE 1749H: Interconnection Networks (Enright Jerger)

On-Off Flow Control Credit: requires upstream signaling for every flit On-off: decreases upstream signaling Off signal – Sent when number of free buffers falls below threshold F off On signal – Sent when number of free buffers rises above threshold F on Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Process On-Off Timeline Less signaling but more buffering – On-chip buffers more expensive than wires Node 1Node 2 t1 Flit t2 F off threshold reached Off Flit On Process Flit t3 t4 t5 t6 t7 t8 F off set to prevent flits arriving before t4 from overflowing F on threshold reached F on set so that Node 2 does not run out of flits between t5 and t8 42

Buffer Utilization VA/ SA 1Cycle Credit count 2 Head Flit SA Body Flit STLT Body Flit 2 Credit (head) BWVA/ SA ST CC-LTC- Up STLTBWSAST SASTLT CC-LTC- Up 2 Credit (body 1) SASTLT Tail Flit Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Buffer Sizing Prevent backpressure from limiting throughput – Buffers must hold flits >= turnaround time Assume: – 1 cycle propagation delay for data and credits – 1 cycle credit processing delay – 3 cycle router pipeline At least 6 flit buffers Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Actual Buffer Usage & Turnaround Delay Flit arrives at node 1 and uses buffer Flit leaves node 1 and credit is sent to node 0 Node 0 receives credit Node 0 processes credit, freed buffer reallocated to new flit New flit leaves Node 0 for Node 1 New flit arrives at Node 1 and reuses buffer Actual buffer usage Credit propagation delay Credit pipeline delay flit pipeline delay flit propagation delay Winter ECE 1749H: Interconnection Networks (Enright Jerger) 113 1

Buffer Sizing: On/Off 9 cycles turnaround – Requires more buffering than credit-based Flit arrives at node 1 and uses buffers Off signal is sent Flit leaves Node 1. On signal sent to Node 0 Node 0 receives on signal Node 0 processes on signal. Buffer reallocated to new flit New flit leaves Node 0 New flit arrives at Node 1 and reuses buffer. Off signal sent On pipeline delay Flit pipeline delay Flit propagation delay Node 0 receives off signal On propagation delay Node 0 processes off signal and stops sending flits Flit pipeline delay, actual buffer usage Off propagation delay Off pipeline delay Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Flow Control and MPSoCs Wormhole flow control Real time performance requirements – Quality of Service – Guaranteed bandwidth allocated to each node Time division multiplexing Irregularity – Different buffer sizes Winter ECE 1749H: Interconnection Networks (Enright Jerger)

Flow Control Summary On-chip networks require techniques with lower buffering requirements – Wormhole or Virtual Channel flow control Avoid dropping packets in on-chip environment – Requires buffer backpressure mechanism Complexity of flow control impacts router microarchitecture (in 2 weeks) Winter ECE 1749H: Interconnection Networks (Enright Jerger)