Prof. Natalie Enright Jerger

Slides:



Advertisements
Similar presentations
Interconnection Networks: Flow Control and Microarchitecture.
Advertisements

Prof. Natalie Enright Jerger
QuT: A Low-Power Optical Network-on-chip
A Novel 3D Layer-Multiplexed On-Chip Network
Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE.
William Stallings Data and Computer Communications 7 th Edition Chapter 13 Congestion in Data Networks.
REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
What is Flow Control ? Flow Control determines how a network resources, such as channel bandwidth, buffer capacity and control state are allocated to packet.
ECE 1749H: Interconnection Networks for Parallel Computer Architectures: Flow Control Prof. Natalie Enright Jerger.
Module R R RRR R RRRRR RR R R R R Efficient Link Capacity and QoS Design for Wormhole Network-on-Chip Zvika Guz, Isask ’ har Walter, Evgeny Bolotin, Israel.
High Performance Router Architectures for Network- based Computing By Dr. Timothy Mark Pinkston University of South California Computer Engineering Division.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
Network based System on Chip Part A Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
CSE 291-a Interconnection Networks Lecture 15: Router (cont’d) March 5, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter 2007 Transcribed by Ling.
Network-on-Chip Examples System-on-Chip Group, CSE-IMM, DTU.
CSE 291-a Interconnection Networks Lecture 10: Flow Control February 21, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter 2007 Transcribed by.
Issues in System-Level Direct Networks Jason D. Bakos.
Trace-Driven Optimization of Networks-on-Chip Configurations Andrew B. Kahng †‡ Bill Lin ‡ Kambiz Samadi ‡ Rohit Sunkam Ramanujam ‡ University of California,
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Switching, routing, and flow control in interconnection networks.
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
Communication issues for NOC By Farhadur Arifin. Objective: Future system of NOC will have strong requirment on reusability and communication performance.
Networks-on-Chips (NoCs) Basics
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
DUKE UNIVERSITY Self-Tuned Congestion Control for Multiprocessor Networks Shubhendu S. Mukherjee VSSAD, Alpha Development Group.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
Deadlock CEG 4131 Computer Architecture III Miodrag Bolic.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.
Packet switching network Data is divided into packets. Transfer of information as payload in data packets Packets undergo random delays & possible loss.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
BZUPAGES.COM Presentation On SWITCHING TECHNIQUE Presented To; Sir Taimoor Presented By; Beenish Jahangir 07_04 Uzma Noreen 07_08 Tayyaba Jahangir 07_33.
Lecture # 03 Switching Course Instructor: Engr. Sana Ziafat.
Lecture 16: Router Design
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Advanced Processor Group The School of Computer Science A Dynamic Link Allocation Router Wei Song, Doug Edwards Advanced Processor Group The University.
Virtual-Channel Flow Control William J. Dally
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
Chapter 10 Congestion Control in Data Networks and Internets 1 Chapter 10 Congestion Control in Data Networks and Internets.
Flow Control Ben Abdallah Abderazek The University of Aizu
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
The network-on-chip protocol
Lecture 23: Interconnection Networks
Interconnection Networks: Flow Control
Azeddien M. Sllame, Amani Hasan Abdelkader
Lecture 23: Router Design
Chapter 3 Part 3 Switching and Bridging
Mechanics of Flow Control
Lecture: Interconnection Networks
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
Congestion Control (from Chapter 05)
CEG 4131 Computer Architecture III Miodrag Bolic
Lecture: Interconnection Networks
Chapter 3 Part 3 Switching and Bridging
Lecture 25: Interconnection Networks
Congestion Control (from Chapter 05)
Presentation transcript:

Prof. Natalie Enright Jerger ECE 1749H: Interconnection Networks for Parallel Computer Architectures: Flow Control Prof. Natalie Enright Jerger

Switching/Flow Control Overview Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine allocation of resources to messages as they traverse network Buffers and links Significant impact on throughput and latency of network Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

ECE 1749H: Interconnection Networks (Enright Jerger) Packets Messages: composed of one or more packets If message size is <= maximum packet size only one packet created Packets: composed of one or more flits Flit: flow control digit Phit: physical digit Subdivides flit into chunks = to link width Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

ECE 1749H: Interconnection Networks (Enright Jerger) Packets (2) Message Header Payload Packet Route Seq# Head Flit Body Flit Tail Flit Flit Type VCID Head, Body, Tail, Head & Tail Phit Off-chip: channel width limited by pins Requires phits On-chip: abundant wiring means phit size == flit size Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

ECE 1749H: Interconnection Networks (Enright Jerger) Packets(3) Cache line RC Type VCID Addr Bytes 0-15 Bytes 16-31 Bytes 32-47 Bytes 48-63 Head Flit Body Flits Tail Flit Coherence Command RC Type VCID Addr Cmd Head & Tail Flit Packet contains destination/route information Flits may not  all flits of a packet must take same route Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

ECE 1749H: Interconnection Networks (Enright Jerger) Switching Different flow control techniques based on granularity Circuit-switching: operates at the granularity of messages Packet-based: allocation made to whole packets Flit-based: allocation made on a flit-by-flit basis Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

Message-Based Flow Control Coarsest granularity Circuit-switching Pre-allocates resources across multiple hops Source to destination Resources = links Buffers are not necessary Probe sent into network to reserve resources Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

ECE 1749H: Interconnection Networks (Enright Jerger) Circuit Switching Once probe sets up circuit Message does not need to perform any routing or allocation at each network hop Good for transferring large amounts of data Can amortize circuit setup cost by sending data with very low per-hop overheads No other message can use those resources until transfer is complete Throughput can suffer due setup and hold time for circuits Links are idle until setup is complete Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

Circuit Switching Example Configuration Probe 5 Data Circuit Acknowledgement Significant latency overhead prior to data transfer Data transfer does not pay per-hop overhead for routing and allocation Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

Circuit Switching Example (2) 1 2 Configuration Probe 5 Data Circuit Acknowledgement When there is contention Significant wait time Message from 1  2 must wait Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

Packet-based Flow Control Break messages into packets Interleave packets on links Better utilization Requires per-node buffering to store in-flight packets Two types of packet-based techniques Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

ECE 1749H: Interconnection Networks (Enright Jerger) Store and Forward Links and buffers are allocated to entire packet Head flit waits at router until entire packet is received before being forwarded to the next hop Not suitable for on-chip Requires buffering at each router to hold entire packet Packet cannot traverse link until buffering allocated to entire packet Incurs high latencies (pays serialization latency at each hop) Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

Store and Forward Example Total delay = 4 cycles per hop x 3 hops = 12 cycles 5 High per-hop latency Serialization delay paid at each hop Larger buffering required Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

Packet-based: Virtual Cut Through Links and Buffers allocated to entire packets Flits can proceed to next hop before tail flit has been received by current router But only if next router has enough buffer space for entire packet Reduces the latency significantly compared to SAF But still requires large buffers Unsuitable for on-chip Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

Virtual Cut Through Example Allocate 4 flit-sized buffers before head proceeds Allocate 4 flit-sized buffers before head proceeds Total delay = 1 cycle per hop x 3 hops + serialization = 6 cycles 5 Lower per-hop latency Large buffering required Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

Virtual Cut Through Cannot proceed because only 2 flit buffers available Throughput suffers from inefficient buffer allocation Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

Flit Level Flow Control Help routers meet tight area/power constraints Flit can proceed to next router when there is buffer space available for that flit Improved over SAF and VCT by allocating buffers on a flit-basis Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

ECE 1749H: Interconnection Networks (Enright Jerger) Wormhole Flow Control Pros More efficient buffer utilization (good for on-chip) Low latency Cons Poor link utilization: if head flit becomes blocked, all links spanning length of packet are idle Cannot be re-allocated to different packet Suffers from head of line (HOL) blocking Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

Wormhole Example 6 flit buffers/input port Red holds this channel: channel remains idle until read proceeds Channel idle but red packet blocked behind blue Buffer full: blue cannot proceed Blocked by other packets 6 flit buffers/input port Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

ECE 1749H: Interconnection Networks (Enright Jerger) Virtual Channels First proposed for deadlock avoidance We’ll come back to this Can be applied to any flow control First proposed with wormhole Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

Virtual Channel Flow Control Virtual channels used to combat HOL blocking in wormhole Virtual channels: multiple flit queues per input port Share same physical link (channel) Link utilization improved Flits on different VC can pass blocked packet Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

Virtual Channel Example Buffer full: blue cannot proceed Blocked by other packets 6 flit buffers/input port 3 flit buffers/VC Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

ECE 1749H: Interconnection Networks (Enright Jerger) Summary of techniques Links Buffers Circuit-Switching Messages N/A (buffer-less) Store and Forward Packet Virtual Cut Through Wormhole Flit Virtual Channel Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

ECE 1749H: Interconnection Networks (Enright Jerger) Deadlock Using flow control to guarantee deadlock freedom give more flexible routing Recall: routing restrictions needed for deadlock freedom If routing algorithm is not deadlock free VCs can break resource cycle Each VC is time-multiplexed onto physical link Holding VC implies holding associated buffer queue Not tying up physical link resource Enforce order on VCs Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

Deadlock: Enforce Order B D C Dateline B1 D B0 D0 B D1 C1 A C0 All message sent through VC 0 until cross dateline After dateline, assigned to VC 1 Cannot be allocated to VC 0 again Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

ECE 1749H: Interconnection Networks (Enright Jerger) Deadlock: Escape VCs Enforcing order lowers VC utilization Previous example: VC 1 underutilized Escape Virtual Channels Have 1 VC that is deadlock free Example: VC 0 uses DOR, other VCs use arbitrary routing function Access to VCs arbitrated fairly: packet always has chance of landing on escape VC Assign different message classes to different VCs to prevent protocol level deadlock Prevent req-ack message cycles Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

ECE 1749H: Interconnection Networks (Enright Jerger) Buffer Backpressure Need mechanism to prevent buffer overflow Avoid dropping packets Upstream nodes need to know buffer availability at downstream routers Significant impact on throughput achieved by flow control Two common mechanisms Credits On-off Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

Credit-Based Flow Control Upstream router stores credit counts for each downstream VC Upstream router When flit forwarded Decrement credit count Count == 0, buffer full, stop sending Downstream router When flit forwarded and buffer freed Send credit to upstream router Upstream increments credit count Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

Credit Timeline Round-trip credit delay: Node 1 Node 2 t1 Credit Flit departs router t2 Process Credit round trip delay t3 Credit Flit t4 Process t5 Round-trip credit delay: Time between when buffer empties and when next flit can be processed from that buffer entry If only single entry buffer, would result in significant throughput degradation Important to size buffers to tolerate credit turn-around Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

ECE 1749H: Interconnection Networks (Enright Jerger) On-Off Flow Control Credit: requires upstream signaling for every flit On-off: decreases upstream signaling Off signal Sent when number of free buffers falls below threshold Foff On signal Sent when number of free buffers rises above threshold Fon Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

On-Off Timeline Less signaling but more buffering Foff threshold reached Node 1 Node 2 t1 Flit Foff set to prevent flits arriving before t4 from overflowing t2 Flit Off Flit t3 Process Flit t4 Flit Flit Flit Flit Fon threshold reached t5 Flit Fon set so that Node 2 does not run out of flits between t5 and t8 On Flit t6 Flit Process t7 Flit Flit Flit t8 Flit Less signaling but more buffering On-chip buffers more expensive than wires

ECE 1749H: Interconnection Networks (Enright Jerger) Buffer Utilization Cycle 1 2 3 4 5 6 7 8 9 10 11 12 Credit count 2 1 1 2 1 VA/SA ST LT BW VA/SA ST Head Flit SA ST LT BW SA ST Body Flit 1 C C-LT C-Up Credit (head) SA ST LT Body Flit 2 C C-LT C-Up Credit (body 1) SA ST LT Tail Flit Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

ECE 1749H: Interconnection Networks (Enright Jerger) Buffer Sizing Prevent backpressure from limiting throughput Buffers must hold flits >= turnaround time Assume: 1 cycle propagation delay for data and credits 1 cycle credit processing delay 3 cycle router pipeline At least 6 flit buffers Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

Actual Buffer Usage & Turnaround Delay Credit propagation delay Credit pipeline delay flit propagation delay Actual buffer usage flit pipeline delay Flit leaves node 1 and credit is sent to node 0 Node 0 processes credit, freed buffer reallocated to new flit New flit arrives at Node 1 and reuses buffer Flit arrives at node 1 and uses buffer Node 0 receives credit New flit leaves Node 0 for Node 1 Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

Buffer Sizing: On/Off 9 cycles turnaround Flit pipeline delay, actual buffer usage On propagation delay On pipeline delay Flit pipeline delay Flit propagation delay Flit leaves Node 1. On signal sent to Node 0 Node 0 processes on signal. Buffer reallocated to new flit New flit arrives at Node 1 and reuses buffer. Off signal sent Flit arrives at node 1 and uses buffers Off signal is sent Node 0 receives on signal New flit leaves Node 0 Off propagation delay Off pipeline delay 9 cycles turnaround Requires more buffering than credit-based Node 0 receives off signal Node 0 processes off signal and stops sending flits Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

Flow Control and MPSoCs Wormhole flow control Real time performance requirements Quality of Service Guaranteed bandwidth allocated to each node Time division multiplexing Irregularity Different buffer sizes Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)

ECE 1749H: Interconnection Networks (Enright Jerger) Flow Control Summary On-chip networks require techniques with lower buffering requirements Wormhole or Virtual Channel flow control Avoid dropping packets in on-chip environment Requires buffer backpressure mechanism Complexity of flow control impacts router microarchitecture (next) Winter 2010 ECE 1749H: Interconnection Networks (Enright Jerger)