© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.

Slides:



Advertisements
Similar presentations
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) The Black Widow High Radix Clos Network S. Scott, D.Abts, J. Kim, and W.
Advertisements

Prof. Natalie Enright Jerger
A Novel 3D Layer-Multiplexed On-Chip Network
What is Flow Control ? Flow Control determines how a network resources, such as channel bandwidth, buffer capacity and control state are allocated to packet.
ECE 1749H: Interconnection Networks for Parallel Computer Architectures: Flow Control Prof. Natalie Enright Jerger.
ECE 8813a (1) Non-minimal Routing Non-minimal routing  Wormhole degrades performance while VCT has less secondary effects  Fault tolerance is the main.
Montek Singh COMP Nov 10,  Design questions at various leves ◦ Network Adapter design ◦ Network level: topology and routing ◦ Link level:
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
1 Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background.
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
CSE 291-a Interconnection Networks Lecture 12: Deadlock Avoidance (Cont’d) Router February 28, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter.
1 Lecture 16: On-Chip Networks Today: on-chip networks background.
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
CSE 291-a Interconnection Networks Lecture 15: Router (cont’d) March 5, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter 2007 Transcribed by Ling.
1 Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
On-Chip Networks and Testing
Elastic-Buffer Flow-Control for On-Chip Networks
Networks-on-Chips (NoCs) Basics
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Blue Gene/L Torus Interconnection Network N. R. Adiga, et.al IBM Journal.
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
Deadlock CEG 4131 Computer Architecture III Miodrag Bolic.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
George Michelogiannakis, Prof. William J. Dally Concurrent architecture & VLSI group Stanford University Elastic Buffer Flow Control for On-chip Networks.
1 Lecture 26: Networks, Storage Topics: router microarchitecture, disks, RAID (Appendix D) Final exam: Monday 30 th Apr 10:30-12:30 Same rules as the midterm.
NC2 (No.4) 1 Undeliverable packets & solutions Deadlock: packets are unable to progress –Prevention, avoidance, recovery Livelock: packets cannot reach.
1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.
Final Chapter Packet-Switching and Circuit Switching 7.3. Statistical Multiplexing and Packet Switching: Datagrams and Virtual Circuits 4. 4 Time Division.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Interconnection Networks.
Router Architecture. December 21, 2015SoC Architecture2 Network-on-Chip Information in the form of packets is routed via channels and switches from one.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Deadlock.
Lecture 16: Router Design
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Deadlock: Part II.
Efficient Microarchitecture for Network-on-Chip Routers
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220.
1 Lecture 15: NoC Innovations Today: power and performance innovations for NoCs.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Deadlock: Part II - Recovery.
Flow Control Ben Abdallah Abderazek The University of Aizu
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
Switch Microarchitecture Basics
The network-on-chip protocol
Lecture 23: Interconnection Networks
Physical constraints (1/2)
Interconnection Networks: Flow Control
Lecture 23: Router Design
Lecture 16: On-Chip Networks
NoC Switch: Basic Design Principles &
Lecture 17: NoC Innovations
Mechanics of Flow Control
Lecture: Interconnection Networks
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Lecture: Networks Topics: TM wrap-up, networks.
Lecture: Interconnection Networks
Lecture 25: Interconnection Networks
Presentation transcript:

© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics

ECE 8813a (2) Reading 1.L. S. Peh and W. J. Dally, “A Delay Model and Speculative Architecture for Pipelined Routers,” Proc. of the 7th Int’l Symposium on High Performance Computer Architecture, Monterrey, January, L. S. Peh and W. J. Dally, “A Delay Model for Router Microarchitectures,” IEEE Micro, January-February Text: Sections 7.2.1, 7.2.2, and (pages )

ECE 8813a (3) Organization Routing, arbitration, switching Buffer management and flow control Concurrent, pipelined, speculative implementation of basic switch functions

ECE 8813a (4) Major Switch Components Microarchitecture level operation Buffers/Virtual Channels: how? Routing: where? Arbitration: who? Scheduling/Allocation: when?

ECE 8813a (5) A Virtual Channel Switch Data plane Control plane From L. S. Peh and W. J. Dally, “A Delay Model for Router Microarchitectures,” IEEE Micro, January- February 2001

ECE 8813a (6) Routing Decisions Formally represented as a routing function  Mapping from input ports (channels) to output ports (channels)  Distinct for oblivious vs. adaptive routing Common implementation forms  Finite state machine  Table look up Centralized vs. distributed  Across input ports (virtual channels)

ECE 8813a (7) Some Operational Principles Data and control planes operate a three rates  Phit, flit, packet  Resources are allocated and de-allocated at these rates Fixed clock cycle model State Management  Of resources – allocation  Of data – mapping to resources Granularity of allocation/management key to deadlock freedom in pipelined switches

ECE 8813a (8) Pipelined Switch Microarchitecture CrossBar Stage 1Stage 2Stage 3Stage 4Stage 5 VC Allocation IB (Input Buffering) RC VCA SA ST & Output Buffering Input buffers DEMUX Physical channel Link Control Link Control Physical channel MUX DEMUX MUX Output buffers Link Control Output buffers Link Control Physical channel Physical channel DEMUX MUX DEMUX MUX Routing Computation Switch Allocation L. S. Peh and W. J. Dally, “A Delay Model and Speculative Architecture for Pipelined Routers,” Proc. of the 7th Int’l Symposium on High Performance Computer Architecture, Monterrey, January, 2001

ECE 8813a (9) Buffer States Input buffers  free, routing, VCA, transmitting, stalled (flow control)  Output port and output virtual channel  Flow control information: stop/go, credits Output Buffers:  transmitting, stalled (flow control), free  Input port and input virtual channel  Flow control information: stop/go, credits

ECE 8813a (10) Virtual Channel Allocation How are candidates for arbitration created  routing function  Alternatives depend on routing flexibility This is the point at which dependencies are create  when deadlock is avoided L. S. Peh and W. J. Dally, “A Delay Model and Speculative Architecture for Pipelined Routers,” Proc. of the 7th Int’l Symposium on High Performance Computer Architecture, Monterrey, January, 2001 Deterministic routing Adaptive/Link Fully adaptive

ECE 8813a (11) Switch Allocator Separable allocator Separate allocator for speculative and non-speculative requests Output port allocated for packet duration Low state update rate Non-VC AllocatorVC Allocator L. S. Peh and W. J. Dally, “A Delay Model and Speculative Architecture for Pipelined Routers,” Proc. of the 7th Int’l Symposium on High Performance Computer Architecture, Monterrey, January, 2001

ECE 8813a (12) Switch Allocation Flits bid on a cycle basis for cross-bar slots  Possible to increase the granularity of bids or the duration which a crossbar port can be held  SA cannot create deadlock since ports are not held indefinitely Success in SA is accompanied by flow control updates  For example, transmitting credits Traversal of tail flit reinitializes, input channel, resets input/output buffer allocations L. S. Peh and W. J. Dally, “A Delay Model and Speculative Architecture for Pipelined Routers,” Proc. of the 7th Int’l Symposium on High Performance Computer Architecture, Monterrey, January, 2001

ECE 8813a (13) The Crossbar There are many alternative cross-bar designs L. S. Peh and W. J. Dally, “A Delay Model and Speculative Architecture for Pipelined Routers,” Proc. of the 7th Int’l Symposium on High Performance Computer Architecture, Monterrey, January, 2001

ECE 8813a (14) Speculation What can be speculated?  Cross bar pending VC allocation  More complex for adaptive routing protocols Speculative flits vs. non-speculative  Header flits vs. body & tail flits Overhead of speculation  High traffic loads masks failures in speculation  Low traffic loads increase probability of success

ECE 8813a (15) Pipeline Disruptions Resource availability disruptions  VC availability  Downstream buffer space not available  Inter-packet gap is a function of deadlock freedom (later) Allocated flow disruptions  Switch not available  Downstream buffer space not available Disruptions (pipeline bubbles) propagate to the destination through intermediate routers

ECE 8813a (16) A Look at Channel Dependencies Issue: creating structural dependencies  Dependencies between messages due to concurrent use of VC buffers  Such dependencies must be globally managed to avoid deadlock Architectural decision: when is a VC freed?  When the tail flit releases an input virtual channel  When the tail releases the output virtual channel oRemember a VC traverses a link!

ECE 8813a (17) Base Performance L. S. Peh and W. J. Dally, “A Delay Model and Speculative Architecture for Pipelined Routers,” Proc. of the 7th Int’l Symposium on High Performance Computer Architecture, Monterrey, January, 2001

ECE 8813a (18) Buffer Occupancy Deeper pipelining increases the buffer turnaround time and decreases occupancy L. S. Peh and W. J. Dally, “A Delay Model and Speculative Architecture for Pipelined Routers,” Proc. of the 7th Int’l Symposium on High Performance Computer Architecture, Monterrey, January, 2001

ECE 8813a (19) Impact of Flow Control L. S. Peh and W. J. Dally, “A Delay Model and Speculative Architecture for Pipelined Routers,” Proc. of the 7th Int’l Symposium on High Performance Computer Architecture, Monterrey, January, 2001

ECE 8813a (20) What Next? Buffer organization Arbitration Switching scheduling On-chip vs. off-chip implementations