ECE 8813a (1) Non-minimal Routing Non-minimal routing  Wormhole degrades performance while VCT has less secondary effects  Fault tolerance is the main.

Slides:



Advertisements
Similar presentations
Interconnection Networks: Flow Control and Microarchitecture.
Advertisements

Prof. Natalie Enright Jerger
A Novel 3D Layer-Multiplexed On-Chip Network
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
Jaringan Komputer Lanjut Packet Switching Network.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
A DAPTIVE R OUTING David Ouellet-Poulin CEG 4136 – Computer Architecture III November 16 th, 2010.
Montek Singh COMP Nov 10,  Design questions at various leves ◦ Network Adapter design ◦ Network level: topology and routing ◦ Link level:
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
Miguel Gorgues, Dong Xiang, Jose Flich, Zhigang Yu and Jose Duato Uni. Politecnica de Valencia, Spain School of Software, Tsinghua University, China, Achieving.
High Performance Router Architectures for Network- based Computing By Dr. Timothy Mark Pinkston University of South California Computer Engineering Division.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
CS 258 Parallel Computer Architecture Lecture 5 Routing February 6, 2008 Prof John D. Kubiatowicz
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
Predictive Load Balancing Reconfigurable Computing Group.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
Issues in System-Level Direct Networks Jason D. Bakos.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
Cristóbal Camarero With support from: Enrique Vallejo Ramón Beivide
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Storage area network and System area network (SAN)
Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Routing Algorithms.
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Switching, routing, and flow control in interconnection networks.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
Routing algorithms Routing algorithms establish the path followed by each message or packet. Routing algorithms for wormhole routing are also valid for.
On-Chip Networks and Testing
1 Albert Ferrer-Florit, Steve Parkes Space Technology Centre University of Dundee QoS for SpaceWire networks SpW-RT prototyping.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Distributed Routing Algorithms. In a message passing distributed system, message passing is the only means of interprocessor communication. Unicast, Multicast,
Networks-on-Chips (NoCs) Basics
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
ECE669 L21: Routing April 15, 2004 ECE 669 Parallel Computer Architecture Lecture 21 Routing.
A.SATHEESH Department of Software Engineering Periyar Maniammai University Tamil Nadu.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
NC2 (No.4) 1 Undeliverable packets & solutions Deadlock: packets are unable to progress –Prevention, avoidance, recovery Livelock: packets cannot reach.
1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.
NC2 (No6) 1 Maximally Adaptive Routing Maximize adaptivity for a double-x routing based on turn model. Virtual network 0 Virtual network 1 Maximally adaptive.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Deadlock.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Deadlock: Part II.
CS440 Computer Networks 1 Packet Switching Neil Tang 10/6/2008.
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Virtual-Channel Flow Control William J. Dally
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
1 Lecture 14: Interconnection Networks Topics: dimension vs. arity, deadlock.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Deadlock: Part II - Recovery.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
Lecture 23: Interconnection Networks
Fault-tolerant routing
Routing mechanism and algorithm
Azeddien M. Sllame, Amani Hasan Abdelkader
Rahul Boyapati. , Jiayi Huang
Switching, routing, and flow control in interconnection networks
Lecture 14: Interconnection Networks
Lecture: Interconnection Networks
Lecture 25: Interconnection Networks
Presentation transcript:

ECE 8813a (1) Non-minimal Routing Non-minimal routing  Wormhole degrades performance while VCT has less secondary effects  Fault tolerance is the main motivator Classes  Search-based algorithms  Virtual channel-based routing  Turn-based routing Non-Minimal Routing

ECE 8813a (2) Reading Section 4.7 and/or P.T. Gaughan, et al., “Distributed, deadlock-free routing in faulty, pipelined, direct interconnection networks,” IEEE Transactions on Computers, vol. 45, no. 6, pp , June 1996 A. Mejia, J. Flich, J. Duato, Sven-Arne Reinomo and Tor Skeie, “Segment Based Routing: An Efficient Fault-Tolerant Routing Algorithm for Meshes and Tori,” Proceedings of the International Parallel and Distributed Processing Symposium, April 2006 From J. Flich, A. Mejia, P. Lopez, and J. Duato, “Region-Based Routing: An Efficient Routing Mechanism to Tackle Unreliable Hardware in Networks on Chip,” Proceedings of the First International Symposium on Networks on Chip, May 2007

ECE 8813a (3) Backtracking Protocols Backtracking search + resource reservation Constrain the search  Minimal paths vs. #misroutes Non-Minimal Routing P.T. Gaughan, et al., “Distributed, deadlock-free routing in faulty, pipelined, direct interconnection networks,” IEEE Transactions on Computers, vol. 45, no. 6, pp , June 1996

ECE 8813a (4) Optimization Sensitive to choice of switching technique  Naturally suited to circuit switching and pipelined circuit switching  Overhead is large with SAF Deadlock is avoided by not blocking on busy channels Livelock is avoided by maintaining and using search history  In the header: large headers  In the routers: local state, headers comparable to e-cube Protocol variations  Multi-links  k-family  exhaustive: profitable and misrouting  limited misrouting  multi-phase Non-Minimal Routing

ECE 8813a (5) Topology Agnostic Routing Topology dependent vs. topology agnostic routing  Reliability  Increasingly important on-chip Approaches  Techniques based on virtual channels oExpensive on-chip oCompetes with QoS schemes  Techniques based on Turn restrictions oDifficult to ensure non-minimal paths Topology Agnostic Routing

ECE 8813a (6) Segment Based Routing Topology agnostic routing Restriction-based approach  Multiple restriction options oSelect restrictions based on performance goals  Source based routing oRouting table generation From A. Mejia, J. Flich, J. Duato, Sven-Arne Reinomo and Tor Skeie, “Segment Based Routing: An Efficient Fault-Tolerant Routing Algorithm for Meshes and Tori,” Proceedings of the International Parallel and Distributed Processing Symposium, April Topology Agnostic Routing

ECE 8813a (7) Key Idea: Segments & Subnets Partition topology into subnets and then segments in a subnet Goal: islands of regularity Topology Agnostic Routing

ECE 8813a (8) Key Idea: Optimization Placement of Turn restrictions in a segment  Placement for latency  shortest path  Placement for throughput  distribute traffic Topology segmentation  Can be optimized for regular topologies Topology Agnostic Routing

ECE 8813a (9) Requirements Avoid deadlock in a segment Avoid deadlock when traversing multiple segments Ensure routing connectivity when physical connectivity exists Avoiding congestion in path construction Topology Agnostic Routing

ECE 8813a (10) Construction of Segments Search for starting segment + “regular” segments  Unitary segments Add one bidirectional restriction in each segment Starting node Terminal node Unitary segment bridge segment Topology Agnostic Routing Failed links

ECE 8813a (11) Segment Types Starting segment Regular segment Unitary segment Starting node Terminal node Unitary segment bridge segment Topology Agnostic Routing Failed links

ECE 8813a (12) Deadlock Freedom One routing restriction per segment  No cycles in a segment Every cycle contains a segment  Hence cannot be “closed” to create deadlock No cycle from the start node back to itself  Cannot create cycles across subnets Think of a subnet as a union of 1-D segments Topology Agnostic Routing

ECE 8813a (13) Example From A. Mejia, J. Flich, J. Duato, “On The Potential of Segment Based Routing” Proceedings of the International Conference on Parallel Processing 2008

ECE 8813a (14) Routing From A. Mejia, J. Flich, J. Duato, “On The Potential of Segment Based Routing” Proceedings of the International Conference on Parallel Processing 2008 Segment routing is turn based and therefore partially adaptive Source routing can be layered on top of segments to balance traffic

ECE 8813a (15) Performance From A. Mejia, J. Flich, J. Duato, Sven-Arne Reinomo and Tor Skeie, “Segment Based Routing: An Efficient Fault-Tolerant Routing Algorithm for Meshes and Tori,” Proceedings of the International Parallel and Distributed Processing Symposium, April Topology Agnostic Routing

ECE 8813a (16) Region-Based Routing Recognize that routing decisions implicitly check for region membership  Think meshes Generalize the idea of regions  Can naturally be adapted for fault tolerant routing S D From J. Flich, A. Mejia, P. Lopez, and J. Duato, “Region-Based Routing: An Efficient Routing Mechanism to Tackle Unreliable Hardware in Networks on Chip,” Proceedings of the First International Symposium on Networks on Chip, May 2007 Topology Agnostic Routing

ECE 8813a (17) Example of Regions Static, off-line topology characterization Online querying of network structure  Built on segment-based routin { node set } From J. Flich, A. Mejia, P. Lopez, and J. Duato, “Region-Based Routing: An Efficient Routing Mechanism to Tackle Unreliable Hardware in Networks on Chip,” Proceedings of the First International Symposium on Networks on Chip, May 2007 Topology Agnostic Routing

ECE 8813a (18) Example of Regions What are the characteristics of these regions?  Note #regions = f(routing options) Note use of output port depends on input port  Check W output port from N input port { node set } From J. Flich, A. Mejia, P. Lopez, and J. Duato, “Region-Based Routing: An Efficient Routing Mechanism to Tackle Unreliable Hardware in Networks on Chip,” Proceedings of the First International Symposium on Networks on Chip, May 2007 Topology Agnostic Routing

ECE 8813a (19) Approach Observe that table-based routing is really region based  Each entry identifies a region Merge entries into compact region specifications at each switch Region construction is based on the paths  Any set of paths  fault tolerant routing  No virtual channels Topology Agnostic Routing

ECE 8813a (20) Key Idea Generate paths  All minimal  First non-minimal  Note: using SR routing Record paths at each router  Produce region representation for each output port  Record input port dependencies Program Routers From J. Flich, A. Mejia, P. Lopez, and J. Duato, “Region-Based Routing: An Efficient Routing Mechanism to Tackle Unreliable Hardware in Networks on Chip,” Proceedings of the First International Symposium on Networks on Chip, May 2007 Topology Agnostic Routing

ECE 8813a (21) Creating Regions Coalesce routing options based on inputs and outputs Represents a compact routing table From J. Flich, A. Mejia, P. Lopez, and J. Duato, “Region-Based Routing: An Efficient Routing Mechanism to Tackle Unreliable Hardware in Networks on Chip,” Proceedings of the First International Symposium on Networks on Chip, May 2007 Topology Agnostic Routing

ECE 8813a (22) Region Construction From J. Flich, A. Mejia, P. Lopez, and J. Duato, “Region-Based Routing: An Efficient Routing Mechanism to Tackle Unreliable Hardware in Networks on Chip,” Proceedings of the First International Symposium on Networks on Chip, May 2007 Topology Agnostic node Segment Routing SearchCoalesce & Packing Region Formation Note this can be applied to any topology No virtual channels Offline optimization of latency vs. distance

ECE 8813a (23) Hardware Overheads Each region requires  Four registers that define the region  Mask registers to define input and output ports  Logic to determine routing options Hardware cost grows as the number of regions  Growth as f(network_size) is much slower From J. Flich, A. Mejia, P. Lopez, and J. Duato, “Region-Based Routing: An Efficient Routing Mechanism to Tackle Unreliable Hardware in Networks on Chip,” Proceedings of the First International Symposium on Networks on Chip, May 2007 Topology Agnostic Routing

ECE 8813a (24) Implementation Initialization of region registers and parallel evaluation of all regions From J. Flich, A. Mejia, P. Lopez, and J. Duato, “Region-Based Routing: An Efficient Routing Mechanism to Tackle Unreliable Hardware in Networks on Chip,” Proceedings of the First International Symposium on Networks on Chip, May 2007 Topology Agnostic Routing

ECE 8813a (25) Microarchitecture Issues Routing algorithm performance is sensitive to resource allocation schemes in the router Key resource management functions include  Routing function  Selection function  Arbitration/scheduling Mismatch can lead to poor performance

ECE 8813a (26) Resource Allocation: Selection Functions Selection function may be oblivious or informed  Common to favor minimal paths and lightly loaded links Examples:  Meshes: minimum congestion, maximum flexibility, straight lines Unlike routing functions, selection function must be serialized  Result updates the channel status – a centralized resource VC status/control VC buffer Selection function Routing function Input VC Output VCs

ECE 8813a (27) Selection Functions Favor adaptive channels  Improve probability of escape channel availability  Time dependent selection functions: give adaptivity a chance Selection functions for real time traffic  Separate best effort and guaranteed packets via VCs or virtual networks Note the impact on bisection utilization Selection functions for cache coherent systems?

ECE 8813a (28) Resource Allocation: Arbitration Tradeoffs: channel bandwidth vs. message sizes and types  Mix of buffering strategies across message types All three strategies must be co-designed for a tuned system arbitration Flow controlMessage size

ECE 8813a (29) Routing, Selection & Arbitration Input driven vs. output driven scheduling  Output driven scheduling requires replication of routers amongst inputs Lessons from the microprocessor world  Impact of complexity, workloads, and concurrency Impact of….  Symmetry of the topology  Locality of traffic  Packet size Locality, uniformity Irregular, hot spot deterministic routing adaptive routing

ECE 8813a (30) Characterization of Techniques Deadlock freedom achieved by  Path based techniques oRestrict paths  Buffer based techniques oStructured buffer pools  Channel based techniques o#VCs independent of the network

ECE 8813a (31) Summary Best routing algorithm driven by multiple considerations Deterministic vs. adaptive Uniform vs. non-uniform traffic Packet sizes Power envelope On-chip vs. off-chip Locality of traffic Hot spots Compatible micro-architecture Symmetric vs. asymmetric topology