Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.

Slides:



Advertisements
Similar presentations
Prof. Natalie Enright Jerger
Advertisements

Misbah Mubarak, Christopher D. Carothers
A Novel 3D Layer-Multiplexed On-Chip Network
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
Jaringan Komputer Lanjut Packet Switching Network.
REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
EFFICIENT ROUTING MECHANISMS FOR DRAGONFLY NETWORKS Marina García Enrique Vallejo Ramón Beivide Miguel Odriozola Mateo Valero International Conference.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
Advanced Networking Wickus Nienaber Daniel Beech.
Miguel Gorgues, Dong Xiang, Jose Flich, Zhigang Yu and Jose Duato Uni. Politecnica de Valencia, Spain School of Software, Tsinghua University, China, Achieving.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
Network based System on Chip Final Presentation Part B Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Network based System on Chip Part A Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Gaussian Interconnections for On-Chip Networks Ramón Beivide and Enrique Vallejo University of Cantabria, Spain
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
EE 122: Router Design Kevin Lai September 25, 2002.
Dynamic NoC. 2 Limitations of Fixed NoC Communication NoC for reconfigurable devices:  NOC: a viable infrastructure for communication among task dynamically.
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
Cristóbal Camarero With support from: Enrique Vallejo Ramón Beivide
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.
Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.
Switching, routing, and flow control in interconnection networks.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
On-Chip Networks and Testing
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Elastic-Buffer Flow-Control for On-Chip Networks
ATM SWITCHING. SWITCHING A Switch is a network element that transfer packet from Input port to output port. A Switch is a network element that transfer.
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Deadlock CEG 4131 Computer Architecture III Miodrag Bolic.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
George Michelogiannakis, Prof. William J. Dally Concurrent architecture & VLSI group Stanford University Elastic Buffer Flow Control for On-chip Networks.
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
1 Optical Packet Switching Techniques Walter Picco MS Thesis Defense December 2001 Fabio Neri, Marco Ajmone Marsan Telecommunication Networks Group
A Study of Cyclops64 Crossbar Architecture and Performance Yingping Zhang April, 2005.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
Yu Cai Ken Mai Onur Mutlu
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Intel Slide 1 A Comparative Study of Arbitration Algorithms for the Alpha Pipelined Router Shubu Mukherjee*, Federico Silla !, Peter Bannon $, Joel.
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220.
Virtual-Channel Flow Control William J. Dally
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
1 Lecture 14: Interconnection Networks Topics: dimension vs. arity, deadlock.
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University
Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai.
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Azeddien M. Sllame, Amani Hasan Abdelkader
Lecture 23: Router Design
CEG 4131 Computer Architecture III Miodrag Bolic
Lecture: Interconnection Networks
HopliteBuf: FPGA NoCs with Provably Stall-Free FIFOs
Presentation transcript:

Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University of Cantabria, Spain ISCA’07 Presented By Tina Miriam John

Outline Introduction The Rotary Router Avoiding Anomalies Performance Evaluation Implementation Practicality Conclusion

Introduction CMP – Most effective way to deal with increasing design complexity. Lower latency, higher bandwidth, low power consumption and area requirements. Existing low cost router architectures cause Head of Line (HOL) blocking. Centralized internal storage not feasible in CMP framework. Real traffic patterns deviate from balanced usage of network resources while employing deterministic algorithms. Smaller packet size as in CMP networks, reduces bandwidth increase effectiveness.

General Router Structure  Rotary Router sketch  Minimizes effects of small packets and takes advantage of them.  No appreciable HOL blocking.  Uses topology dependent adaptive routing.

General Router Structure Two independent rings : packets circulate either clockwise or anti-clockwise. Each ring built with a group of Dual-port FIFO Buffers (DFB). Packets circulate using DFBs of the ring, until they reach a profitable output port. No centralized arbitration employed; instead done independently at each router output port, independent of number of input ports.

Router Building Blocks Input Stage Made of FIFO buffer and demux. Computes profitable output ports for each entering packet Selects ring direction for packet movement – to minimize delay. Delay depends on # of DFBs traversed and time spent at each DFB. Output Stage Responsible for getting packets out of the rings and sending them to a neighbor router. Made of two buffers and a mux. Applies Flow Control mechanism between contiguous routers.

Router Building Blocks Buffering Segment Stage Made up of two DFBs connecting every two router ports. Each DFB has two pairs of R/W ports. One pair builds a ring in which the packets turn. The other pair connects the buffer to Input and Output stages. Decodes routing information generated by Input stage, placed in packet header.

Flow Control and Routing Algorithm Virtual Cut Through – Controls advance of packets among routers. Bubble flow control – Regulates packet injection into rings Occupation based flow control – Manages advance of packets in rings inside router.

Avoiding Anomalies Deadlock and Livelock Bubble flow control prevents input ports from exhausting buffering space in the internal rings of the router. Packets always move between routers because of guaranteed hole in any ring. Delays appearance of congested situations and removes HOL blocking effect. Starvation Injection traffic needs three holes to enter a router; in- transit traffic requires only two. In-transit traffic starvation reduced by balancing buffer occupation among input ports. Done by modifying flow control, increasing the required number of holes to inject a packet into the ring.

Performance Evaluation Synthetic Workloads (a) (b) Maximum Normalized Throughput (a) 4x4 torus (b) 8x8 torus

Performance Evaluation Synthetic Workloads (a) Random Traffic (b) Transpose Matrix Traffic

Performance Evaluation Real Workloads (a) Normalized Execution Time (b) Main Simulation Parameters

Implementation Practicality Delay and Area (a) Structure of DFB (b) Atomic modules of DFB

Implementation Viability Power (a) Power consumption for 8x8 (b) Mobility of packets torus network

Conclusion A novel router architecture targeting CMP systems. Utilizes a decentralized and scalable structure based on rings. Eliminates HOL blocking, improves performance and provides a deadlock avoidance mechanism. Reasonable costs in terms of area and power consumption.

References W. Dally, B. Towles, “Principles and Practices of Interconnection Networks”. Morgan Kaufmann, P. Kermani, L. Kleinrock, “Virtual Cut-Through: A New Computer Communication Switching Technique”. Computer Networks, Vol. 3, pp , September V. Puente, J.A. Gregorio, J. M. Prellezo, R.Beivide, J. Duato,C. Izu, “Adaptive Bubble Router: a Design to Improve Performance in Torus Networks”, International Conference of Parallel Processing (ICPP) Y. Tamir and G.L. Frazier. “Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches” IEEE Trans. on Computers, Vol.41, No. 6, pp , June 1992.

Thanks!!!