Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.

Similar presentations


Presentation on theme: "Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University."— Presentation transcript:

1 Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University of Cantabria, Spain ISCA’07 Presented By Tina Miriam John

2 Outline Introduction The Rotary Router Avoiding Anomalies Performance Evaluation Implementation Practicality Conclusion

3 Introduction CMP – Most effective way to deal with increasing design complexity. Lower latency, higher bandwidth, low power consumption and area requirements. Existing low cost router architectures cause Head of Line (HOL) blocking. Centralized internal storage not feasible in CMP framework. Real traffic patterns deviate from balanced usage of network resources while employing deterministic algorithms. Smaller packet size as in CMP networks, reduces bandwidth increase effectiveness.

4 General Router Structure  Rotary Router sketch  Minimizes effects of small packets and takes advantage of them.  No appreciable HOL blocking.  Uses topology dependent adaptive routing.

5 General Router Structure Two independent rings : packets circulate either clockwise or anti-clockwise. Each ring built with a group of Dual-port FIFO Buffers (DFB). Packets circulate using DFBs of the ring, until they reach a profitable output port. No centralized arbitration employed; instead done independently at each router output port, independent of number of input ports.

6 Router Building Blocks Input Stage Made of FIFO buffer and demux. Computes profitable output ports for each entering packet Selects ring direction for packet movement – to minimize delay. Delay depends on # of DFBs traversed and time spent at each DFB. Output Stage Responsible for getting packets out of the rings and sending them to a neighbor router. Made of two buffers and a mux. Applies Flow Control mechanism between contiguous routers.

7 Router Building Blocks Buffering Segment Stage Made up of two DFBs connecting every two router ports. Each DFB has two pairs of R/W ports. One pair builds a ring in which the packets turn. The other pair connects the buffer to Input and Output stages. Decodes routing information generated by Input stage, placed in packet header.

8 Flow Control and Routing Algorithm Virtual Cut Through – Controls advance of packets among routers. Bubble flow control – Regulates packet injection into rings Occupation based flow control – Manages advance of packets in rings inside router.

9 Avoiding Anomalies Deadlock and Livelock Bubble flow control prevents input ports from exhausting buffering space in the internal rings of the router. Packets always move between routers because of guaranteed hole in any ring. Delays appearance of congested situations and removes HOL blocking effect. Starvation Injection traffic needs three holes to enter a router; in- transit traffic requires only two. In-transit traffic starvation reduced by balancing buffer occupation among input ports. Done by modifying flow control, increasing the required number of holes to inject a packet into the ring.

10 Performance Evaluation Synthetic Workloads (a) (b) Maximum Normalized Throughput (a) 4x4 torus (b) 8x8 torus

11 Performance Evaluation Synthetic Workloads (a) Random Traffic (b) Transpose Matrix Traffic

12 Performance Evaluation Real Workloads (a) Normalized Execution Time (b) Main Simulation Parameters

13 Implementation Practicality Delay and Area (a) Structure of DFB (b) Atomic modules of DFB

14 Implementation Viability Power (a) Power consumption for 8x8 (b) Mobility of packets torus network

15 Conclusion A novel router architecture targeting CMP systems. Utilizes a decentralized and scalable structure based on rings. Eliminates HOL blocking, improves performance and provides a deadlock avoidance mechanism. Reasonable costs in terms of area and power consumption.

16 References W. Dally, B. Towles, “Principles and Practices of Interconnection Networks”. Morgan Kaufmann, 2004.. P. Kermani, L. Kleinrock, “Virtual Cut-Through: A New Computer Communication Switching Technique”. Computer Networks, Vol. 3, pp. 267-286, September 1979. V. Puente, J.A. Gregorio, J. M. Prellezo, R.Beivide, J. Duato,C. Izu, “Adaptive Bubble Router: a Design to Improve Performance in Torus Networks”, International Conference of Parallel Processing (ICPP) 1999. Y. Tamir and G.L. Frazier. “Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches” IEEE Trans. on Computers, Vol.41, No. 6, pp 725-737, June 1992.

17 Thanks!!!


Download ppt "Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University."

Similar presentations


Ads by Google