Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.

Slides:



Advertisements
Similar presentations
Interconnection Networks: Flow Control and Microarchitecture.
Advertisements

Misbah Mubarak, Christopher D. Carothers
QuT: A Low-Power Optical Network-on-chip
A Novel 3D Layer-Multiplexed On-Chip Network
Interconnection Networks: Topology and Routing Natalie EnrightJerger.
Weighted Random Oblivious Routing on Torus Networks Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California, San Diego.
Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University.
Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks ______________________________ John Kim, William J. Dally &Dennis Abts Presented.
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
CS 258 Parallel Computer Architecture Lecture 5 Routing February 6, 2008 Prof John D. Kubiatowicz
1 Lecture 21: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
Issues in System-Level Direct Networks Jason D. Bakos.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
Cristóbal Camarero With support from: Enrique Vallejo Ramón Beivide
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
McRouter: Multicast within a Router for High Performance NoCs
Interconnect Network Topologies
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
Interconnect Networks
On-Chip Networks and Testing
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Elastic-Buffer Flow-Control for On-Chip Networks
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Dragonfly Topology for networks Presented by : Long Bao.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip
George Michelogiannakis, Prof. William J. Dally Concurrent architecture & VLSI group Stanford University Elastic Buffer Flow Control for On-chip Networks.
Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, ​ Kevin Chang, Greg Nazario, Reetuparna.
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
Network-on-Chip Introduction Axel Jantsch / Ingo Sander
Express Cube Topologies for On-chip Interconnects Boris Grot J. Hestness, S. W. Keckler, O. Mutlu † The University of Texas at Austin † Carnegie Mellon.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Off-Line AGV Routing on the 2D Mesh Topology with Partial Permutation
Traffic Steering Between a Low-Latency Unsiwtched TL Ring and a High-Throughput Switched On-chip Interconnect Jungju Oh, Alenka Zajic, Milos Prvulovic.
University of Michigan, Ann Arbor
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
Yu Cai Ken Mai Onur Mutlu
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
1 Lecture 15: NoC Innovations Today: power and performance innovations for NoCs.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Virtual-Channel Flow Control William J. Dally
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
1 Lecture 14: Interconnection Networks Topics: dimension vs. arity, deadlock.
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
Design Space Exploration for NoC Topologies ECE757 6 th May 2009 By Amit Kumar, Kanchan Damle, Muhammad Shoaib Bin Altaf, Janaki K.M Jillella Course Instructor:
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
Network-on-Chip Paradigm Erman Doğan. OUTLINE SoC Communication Basics  Bus Architecture  Pros, Cons and Alternatives NoC  Why NoC?  Components 
Topologies.
How to Train your Dragonfly
Lecture 23: Interconnection Networks
Effective mechanism for bufferless networks at intensive workloads
Exploring Concentration and Channel Slicing in On-chip Network Router
Azeddien M. Sllame, Amani Hasan Abdelkader
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Lecture 17: NoC Innovations
Lecture 14: Interconnection Networks
Interconnection Network Design Lecture 14
Abdelhafid Bouhraoua and M.E.S El-Rabaa
Presentation transcript:

Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang

Motivation & Goal Most on-chip networks (2D mesh): low-radix  Pros: simple & short wires  Cons: long network diameter & energy inefficiency (many hops) High-radix networks  Intermediate routers: reduced a lot  Small latency & lower power Goal: how does on-chip network use high- radix routers to reduce latency & energy

On-chip network Plentiful bandwidth due to inexpensive wires while buffers are expensive lower cost: from smaller distance  By reducing number of channels & buffers  Concentration: several terminal nodes share resources (routers) Latency:  Reduce hop count at the expense of T S ↑to get an overall reduced latency

On-chip Flattened Butterfly Topology  Radix=10(concentration factor:4; 3:d1; 3:d2)  2 hops  Longer wires-> deeper buffers Non-minimal global adaptive routing (UGAL)  Load balance & performance: path diversity  Routing minimally or non-minimally  Non-minimal: minimal Direction-ordered routing (prevent deadlock)  Only 2 VCs Fig. 3a

Bypass Channels & Microarchitecture Goal: reduce distance traveled by packets to reduce latency and energy Two types of muxes  Input muxes: bypass inputs or direct inputs  Output muxes: direct outputs or bypass inputs Yield arbiter to guarantee global fairness  If primary input is idle, non-primary input is chosen  Control packet: prevent starvation Combination of minimal and non-minimal routing

Bypass Channels (continue) Switch architecture  Minimal: simplified crossbar switch  Non-minimal: more complexity  Non-minimal with bypass channels: less complexity Flow control & routing  Buffers for non-primary inputs  Separate buffers for destination of control packets  Modify UGAL to support bypass channels

Evaluation Throughput: up to 50% throughput increase compared to concentrated mesh Power: about 38% power reduction compared to mesh Latency: about 28% latency reduction compared to mesh

Scalability Lower channel increasing factor than hypercube Three ways to scale  Concentrate factor  Dimension of the flattened butterfly  Hybrid approach Future technology helps long wires Increasing VCs will slightly reduce latency

Conclusion & Concerns Flattened-butterfly:  interesting idea  Maximum distance between nodes=2  Non-minimal routing to balance load  Bypassing channel to reduce latency  Lower latency and power, high throughput compared to mesh Concerns:  High channel count? (bigger than mesh & torus)  Low channel utilization? (due to high channel)  Control complexity? (arbitration, control packets)  Bypass channel: good idea? (How about just use non- minimal or minimal?)