Express Cube Topologies for On-chip Interconnects Boris Grot J. Hestness, S. W. Keckler, O. Mutlu † The University of Texas at Austin † Carnegie Mellon.

Slides:



Advertisements
Similar presentations
A Novel 3D Layer-Multiplexed On-Chip Network
Advertisements

The Locality-Aware Adaptive Cache Coherence Protocol George Kurian 1, Omer Khan 2, Srini Devadas 1 1 Massachusetts Institute of Technology 2 University.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
Interconnection Networks: Topology and Routing Natalie EnrightJerger.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks ______________________________ John Kim, William J. Dally &Dennis Abts Presented.
CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips CiprianSeiculescu Stavros Volos Naser Khosro Pour Babak Falsafi Giovanni De Micheli.
Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees Boris Grot The University of Texas at Austin.
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh.
1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)
L2 to Off-Chip Memory Interconnects for CMPs Presented by Allen Lee CS258 Spring 2008 May 14, 2008.
Firefly: Illuminating Future Network-on-Chip with Nanophotonics Yan Pan, Prabhat Kumar, John Kim †, Gokhan Memik, Yu Zhang, Alok Choudhary EECS Department.
IP I/O Memory Hard Disk Single Core IP I/O Memory Hard Disk IP Bus Multi-Core IP R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R Networks.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
Design of a High-Throughput Distributed Shared-Buffer NoC Router
Towards Scalable, Energy-Efficient, Bus-Based On-Chip Networks Aniruddha N. Udipi with Naveen Muralimanohar*, Rajeev Balasubramonian University of Utah.
Interconnection Network Topology Design Trade-offs
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
1 Lecture 21: Coherence and Interconnection Networks Papers: Flexible Snooping: Adaptive Filtering and Forwarding in Embedded Ring Multiprocessors, UIUC,
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.
1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University.
Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Manycore Network Interfaces for In-Memory Rack-Scale Computing Alexandros Daglis, Stanko Novakovic, Edouard Bugnion, Babak Falsafi, Boris Grot.
McRouter: Multicast within a Router for High Performance NoCs
Interconnect Network Topologies
Tightly-Coupled Multi-Layer Topologies for 3D NoCs Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi (NII, JAPAN) Hideharu Amano (Keio Univ, JAPAN)
1 University of Utah & HP Labs 1 Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 Naveen Muralimanohar Rajeev Balasubramonian.
Interconnect Networks
On-Chip Networks and Testing
Physical Planning for the Architectural Exploration of Large-Scale Chip Multiprocessors Javier de San Pedro, Nikita Nikitin, Jordi Cortadella and Jordi.
Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)
Report Advisor: Dr. Vishwani D. Agrawal Report Committee: Dr. Shiwen Mao and Dr. Jitendra Tugnait Survey of Wireless Network-on-Chip Systems Master’s Project.
Three-Dimensional Layout of On-Chip Tree-Based Networks Hiroki Matsutani (Keio Univ, Japan) Michihiro Koibuchi (NII, Japan) D. Frank Hsu (Fordham Univ,
International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia.
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
Multiprocessor Interconnection Networks Todd C. Mowry CS 740 November 3, 2000 Topics Network design issues Network Topology.
Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), th Annual International Symposium on.
Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, ​ Kevin Chang, Greg Nazario, Reetuparna.
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
Network-on-Chip Introduction Axel Jantsch / Ingo Sander
In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI.
Traffic Steering Between a Low-Latency Unsiwtched TL Ring and a High-Throughput Switched On-chip Interconnect Jungju Oh, Alenka Zajic, Milos Prvulovic.
University of Michigan, Ann Arbor
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
Yu Cai Ken Mai Onur Mutlu
ASPLOS’02 Presented by Kim, Sun-Hee.  Technology trends ◦ The rate of frequency scaling is slowing down  Performance must come from exploiting concurrency.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Author: J. Kim, C. Nicopoulos (Dept. of CSE, PSU)
Hybrid Optoelectric On-chip Interconnect Networks Yong-jin Kwon 1.
Topology-aware QOS Support in Highly Integrated CMPs Boris Grot (UT-Austin) Stephen W. Keckler (NVIDIA/UT-Austin) Onur Mutlu (CMU) WIOSCA '10.
Topology How the components are connected. Properties Diameter Nodal degree Bisection bandwidth A good topology: small diameter, small nodal degree, large.
Building manycore processor-to-DRAM networks using monolithic silicon photonics Ajay Joshi †, Christopher Batten †, Vladimir Stojanović †, Krste Asanović.
Boris Grot, Joel Hestness, Stephen W. Keckler 1 The University of Texas at Austin 1 NVIDIA Research Onur Mutlu Carnegie Mellon University.
Design Space Exploration for NoC Topologies ECE757 6 th May 2009 By Amit Kumar, Kanchan Damle, Muhammad Shoaib Bin Altaf, Janaki K.M Jillella Course Instructor:
FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator for NoC Modeling in Full-System Simulations 5/3/2011 Michael K. Papamichael, James C.
Fall 2012 Parallel Computer Architecture Lecture 21: Interconnects IV Prof. Onur Mutlu Carnegie Mellon University 10/29/2012.
Interconnection Networks: Topology
Lecture 23: Interconnection Networks
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Exploring Concentration and Channel Slicing in On-chip Network Router
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Rahul Boyapati. , Jiayi Huang
Multiprocessor network topologies
Interconnection Network Design Lecture 14
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
A Case for Interconnect-Aware Architectures
Presentation transcript:

Express Cube Topologies for On-chip Interconnects Boris Grot J. Hestness, S. W. Keckler, O. Mutlu † The University of Texas at Austin † Carnegie Mellon University ‡ Part of this work was performed at Microsoft Research Feb 17, 2009HPCA ‘09

The Era of Many-core UTCS2HPCA ‘09 Intel Larrabee 16+ cores Bidirectional ring interconnect UT TRIPS 2x16 exec tiles 16 NUCA tiles Multiple networks Intel Polaris 80 tiles 8x10 mesh Tilera Tile 64 cores 5 mesh networks

Networks on a Chip (NOCs)  On-chip advantages No pin constraints Rich wiring resources  On-chip limitations 2D substrates limit implementable topologies Logic area constrains use of wiring resources Energy/power budget caps  Focus Topologies for tomorrow’s many-core CMPs HPCA ‘093UTCS

Outline  Introduction  Existing topologies  Multidrop Express Channels (MECS)  Evaluation  Generalized Express Cubes  Summary UTCS4HPCA '09

UTCS5HPCA '09 2-D Mesh

 Pros Low design & layout complexity Simple, fast routers  Cons Large diameter Energy & latency impact UTCS6HPCA '09 2-D Mesh

 Pros Multiple terminals attached to a router node Fast nearest-neighbor communication via the crossbar Hop count reduction proportional to concentration degree  Cons Benefits limited by crossbar complexity UTCS7HPCA '09 Concentration (Balfour & Dally, ICS ‘06 )

UTCS8HPCA '09 Concentration  Side-effects Fewer channels Greater channel width

UTCS9HPCA ‘09 Replication CMesh-X2  Benefits Restores bisection channel count Restores channel width Reduced crossbar complexity

UTCS10HPCA '09 Flattened Butterfly (Kim et al., Micro ‘07)  Objectives: Improve connectivity Exploit the wire budget

UTCS11HPCA '09 Flattened Butterfly (Kim et al., Micro ‘07)

UTCS12HPCA '09 Flattened Butterfly (Kim et al., Micro ‘07)

UTCS13HPCA '09 Flattened Butterfly (Kim et al., Micro ‘07)

UTCS14HPCA '09 Flattened Butterfly (Kim et al., Micro ‘07)

 Pros Excellent connectivity Low diameter: 2 hops  Cons High channel count: k 2 /2 per row/column Low channel utilization Increased control (arbitration) complexity UTCS15HPCA '09 Flattened Butterfly (Kim et al., Micro ‘07)

UTCS16HPCA '09 Multidrop Express Channels (MECS)  Objectives: Connectivity More scalable channel count Better channel utilization

UTCS17HPCA '09 Multidrop Express Channels (MECS)

UTCS18HPCA '09 Multidrop Express Channels (MECS)

UTCS19HPCA '09 Multidrop Express Channels (MECS)

UTCS20HPCA '09 Multidrop Express Channels (MECS)

UTCS21HPCA ‘09 Multidrop Express Channels (MECS)

 Pros One-to-many topology Low diameter: 2 hops k channels row/column Asymmetric  Cons Asymmetric Increased control (arbitration) complexity UTCS22HPCA ‘09 Multidrop Express Channels (MECS)

Analytical Comparison UTCS23HPCA '09 CMeshFBflyMECS Network Size Radix (conctr’d) Diameter Channel count Channel width Router inputs Router outputs

Experimental Methodology TopologiesMesh, CMesh, CMesh-X2, FBFly, MECS, MECS-X2 Network sizes64 & 256 terminals RoutingDOR, adaptive Messages64 & 576 bits Synthetic trafficUniform random, bit complement, transpose, self-similar PARSEC benchmarks Blackscholes, Bodytrack, Canneal, Ferret, Fluidanimate, Freqmine, Vip, x264 Full-system configM5 simulator, Alpha ISA, 64 OOO cores Energy evaluationOrion + CACTI 6 UTCS24HPCA '09

UTCS25HPCA '09 64 nodes: Uniform Random

UTCS26HPCA ' nodes: Uniform Random

UTCS27HPCA '09 Energy (100K pkts, Uniform Random)

UTCS28HPCA '09 64 Nodes: PARSEC

Generalized Express Cubes  Low-dimensional k-ary n-cube n = {1,2} Good fit for planar silicon  Express channels Improve connectivity MECS for better wire utilization  Multiple networks Improve throughput Reduce crossbar area & energy overhead  Hierarchical scaling UTCS29HPCA '09

Partitioning: a GEC Example UTCS30HPCA '09 MECS MECS-X2 Flattened Butterfly Partitioned MECS

Summary  MECS A novel one-to-many topology Good fit for planar substrates Excellent connectivity Effective wire utilization  Generalized Express Cubes Framework & taxonomy for NOC topologies Extension of the k-ary n-cube model Useful for understanding and exploring on-chip interconnect options Future: expand & formalize UTCS31HPCA '09

Summary  MECS A novel one-to-many topology Good fit for planar substrates Excellent connectivity Effective wire utilization  Generalized Express Cubes Framework & taxonomy for NOC topologies Extension of the k-ary n-cube model Useful for understanding and exploring on-chip interconnect options Future: expand & formalize UTCS32HPCA '09

UTCS33HPCA '09