Handling Global Traffic in Future CMP NoCs Ran Manevich, Israel Cidon, and Avinoam Kolodny. Group Research QNoC Electrical Engineering Department Technion.

Slides:



Advertisements
Similar presentations
Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel.
Advertisements

1 Praveen K. Muthuswamy Electrical Computer and Systems Engineering Rensselaer Polytechnic Institute In collaboration with Koushik Kar, Aparna Gupta (RPI)
Quality-of-Service Routing in IP Networks Donna Ghosh, Venkatesh Sarangan, and Raj Acharya IEEE TRANSACTIONS ON MULTIMEDIA JUNE 2001.
QuT: A Low-Power Optical Network-on-chip
A Novel 3D Layer-Multiplexed On-Chip Network
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
TOPOLOGIES FOR POWER EFFICIENT WIRELESS SENSOR NETWORKS ---KRISHNA JETTI.
Network Layer Routing Issues (I). Infrastructure vs. multi-hop Infrastructure networks: Infrastructure networks: ◦ One or several Access-Points (AP) connected.
Weighted Random Oblivious Routing on Torus Networks Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California, San Diego.
Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California,
1 Asynchronous Bit-stream Compression (ABC) IEEE 2006 ABC Asynchronous Bit-stream Compression Arkadiy Morgenshtein, Avinoam Kolodny, Ran Ginosar Technion.
Technion – Israel Institute of Technology Qualcomm Corp. Research and Development, San Diego, California Leveraging Application-Level Requirements in the.
ATLAS: The Network-on-Chip Design Exploration Flow
Reporter: Bo-Yi Shiu Date: 2011/05/27 Virtual Point-to-Point Connections for NoCs Mehdi Modarressi, Arash Tavakkol, and Hamid Sarbazi- Azad IEEE TRANSACTIONS.
Module R R RRR R RRRRR RR R R R R Efficient Link Capacity and QoS Design for Wormhole Network-on-Chip Zvika Guz, Isask ’ har Walter, Evgeny Bolotin, Israel.
Firefly: Illuminating Future Network-on-Chip with Nanophotonics Yan Pan, Prabhat Kumar, John Kim †, Gokhan Memik, Yu Zhang, Alok Choudhary EECS Department.
Multiple constraints QoS Routing Given: - a (real time) connection request with specified QoS requirements (e.g., Bdw, Delay, Jitter, packet loss, path.
1 Evgeny Bolotin – Efficient Routing, DATE 2007 Routing Table Minimization for Irregular Mesh NoCs Evgeny Bolotin, Israel Cidon, Ran Ginosar, Avinoam Kolodny.
NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar.
MICRO-MODEM RELIABILITY SOLUTION FOR NOC COMMUNICATIONS Arkadiy Morgenshtein, Evgeny Bolotin, Israel Cidon, Avinoam Kolodny, Ran Ginosar Technion – Israel.
LOW-LEAKAGE REPEATERS FOR NETWORK-ON-CHIP INTERCONNECTS Arkadiy Morgenshtein, Israel Cidon, Avinoam Kolodny, Ran Ginosar Technion – Israel Institute of.
Utilizing Shared Data in Chip Multiprocessors with the Nahalal Architecture Zvika Guz, Idit Keidar, Avinoam Kolodny, Uri C. Weiser The Technion – Israel.
1 Link Division Multiplexing (LDM) for NoC Links IEEE 2006 LDM Link Division Multiplexing Arkadiy Morgenshtein, Avinoam Kolodny, Ran Ginosar Technion –
1 Evgeny Bolotin – ClubNet Nov 2003 Network on Chip (NoC) Evgeny Bolotin Supervisors: Israel Cidon, Ran Ginosar and Avinoam Kolodny ClubNet - November.
1 E. Bolotin – The Power of Priority, NoCs 2007 The Power of Priority : NoC based Distributed Cache Coherency Evgeny Bolotin, Zvika Guz, Israel Cidon,
1 Evgeny Bolotin – ICECS 2004 Automatic Hardware-Efficient SoC Integration by QoS Network on Chip Electrical Engineering Department, Technion, Haifa, Israel.
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
Statistical Approach to NoC Design Itamar Cohen, Ori Rottenstreich and Isaac Keslassy Technion (Israel)
A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat Presented by Gregory Peaker and Tyler Maclean.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Photonic Networks on Chip Yiğit Kültür CMPE 511 – Computer Architecture Term Paper Presentation 27/11/2008.
Quasi Fat Trees for HPC Clouds and their Fault-Resilient Closed-Form Routing Technion - EE Department; *and Mellanox Technologies Eitan Zahavi* Isaac Keslassy.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Blue Gene / C Cellular architecture 64-bit Cyclops64 chip: –500 Mhz –80 processors ( each has 2 thread units and a FP unit) Software –Cyclops64 exposes.
On-Chip Networks and Testing
Report Advisor: Dr. Vishwani D. Agrawal Report Committee: Dr. Shiwen Mao and Dr. Jitendra Tugnait Survey of Wireless Network-on-Chip Systems Master’s Project.
Three-Dimensional Layout of On-Chip Tree-Based Networks Hiroki Matsutani (Keio Univ, Japan) Michihiro Koibuchi (NII, Japan) D. Frank Hsu (Fordham Univ,
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Wireless Mesh Network 指導教授:吳和庭教授、柯開維教授 報告:江昀庭 Source reference: Akyildiz, I.F. and Xudong Wang “A survey on wireless mesh networks” IEEE Communications.
Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), th Annual International Symposium on.
Express Cube Topologies for On-chip Interconnects Boris Grot J. Hestness, S. W. Keckler, O. Mutlu † The University of Texas at Austin † Carnegie Mellon.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Under-project Meeting Network-on-Chip Group 2007/3/07 TA: 林書彥 黃群翔.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
50 th Annual Allerton Conference, 2012 On the Capacity of Bufferless Networks-on-Chip Alex Shpiner, Erez Kantor, Pu Li, Israel Cidon and Isaac Keslassy.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
University of Michigan, Ann Arbor
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
Dynamic Traffic Distribution among Hierarchy Levels in Hierarchical Networks-on-Chip Ran Manevich, Israel Cidon, and Avinoam Kolodny Group Research QNoC.
Module R R RRR R RRRRR RR R R R R Access Regulation to Hot-Modules in Wormhole NoCs Isask’har (Zigi) Walter Supervised by: Israel Cidon, Ran Ginosar and.
Design Tradeoffs of Long Links in Hierarchical Tiled Networks-on-Chip Group Research 1 QNoC.
Technion – Israel Institute of Technology Faculty of Electrical Engineering NOC Seminar Error Handling in Wormhole Networks Author: Amit Berman Mentor:
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
A Low-Area Interconnect Architecture for Chip Multiprocessors Zhiyi Yu and Bevan Baas VLSI Computation Lab ECE Department, UC Davis.
Design Space Exploration for NoC Topologies ECE757 6 th May 2009 By Amit Kumar, Kanchan Damle, Muhammad Shoaib Bin Altaf, Janaki K.M Jillella Course Instructor:
HAT: Heterogeneous Adaptive Throttling for On-Chip Networks Kevin Kai-Wei Chang Rachata Ausavarungnirun Chris Fallin Onur Mutlu.
Performed by: Erez Davidi / Aviad Zrihen Instructor: Yaniv Ben-Yitzhak המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון.
-1/16- Maximum Battery Life Routing to Support Ubiquitous Mobile Computing in Wireless Ad Hoc Networks C.-K. Toh, Georgia Institute of Technology IEEE.
Network-on-Chip Paradigm Erman Doğan. OUTLINE SoC Communication Basics  Bus Architecture  Pros, Cons and Alternatives NoC  Why NoC?  Components 
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Exploring Concentration and Channel Slicing in On-chip Network Router
Israel Cidon, Ran Ginosar and Avinoam Kolodny
Interconnection Network Design Lecture 14
Advanced Computer Architecture 5MD00 / 5Z032 Multi-Processing 2
Yiannis Andreopoulos et al. IEEE JSAC’06 November 2006
Presentation transcript:

Handling Global Traffic in Future CMP NoCs Ran Manevich, Israel Cidon, and Avinoam Kolodny. Group Research QNoC Electrical Engineering Department Technion – Israel Institute of Technology Haifa, Israel SLIP 2012

Bandwidth Version of Rent’s Rule B – Cluster external bandwidth. k – Average bandwidth per module. G – Number of modules in a cluster. R – Rent’s exponent, 0<R<1. G = 16 B = ∑ Greenfield et al., “Implications of Rent’s Rule for NoC Design and Its Fault-Tolerance”, NOCS 2007

Rent’s Exponent Reflects Traffic Locality

CMP NoC Traffic Follows Rent’s Rule 2D Mesh NoC ~Average of CMP parallel programs * * Heirman et al., “Rent’s Rule and Parallel Programs: Characterizing Network Traffic Behaviour”, SLIP 2008

2D Mesh – Packets Classification by Distance For illustration purposes, packets are classified according to distances between sources and destinations. K=8 Nearest Neighbor (NN) – Dist = 1 Local – 1<Dist<2+K/8 Global – Dist ≥ 2+K/8 K=16

Fraction of global packets decreases in large systems Rent’s exponent (R) = 0.7 (Nearest Neighbor)

Dominance of Global Packets in BW/Router and Light Load Latency Nearest Neighbor traffic is dominant in small systems. * Zarkesh-Ha et al., “Hybrid Network on Chip (HNoC): local buses with globalmesh architecture”, SLIP 2010 * In large systems: 1.Global packets are minority. 2.Global packets dominate BW/router and average latency.

Problem!!! In large systems, global packets (minority): Consume most of the network’s BW. Significantly increase average light load latency.

Solution - PyraMesh Overall hops-count is reduced. Average latency is reduced. Average BW per router is reduced. Hierarchical 2D mesh. Global packets are routed through higher hierarchy levels hops instead of 14! Source Dest.

PyraMesh - Architecture K – The size of the base mesh. NL – Number of levels. NP – Number of pyramids on top of the base mesh. α i – Ratio between the sizes of levels i and i+1. C i – Number of routers in level i that are connected to a router in level i+1 along a single dimension. K = 8, NL = 2, NP = 1 α i = 4, C i = 2 K = 8, NL = 3, NP = 1 α i = 2, C i = 1 K = 8, NL = 2, NP = 4 α i = 4, C i = 1

Addressing – On each level i, node (X,Y) Base Mesh is represented by the nearest router in the North-East quarter: Routing – XY: PyraMesh – Addressing and Routing

Packets are distributed among levels i according to their travel distance (D) in the base mesh. DTh i – Distance threshold of level i. If D > DTh i, the packet is directed to level i+1. Example: DTh i = 6, 12, 20 PyraMesh – Packets Classification Highest LevelTravel Distance 4D>20 312<D≤20 26<D≤12 1 (Base Mesh)D≤6

Area overhead, Wiring overhead, Maximum bandwidth per router*, Average light-load latency* = F( K,NL,NP,α i,C i,Dth i *, R * ) PyraMesh – Optimization CONSTRAINTS OPTIMIZATION OBJECTIVES

Optimization Results Example of 16x16 System, R = 0.7 Throughput optimized PyraMesh: Light load latency optimized PyraMesh: D≤5 5<D≤8 D>8 Packets distance thresholds D≤6 6<D≤18 D>18

Light Load Latency Performance BMesh – The baseline mesh Scaled Mesh (SMesh) – Links wider than in BMesh by PyraMesh area overhead factor. HNoC –

Throughput Results, R = 0.7

Our Contributions The observation that global packets limit scalability of large systems. PyraMesh – A novel framework for hierarchical NoCs design. Characterization of Rentian traffic in large NoCs.

Conclusions Global packets limit performance in large (future) CMP systems. PyraMesh – A novel class of hierarchical 2D mesh topologies. PyraMesh handles global traffic in future CMP NoCs.

Thank You!

Related Work CMesh J. D. Balfour and W. J. Dally. “Design tradeoffs for tiled CMP on-chip networks”. International Conference on Supercomputing, GigaNoC C. Puttmann, J.-C. Niemann, M. Porrmann, and U. Rückert. “GigaNoC – A hierarchical network-on-chip for scalable chip-multiprocessors.” Euromicro DSD Long Range Links U. Y. Ogras and R. Marculescu. “ ‘It’s a small world after all’: NoC performance optimization via long-range link insertion”. IEEE Trans. on Very Large Scale Integr. (VLSI) Syst Hierarchical Rings on a Mesh S. Bourduas and Z. Zilic. “Latency reduction of global traffic in wormhole-routed meshes using hierarchical rings for global routing”. ASAP Hierarchical 2-Levels 2D Mesh Markus Winter and Steffen Prusseit and Gerhard P. Fettweis. Hierarchical routing architectures in clustered 2D-mesh networks-on-chip. ISOCC 2010.