Exploring Concentration and Channel Slicing in On-chip Network Router

Slides:



Advertisements
Similar presentations
Memory-centric System Interconnect Design with Hybrid Memory Cubes Gwangsun Kim, John Kim Korea Advanced Institute of Science and Technology Jung Ho Ahn,
Advertisements

Misbah Mubarak, Christopher D. Carothers
QuT: A Low-Power Optical Network-on-chip
A Novel 3D Layer-Multiplexed On-Chip Network
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
Do We Need Wide Flits in Networks-On-Chip? Junghee Lee, Chrysostomos Nicopoulos, Sung Joo Park, Madhavan Swaminathan and Jongman Kim Presented by Junghee.
Aérgia: Exploiting Packet Latency Slack in On-Chip Networks
On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic.
Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University.
Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks ______________________________ John Kim, William J. Dally &Dennis Abts Presented.
CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips CiprianSeiculescu Stavros Volos Naser Khosro Pour Babak Falsafi Giovanni De Micheli.
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh.
L2 to Off-Chip Memory Interconnects for CMPs Presented by Allen Lee CS258 Spring 2008 May 14, 2008.
Firefly: Illuminating Future Network-on-Chip with Nanophotonics Yan Pan, Prabhat Kumar, John Kim †, Gokhan Memik, Yu Zhang, Alok Choudhary EECS Department.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Blue Gene / C Cellular architecture 64-bit Cyclops64 chip: –500 Mhz –80 processors ( each has 2 thread units and a FP unit) Software –Cyclops64 exposes.
Report Advisor: Dr. Vishwani D. Agrawal Report Committee: Dr. Shiwen Mao and Dr. Jitendra Tugnait Survey of Wireless Network-on-Chip Systems Master’s Project.
Elastic-Buffer Flow-Control for On-Chip Networks
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
1 Application Aware Prioritization Mechanisms for On-Chip Networks Reetuparna Das Onur Mutlu † Thomas Moscibroda ‡ Chita Das § Reetuparna Das § Onur Mutlu.
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
LIBRA: Multi-mode On-Chip Network Arbitration for Locality-Oblivious Task Placement Gwangsun Kim Computer Science Department Korea Advanced Institute of.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), th Annual International Symposium on.
Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, ​ Kevin Chang, Greg Nazario, Reetuparna.
In-network cache coherence MICRO’2006 Noel Eisley et.al, Princeton Univ. Presented by PAK, EUNJI.
Express Cube Topologies for On-chip Interconnects Boris Grot J. Hestness, S. W. Keckler, O. Mutlu † The University of Texas at Austin † Carnegie Mellon.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Rev PA1 1 Performance energy trade-offs with Silicon Photonics Sébastien Rumley, Robert Hendry, Dessislava Nikolova, Keren Bergman.
University of Michigan, Ann Arbor
Yu Cai Ken Mai Onur Mutlu
OASIS NoC Revisited Adam Esch (m ). Outline Pre-Research OASIS Overview Research Contributions Remarks OASIS Suggestions Future Work.
Reduction of Register File Power Consumption Approach: Value Lifetime Characteristics - Pradnyesh Gudadhe.
A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.
Assaf Shacham, Keren Bergman, Luca P. Carloni Presented for HPCAN Session by: Millad Ghane NOCS’07.
Hybrid Optoelectric On-chip Interconnect Networks Yong-jin Kwon 1.
Virtual-Channel Flow Control William J. Dally
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
Building manycore processor-to-DRAM networks using monolithic silicon photonics Ajay Joshi †, Christopher Batten †, Vladimir Stojanović †, Krste Asanović.
A Low-Area Interconnect Architecture for Chip Multiprocessors Zhiyi Yu and Bevan Baas VLSI Computation Lab ECE Department, UC Davis.
Boris Grot, Joel Hestness, Stephen W. Keckler 1 The University of Texas at Austin 1 NVIDIA Research Onur Mutlu Carnegie Mellon University.
Design Space Exploration for NoC Topologies ECE757 6 th May 2009 By Amit Kumar, Kanchan Damle, Muhammad Shoaib Bin Altaf, Janaki K.M Jillella Course Instructor:
HAT: Heterogeneous Adaptive Throttling for On-Chip Networks Kevin Kai-Wei Chang Rachata Ausavarungnirun Chris Fallin Onur Mutlu.
Network On Chip Cache Coherency Midterm presentation Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter Isaschar.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
Architecture and Algorithms for an IEEE 802
FlexiBuffer: Reducing Leakage Power in On-Chip Network Routers
Lecture 23: Interconnection Networks
SECTIONS 1-7 By Astha Chawla
Work-in-Progress: Wireless Network Reconfiguration for Control Systems
Effective mechanism for bufferless networks at intensive workloads
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Christopher Han-Yu Chou Supervisor: Dr. Guy Lemieux
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Complexity effective memory access scheduling for many-core accelerator architectures Zhang Liang.
Using Packet Information for Efficient Communication in NoCs
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors
Computer Evolution and Performance
Chapter 2 Switching.
Presentation transcript:

Exploring Concentration and Channel Slicing in On-chip Network Router Prabhat Kumar1 Yan Pan1 John Kim2 Gokhan Memik1 Alok Choudhary1 1Northwestern University 2KAIST, South Korea

Contributions of the Work Performance implication of concentration. Integrated vs. external concentration. 47% reduction in area 36% reduction in energy 10% performance degradation Channel Slicing Virtual concentration for efficient resource utilization. 69% reduction in area 32 % reduction in energy

Outline Motivation Concentration Channel slicing Virtual concentration Results Conclusion

Motivation Limited Budget Efficiency Design Options Previous Work Area Energy Efficiency Performance Cost optimization Design Options Concentration Channel Slicing Previous Work Concentration – CMESH, Flattened Butterfly, Firefly, Multidrop Express Channels (on-chip) Channel Slicing – Dragonfly (off-chip) On-chip communication is critical for CMPs as that what determines the performance of the system Concentration and Slicing are previously proposed options whose trade-offs are not well investigated

Motivation Firefly [Pan ISCA’09] Simplified Router Microarchitecture

Typical Topology 2D Mesh # Processor Nodes = # Routers

Solution: Concentration Multiple cores share one router Benefits Resource sharing Network Diameter decreases Local communication cost decreases Drawbacks Router complexity increases significantly You can say: An example, with concentration = 4 is shown on the right. If we keep the bandwidth density constant then the channel width will be 2x the channel width in 2D MESH Put a text below the figre: Concentration = 4, Router radix increases from 5->8, channel width -> 2x C = 4 Radix = 8 Width = 2x

Issue : Router Complexity Router components Crossbar Switch ~ (radix)2 Arbitration logic 5x5 crossbar, 2D MESH 8x8 crossbar, 2D MESH, C = 4 (Integrated Concentration) Put below the figures (Also say it is just an example of 2D mesh concentration): Typical crossbar, integrated crossbar (or something like what is an integrated crossbar), put the reference of the model You can say: Crossbar Switch area increases quadratically with the radix, Higher radix leads to increased complexity in arbitration logic. How can we reduce the complexity of crossbar switch?

Design Option: External Concentration Multiplex injection ports De-multiplex ejection ports Benefits Router radix decreases Area decreases Cons Reduced switching capacity You have to mention the philosophy of external concentration You can say: The traffic going from the injection to the ejection ports still see an equivalent to a 8x8 crossbar, while the intermediate traffic (i.e., east, west, north, south) traffic sees a 5x5 crossbar In few words, Put down the philosophy as it is the last slide of this section

Issue: Arbitration External Concentration Two levels of arbitration Parallel Arbitration Use router switch information for concentration arbitration Add more details, examples or something

Outline Motivation Concentration Channel slicing Virtual concentration Results Conclusion

Issue: Wide Channels Constant bandwidth density => wider channels Inefficient utilization Cache lines ~ 512-1024 bits wide Request, control, coherency packets much narrower Router Area Switch area ~ (channel width)2 Add the reference paper for the model C = 4 Radix = 8 Width = 2x

Design Option: Channel Slicing Slice wide channels Pros Complexity reduces further Better channel utilization Cons Serialization latency increases (for long pkts) Wide Channels imply larger area of components of the routers, how to put this statement in 2-3 words???? C = 4 Slicing Factor = 4

Outline Motivation Concentration Channel slicing Virtual concentration Results Conclusion

Combining Concentration and Slicing Slicing + Concentration Virtual Concentration Nodes dedicated to a sliced layer No sharing of input bandwidth Explain clearly how the local traffic flows or something!!!

Outline Motivation Concentration Channel slicing Virtual concentration Results Conclusion

Evaluation Setup Simulation Environment Booksim simulator Constant on-chip resources Equal Bisection bandwidth for all configurations Equal amount of buffer storage Elaborate a little bit about holding on-chip resources constant!!!

External Concentration Zero-load latency 21% reduction for UR 25% reduction for Bitcomp Throughput 10% reduction for UR No change for Bitcomp Area 47% reduction compared to Integrated Energy 36% reduction compared to Integrated

Virtual Concentration Zero-load latency No change compared to MESH 16% increase for UR compared to Integrated 12% increase for Bitcomp compared to Integrated Throughput No significant difference for UR 4.5% increase for Bitcomp Area 69% reduction compared to MESH Energy 32% reduction compared to MESH Change the 16% and 12% according to the plots.

Area and Energy Consumption 69% reduction compared to MESH 88% reduction compared to Integrated concentration Energy 32 % reduction compared to MESH 35% reduction compared to Integrated concentration

Conclusion Combination of concentration and channel slicing provides efficient NoC design. External concentration reduces complexity with some performance degradation. Virtual Concentration saves 69% area and 32% energy compared to 2D MESH. Make sure that the conclusion is in accordance to the previous slides, do not use any word which is not used earlier.

Thank you for your patience!! Questions? prabhat-kumar@northwestern.edu