Performance and Energy Comparison of Electrical and Hybrid Photonic Networks for CMPs Ankit Jain, Shoaib Kamil, Marghoob Mohiyuddin, John Shalf, John Kubiatowicz.

Slides:



Advertisements
Similar presentations
Prof. Natalie Enright Jerger
Advertisements

QuT: A Low-Power Optical Network-on-chip
A Novel 3D Layer-Multiplexed On-Chip Network
Dynamic Topology Optimization for Supercomputer Interconnection Networks Layer-1 (L1) switch –Dumb switch, Electronic “patch panel” –Establishes hard links.
Zhongkai Chen 3/25/2010. Jinglei Wang; Yibo Xue; Haixia Wang; Dongsheng Wang Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China This paper.
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University.
Handling Global Traffic in Future CMP NoCs Ran Manevich, Israel Cidon, and Avinoam Kolodny. Group Research QNoC Electrical Engineering Department Technion.
CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips CiprianSeiculescu Stavros Volos Naser Khosro Pour Babak Falsafi Giovanni De Micheli.
Reporter: Bo-Yi Shiu Date: 2011/05/27 Virtual Point-to-Point Connections for NoCs Mehdi Modarressi, Arash Tavakkol, and Hamid Sarbazi- Azad IEEE TRANSACTIONS.
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
Miguel Gorgues, Dong Xiang, Jose Flich, Zhigang Yu and Jose Duato Uni. Politecnica de Valencia, Spain School of Software, Tsinghua University, China, Achieving.
High Performance Router Architectures for Network- based Computing By Dr. Timothy Mark Pinkston University of South California Computer Engineering Division.
Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh.
OCIN Workshop Wrapup Bill Dally. Thanks To Funding –NSF - Timothy Pinkston, Federica Darema, Mike Foster –UC Discovery Program Organization –Jane Klickman,
IP I/O Memory Hard Disk Single Core IP I/O Memory Hard Disk IP Bus Multi-Core IP R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R Networks.
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
Design of a High-Throughput Distributed Shared-Buffer NoC Router
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.
Issues in System-Level Direct Networks Jason D. Bakos.
Hybrid Electric/Photonic Networks for Scientific Applications on Tiled CMPs Ankit Jain, Shoaib Kamil, Marghoob Mohiyuddin CS258 Final Presentation Prof.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
Measuring Network Performance of Multi-Core Multi-Cluster (MCMCA) Norhazlina Hamid Supervisor: R J Walters and G B Wills PUBLIC.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Photonic Networks on Chip Yiğit Kültür CMPE 511 – Computer Architecture Term Paper Presentation 27/11/2008.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
On-Chip Networks and Testing
SHAPES scalable Software Hardware Architecture Platform for Embedded Systems Hardware Architecture Atmel Roma, INFN Roma, ST Microelectronics Grenoble,
Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, Katherine Yelick Lawrence Berkeley National Laboratory ACM International Conference.
Report Advisor: Dr. Vishwani D. Agrawal Report Committee: Dr. Shiwen Mao and Dr. Jitendra Tugnait Survey of Wireless Network-on-Chip Systems Master’s Project.
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
Comparing Memory Systems for Chip Multiprocessors Leverich et al. Computer Systems Laboratory at Stanford Presentation by Sarah Bird.
Scalable Reconfigurable Interconnects Ali Pinar Lawrence Berkeley National Laboratory joint work with Shoaib Kamil, Lenny Oliker, and John Shalf CSCAPES.
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.
Silicon Nanophotonic Network-On-Chip Using TDM Arbitration
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Performance Analysis of a JPEG Encoder Mapped To a Virtual MPSoC-NoC Architecture Using TLM 林孟諭 Dept. of Electrical Engineering National Cheng Kung.
Axel Jantsch 1 Networks on Chip Axel Jantsch 1 Shashi Kumar 1, Juha-Pekka Soininen 2, Martti Forsell 2, Mikael Millberg 1, Johnny Öberg 1, Kari Tiensurjä.
Performance and Energy Comparison of Electrical and Hybrid Photonic Networks for CMPs Ankit Jain, Shoaib Kamil, Marghoob Mohiyuddin, John Shalf, John Kubiatowicz.
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
Yu Cai Ken Mai Onur Mutlu
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Assaf Shacham, Keren Bergman, Luca P. Carloni Presented for HPCAN Session by: Millad Ghane NOCS’07.
Team LDPC, SoC Lab. Graduate Institute of CSIE, NTU Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Exploring Concentration and Channel Slicing in On-chip Network Router
Rahul Boyapati. , Jiayi Huang
Analysis of a Chip Multiprocessor Using Scientific Applications
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors
Lecture: Interconnection Networks
Lecture 25: Interconnection Networks
Presentation transcript:

Performance and Energy Comparison of Electrical and Hybrid Photonic Networks for CMPs Ankit Jain, Shoaib Kamil, Marghoob Mohiyuddin, John Shalf, John Kubiatowicz UC Berkeley ParLab/LBNL

MotivationMotivation ContributionsContributions Baseline ArchitectureBaseline Architecture Introduction ArchitectureArchitecture SimulatorSimulator ModelModel Electrical Network ArchitectureArchitecture SimulatorSimulator ModelModel Hybrid Network Synthetic CommunicationSynthetic Communication Real ApplicationsReal Applications Studied Communications Synthetic ResultsSynthetic Results Application ResultsApplication Results Process to Processor MappingProcess to Processor Mapping Results Conclusions and Future Work

Motivation Manycore: NoCs key to translating raw performance  sustained performance Electrical NoC performance/energy constrained by process technology  Also, every joule saved counts Photonic NoC promising  Enabled by recent advances in photonics & chip fabrication  Potentially high performance at low energy cost  But cannot do packet switching Use hybrid network  Small packets  electrical NoC  Large packets  optical NoC

Contributions Use both synthetic traces and real application traces to compare electrical vs. hybrid photonic networks Construct cycle-accurate simulators and compare with simple analytic models How can we obtain good performance and low energy usage?

Baseline Architecture 64 small, homogenous cores on a CMP Cores ~1.5mm x 1.5mm 22nm process, 5GHz 3D Integrated CMOS  layer for processors, layers for memory We examine two interconnect architectures to compare performance & energy efficiency

MotivationMotivation ContributionsContributions Baseline ArchitectureBaseline Architecture Introduction ArchitectureArchitecture SimulatorSimulator ModelModel Electrical Network ArchitectureArchitecture SimulatorSimulator ModelModel Hybrid Network Synthetic CommunicationSynthetic Communication Real ApplicationsReal Applications Studied Communications Synthetic ResultsSynthetic Results Application ResultsApplication Results Process to Processor MappingProcess to Processor Mapping Results Conclusions and Future Work

Electrical NoC Bill Dally’s CMesh topology Wormhole routed Virtual channels Single electrical layer with multiple memory layers

Electrical Simulator Processor  Ignore computation  Communication divided into “phases” (SPMD-style) Send and receive all messages in a phase as fast as possible Router  XY dimension order routing  Express links on periphery  Virtual channels & wormhole routing  Credit based flow control  8 input ports  8x8 switch

Analytic Model for Electrical NoC Time  Bandwidth-only model  Assume virtual channels hide latency Energy  Each hop incurs a set amount of energy  Link crossing + Router traversal  Parameters from Dally et al, scaled via ITRS

MotivationMotivation ContributionsContributions Baseline ArchitectureBaseline Architecture Introduction ArchitectureArchitecture SimulatorSimulator ModelModel Electrical Network ArchitectureArchitecture SimulatorSimulator ModelModel Hybrid Network Synthetic CommunicationSynthetic Communication Real ApplicationsReal Applications Studied Communications Synthetic ResultsSynthetic Results Application ResultsApplication Results Process to Processor MappingProcess to Processor Mapping Results Conclusions and Future Work

Hybrid NoC Mesh Topology Electrical “Control Network” (ECN) on Processor Plane Multiple optical networks on Photonic Plane Small setup messages on ECN and bulk data transfer on optical network

Blocking Photonic Switch On  message turnsOn  message turns No inactive power consumptionNo inactive power consumption Small switching costSmall switching cost Small active power while switched onSmall active power while switched on Capable of routing a single path from any source to any destination

Deadlock in Hybrid NoC Blocking 4x4 switch  Only one path can be routed at a time through a switch Deadlock is a known issue in circuit switching. Avoid deadlock with:  Exponential backoff  Dimension order routing  Multiple optical networks Results in more possible paths

Hybrid Simulator 1:1 processor to electrical router mapping  Each electrical router buffers up to 8 path setup messages from its corresponding processor  Electrical router does not use virtual channels or wormhole routing (unnecessary and consume energy) Path setup packets are minimally sized: take one cycle to traverse between 2 routers Energy includes Electro-Opto-Electrical conversions at the endpoints  Most expensive operation energy-wise

Analytic Model for Hybrid NoC Time  Must account for latency of electrical network, bandwidth limits, and contention  For contention, serialize “most-used” link Only one message can be sent along link at a time Overall time is time to send all messages on busiest link Energy  Each message incurs energy cost on electrical network, plus the costs on the photonic network

MotivationMotivation ContributionsContributions Baseline ArchitectureBaseline Architecture Introduction ArchitectureArchitecture SimulatorSimulator ModelModel Electrical Network ArchitectureArchitecture SimulatorSimulator ModelModel Hybrid Network Synthetic CommunicationSynthetic Communication Real ApplicationsReal Applications Studied Communications Synthetic ResultsSynthetic Results Application ResultsApplication Results Process to Processor MappingProcess to Processor Mapping Results Conclusions and Future Work

Synthetic Traces Random messages Nearest-Neighbor Bitreverse Tornado Look at both small & large messages

Real Applications SPMD style applications From DOE/NERSC workloads Broken into multiple phases of communication  implicit barrier is assumed at the end of a communication phase

MotivationMotivation ContributionsContributions Baseline ArchitectureBaseline Architecture Introduction ArchitectureArchitecture SimulatorSimulator ModelModel Electrical Network ArchitectureArchitecture SimulatorSimulator ModelModel Hybrid Network Synthetic CommunicationSynthetic Communication Real ApplicationsReal Applications Studied Communications Synthetic ResultsSynthetic Results Application ResultsApplication Results Process to Processor MappingProcess to Processor Mapping Results Conclusions and Future Work

Synthetic Trace Results 20 For small messages, setup latency for the hybrid network makes it slower than electrical Hybrid network outperforms electrical-only on large messages, and uses far less energy in both cases

Application Performance

Application Energy

Process-Processor Mapping (1/2)

Process-Processor Mapping (2/2)

MotivationMotivation ContributionsContributions Baseline ArchitectureBaseline Architecture Introduction ArchitectureArchitecture SimulatorSimulator ModelModel Electrical Network ArchitectureArchitecture SimulatorSimulator ModelModel Hybrid Network Synthetic CommunicationSynthetic Communication Real ApplicationsReal Applications Studied Communications Synthetic ResultsSynthetic Results Application ResultsApplication Results Process to Processor MappingProcess to Processor Mapping Results Conclusions and Future Work

Conclusions Simple analytic models accurately predict both performance and energy consumption Hybrid NoC: Majority of energy due to Optical-to- Electrical and Electrical-to-Optical conv. (>94%). Hybrid NoC performs better for larger messages; energy consumption is much lower Process-to-processor mapping can significantly impact performance as well as energy consumption.  Finding the optimal mapping is not always of utmost importance— making sure not to use a ‘bad’ mapping is. Overall, hybrid photonic on-chip networks are promising

Future Work Non-blocking optical mesh interconnection network Account for data transfer onto chip More accurate full system simulators (for both performance and energy)  simulate FP operations & memory traffic More real applications including those that are not SPMD (with overlap of computation and communication) Explore applications with less synchronous communication models  Not SPMD  Overlap of computation and communication

Acknowledgements Katherine Yelick (UC Berkeley ParLab & NERSC/LBNL) Assam Schacham and Dr. Keren Bergman (Columbia University)  Our exploration is based on their earlier work (see references) BeBOP Research Group (UC Berkeley Computer Science Dept)

References [1] Assaf Shacham, Keren Bergman, and Luca Carloni. On the Design of a Photonic Network-on-Chip. In Proceedings of the First International Symposium on Networks-on-Chip, [2] James Balfour, and William Dally. Design Tradeoffs for Tiled CMP On-Chip Networks. In Proceedings of the International Conference on Supercomputing, [3] Shoaib Kamil, Ali Pinar, Daniel Gunter, Michael Lijewski, Leonid Oliker, and John Shalf. Reconfigurable Hybrid Interconnection for Static and Dynamic Applications. In Proceedings of the ACM International Conference on Computing Frontiers, [4] Bergman et. al.. Topology Exploration for Photonic NoCs for Chip Multiprocessors. Unpublished to date. [5] Cactus Homepage [6] Z. Lin, S. Ethier, T.S. Hahm, and W.M. Tang. Size Scaling of Turbulent Transport in Magnetically Confined Plasmas. Phys. Rev. Lett., 88, [7] Julian Borrill, Jonathan Carter, Leonid Oliker, David Skinner, and R. Biswas. Integrated performance monitoring of a cosmology application on leading hec platforms. In Proceedings of the International Conference on Parallel Processing (ICPP), [8] A. Canning, L.W. Wang, A. Williamson, and A. Zunger. Parallel Empirical Pseudopotential Electronic Structure Calculations for Million Atom Systems. J. Comput. Phys., 160:29, [9] Xiaoye S. Li and James W. Demmel. SuperLU-dist: A Scalable Distributed-Memory Sparse Direct Solver for Unsymmetric Linear Systems. ACM Trans. Mathematical Software, 29(2):110140, June [10] J. Qiang, M. Furman, and R. Ryne. A Parallel Particle-in-Cell Model for Beam-Beam Interactions in High Energy Ring Colliders. J. Comp. Phys., 198, [11] IPM Homepage http://