Hybrid Electric/Photonic Networks for Scientific Applications on Tiled CMPs Ankit Jain, Shoaib Kamil, Marghoob Mohiyuddin CS258 Final Presentation Prof.

Slides:



Advertisements
Similar presentations
Interconnection Networks: Flow Control and Microarchitecture.
Advertisements

Prof. Natalie Enright Jerger
A Novel 3D Layer-Multiplexed On-Chip Network
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh +, Mikko Lipasti* *University of Wisconsin - Madison + Princeton University 2 nd IEEE.
REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
Parallel System Performance CS 524 – High-Performance Computing.
CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips CiprianSeiculescu Stavros Volos Naser Khosro Pour Babak Falsafi Giovanni De Micheli.
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh.
IP I/O Memory Hard Disk Single Core IP I/O Memory Hard Disk IP Bus Multi-Core IP R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R Networks.
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
Design of a High-Throughput Distributed Shared-Buffer NoC Router
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
CAD and Design Tools for On- Chip Networks Luca Benini, Mark Hummel, Olav Lysne, Li-Shiuan Peh, Li Shang, Mithuna Thottethodi,
Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.
Issues in System-Level Direct Networks Jason D. Bakos.
Trace-Driven Optimization of Networks-on-Chip Configurations Andrew B. Kahng †‡ Bill Lin ‡ Kambiz Samadi ‡ Rohit Sunkam Ramanujam ‡ University of California,
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
Measuring Network Performance of Multi-Core Multi-Cluster (MCMCA) Norhazlina Hamid Supervisor: R J Walters and G B Wills PUBLIC.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
RFAD LAB, YONSEI University IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, NO. 9, SEPTEMBER 2008 Photonic Networks-on-Chip for Future Generations of Chip Multiprocessors.
McRouter: Multicast within a Router for High Performance NoCs
Photonic Networks on Chip Yiğit Kültür CMPE 511 – Computer Architecture Term Paper Presentation 27/11/2008.
ROBERT HENDRY, GILBERT HENDRY, KEREN BERGMAN LIGHTWAVE RESEARCH LAB COLUMBIA UNIVERSITY HPEC 2011 TDM Photonic Network using Deposited Materials.
Yao Wang, Yu Wang, Jiang Xu, Huazhong Yang EE. Dept, TNList, Tsinghua University, Beijing, China Computing System Lab, Dept. of ECE Hong Kong University.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
On-Chip Networks and Testing
SHAPES scalable Software Hardware Architecture Platform for Embedded Systems Hardware Architecture Atmel Roma, INFN Roma, ST Microelectronics Grenoble,
Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, Katherine Yelick Lawrence Berkeley National Laboratory ACM International Conference.
Report Advisor: Dr. Vishwani D. Agrawal Report Committee: Dr. Shiwen Mao and Dr. Jitendra Tugnait Survey of Wireless Network-on-Chip Systems Master’s Project.
International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia.
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
Comparing Memory Systems for Chip Multiprocessors Leverich et al. Computer Systems Laboratory at Stanford Presentation by Sarah Bird.
Scalable Reconfigurable Interconnects Ali Pinar Lawrence Berkeley National Laboratory joint work with Shoaib Kamil, Lenny Oliker, and John Shalf CSCAPES.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
Performance and Energy Comparison of Electrical and Hybrid Photonic Networks for CMPs Ankit Jain, Shoaib Kamil, Marghoob Mohiyuddin, John Shalf, John Kubiatowicz.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Evaluation of Modern Parallel Vector Architectures Leonid Oliker Future Technologies Group Computational Research Division LBNL
Silicon Nanophotonic Network-On-Chip Using TDM Arbitration
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
University of Michigan, Ann Arbor
Performance and Energy Comparison of Electrical and Hybrid Photonic Networks for CMPs Ankit Jain, Shoaib Kamil, Marghoob Mohiyuddin, John Shalf, John Kubiatowicz.
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
Yu Cai Ken Mai Onur Mutlu
Lecture 16: Router Design
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Assaf Shacham, Keren Bergman, Luca P. Carloni Presented for HPCAN Session by: Millad Ghane NOCS’07.
Team LDPC, SoC Lab. Graduate Institute of CSIE, NTU Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin.
Hybrid Optoelectric On-chip Interconnect Networks Yong-jin Kwon 1.
Virtual-Channel Flow Control William J. Dally
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
Design Space Exploration for NoC Topologies ECE757 6 th May 2009 By Amit Kumar, Kanchan Damle, Muhammad Shoaib Bin Altaf, Janaki K.M Jillella Course Instructor:
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
Network-on-Chip Paradigm Erman Doğan. OUTLINE SoC Communication Basics  Bus Architecture  Pros, Cons and Alternatives NoC  Why NoC?  Components 
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
A Quantitative Analysis of Stream Algorithms on Raw Fabrics
Azeddien M. Sllame, Amani Hasan Abdelkader
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
Analysis of a Chip Multiprocessor Using Scientific Applications
Israel Cidon, Ran Ginosar and Avinoam Kolodny
Presentation transcript:

Hybrid Electric/Photonic Networks for Scientific Applications on Tiled CMPs Ankit Jain, Shoaib Kamil, Marghoob Mohiyuddin CS258 Final Presentation Prof. Kubiatowicz May 14, 2008

Motivation Objectives Analytical Model Baseline Architecture Motivation Architecture Simulator Electrical Network Architecture Simulator Hybrid Network “Random” Messages Real Applications Studied Communications Performance Power Process to Processor Mapping Results Conclusions and Future Work

Motivation Manycore: NoCs key to translating raw performance  sustained performance Electrical NoC performance/energy doesn’t scale well with technology Optical NoC promising + Low energy, high bandwidth –Setup overhead Use hybrid network –Small packets  electrical NoC –Large packets  optical NoC

Objective NoC comparison studies ignore real applications  Use real application traces Compare NoC performance with simple models Study tradeoffs involved

Analytic Model Three Models –Bandwidth Model –Bandwidth + Latency Model –Bandwidth + Latency + Contention Model ELECTRICALHYBRID

Baseline Architecture Small, homogenous cores on a CMP Cores ~1.5mm x 1.5mm 22nm process, 5GHz 3D Integrated CMOS –layer for processors, layers for memory We examine two interconnect architectures to compare performance & energy efficiency

Motivation Objectives Analytical Model Baseline Architecture Motivation Architecture Simulator Electrical Network Architecture Simulator Hybrid Network “Random” Messages Real Applications Studied Communications Performance Power Process to Processor Mapping Results Conclusions and Future Work

Electrical NoC Dally’s CMesh topology Wormhole routed Virtual channels Single electrical layer with multiple memory layers

Electrical Simulator (1/2) Processor –destination processors take flits out of the network as soon as they arrive Router –XY dimension order routing –Express links on periphery –Virtual channel wormhole routing –Credit based flow control –8 input ports  8x8 switch

Electrical Simulator (2/2) Channels –Buffering at both ends –Maximum wire length = side of processor core

Motivation Objectives Analytical Model Baseline Architecture Motivation Architecture Simulator Electrical Network Architecture Simulator Hybrid Network “Random” Messages Real Applications Studied Communications Performance Power Process to Processor Mapping Results Conclusions and Future Work

Hybrid NoC Mesh Topology Electrical “Control Network” (ECN) on Processor Plane Multiple optical networks on Photonic Plane Small setup messages on ECN and bulk data transfer on optical network

Blocking Optical Switch On  message turns No inactive powerconsumption Small switching cost Small active power while switched on Capable of routing a single path from any source to any destination

Deadlock in Hybrid NoC Blocking 4x4 switch –only one path can be routed at a time through a switch Deadlock is a known issue in circuit switching –Exponential backoff –Dimension order routing –Optical network choice

Hybrid Simulator (1/2) 1:1 processor to electrical router mapping –Each electrical router buffers up to 8 path setup messages from its corresponding processor Path setup packets are minimally sized: take one cycle to traverse between 2 routers Energy includes E-O-E conversions

Hybrid Simulator (2/2)

Motivation Objectives Analytical Model Baseline Architecture Motivation Architecture Simulator Electrical Network Architecture Simulator Hybrid Network “Random” Messages Real Applications Studied Communications Performance Power Process to Processor Mapping Results Conclusions and Future Work

Synthetic Traces Random messages Nearest-Neighbor Bitreverse Tornado –explicitly designed to stress mesh networks Results will be in final report

Real Applications SPMD style applications Widely Used in scientific community Broken into multiple phases of communication –implicit barrier is assumed at the end of a communication phase

Motivation Objectives Analytical Model Baseline Architecture Motivation Architecture Simulator Electrical Network Architecture Simulator Hybrid Network “Random” Messages Real Applications Studied Communications Performance Power Process to Processor Mapping Results Conclusions and Future Work

Parameter Exploration: Electrical NoC Total buffer size = #vcs X buffer size  router area Small total buffer size good enough!

Parameter Exploration: Hybrid NoC Sensitive to path multiplicity more available paths = less contention Timeouts prevent over- and under-waiting

Performance

Energy

Process-Processor Mapping (1/2)

Process-Processor Mapping (2/2)

NoC as Part of a System Use Merrimac FP unit numbers Scale to 22nm using ITRS roadmap Trace methodology records FP Operations Compare energy used in FP unit vs energy used in interconnect

Motivation Objectives Analytical Model Baseline Architecture Motivation Architecture Simulator Electrical Network Architecture Simulator Hybrid Network “Random” Messages Real Applications Studied Communications Performance Power Process to Processor Mapping Results Conclusions and Future Work

Conclusions Models accurately predict both performance and energy consumption Hybrid NoC: Majority of energy due to Optical-to- Electrical and Electrical-to-Optical conv. (>94%). Additional hardware to the hybrid NoC  low energy expended (in contrast to electrical NoC) Process-to-processor mapping can significantly impact performance as well as energy consumption. –Finding the optimal mapping is not always of utmost importance— making sure not to use a ‘bad’ mapping is. Bulk-synchronous apps use most of their energy in FP ops

Future Work Non-blocking optical mesh interconnection network Account for data transfer onto chip More accurate full system simulators (for both performance and energy) –simulate FP operations & memory traffic More real applications including those that are not SPMD (with overlap of computation and communication) Explore applications with less synchronous communication models –Not SPMD –Overlap of computation and communication

QUESTIONS

Acknowledgements John Shalf (CRD/NERSC, Lawrence Berkeley National Labs) Prof. John Kubiatowicz (UC Berkeley Computer Science Dept) Dr. Keren Bergman (Columbia University) BeBOP Research Group (UC Berkeley Computer Science Dept)

References [1] Assaf Shacham, Keren Bergman, and Luca Carloni. On the Design of a Photonic Network-on- Chip. In Proceedings of the First International Symposium on Networks-on-Chip, [2] James Balfour, and William Dally. Design Tradeoffs for Tiled CMP On-Chip Networks. In Proceedings of the International Conference on Supercomputing, [3] Shoaib Kamil, Ali Pinar, Daniel Gunter, Michael Lijewski, Leonid Oliker, and John Shalf. Reconfigurable Hybrid Interconnection for Static and Dynamic Applications. In Proceedings of the ACM International Conference on Computing Frontiers, [4] Bergman et. al.. Topology Exploration for Photonic NoCs for Chip Multiprocessors. Unpublished to date. [5] Cactus Homepage [6] Z. Lin, S. Ethier, T.S. Hahm, and W.M. Tang. Size Scaling of Turbulent Transport in Magnetically Confined Plasmas. Phys. Rev. Lett., 88, [7] Julian Borrill, Jonathan Carter, Leonid Oliker, David Skinner, and R. Biswas. Integrated performance monitoring of a cosmology application on leading hec platforms. In Proceedings of the International Conference on Parallel Processing (ICPP), [8] A. Canning, L.W. Wang, A. Williamson, and A. Zunger. Parallel Empirical Pseudopotential Electronic Structure Calculations for Million Atom Systems. J. Comput. Phys., 160:29, [9] Xiaoye S. Li and James W. Demmel. SuperLU-dist: A Scalable Distributed-Memory Sparse Direct Solver for Unsymmetric Linear Systems. ACM Trans. Mathematical Software, 29(2):110140, June [10] J. Qiang, M. Furman, and R. Ryne. A Parallel Particle-in-Cell Model for Beam-Beam Interactions in High Energy Ring Colliders. J. Comp. Phys., 198, [11] IPM Homepage http://